Publicaciones científicas

GABAC: an arithmetic coding solution for genomic data

01-abr-2020 | Revista: Bioinformatics

Jan Voges  1 , Tom Paridaens  2 , Fabian Müntefering  1 , Liudmila S Mainzer  3   4 , Brian Bliss  3 , Mingyu Yang  5 , Idoia Ochoa  5 , Jan Fostier  2 , Jörn Ostermann  1 , Mikel Hernaez  4


Motivation: In an effort to provide a response to the ever-expanding generation of genomic data, the International Organization for Standardization (ISO) is designing a new solution for the representation, compression and management of genomic sequencing data: the Moving Picture Experts Group (MPEG)-G standard.

This paper discusses the first implementation of an MPEG-G compliant entropy codec: GABAC. GABAC combines proven coding technologies, such as context-adaptive binary arithmetic coding, binarization schemes and transformations, into a straightforward solution for the compression of sequencing data.

Results: We demonstrate that GABAC outperforms well-established (entropy) codecs in a significant set of cases and thus can serve as an extension for existing genomic compression solutions, such as CRAM.

Availability and implementation: The GABAC library is written in C++. We also provide a command line application which exercises all features provided by the library. GABAC can be downloaded from https://github.com/mitogen/gabac.

Supplementary information: Supplementary data are available at Bioinformatics online.

CITA DEL ARTÍCULO  Bioinformatics. 2020 Apr 1;36(7):2275-2277.  doi: 10.1093/bioinformatics/btz922