%SUMMARY %- ABSTRACT %- INTRODUCTION %# BASICS %- \acs{DNA} STRUCTURE %- DATA TYPES % - BAM/FASTQ % - NON STANDARD %- COMPRESSION APPROACHES % - SAVING DIFFERENCES WITH GIVEN BASE \acs{DNA} % - HUFFMAN ENCODING % - PROBABILITY APPROACHES (WITH BASE?) % %# COMPARING TOOLS %- %# POSSIBLE IMPROVEMENT %- \acs{DNA}S STOCHASTICAL ATTRIBUTES %- IMPACT ON COMPRESSION \chapter{Compression aproaches} The process of compressing data serves the goal to generate an output that is smaller than its input data. In many cases, like in gene compressing, the compression is idealy lossless. This means it is possible for every compressed data, to receive the full information that were available in the origin data, by decompressing it. Lossy compression on the other hand, might excludes parts of data in the compression process, in order to increase the compression rate. The excluded parts are typicaly not necessary to transmit the origin information. This works with certain audio and pictures files or network protocols that are used to transmit video/audio streams live. For \acs{DNA} a lossless compression is needed. To be preceice a lossy compression is not possible, because there is no unnecessary data. Every nucleotide and its position is needed for the sequenced \acs{DNA} to be complete. % list of algos and the tools that use them The well known Huffman coding, is used in several Tools for genome compression (genomic squeeze <- official | inofficial -> GDC, GRS). Further \ac{ANS} or rANS ... TBD. \subsection{Huffman encoding} \section{Probability aproaches}