%SUMMARY %- ABSTRACT %- INTRODUCTION %# BASICS %- \acs{DNA} STRUCTURE %- DATA TYPES % - BAM/FASTQ % - NON STANDARD %- COMPRESSION APPROACHES % - SAVING DIFFERENCES WITH GIVEN BASE \acs{DNA} % - HUFFMAN ENCODING % - PROBABILITY APPROACHES (WITH BASE?) % %# COMPARING TOOLS %- %# POSSIBLE IMPROVEMENT %- \acs{DNA}S STOCHASTICAL ATTRIBUTES %- IMPACT ON COMPRESSION \chapter{Structure Of Biological Data} To strengthen the understanding how and where biological information is stored, this section starts with a quick and general rundown on the structure of any living organism. % todo add picture All living organisms, like plants and animals, are made of cells (a human body can consist out of several trillion cells). % human body estimated 3.72 x 10^13 cells https://www.tandfonline.com/doi/full/10.3109/03014460.2013.807878 A cell in itsel is a living organism, the smalles one possible. A cell got two layers, the inner one is called nucleus wich contains chromosomes. The chromosomes hold the genetic information in form of \acs{DNA}. \section{DNA} \ac{DNA} is often seen in the form of a double helix. A double helix consists, as the name suggestes, of two single helix. Each of them consists of two main components: the Suggar Phosphat backbone, which is irelavant for this Paper and the Bases. The arrangement of Bases represents the Information stored in the \acs{DNA}. A base is an organic molecule, they are called Nucleotides. %Nucleotides have special attributes and influence other Nucleotides in the \acs{DNA} Sequence % describe Genomes? \section{Nucleotides} For this paper, nucleotides are the most important parts of the \acs{DNA}. A Nucleotide can have one of four forms: it can be either adenine, thymine, guanine or cytosine. Each of them got a Counterpart on the helix, to be more explicit: adenine can only bond with thymine, guanine can only bond with cytosine. This means with the content of one helix, the other one can be determined by ``inverting'' the first. The counterpart for e.g.: adenine, guanine, adenine would be: thymine, cytosine, thymine. For the sake of simplicity, one does not write out the full name of each nucleotide but only use its initial: AGA in one Helix, TCT in the other. % it there is only one section -> remove it or move everything into introduction