| 123456789101112131415161718192021222324252627282930313233343536373839404142 |
- %SUMMARY
- %- ABSTRACT
- %- INTRODUCTION
- %# BASICS
- %- \acs{DNA} STRUCTURE
- %- DATA TYPES
- % - BAM/FASTQ
- % - NON STANDARD
- %- COMPRESSION APPROACHES
- % - SAVING DIFFERENCES WITH GIVEN BASE \acs{DNA}
- % - HUFFMAN ENCODING
- % - PROBABILITY APPROACHES (WITH BASE?)
- %
- %# COMPARING TOOLS
- %-
- %# POSSIBLE IMPROVEMENT
- %- \acs{DNA}S STOCHASTICAL ATTRIBUTES
- %- IMPACT ON COMPRESSION
- \chapter{Structure Of Biological Data}
- To strengthen the understanding of how and where biological information is stored, this section starts with a quick and general rundown on the structure of any living organism.\\
- % todo add picture
- All living organisms, like plants and animals, are made of cells (a human body can consist out of several trillion cells) \cite{cells}.
- A cell in itself is a living organism; The smalles one possible. It has two layers from which the inner one is called nucleus. The nucleus contains chromosomes and those chromosomes hold the genetic information in form of \ac{DNA}.
- % nucelosome and histone?
-
- \section{DNA}
- \ac{DNA} is often seen in the form of a double helix. A double helix consists, as the name suggestes, of two single helix.
- \begin{figure}[ht]
- \centering
- \includegraphics[width=15cm]{k2/dna.png}
- \caption{A purely diagrammatic figure of the components \ac{DNA} is made of. The smaller, inner rods symbolize nucleotide links and the outer ribbons the phosphate-suggar chains \cite{dna_structure}.}
- \label{k2:dna-struct}
- \end{figure}
- Each of them consists of two main components: the Suggar Phosphat backbone, which is irelavant for this work and the Bases. The arrangement of Bases represents the Information stored in the \ac{DNA}. A base is an organic molecule, they are called Nucleotides \cite{dna_structure}. %Nucleotides have special attributes and influence other Nucleotides in the \acs{DNA} Sequence
- % describe Genomes?
- \section{Nucleotides}
- For this work, nucleotides are the most important parts of the \acs{DNA}. A Nucleotide can have one of four forms: it can be either adenine, thymine, guanine or cytosine. Each of them got a Counterpart with which a bond can be established: adenine can bond with thymine, guanine can bond with cytosine. For someone who whishes to persist this information, it means the content of one helix can be determined by ``inverting'' the other one, in other words: the nucleotides of only one (entire) helix needs to be stored physically, to save the information of the whole \ac{DNA}. The counterpart for e.g.: \texttt{adenine, guanine, adenine} chain would be a chain of \texttt{thymine, cytosine, thymine}. For the sake of simplicity, one does not write out the full name of each nucleotide, but only its initial: \texttt{AGA} in one Helix, \texttt{TCT} in the other.
|