před 3 roky · 0b5889e5a7
--- a/latex/result/thesis.pdf
+++ b/latex/result/thesis.pdf
--- a/latex/tex/kapitel/abkuerzungen.tex
+++ b/latex/tex/kapitel/abkuerzungen.tex
@@ -6,4 +6,5 @@
 
				 %          sortieren. Das passiert nicht automatisch.
			
 
				 \begin{acronym}[IEEE]
			
 
				   \acro{DNA}{Deoxyribonucleic acid}
			
 
				+  \acro{ANS}{Arithmetic numeral system}
			
 
				 \end{acronym}
			
--- a/latex/tex/kapitel/k2_dna_structure.tex
+++ b/latex/tex/kapitel/k2_dna_structure.tex
@@ -17,8 +17,16 @@
 
				 %- \acs{DNA}S STOCHASTICAL ATTRIBUTES 
			
 
				 %- IMPACT ON COMPRESSION
			
 
				 
			
 
				-\chapter{DNA Structure}
			
 
				-\ac{DNA} is well known in the form of a double helix. A double helix consists, as the name suggestes, of two single helix. Each of them consists of two main components: the Suggar Phosphat backbone, which is irelavant for this Paper and the Bases. The arrangement of Bases represents the Information stored in the \acs{DNA}. A base is an organic molecule, they are called Nucleotides. %Nucleotides have special attributes and influence other Nucleotides in the \acs{DNA} Sequence
			
 
				+\chapter{Structure Of Biological Data}
			
 
				+To strengthen the understanding how and where biological information is stored, this section starts with a quick and general rundown on the structure of any living organism.
			
 
				+% todo add picture
			
 
				+All living organisms, like plants and animals, are made of cells (a human body can consist out of several trillion cells).
			
 
				+% human body estimated 3.72 x 10^13 cells https://www.tandfonline.com/doi/full/10.3109/03014460.2013.807878
			
 
				+A cell in itsel is a living organism, the smalles one possible. A cell got two layers, the inner one is called nucleus wich contains chromosomes. The chromosomes hold the genetic information in form of \acs{DNA}. 
			
 
				+ 
			
 
				+\section{DNA}
			
 
				+\ac{DNA} is often seen in the form of a double helix. A double helix consists, as the name suggestes, of two single helix. Each of them consists of two main components: the Suggar Phosphat backbone, which is irelavant for this Paper and the Bases. The arrangement of Bases represents the Information stored in the \acs{DNA}. A base is an organic molecule, they are called Nucleotides. %Nucleotides have special attributes and influence other Nucleotides in the \acs{DNA} Sequence
			
 
				+% describe Genomes?
			
 
				 
			
 
				 \section{Nucleotides}
			
 
				 For this paper, nucleotides are the most important parts of the \acs{DNA}. A Nucleotide can have one of four forms: it can be either adenine, thymine, guanine or cytosine. Each of them got a Counterpart on the helix, to be more explicit: adenine can only bond with thymine, guanine can only bond with cytosine. This means with the content of one helix, the other one can be determined by ``inverting'' the first. The counterpart for e.g.: adenine, guanine, adenine would be: thymine, cytosine, thymine. For the sake of simplicity, one does not write out the full name of each nucleotide but only use its initial: AGA in one Helix, TCT in the other.
			
--- a/latex/tex/kapitel/k_algorithms
+++ b/latex/tex/kapitel/k_algorithms
@@ -1,27 +0,0 @@
 
				-%SUMMARY
			
 
				-%- ABSTRACT
			
 
				-%- INTRODUCTION
			
 
				-%# BASICS
			
 
				-%- \acs{DNA} STRUCTURE
			
 
				-%- DATA TYPES
			
 
				-% - BAM/FASTQ
			
 
				-% - NON STANDARD
			
 
				-%- COMPRESSION APPROACHES
			
 
				-% - SAVING DIFFERENCES WITH GIVEN BASE \acs{DNA}
			
 
				-% - HUFFMAN ENCODING
			
 
				-% - PROBABILITY APPROACHES (WITH BASE?)
			
 
				-%
			
 
				-%# COMPARING TOOLS
			
 
				-%- 
			
 
				-%# POSSIBLE IMPROVEMENT
			
 
				-%- \acs{DNA}S STOCHASTICAL ATTRIBUTES 
			
 
				-%- IMPACT ON COMPRESSION
			
 
				-
			
 
				-\section{Compression aproaches}
			
 
				-Several algorithms for data compression, have been prooven efficient over the last decades. The well known Huffman coding, is used in several Tools for genome compression (genomic squeeze <- offizial | inofficial -> GDC, GRS).
			
 
				-
			
 
				-further algos
			
 
				-- (r)ANS Arithmetik numeral systems
			
 
				-- Arithmetic encoding
			
 
				-
			
 
				-
			
--- a/latex/tex/kapitel/k_algorithms.tex
+++ b/latex/tex/kapitel/k_algorithms.tex
@@ -0,0 +1,29 @@
 
				+%SUMMARY
			
 
				+%- ABSTRACT
			
 
				+%- INTRODUCTION
			
 
				+%# BASICS
			
 
				+%- \acs{DNA} STRUCTURE
			
 
				+%- DATA TYPES
			
 
				+% - BAM/FASTQ
			
 
				+% - NON STANDARD
			
 
				+%- COMPRESSION APPROACHES
			
 
				+% - SAVING DIFFERENCES WITH GIVEN BASE \acs{DNA}
			
 
				+% - HUFFMAN ENCODING
			
 
				+% - PROBABILITY APPROACHES (WITH BASE?)
			
 
				+%
			
 
				+%# COMPARING TOOLS
			
 
				+%- 
			
 
				+%# POSSIBLE IMPROVEMENT
			
 
				+%- \acs{DNA}S STOCHASTICAL ATTRIBUTES 
			
 
				+%- IMPACT ON COMPRESSION
			
 
				+
			
 
				+\chapter{Compression aproaches}
			
 
				+The process of compressing data serves the goal to generate an output that is smaller than its input data. In many cases, like in gene compressing, the compression is idealy lossless. This means it is possible for every compressed data, to receive the full information that were available in the origin data, by decompressing it. Lossy compression on the other hand, might excludes parts of data in the compression process, in order to increase the compression rate. The excluded parts are typicaly not necessary to transmit the origin information. This works with certain audio and pictures files or network protocols that are used to transmit video/audio streams live.
			
 
				+For \acs{DNA} a lossless compression is needed. To be preceice a lossy compression is not possible, because there is no unnecessary data. Every nucleotide and its position is needed for the sequenced \acs{DNA} to be complete.
			
 
				+
			
 
				+% list of algos and the tools that use them
			
 
				+The well known Huffman coding, is used in several Tools for genome compression (genomic squeeze <- official | inofficial -> GDC, GRS). Further \ac{ANS} or rANS ... TBD.
			
 
				+
			
 
				+\subsection{Huffman encoding}
			
 
				+
			
 
				+\section{Probability aproaches}
			
--- a/latex/tex/kapitel/k_datatypes.tex
+++ b/latex/tex/kapitel/k_datatypes.tex
@@ -25,7 +25,7 @@ To optimize this, people have developed other filetypes, that focuse on storing
 
				 \begin{itemize}
			
 
				   \item{FASTQ}
			
 
				   \item{SAM/BAM}
			
 
				-  \item{...}
			
 
				+  %\item{...}
			
 
				 \end{itemize}
			
 
				 
			
 
				 %BAM : Contains sequence, quality, mapping, signal, etc
			
--- a/latex/tex/thesis.tex
+++ b/latex/tex/thesis.tex
@@ -134,9 +134,10 @@
 
				 
			
 
				 % ------------------------------------------------------------------
			
 
				 % Hauptteil der Arbeit
			
 
				-\input{kapitel/k1_introduction} % Externe Datei einbinden
			
 
				-\input{kapitel/k2_dna_structure} % Externe Datei einbinden
			
 
				-\input{kapitel/k_datatypes} % Externe Datei einbinden
			
 
				+\input{kapitel/k1_introduction} 
			
 
				+\input{kapitel/k2_dna_structure}
			
 
				+\input{kapitel/k_datatypes} 
			
 
				+\input{kapitel/k_algorithms} % Externe Datei einbinden
			
 
				 % ------------------------------------------------------------------
			
 
				 
			
 
				 \label{lastpage}