瀏覽代碼

added pdf, finished results for now.

u 3 年之前
父節點
當前提交
dc5f0411ba
共有 2 個文件被更改,包括 4 次插入2 次删除
  1. 二進制
      latex/result/thesis.pdf
  2. 4 2
      latex/tex/kapitel/k6_results.tex

二進制
latex/result/thesis.pdf


+ 4 - 2
latex/tex/kapitel/k6_results.tex

@@ -57,7 +57,7 @@ Overall, Samtools \acs{BAM} resulted in 71.76\% size reduction, the \acs{CRAM} m
 \sffamily
 \begin{footnotesize}
   \begin{longtable}[ht]{ p{.2\textwidth} p{.2\textwidth} p{.2\textwidth} p{.2\textwidth}}
-    \caption[Compression Effectivity]                       % Caption für das Tabellenverzeichnis
+    \caption[Compression Efficiency]                       % Caption für das Tabellenverzeichnis
         {Compression duration in seconds} % Caption für die Tabelle selbst
         \\
     \toprule
@@ -154,6 +154,9 @@ Reviewing \ref{t:recal-time} one will notice, that \acs{GeCo} reached a runtime
 In both tables \ref{t:recal-time} and \ref{t:recal-size} the already identified pattern can be observed. Looking at the compression ratio in \ref{t:recal-size} a maximum compression of 99.04\% was reached with \acs{GeCo}. In this set of test files, file seven were the one with the greatest size (\~1.3 Gigabyte). Closely folled by file one and two (\~1.2 Gigabyte). 
 
 \section{View on Possible Improvements}
+So far, this work went over formats for storing genomes, methods to compress files (in mentioned formats) and through tests where implementations of named algorithms compress several files and analyzed the results. The test results show that \acs{GeCo} provides a better compression ratio than Samtools and takes more time to run through. So in this testrun, implementations of arithmetic coding resulted in a better compression ratio than Samtools \acs{BAM} with the mix of huffman coding and \acs{LZ77}, or Samtools custom compression format \acs{CRAM}. Comparing results in \autocite{survey}, supports this statement. This study used \acs{FASTA}/Multi-FASTA files from 71MB to 166MB and found that \acs{GeCo} had a variating compression ratio from 12.34 to 91.68 times smaller than the input reference and also resulted in long runtimes up to over 600 minutes \cite{survey}. Since this study focused on another goal than this work and therefore used different test variables and environments, the results can not be compared. But what can be taken from this, is that arithmetic coding, at least in \acs{GeCo} is in need of a runtime improvement.\\
+The actual mathematical proove of such an improvemnt and its implementation can not be covered because it would to beyond scope. But in order to set up a foundation for this task, the rest of this work will consist of considerations and problem analysis, which should be thought about and dealt with to develop a improvement.
+
 S.V. Petoukhov described his findings about the distribution of nucleotides \cite{pet21}. With the probability of one nucleotide, in a sequence of sufficient length, information about the direct neighbours is revealed. For example, with the probability of \texttt{C}, the probabilities for sets (n-plets) of any nucleotide \texttt{N}, including \texttt{C} can be determined without counting them \cite{pet21}.\\
 %\%C ≈ Σ\%CN ≈ Σ\%NС ≈ Σ\%CNN ≈ Σ\%NCN ≈ Σ\%NNC ≈ Σ\%CNNN ≈ Σ\%NCNN ≈ Σ\%NNCN ≈ Σ\%NNNC\\
 
@@ -222,7 +225,6 @@ Without determining probabilities, one can see that the amount of \texttt{A}s ou
 % length cutting
 
 
-
 % how is data interpreted
 % why did the tools result in this, what can we learn
 % improvements