Quellcode durchsuchen

changed result tables, added percentages. extended analysis

u vor 3 Jahren
Ursprung
Commit
df8ef4c619
2 geänderte Dateien mit 157 neuen und 32 gelöschten Zeilen
  1. 73 32
      latex/tex/kapitel/k6_results.tex
  2. 84 0
      latex/tex/make

+ 73 - 32
latex/tex/kapitel/k6_results.tex

@@ -1,46 +1,87 @@
 \chapter{Results and Discussion}
+The two tables \ref{t:effectivity}, \ref{t:efficiency} contain raw measurement values for the two goals, described in \ref{k5:goals}. The first table visualizes how long each compression procedure took, in milliseconds. The second one contains file sizes in bytes. Each row contains information about one of the \texttt{Homo\_sapiens.GRCh38.dna.chromosome.}x\texttt{.fa} files. To improve readability, the filename in all tables were replaced by \texttt{File}. To determine which file was compressed, simply replace the placeholder with the number following \texttt{File}.\\
 
-\label{t:effectivity}
+The units milliseconds and bytes store a high persicion for measurements. Unfortunally they are harder to read and compare to the human eye. Therefore, starting with comparing sizes, \ref{t:sizepercent} contian the percentual size of each file in relation to the respective source file. The compression with \acs{GeCo} with the file Homo\_sapiens.GRCh38.dna.chromosome.11.fa resulted in a file that were only 17.6\% as big.\\
+
+\label{t:sizepercent}
 \sffamily
 \begin{footnotesize}
-  \begin{longtable}[c]{ p{.2\textwidth} p{.2\textwidth} p{.2\textwidth} p{.2\textwidth}}
+  \begin{longtable}[r]{ p{.2\textwidth} p{.2\textwidth} p{.2\textwidth} p{.2\textwidth}}
     \caption[Compression Effectivity]                       % Caption für das Tabellenverzeichnis
-        {File sizes in different compression formats} % Caption für die Tabelle selbst
+        {File sizes in different compression formats in \textbf{percent}} % Caption für die Tabelle selbst
         \\
     \toprule
-     \textbf{ID.} & \textbf{Source File} & \textbf{\acs{GeCo}} & \textbf{Samtools \acs{CRAM}} \\
+     \textbf{ID.} & \textbf{\acs{GeCo} \%} & \textbf{Samtools \acs{BAM}\%}& \textbf{Samtools \acs{CRAM} \%} \\
     \midrule
-     File 1& 253105752& 46364770& 55769827\\
-     File 2& 136027438& 27411806& 32238052\\
-     File 3& 137338124& 27408185& 32529673\\
-     File 4& 135496623& 27231126& 32166751\\
-     File 5& 116270459& 20696778& 23568321\\
-     File 6& 108827838& 18676723& 21887811\\
-     File 7& 103691101& 16804782& 20493276\\
-     File 8& 91844042& 16005173& 19895937\\
-     File 9& 84645123& 15877526& 20177456\\
-     File 10& 81712897& 16344067& 19310998\\
-     File 11& 59594634& 10488207& 14251243\\
-     File 12& 246230144& 49938168& 58026123\\
-     File 13& 65518294& 13074402& 15510100\\
-     File 14& 47488540& 7900773& 9708258\\
-     File 15& 51665500& 41117340& 47707954\\
-     File 16& 201600541& 39248276& 45564837\\
-     File 17& 193384854& 37133480& 43655371\\
-     File 18& 184563953& 35355184& 40980906\\
-     File 19& 173652802& 31813760& 38417108\\
-     File 20& 162001796& 30104816& 34926945\\
-     File 21& 147557670& 23932541& 29459829\\
+			File 1& 18.32& 24.51& 22.03\\
+			File 2& 20.15& 26.36& 23.7\\
+			File 3& 19.96& 26.14& 23.69\\
+			File 4& 20.1& 26.26& 23.74\\
+			File 5& 17.8& 22.76& 20.27\\
+			File 6& 17.16& 22.31& 20.11\\
+			File 7& 16.21& 21.69& 19.76\\
+			File 8& 17.43& 23.48& 21.66\\
+			File 9& 18.76& 25.16& 23.84\\
+			File 10& 20.0& 25.31& 23.63\\
+			File 11& 17.6& 24.53& 23.91\\
+			File 12& 20.28& 26.56& 23.57\\
+			File 13& 19.96& 25.6& 23.67\\
+			File 14& 16.64& 22.06& 20.44\\
+			File 15& 79.58& 103.72& 92.34\\
+			File 16& 19.47& 25.52& 22.6\\
+			File 17& 19.2& 25.25& 22.57\\
+			File 18& 19.16& 25.04& 22.2\\
+			File 19& 18.32& 24.4& 22.12\\
+			File 20& 18.58& 24.14& 21.56\\
+			File 21& 16.22& 22.17& 19.96\\
+      &&&\\
+			\textbf{Total}& 21.47& 28.24& 25.59\\
     \bottomrule
   \end{longtable}
 \end{footnotesize}
+
 \rmfamily
-% raw data and charts
-% differences in used algos/ algos in tools <- k5?
-% optimization approach
-% further research focus <- ask if wanted
+Overall, Samtools \acs{BAM} resulted in 71.76\% size reduction, the \acs{CRAM} methode improved this by rughly 2.5\%. \acs{GeCo} provided the greatest reduction with 78.53\%. This gap of about 4\% comes with a comparatively great sacrifice in time.\\
 
-% todo ms to minutes and bytes to mb. Those tables move to the appendix
-The two tables above contain rather raw measurement values for the two goals, described in \ref{k5:goals}. The first table shows how long each compression procedure took. Each row contains information about one of the \texttt{Homo\_sapiens.GRCh38.dna.chromosome.}x\texttt{.fa} files. To improve readability, the filename were replaced by \texttt{File}. To determine which file was compressed, simply replace the placeholder with the number following \texttt{File}.\\
 
-While \acs{GeCo} takes more time to compress, an increase in effectivity, meaning in the reduction of file size, can be recognized.\\ 
+\label{t:time}
+\sffamily
+\begin{footnotesize}
+  \begin{longtable}[r]{ p{.2\textwidth} p{.2\textwidth} p{.2\textwidth} p{.2\textwidth}}
+    \caption[Compression Effectivity]                       % Caption für das Tabellenverzeichnis
+        {Compression duration in seconds} % Caption für die Tabelle selbst
+        \\
+    \toprule
+     \textbf{ID.} & \textbf{\acs{GeCo} } & \textbf{Samtools \acs{BAM}}& \textbf{Samtools \acs{CRAM} } \\
+    \midrule
+			compress time for geco, bam and cram in seconds
+			File 1 & 23.5& 3.786& 16.926\\
+			File 2 & 24.65& 3.784& 17.043\\
+			File 3 & 2.016& 3.123& 13.999\\
+			File 4 & 19.408& 3.011& 13.445\\
+			File 5 & 18.387& 2.862& 12.802\\
+			File 6 & 17.364& 2.685& 12.015\\
+			File 7 & 15.999& 2.503& 11.198\\
+			File 8 & 14.828& 2.286& 10.244\\
+      File 9 & 12.304& 2.078& 9.21\\
+			File 10 & 13.493& 2.127& 9.461\\
+			File 11 & 13.629& 2.132& 9.508\\
+			File 12 & 13.493& 2.115& 9.456\\
+			File 13 & 99.902& 1.695& 7.533\\
+			File 14 & 92.475& 1.592& 7.011\\
+			File 15 & 85.255& 1.507& 6.598\\
+			File 16 & 82.765& 1.39& 6.089\\
+			File 17 & 82.081& 1.306& 5.791\\
+			File 18 & 79.842& 1.277& 5.603\\
+			File 19 & 58.605& 0.96& 4.106\\
+			File 20 & 64.588& 1.026& 4.507\\
+			File 21 & 41.198& 0.721& 3.096\\
+      &&&\\
+      \textbf{Total}&42.57&2.09&9.32\\
+    \bottomrule
+  \end{longtable}
+\end{footnotesize}
+\rmfamily
+
+As \ref{t:time} is showing, the average compression duration for \acs{GeCo} is at 42.57s. That is a little over 33s, or 78\% longer than the average runtime of samtools for compressing into the \acs{CRAM} format.\\
+Before interpreting this data further, a quick view into development processes: \acs{GeCo} stopped development in the year 2016 while Samtools is being developed since 2015, to this day, with over 70 people contributing. Considering the data with that in mind, an improvement in \acs{GeCo}s efficiency, would be a start to equalize the great gap in the compression duration.\\

+ 84 - 0
latex/tex/make

@@ -0,0 +1,84 @@
+\chapter{Results and Discussion}
+
+\label{t:effectivity}
+\sffamily
+\begin{footnotesize}
+  \begin{longtable}[rcrr]{ p{.2\textwidth} p{.2\textwidth} p{.2\textwidth} p{.2\textwidth}}
+    \caption[Compression Effectivity]                       % Caption für das Tabellenverzeichnis
+        {File sizes in different compression formats} % Caption für die Tabelle selbst
+        \\
+    \toprule
+     \textbf{ID.} & \textbf{Source File} & \textbf{\acs{GeCo}} & \textbf{Samtools \acs{CRAM}} \\
+    \midrule
+     File 1& 253105752& 46364770& 55769827\\
+     File 2& 136027438& 27411806& 32238052\\
+     File 3& 137338124& 27408185& 32529673\\
+     File 4& 135496623& 27231126& 32166751\\
+     File 5& 116270459& 20696778& 23568321\\
+     File 6& 108827838& 18676723& 21887811\\
+     File 7& 103691101& 16804782& 20493276\\
+     File 8& 91844042& 16005173& 19895937\\
+     File 9& 84645123& 15877526& 20177456\\
+     File 10& 81712897& 16344067& 19310998\\
+     File 11& 59594634& 10488207& 14251243\\
+     File 12& 246230144& 49938168& 58026123\\
+     File 13& 65518294& 13074402& 15510100\\
+     File 14& 47488540& 7900773& 9708258\\
+     File 15& 51665500& 41117340& 47707954\\
+     File 16& 201600541& 39248276& 45564837\\
+     File 17& 193384854& 37133480& 43655371\\
+     File 18& 184563953& 35355184& 40980906\\
+     File 19& 173652802& 31813760& 38417108\\
+     File 20& 162001796& 30104816& 34926945\\
+     File 21& 147557670& 23932541& 29459829\\
+    \bottomrule
+  \end{longtable}
+\end{footnotesize}
+\rmfamily
+% raw data and charts
+% differences in used algos/ algos in tools <- k5?
+% optimization approach
+% further research focus <- ask if wanted
+
+% todo ms to minutes and bytes to mb. Those tables move to the appendix
+The two tables above contain rather raw measurement values for the two goals, described in \ref{k5:goals}. The first table shows how long each compression procedure took. Each row contains information about one of the \texttt{Homo\_sapiens.GRCh38.dna.chromosome.}x\texttt{.fa} files. To improve readability, the filename were replaced by \texttt{File}. To determine which file was compressed, simply replace the placeholder with the number following \texttt{File}.\\
+
+While \acs{GeCo} takes more time to compress, an increase in effectivity, meaning in the reduction of file size, can be recognized.\\ 
+
+\label{t:sizepercent}
+\sffamily
+\begin{footnotesize}
+  \begin{longtable}[r]{ p{.2\textwidth} p{.2\textwidth} p{.2\textwidth} p{.2\textwidth}}
+    \caption[Compression Effectivity]                       % Caption für das Tabellenverzeichnis
+        {File sizes in different compression formats in \textbf{percent}} % Caption für die Tabelle selbst
+        \\
+    \toprule
+     \textbf{ID.} & \textbf{\acs{GeCo} \%} & \textbf{Samtools \acs{BAM}\%}& \textbf{Samtools \acs{CRAM} \%} \\
+    \midrule
+			File 1& 18.32& 24.51& 22.03\\
+			File 2& 20.15& 26.36& 23.7\\
+			File 3& 19.96& 26.14& 23.69\\
+			File 4& 20.1& 26.26& 23.74\\
+			File 5& 17.8& 22.76& 20.27\\
+			File 6& 17.16& 22.31& 20.11\\
+			File 7& 16.21& 21.69& 19.76\\
+			File 8& 17.43& 23.48& 21.66\\
+			File 9& 18.76& 25.16& 23.84\\
+			File 10& 20.0& 25.31& 23.63\\
+			File 11& 17.6& 24.53& 23.91\\
+			File 12& 20.28& 26.56& 23.57\\
+			File 13& 19.96& 25.6& 23.67\\
+			File 14& 16.64& 22.06& 20.44\\
+			File 15& 79.58& 103.72& 92.34\\
+			File 16& 19.47& 25.52& 22.6\\
+			File 17& 19.2& 25.25& 22.57\\
+			File 18& 19.16& 25.04& 22.2\\
+			File 19& 18.32& 24.4& 22.12\\
+			File 20& 18.58& 24.14& 21.56\\
+			File 21& 16.22& 22.17& 19.96\\
+      &&&\\
+			\textbf{Total}& 21.47& 28.24& 25.59\\
+    \bottomrule
+  \end{longtable}
+\end{footnotesize}
+\rmfamily