瀏覽代碼

pdf recreation

u 3 年之前
父節點
當前提交
77f847a059
共有 1 個文件被更改,包括 8 次插入8 次删除
  1. 8 8
      latex/tex/kapitel/k3_datatypes.tex

+ 8 - 8
latex/tex/kapitel/k3_datatypes.tex

@@ -4,7 +4,7 @@
 %# BASICS
 %# BASICS
 %- \acs{DNA} STRUCTURE
 %- \acs{DNA} STRUCTURE
 %- DATA TYPES
 %- DATA TYPES
-% - BAM/FASTQ
+% - BAM/\ac{FASTQ}
 % - NON STANDARD
 % - NON STANDARD
 %- COMPRESSION APPROACHES
 %- COMPRESSION APPROACHES
 % - SAVING DIFFERENCES WITH GIVEN BASE \acs{DNA}
 % - SAVING DIFFERENCES WITH GIVEN BASE \acs{DNA}
@@ -40,7 +40,7 @@ Some common fileformats would be:
 \begin{itemize}
 \begin{itemize}
 % which is relevant? 
 % which is relevant? 
   \item{FASTA}
   \item{FASTA}
-  \item{FASTQ}
+  \item{\ac{FASTQ}}
   \item{twoBit}
   \item{twoBit}
   \item{SAM/BAM}
   \item{SAM/BAM}
   \item{VCF}
   \item{VCF}
@@ -52,9 +52,9 @@ Since methods to store this kind of Data are still in development, there are man
 %rewrite:
 %rewrite:
 In order to not go beyond the scope, this paper will only focuse on compression tools which are using standard formats.
 In order to not go beyond the scope, this paper will only focuse on compression tools which are using standard formats.
 
 
-\section{FASTQ}
+\section{\ac{FASTQ}}
 Is a text base format for storing sequenced data. It saves nucleotides as letters and in extend to that, the quality values are saved.
 Is a text base format for storing sequenced data. It saves nucleotides as letters and in extend to that, the quality values are saved.
-FASTQ files are split into multiples of four, each four lines contain the informations for one sequence. The exact structure of FASTQ format is as follows:
+\ac{FASTQ} files are split into multiples of four, each four lines contain the informations for one sequence. The exact structure of \ac{FASTQ} format is as follows:
 \texttt{
 \texttt{
 Line 1: Sequence identifier aka. Title, starting with an @ and an optional description.\\
 Line 1: Sequence identifier aka. Title, starting with an @ and an optional description.\\
 Line 2: The seuqence consisting of nucleoids, symbolized by A, T, G and C.\\
 Line 2: The seuqence consisting of nucleoids, symbolized by A, T, G and C.\\
@@ -65,9 +65,9 @@ The quality values have no fixed type, to name a few there is the sanger format,
 The quality value shows the estimated probability of error in the sequencing process.
 The quality value shows the estimated probability of error in the sequencing process.
 [...]
 [...]
 
 
-\section{SAM/BAM}
+\section{Sequence Alignment Map}
 % src https://github.com/samtools/samtools
 % src https://github.com/samtools/samtools
-\ac{SAM} often seen in its compressed, binary representation \ac{BAM} with the fileextension \texttt{.bam}, is part of the SAMtools package, a uitlity tool for processing SAM/BAM and CRAM files. The SAM/BAM file is a text based format delimited by TABs. It uses 7-bit US-ASCII, to be precise Charset ANSI X3.4-1968 as defined in RFC1345. The structure is more complex than the one in FASTQ and described best, accompanied by an example:
+\ac{SAM} often seen in its compressed, binary representation \ac{BAM} with the fileextension \texttt{.bam}, is part of the SAMtools package, a uitlity tool for processing SAM/BAM and CRAM files. The SAM/BAM file is a text based format delimited by TABs. It uses 7-bit US-ASCII, to be precise Charset ANSI X3.4-1968 as defined in RFC1345. The structure is more complex than the one in \ac{FASTQ} and described best, accompanied by an example:
 
 
 \begin{figure}[ht]
 \begin{figure}[ht]
   \centering
   \centering
@@ -89,7 +89,7 @@ The regulare expression, shown above, filters touple of characters from a to z i
 %- allows viewing BAM data (localy and remote via ftp/http)
 %- allows viewing BAM data (localy and remote via ftp/http)
 %- file extention: <filename>.bam.bai
 %- file extention: <filename>.bam.bai
 
 
-%- stores more data than FASTQ
+%- stores more data than \ac{FASTQ}
  
  
 % src: https://support.illumina.com/help/BS_App_RNASeq_Alignment_OLH_1000000006112/Content/Source/Informatics/BAM-Format.htm
 % src: https://support.illumina.com/help/BS_App_RNASeq_Alignment_OLH_1000000006112/Content/Source/Informatics/BAM-Format.htm
 %- allignment section includes
 %- allignment section includes
@@ -102,6 +102,6 @@ The regulare expression, shown above, filters touple of characters from a to z i
 
 
 %- BAM index files nameschema: <filename>.bam.bai 
 %- BAM index files nameschema: <filename>.bam.bai 
 
 
-\section{CRAM - Compressed Reference-oriented Ailgnment Map}
+\section{Compressed Reference-oriented Ailgnment Map}
 \ac{CRAM} was developed as an alternative to the \ac{SAM} and \ac{BAM} Format. It specification is maintained by \ac{GA4GH}. It features both lossy and lossless compression mode. Since it is not relevant to this work, the lossy compression is ignored from here on. Even though it is part of \ac{GA4GH} suite, the file format can be used independently.\\
 \ac{CRAM} was developed as an alternative to the \ac{SAM} and \ac{BAM} Format. It specification is maintained by \ac{GA4GH}. It features both lossy and lossless compression mode. Since it is not relevant to this work, the lossy compression is ignored from here on. Even though it is part of \ac{GA4GH} suite, the file format can be used independently.\\
 The format saves data in containers which consist out of slices. Each slice is represented by a line in the file. Container and slices each store metadata in a header. Data is stored as blocks in slices, in a compressed form.
 The format saves data in containers which consist out of slices. Each slice is represented by a line in the file. Container and slices each store metadata in a header. Data is stored as blocks in slices, in a compressed form.