u пре 3 година
родитељ
комит
77f847a059
1 измењених фајлова са 8 додато и 8 уклоњено
  1. 8 8
      latex/tex/kapitel/k3_datatypes.tex

+ 8 - 8
latex/tex/kapitel/k3_datatypes.tex

@@ -4,7 +4,7 @@
 %# BASICS
 %- \acs{DNA} STRUCTURE
 %- DATA TYPES
-% - BAM/FASTQ
+% - BAM/\ac{FASTQ}
 % - NON STANDARD
 %- COMPRESSION APPROACHES
 % - SAVING DIFFERENCES WITH GIVEN BASE \acs{DNA}
@@ -40,7 +40,7 @@ Some common fileformats would be:
 \begin{itemize}
 % which is relevant? 
   \item{FASTA}
-  \item{FASTQ}
+  \item{\ac{FASTQ}}
   \item{twoBit}
   \item{SAM/BAM}
   \item{VCF}
@@ -52,9 +52,9 @@ Since methods to store this kind of Data are still in development, there are man
 %rewrite:
 In order to not go beyond the scope, this paper will only focuse on compression tools which are using standard formats.
 
-\section{FASTQ}
+\section{\ac{FASTQ}}
 Is a text base format for storing sequenced data. It saves nucleotides as letters and in extend to that, the quality values are saved.
-FASTQ files are split into multiples of four, each four lines contain the informations for one sequence. The exact structure of FASTQ format is as follows:
+\ac{FASTQ} files are split into multiples of four, each four lines contain the informations for one sequence. The exact structure of \ac{FASTQ} format is as follows:
 \texttt{
 Line 1: Sequence identifier aka. Title, starting with an @ and an optional description.\\
 Line 2: The seuqence consisting of nucleoids, symbolized by A, T, G and C.\\
@@ -65,9 +65,9 @@ The quality values have no fixed type, to name a few there is the sanger format,
 The quality value shows the estimated probability of error in the sequencing process.
 [...]
 
-\section{SAM/BAM}
+\section{Sequence Alignment Map}
 % src https://github.com/samtools/samtools
-\ac{SAM} often seen in its compressed, binary representation \ac{BAM} with the fileextension \texttt{.bam}, is part of the SAMtools package, a uitlity tool for processing SAM/BAM and CRAM files. The SAM/BAM file is a text based format delimited by TABs. It uses 7-bit US-ASCII, to be precise Charset ANSI X3.4-1968 as defined in RFC1345. The structure is more complex than the one in FASTQ and described best, accompanied by an example:
+\ac{SAM} often seen in its compressed, binary representation \ac{BAM} with the fileextension \texttt{.bam}, is part of the SAMtools package, a uitlity tool for processing SAM/BAM and CRAM files. The SAM/BAM file is a text based format delimited by TABs. It uses 7-bit US-ASCII, to be precise Charset ANSI X3.4-1968 as defined in RFC1345. The structure is more complex than the one in \ac{FASTQ} and described best, accompanied by an example:
 
 \begin{figure}[ht]
   \centering
@@ -89,7 +89,7 @@ The regulare expression, shown above, filters touple of characters from a to z i
 %- allows viewing BAM data (localy and remote via ftp/http)
 %- file extention: <filename>.bam.bai
 
-%- stores more data than FASTQ
+%- stores more data than \ac{FASTQ}
  
 % src: https://support.illumina.com/help/BS_App_RNASeq_Alignment_OLH_1000000006112/Content/Source/Informatics/BAM-Format.htm
 %- allignment section includes
@@ -102,6 +102,6 @@ The regulare expression, shown above, filters touple of characters from a to z i
 
 %- BAM index files nameschema: <filename>.bam.bai 
 
-\section{CRAM - Compressed Reference-oriented Ailgnment Map}
+\section{Compressed Reference-oriented Ailgnment Map}
 \ac{CRAM} was developed as an alternative to the \ac{SAM} and \ac{BAM} Format. It specification is maintained by \ac{GA4GH}. It features both lossy and lossless compression mode. Since it is not relevant to this work, the lossy compression is ignored from here on. Even though it is part of \ac{GA4GH} suite, the file format can be used independently.\\
 The format saves data in containers which consist out of slices. Each slice is represented by a line in the file. Container and slices each store metadata in a header. Data is stored as blocks in slices, in a compressed form.