BAM (file format)
The BAM file format is the comprehensive raw data of genome sequencing. It consists of the lossless, compressed binary representation of a set of Sequence Alignment Map files.
Schema
BAM is the compressed binary representation of SAM, a compact and index-able representation of nucleotide sequence alignments. The goal of indexing is to retrieve alignments that overlap a specific location quickly without having to go through all of them. Before indexing, BAM must be sorted by reference ID and then leftmost coordinate. BAM is in compressed BGZF format.The structure of BAM files include a header section and an alignment section:
- Header—The sample name, sample length, and alignment method are all included in this section. The alignments section contains alignments that are linked to specific information in the header section.
- Alignments—The read name, read sequence, read quality, alignment information, and custom tags are all included in this file. The chromosome, start coordinate, alignment quality, and match descriptor string are all included in the read name.
- * Alignment Section includes the following:
- ** Read Group
- ** Barcode Tag
- ** Single-end alignment quality
- ** Paired-end alignment quality
- ** Edit distance tag
- ** Amplicon name tag