Sanger sequencing
Sanger sequencing is a method of DNA sequencing that involves electrophoresis and is based on the random incorporation of chain-terminating dideoxynucleotides by DNA polymerase during in vitro DNA replication. After first being developed by Frederick Sanger and colleagues in 1977, it became the most widely used sequencing method for approximately 40 years. An automated instrument using slab gel electrophoresis and fluorescent labels was first commercialized by Applied Biosystems in March 1987. Later, automated slab gels were replaced with automated capillary array electrophoresis.
Recently, higher volume Sanger sequencing has been replaced by next generation sequencing methods, especially for large-scale, automated genome analyses. However, the Sanger method remains in wide use for smaller-scale projects and for validation of deep sequencing results. It still has the advantage over short-read sequencing technologies in that it can produce DNA sequence reads of > 500 nucleotides and maintains a very low error rate with accuracies around 99.99%. Sanger sequencing is still actively being used in efforts for public health initiatives such as sequencing the spike protein from SARS-CoV-2 as well as for the surveillance of norovirus outbreaks through the United States Center for Disease Control and Prevention 's CaliciNet surveillance network.
Method
The classical chain-termination method requires a single-stranded DNA template, a DNA primer, a DNA polymerase, normal deoxynucleotide triphosphates, and modified di-deoxynucleotide triphosphates, the latter of which terminate DNA strand elongation. These chain-terminating nucleotides lack a 3'-OH group required for the formation of a phosphodiester bond between two nucleotides, causing DNA polymerase to cease extension of DNA when a modified ddNTP is incorporated. The ddNTPs may be radioactively or fluorescently labelled for detection in automated sequencing machines.The DNA sample is divided into four separate sequencing reactions, containing all four of the standard deoxynucleotides and the DNA polymerase. To each reaction is added only one of the four dideoxynucleotides, while the other added nucleotides are ordinary ones. The deoxynucleotide concentration should be approximately 100-fold higher than that of the corresponding dideoxynucleotide to allow enough fragments to be produced while still transcribing the complete sequence. Putting it in a more sensible order, four separate reactions are needed in this process to test all four ddNTPs. Following rounds of template DNA extension from the bound primer, the resulting DNA fragments are heat denatured and separated by size using gel electrophoresis. In the original publication of 1977, the formation of base-paired loops of ssDNA was a cause of serious difficulty in resolving bands at some locations. This is frequently performed using a denaturing polyacrylamide-urea gel with each of the four reactions run in one of four individual lanes. The DNA bands may then be visualized by autoradiography or UV light, and the DNA sequence can be directly read off the X-ray film or gel image. Image:Sequencing.jpg|thumb|right|Part of a radioactively labelled sequencing gel In the image on the right, X-ray film was exposed to the gel, and the dark bands correspond to DNA fragments of different lengths. A dark band in a lane indicates a DNA fragment that is the result of chain termination after incorporation of a dideoxynucleotide. The relative positions of the different bands among the four lanes, from bottom to top, are then used to read the DNA sequence.
Image:DNA Sequencin 3 labeling methods.jpg|thumb|left|DNA fragments are labelled with a radioactive or fluorescent tag on the primer, in the new DNA strand with a labeled dNTP, or with a labeled ddNTP. Technical variations of chain-termination sequencing include tagging with nucleotides containing radioactive phosphorus for radiolabelling, or using a primer labeled at the 5' end with a fluorescent dye. Dye-primer sequencing facilitates reading in an optical system for faster and more economical analysis and automation. The later development by Leroy Hood and coworkers of fluorescently labeled ddNTPs and primers set the stage for automated, high-throughput DNA sequencing.Image:Radioactive Fluorescent Seq.jpg|thumb|Sequence ladder by radioactive sequencing compared to fluorescent peaks
Chain-termination methods have greatly simplified DNA sequencing. For example, chain-termination-based kits are commercially available that contain the reagents needed for sequencing, pre-aliquoted and ready to use. Limitations include non-specific binding of the primer to the DNA, affecting accurate read-out of the DNA sequence, and DNA secondary structures affecting the fidelity of the sequence.
Dye-terminator sequencing
Dye-terminator sequencing utilizes labelling of the chain terminator ddNTPs, which permits sequencing in a single reaction rather than four reactions as in the labelled-primer method. In dye-terminator sequencing, each of the four dideoxynucleotide chain terminators is labelled with fluorescent dyes, each of which emits light at different wavelengths.Owing to its greater expediency and speed, dye-terminator sequencing is now the mainstay in automated sequencing. Its limitations include dye effects due to differences in the incorporation of the dye-labelled chain terminators into the DNA fragment, resulting in unequal peak heights and shapes in the electronic DNA sequence trace electropherogram after capillary electrophoresis.
This problem has been addressed with the use of modified DNA polymerase enzyme systems and dyes that minimize incorporation variability, as well as methods for eliminating "dye blobs". The dye-terminator sequencing method, along with automated high-throughput DNA sequence analyzers, was used for the vast majority of sequencing projects until the introduction of next generation sequencing.
Automation and sample preparation
Automated DNA-sequencing instruments can sequence up to 384 DNA samples in a single batch. Batch runs may occur up to 24 times a day. DNA sequencers separate strands by size using capillary electrophoresis, they detect and record dye fluorescence, and output data as fluorescent peak trace chromatograms. Sequencing reactions, cleanup and re-suspension of samples in a buffer solution are performed separately, before loading samples onto the sequencer. A number of commercial and non-commercial software packages can trim low-quality DNA traces automatically. These programs score the quality of each peak and remove low-quality base peaks. The accuracy of such algorithms is inferior to visual examination by a human operator, but is adequate for automated processing of large sequence data sets.Applications of dye-terminating sequencing
The field of public health plays many roles to support patient diagnostics as well as environmental surveillance of potential toxic substances and circulating biological pathogens. Public health laboratories and other laboratories around the world have played a pivotal role in providing rapid sequencing data for the surveillance of the virus SARS-CoV-2, causative agent for COVID-19, during the pandemic that was declared a public health emergency on January 30, 2020. Laboratories were tasked with the rapid implementation of sequencing methods and asked to provide accurate data to assist in the decision-making models for the development of policies to mitigate spread of the virus. Many laboratories resorted to next generation sequencing methodologies while others supported efforts with Sanger sequencing. The sequencing efforts of SARS-CoV-2 are many, while most laboratories implemented whole genome sequencing of the virus, others have opted to sequence very specific genes of the virus such as the S-gene, encoding the information needed to produce the spike protein. The high mutation rate of SARS-CoV-2 leads to genetic differences within the S-gene and these differences have played a role in the infectivity of the virus. Sanger sequencing of the S-gene provides a quick, accurate, and more affordable method to retrieving the genetic code. Laboratories in lower income countries may not have the capabilities to implement expensive applications such as next generation sequencing, so Sanger methods may prevail in supporting the generation of sequencing data for surveillance of variants.Sanger sequencing is also the "gold standard" for norovirus surveillance methods for the Center for Disease Control and Prevention's CaliciNet network. CalciNet is an outbreak surveillance network that was established in March 2009. The goal of the network is to collect sequencing data of circulating noroviruses in the United States and activate downstream action to determine the source of infection to mitigate the spread of the virus. The CalciNet network has identified many infections as foodborne illnesses. This data can then be published and used to develop recommendations for future action to prevent tainting food. The methods employed for detection of norovirus involve targeted amplification of specific areas of the genome. The amplicons are then sequenced using dye-terminating Sanger sequencing and the chromatograms and sequences generated are analyzed with a software package developed in BioNumerics. Sequences are tracked and strain relatedness is studied to infer epidemiological relevance.
Challenges
Common challenges of DNA sequencing with the Sanger method include poor quality in the first 15–40 bases of the sequence due to primer binding and deteriorating quality of sequencing traces after 700–900 bases. Base calling software such as Phred typically provides an estimate of quality to aid in trimming of low-quality regions of sequences.In cases where DNA fragments are cloned before sequencing, the resulting sequence may contain parts of the cloning vector. In contrast, PCR-based cloning and next-generation sequencing technologies based on pyrosequencing often avoid using cloning vectors. Recently, one-step Sanger sequencing methods such as Ampliseq and SeqSharp have been developed that allow rapid sequencing of target genes without cloning or prior amplification.
Current methods can directly sequence only relatively short DNA fragments in a single reaction. The main obstacle to sequencing DNA fragments above this size limit is insufficient power of separation for resolving large DNA fragments that differ in length by only one nucleotide.