DNA sequencing


DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, thymine, cytosine, and guanine. The advent of rapid DNA sequencing methods has greatly accelerated biological and medical research and discovery.
Knowledge of DNA sequences has become indispensable for basic biological research, DNA Genographic Projects and in numerous applied fields such as medical diagnosis, biotechnology, forensic biology, virology and biological systematics. Comparing healthy and mutated DNA sequences can diagnose different diseases including various cancers, characterize antibody repertoire, and can be used to guide patient treatment. Having a quick way to sequence DNA allows for faster and more individualized medical care to be administered, and for more organisms to be identified and cataloged.
The rapid advancements in DNA sequencing technology have played a crucial role in sequencing complete genomes of humans, as well as numerous animal, plant, and microbial species.
The first DNA sequences were obtained in the early 1970s by academic researchers using laborious methods based on two-dimensional chromatography. Following the development of fluorescence-based sequencing methods with a DNA sequencer, DNA sequencing has become easier and orders of magnitude faster.

Applications

DNA sequencing can be used to determine the sequence of individual genes, larger genetic regions, full chromosomes, or entire genomes of any organism. DNA sequencing is also the most efficient way to indirectly sequence RNA or proteins. In fact, DNA sequencing has become a key technology in many areas of biology and other sciences such as medicine, forensics, and anthropology.

Molecular biology

Sequencing is used in molecular biology to study genomes and the proteins they encode. Information obtained using sequencing allows researchers to identify changes in genes and noncoding DNA, associations with diseases and phenotypes, and identify potential drug targets.

Evolutionary biology

Since DNA is an informative macromolecule in terms of transmission from one generation to another, DNA sequencing is used in evolutionary biology to study how different organisms are related and how they evolved. In February 2021, scientists reported, for the first time, the sequencing of DNA from animal remains, a mammoth in this instance, over a million years old, the oldest DNA sequenced to date.

Metagenomics

The field of metagenomics involves identification of organisms present in a body of water, sewage, dirt, debris filtered from the air, or swab samples from organisms. Knowing which organisms are present in a particular environment is critical to research in ecology, epidemiology, microbiology, and other fields. Sequencing enables researchers to determine which types of microbes may be present in a microbiome, for example.

Virology

As most viruses are too small to be seen by a light microscope, sequencing is one of the main tools in virology to identify and study the virus. Viral genomes can be based in DNA or RNA. RNA viruses are more time-sensitive for genome sequencing, as they degrade faster in clinical samples. Traditional Sanger sequencing and next-generation sequencing are used to sequence viruses in basic and clinical research, as well as for the diagnosis of emerging viral infections, molecular epidemiology of viral pathogens, and drug-resistance testing. There are more than 2.3 million unique viral sequences in GenBank. In 2019, NGS has surpassed traditional Sanger as the most popular approach for generating viral genomes.
During the 1997 avian influenza outbreak, viral sequencing determined that the influenza sub-type originated through reassortment between quail and poultry. This led to legislation in Hong Kong that prohibited selling live quail and poultry together at market. Viral sequencing can also be used to estimate when a viral outbreak began by using a molecular clock technique.

Medicine

Medical technicians may sequence genes from patients to determine if there is risk of genetic diseases. This is a form of genetic testing, though not all genetic tests involve complete genome DNA sequencing.
As of 2013 DNA sequencing was increasingly used to diagnose and treat rare diseases. As more and more genes are identified that cause rare genetic diseases, molecular diagnoses for patients become more mainstream. DNA sequencing allows clinicians to identify genetic diseases, improve disease management, provide reproductive counseling, and more effective therapies. Gene sequencing panels are used to identify multiple potential genetic causes of a suspected disorder.
Also, DNA sequencing may be useful for determining a specific bacteria, to allow for more precise antibiotics treatments, hereby reducing the risk of creating antimicrobial resistance in bacteria populations.

Forensic investigation

DNA sequencing may be used along with DNA profiling methods for forensic identification and paternity testing. DNA testing has evolved tremendously in the last few decades to ultimately link a DNA print to what is under investigation. The DNA patterns in fingerprint, saliva, hair follicles, etc. uniquely separate each living organism from another. Testing DNA is a technique which can detect specific genomes in a DNA strand to produce a unique and individualized pattern.

The four canonical bases

The canonical structure of DNA has four bases: thymine, adenine, cytosine, and guanine. DNA sequencing is the determination of the physical order of these bases in a molecule of DNA. However, there are many other bases that may be present in a molecule. In some viruses, cytosine may be replaced by hydroxymethyl- or hydroxymethylglucose- cytosine. In mammalian DNA, variant bases with methyl groups or phosphosulfate may be found. Depending on the sequencing technique, a particular modification, e.g., the 5mC common in humans, may or may not be detected.
In almost all organisms, DNA is synthesized in vivo using only the 4 canonical bases; modification that occurs post replication creates other bases like 5 methyl C. However, some bacteriophage can incorporate a non standard base directly.
In addition to modifications, DNA is under constant assault by environmental agents such as UV and Oxygen radicals. At the present time, the presence of such damaged bases is not detected by most DNA sequencing methods, although PacBio has published on this.

History

Discovery of DNA structure and function

Deoxyribonucleic acid was first discovered and isolated by Friedrich Miescher in 1869, but it remained under-studied for many decades because proteins, rather than DNA, were thought to hold the genetic blueprint to life. This situation changed after 1944 as a result of some experiments by Oswald Avery, Colin MacLeod, and Maclyn McCarty demonstrating that purified DNA could change one strain of bacteria into another. This was the first time that DNA was shown capable of transforming the properties of cells.
In 1953, James Watson and Francis Crick put forward their double-helix model of DNA, based on crystallized X-ray structures being studied by Rosalind Franklin. According to the model, DNA is composed of two strands of nucleotides coiled around each other, linked together by hydrogen bonds and running in opposite directions. Each strand is composed of four complementary nucleotides – adenine, cytosine, guanine and thymine – with an A on one strand always paired with T on the other, and C always paired with G. They proposed that such a structure allowed each strand to be used to reconstruct the other, an idea central to the passing on of hereditary information between generations.
File:Frederick Sanger2.jpg|thumb|Frederick Sanger, a pioneer of sequencing. Sanger is one of the few scientists who was awarded two Nobel prizes, one for the sequencing of proteins, and the other for the sequencing of DNA.
The foundation for sequencing proteins was first laid by the work of Frederick Sanger who by 1955 had completed the sequence of all the amino acids in insulin, a small protein secreted by the pancreas. This provided the first conclusive evidence that proteins were chemical entities with a specific molecular pattern rather than a random mixture of material suspended in fluid. Sanger's success in sequencing insulin spurred on x-ray crystallographers, including Watson and Crick, who by now were trying to understand how DNA directed the formation of proteins within a cell. Soon after attending a series of lectures given by Frederick Sanger in October 1954, Crick began developing a theory which argued that the arrangement of nucleotides in DNA determined the sequence of amino acids in proteins, which in turn helped determine the function of a protein. He published this theory in 1958.

RNA sequencing

was one of the earliest forms of nucleotide sequencing. The major landmark of RNA sequencing is the sequence of the first complete gene and the complete genome of Bacteriophage MS2, identified and published by Walter Fiers and his coworkers at the University of Ghent, in 1972 and 1976. Traditional RNA sequencing methods require the creation of a cDNA molecule which must be sequenced.

Early DNA sequencing methods

The first method for determining DNA sequences involved a location-specific primer extension strategy established by Ray Wu, a geneticist, at Cornell University in 1970. DNA polymerase catalysis and specific nucleotide labeling, both of which figure prominently in current sequencing schemes, were used to sequence the cohesive ends of lambda phage DNA. Between 1970 and 1973, Wu, scientist Radha Padmanabhan and colleagues demonstrated that this method can be employed to determine any DNA sequence using synthetic location-specific primers.
Walter Gilbert, a biochemist, and Allan Maxam, a molecular geneticist, at Harvard also developed sequencing methods, including one for "DNA sequencing by chemical degradation". In 1973, Gilbert and Maxam reported the sequence of 24 basepairs using a method known as wandering-spot analysis. Advancements in sequencing were aided by the concurrent development of recombinant DNA technology, allowing DNA samples to be isolated from sources other than viruses.
Two years later in 1975, Frederick Sanger, a biochemist, and Alan Coulson, a genome scientist, developed a method to sequence DNA. The technique known as the "Plus and Minus" method, involved supplying all the components of the DNA but excluding the reaction of one of the four bases needed to complete the DNA.
In 1976, Gilbert and Maxam, invented a method for rapidly sequencing DNA while at Harvard, known as the Maxam–Gilbert sequencing. The technique involved treating radiolabelled DNA with a chemical and using a polyacrylamide gel to determine the sequence.
In 1977, Sanger then adopted a primer-extension strategy to develop more rapid DNA sequencing methods at the MRC Centre, Cambridge, UK. This technique was similar to his "Plus and Minus" strategy, however, it was based upon the selective incorporation of chain-terminating dideoxynucleotides by DNA polymerase during in vitro DNA replication. Sanger published this method in the same year.