Plastid DNA


Plastid DNA, also known as chloroplast DNA in photosynthetic organisms, is the DNA located in chloroplasts, which are photosynthetic organelles located within the cells of some eukaryotic organisms, as well as some reduced plastids, such as apicoplasts. Chloroplasts, like other types of plastid, contain a genome separate from that in the cell nucleus. The existence of chloroplast DNA was identified biochemically in 1959, and confirmed by electron microscopy in 1962. The discoveries that the chloroplast contains ribosomes and performs protein synthesis revealed that the chloroplast is genetically semi-autonomous. The first complete chloroplast genome sequences were published in 1986, Nicotiana tabacum by Sugiura and colleagues and Marchantia polymorpha by Ozeki et al. Since then, tens of thousands of chloroplast genomes from various species have been sequenced.

Molecular structure

DNAs are circular, and are typically 120,000–170,000 base pairs long. They can have a contour length of around 30–60 micrometers, and have a mass of about 80–130 million daltons.
Most chloroplasts have their entire chloroplast genome combined into a single large ring, though those of dinophyte algae are a notable exception—their genome is broken up into about forty small plasmids, each 2,000–10,000 base pairs long. Each minicircle contains one to three genes, but blank plasmids, with no coding DNA, have also been found.
Chloroplast DNA has long been thought to have a circular structure, but some evidence suggests that chloroplast DNA more commonly takes a linear shape. Over 95% of the chloroplast DNA in corn chloroplasts has been observed to be in branched linear form rather than individual circles.

Inverted repeats

Many chloroplast DNAs contain two inverted repeats, which separate a long single copy section from a short single copy section.
The inverted repeats vary wildly in length, ranging from 4,000 to 25,000 base pairs long each. Inverted repeats in plants tend to be at the upper end of this range, each being 20,000–25,000 base pairs long.
The inverted repeat regions usually contain three ribosomal RNA and two tRNA genes, but they can be expanded or reduced to contain as few as four or as many as over 150 genes.
While a given pair of inverted repeats are rarely completely identical, they are always very similar to each other, apparently resulting from concerted evolution.
The inverted repeat regions are highly conserved among land plants, and accumulate few mutations. Similar inverted repeats exist in the genomes of cyanobacteria and the other two chloroplast lineages, suggesting that they predate the chloroplast, though some chloroplast DNAs like those of peas and a few red algae have since lost the inverted repeats. Others, like the red alga Porphyra flipped one of its inverted repeats. It is possible that the inverted repeats help stabilize the rest of the chloroplast genome, as chloroplast DNAs which have lost some of the inverted repeat segments tend to get rearranged more.

Nucleoids

Each chloroplast contains around 100 copies of its DNA in young leaves, declining to 15–20 copies in older leaves. They are usually packed into nucleoids which can contain several identical chloroplast DNA rings. Many nucleoids can be found in each chloroplast.
Though chloroplast DNA is not associated with true histones, in red algae, a histone-like chloroplast protein coded by the chloroplast DNA that tightly packs each chloroplast DNA ring into a nucleoid has been found.
In primitive red algae, the chloroplast DNA nucleoids are clustered in the center of a chloroplast, while in green plants, the nucleoids are dispersed throughout the stroma.

Gene content and plastid gene expression

More than 33,000 chloroplast genomes have been sequenced and are accessible via the NCBI organelle genome database. The first chloroplast genomes were sequenced in 1986, from tobacco and liverwort. Comparison of the gene sequences of the cyanobacteria Synechocystis to those of the chloroplast genome of Arabidopsis provided confirmation of the endosymbiotic origin of the chloroplast. It also demonstrated the significant extent of gene transfer from the cyanobacterial ancestor to the nuclear genome.
In most plant species, the chloroplast genome encodes approximately 120 genes. The genes primarily encode core components of the photosynthetic machinery and factors involved in their expression and assembly. Across species of land plants, the set of genes encoded by the chloroplast genome is fairly conserved. This includes four ribosomal RNAs, approximately 30 tRNAs, 21 ribosomal proteins, and 4 subunits of the plastid-encoded RNA polymerase complex that are involved in plastid gene expression. The large Rubisco subunit and 28 photosynthetic thylakoid proteins are encoded within the chloroplast genome.

Chloroplast genome reduction and gene transfer

Over time, many parts of the chloroplast genome were transferred to the nuclear genome of the host, a process called endosymbiotic gene transfer.
As a result, the chloroplast genome is heavily reduced compared to that of free-living cyanobacteria. Chloroplasts may contain 60–100 genes whereas cyanobacteria often have more than 1500 genes in their genome. The parasitic Pilostyles have even lost their plastid genes for tRNA. Contrarily, there are only a few known instances where genes have been transferred to the chloroplast from various donors, including bacteria.
Endosymbiotic gene transfer is how we know about the lost chloroplasts in many chromalveolate lineages. Even if a chloroplast is eventually lost, the genes it donated to the former host's nucleus persist, providing evidence for the lost chloroplast's existence. For example, while diatoms now have a red algal derived chloroplast, the presence of many green algal genes in the diatom nucleus provide evidence that the diatom ancestor had a green algal derived chloroplast at some point, which was subsequently replaced by the red chloroplast.
In land plants, some 11–14% of the DNA in their nuclei can be traced back to the chloroplast, up to 18% in Arabidopsis, corresponding to about 4,500 protein-coding genes. There have been a few recent transfers of genes from the chloroplast DNA to the nuclear genome in land plants.

Proteins encoded by the chloroplast

Of the approximately three-thousand proteins found in chloroplasts, some 95% of them are encoded by nuclear genes. Many of the chloroplast's protein complexes consist of subunits from both the chloroplast genome and the host's nuclear genome. As a result, protein synthesis must be coordinated between the chloroplast and the nucleus. The chloroplast is mostly under nuclear control, though chloroplasts can also give out signals regulating gene expression in the nucleus, called retrograde signaling.

Protein synthesis

Protein synthesis within chloroplasts relies on an RNA polymerase coded by the chloroplast's own genome, which is related to RNA polymerases found in bacteria. Chloroplasts also contain a mysterious second RNA polymerase that is encoded by the plant's nuclear genome. The two RNA polymerases may recognize and bind to different kinds of promoters within the chloroplast genome. The ribosomes in chloroplasts are similar to bacterial ribosomes.

RNA editing in plastids

is the insertion, deletion, and substitution of nucleotides in a mRNA transcript prior to translation to protein. The highly oxidative environment inside chloroplasts increases the rate of mutation so post-transcription repairs are needed to conserve functional sequences. The chloroplast editosome substitutes C -> U and U -> C at very specific locations on the transcript. This can change the codon for an amino acid or restore a non-functional pseudogene by adding an AUG start codon or removing a premature UAA stop codon.
The editosome recognizes and binds to cis sequence upstream of the editing site. The distance between the binding site and editing site varies by gene and proteins involved in the editosome. Hundreds of different PPR proteins from the nuclear genome are involved in the RNA editing process. These proteins consist of 35-mer repeated amino acids, the sequence of which determines the cis binding site for the edited transcript.
Basal land plants such as liverworts, mosses and ferns have hundreds of different editing sites while flowering plants typically have between thirty and forty. Parasitic plants such as Epifagus virginiana show a loss of RNA editing resulting in a loss of function for photosynthesis genes.

DNA replication

Leading model of cpDNA replication

The mechanism for chloroplast DNA replication has not been conclusively determined, but two main models have been proposed. Scientists have attempted to observe chloroplast replication via electron microscopy since the 1970s. The results of the microscopy experiments led to the idea that chloroplast DNA replicates using a double displacement loop. As the D-loop moves through the circular DNA, it adopts a theta intermediary form, also known as a Cairns replication intermediate, and completes replication with a rolling circle mechanism. Replication starts at specific points of origin. Multiple replication forks open up, allowing replication machinery to replicate the DNA. As replication continues, the forks grow and eventually converge. The new cpDNA structures separate, creating daughter cpDNA chromosomes.
In addition to the early microscopy experiments, this model is also supported by the amounts of deamination seen in cpDNA. Deamination occurs when an amino group is lost and is a mutation that often results in base changes. When adenine is deaminated, it becomes hypoxanthine. Hypoxanthine can bind to cytosine, and when the HC base pair is replicated, it becomes a GC.
In cpDNA, there are several A → G deamination gradients. DNA becomes susceptible to deamination events when it is single stranded. When replication forks form, the strand not being copied is single stranded, and thus at risk for A → G deamination. Therefore, gradients in deamination indicate that replication forks were most likely present and the direction that they initially opened. This mechanism is still the leading theory today; however, a second theory suggests that most cpDNA is actually linear and replicates through homologous recombination. It further contends that only a minority of the genetic material is kept in circular chromosomes while the rest is in branched, linear, or other complex structures.