Microsatellite


A microsatellite is a tract of repetitive DNA in which certain DNA motifs are repeated, typically 5–50 times. Microsatellites occur at thousands of locations within an organism's genome. They have a higher mutation rate than other areas of DNA leading to high genetic diversity. Microsatellites are often referred to as short tandem repeats by forensic geneticists and in genetic genealogy, or as simple sequence repeats by plant geneticists.
Microsatellites and their longer cousins, the minisatellites, together are classified as VNTR DNA. The name "satellite" DNA refers to the early observation that centrifugation of genomic DNA in a test tube separates a prominent layer of bulk DNA from accompanying "satellite" layers of repetitive DNA.
They are widely used for DNA profiling in cancer diagnosis, in kinship analysis and in forensic identification. They are also used in genetic linkage analysis to locate a gene or a mutation responsible for a given trait or disease. Microsatellites are also used in population genetics to measure levels of relatedness between subspecies, groups and individuals.

History

Although the first microsatellite was characterised in 1984 at the University of Leicester by Weller, Jeffreys and colleagues as a polymorphic GGAT repeat in the human myoglobin gene, the term "microsatellite" was introduced later, in 1989, by Litt and Luty. The name "satellite" DNA refers to the early observation that centrifugation of genomic DNA in a test tube separates a prominent layer of bulk DNA from accompanying "satellite" layers of repetitive DNA. The increasing availability of DNA amplification by PCR at the beginning of the 1990s triggered a large number of studies using the amplification of microsatellites as genetic markers for forensic medicine, for paternity testing, and for positional cloning to find the gene underlying a trait or disease. Prominent early applications include the identifications by microsatellite genotyping of the eight-year-old skeletal remains of a British murder victim, and of the Auschwitz concentration camp doctor Josef Mengele who escaped to South America following World War II.

Structures, locations, and functions

A microsatellite is a tract of tandemly repeated DNA motifs that range in length from one to six or up to ten nucleotides, and are typically repeated 5–50 times. For example, the sequence TATATATATA is a dinucleotide microsatellite, and GTCGTCGTCGTCGTC is a trinucleotide microsatellite. Repeat units of four and five nucleotides are referred to as tetra- and pentanucleotide motifs, respectively. Most eukaryotes have microsatellites, with the notable exception of some yeast species. Microsatellites are distributed throughout the genome. The human genome for example contains 50,000–100,000 dinucleotide microsatellites, and lesser numbers of tri-, tetra- and pentanucleotide microsatellites. Many are located in non-coding parts of the human genome and therefore do not produce proteins, but they can also be located in regulatory regions and coding regions.
Microsatellites in non-coding regions may not have any specific function, and therefore might not be selected against; this allows them to accumulate mutations unhindered over the generations and gives rise to variability that can be used for DNA fingerprinting and identification purposes. Other microsatellites are located in regulatory flanking or intronic regions of genes, or directly in codons of genes – microsatellite mutations in such cases can lead to phenotypic changes and diseases, notably in triplet expansion diseases such as fragile X syndrome and Huntington's disease.
Telomeres are linear sequences of DNA that sit at the very ends of chromosomes and protect the integrity of genomic material during successive rounds of cell division due to the "end replication problem". In white blood cells, the gradual shortening of telomeric DNA has been shown to inversely correlate with ageing in several sample types. Telomeres consist of repetitive DNA, with the hexanucleotide repeat motif TTAGGG in vertebrates. They are thus classified as minisatellites. Similarly, insects have shorter repeat motifs in their telomeres that could arguably be considered microsatellites.

Mutation mechanisms and mutation rates

Unlike point mutations, which affect only a single nucleotide, microsatellite mutations lead to the gain or loss of an entire repeat unit, and sometimes two or more repeats simultaneously. Thus, the mutation rate at microsatellite loci is expected to differ from other mutation rates, such as base substitution rates. The mutation rate at microsatellite loci depends on the repeat motif sequence, the number of repeated motif units and the purity of the canonical repeated sequence. A variety of mechanisms for mutation of microsatellite loci have been reviewed, and their resulting polymorphic nature has been quantified. The actual cause of mutations in microsatellites is debated.
One proposed cause of such length changes is replication slippage, caused by mismatches between DNA strands while being replicated during meiosis. DNA polymerase, the enzyme responsible for reading DNA during replication, can slip while moving along the template strand and continue at the wrong nucleotide. DNA polymerase slippage is more likely to occur when a repetitive sequence is replicated. Because microsatellites consist of such repetitive sequences, DNA polymerase may make errors at a higher rate in these sequence regions. Several studies have found evidence that slippage is the cause of microsatellite mutations. Typically, slippage in each microsatellite occurs about once per 1,000 generations. Thus, slippage changes in repetitive DNA are three orders of magnitude more common than point mutations in other parts of the genome. Most slippage results in a change of just one repeat unit, and slippage rates vary for different allele lengths and repeat unit sizes, and within different species. If there is a large size difference between individual alleles, then there may be increased instability during recombination at meiosis.
Another possible cause of microsatellite mutations are point mutations, where only one nucleotide is incorrectly copied during replication. A study comparing human and primate genomes found that most changes in repeat number in short microsatellites appear due to point mutations rather than slippage.

Microsatellite mutation rates

Direct estimates of microsatellite mutation rates have been made in numerous organisms, from insects to humans. In the desert locust Schistocerca gregaria, the microsatellite mutation rate was estimated at 2.1 × 10−4 per generation per locus. The microsatellite mutation rate in human male germ lines is five to six times higher than in female germ lines and ranges from 0 to 7 × 10−3 per locus per gamete per generation. In the nematode Pristionchus pacificus, the estimated microsatellite mutation rate ranges from 8.9 × 10−5 to 7.5 × 10−4 per locus per generation.
Microsatellite mutation rates vary with base position relative to the microsatellite, repeat type, and base identity. Mutation rate rises specifically with repeat number, peaking around six to eight repeats and then decreasing again. Increased heterozygosity in a population will also increase microsatellite mutation rates, especially when there is a large length difference between alleles. This is likely due to homologous chromosomes with arms of unequal lengths causing instability during meiosis.

Biological effects of microsatellite mutations

Many microsatellites are located in non-coding DNA and are biologically silent. Others are located in regulatory or even coding DNA – microsatellite mutations in such cases can lead to phenotypic changes and diseases. A genome-wide study estimates that microsatellite variation contributes 10–15% of heritable gene expression variation in humans.

Effects on proteins

In mammals, 20–40% of proteins contain repeating sequences of amino acids encoded by short sequence repeats. Most of the short sequence repeats within protein-coding portions of the genome have a repeating unit of three nucleotides, since that length will not cause frame-shifts when mutating. Each trinucleotide repeating sequence is transcribed into a repeating series of the same amino acid. In yeasts, the most common repeated amino acids are glutamine, glutamic acid, asparagine, aspartic acid and serine.
Mutations in these repeating segments can affect the physical and chemical properties of proteins, with the potential for producing gradual and predictable changes in protein action. For example, length changes in tandemly repeating regions in the Runx2 gene lead to differences in facial length in domesticated dogs, with an association between longer sequence lengths and longer faces. This association also applies to a wider range of Carnivora species. Length changes in polyalanine tracts within the HOXA13 gene are linked to hand-foot-genital syndrome, a developmental disorder in humans. Length changes in other triplet repeats are linked to more than 40 neurological diseases in humans, notably trinucleotide repeat disorders such as fragile X syndrome and Huntington's disease. Evolutionary changes from replication slippage also occur in simpler organisms. For example, microsatellite length changes are common within surface membrane proteins in yeast, providing rapid evolution in cell properties. Specifically, length changes in the FLO1 gene control the level of adhesion to substrates. Short sequence repeats also provide rapid evolutionary change to surface proteins in pathenogenic bacteria; this may allow them to keep up with immunological changes in their hosts. Length changes in short sequence repeats in a fungus control the duration of its circadian clock cycles.

Effects on gene regulation

Length changes of microsatellites within promoters and other cis-regulatory regions can change gene expression quickly, between generations. The human genome contains many short sequence repeats in regulatory regions, which provide 'tuning knobs' on the expression of many genes.
Length changes in bacterial SSRs can affect fimbriae formation in Haemophilus influenzae, by altering promoter spacing. Dinucleotide microsatellites are linked to abundant variation in cis-regulatory control regions in the human genome. Microsatellites in control regions of the Vasopressin 1a receptor gene in voles influence their social behavior, and level of monogamy.
In Ewing sarcoma, a point mutation has created an extended GGAA microsatellite which binds a transcription factor, which in turn activates the EGR2 gene which drives the cancer. In addition, other GGAA microsatellites may influence the expression of genes that contribute to the clinical outcome of Ewing sarcoma patients.