Alternative splicing
Alternative splicing, alternative RNA splicing, or differential splicing is an alternative splicing process during gene expression that allows a single gene to produce different splice variants. For example, some exons of a gene may be included within or excluded from the final RNA product of the gene. This means the exons are joined in different combinations, leading to different splice variants. In the case of protein-coding genes, the proteins translated from these splice variants may contain differences in their amino acid sequence and in their biological functions.
Biologically relevant alternative splicing occurs as a normal phenomenon in eukaryotes, where it increases the number of proteins that can be encoded by the genome. In humans, it is widely believed that ~95% of multi-exonic genes are alternatively spliced to produce functional alternative products from the same gene but many scientists believe that most of the observed splice variants are due to splicing errors and the actual number of biologically relevant alternatively spliced genes is much lower.
Discovery
Alternative splicing was first observed in 1977. The adenovirus produces five primary transcripts early in its infectious cycle, prior to viral DNA replication, and an additional one later, after DNA replication begins. The early primary transcripts continue to be produced after DNA replication begins. The additional primary transcript produced late in infection is large and comes from 5/6 of the 32kb adenovirus genome. This is much larger than any of the individual adenovirus mRNAs present in infected cells. Researchers found that the primary RNA transcript produced by adenovirus type 2 in the late phase was spliced in many different ways, resulting in mRNAs encoding different viral proteins. In addition, the primary transcript contained multiple polyadenylation sites, giving different 3' ends for the processed mRNAs.In 1981, the first example of alternative splicing in a transcript from a normal, endogenous gene was characterized. The gene encoding the thyroid hormone calcitonin was found to be alternatively spliced in mammalian cells. The primary transcript from this gene contains 6 exons; the calcitonin mRNA contains exons 1–4, and terminates after a polyadenylation site in exon 4. Another mRNA is produced from this pre-mRNA by skipping exon 4, and includes exons 1–3, 5, and 6. It encodes a protein known as CGRP. Examples of alternative splicing in immunoglobin gene transcripts in mammals were also observed in the early 1980s.
Since then, many other examples of biologically relevant alternative splicing have been found in eukaryotes. The "record-holder" for alternative splicing is a D. melanogaster gene called Dscam, which could potentially have 38,016 splice variants.
In 2021, it was discovered that the genome of adenovirus type 2, the adenovirus in which alternative splicing was first identified, was able to produce a much greater variety of splice variants than previously thought. By using next generation sequencing technology, researchers were able to update the human adenovirus type 2 transcriptome and document the presence of 904 splice variants produced by the virus through a complex pattern of alternative splicing. Very few of these splice variants have been shown to be functional, a point that the authors raise in their paper.
Modes
Five basic modes of alternative splicing are generally recognized.- Exon skipping or cassette exon: in this case, an exon may be spliced out of the primary transcript or retained. This is the most common mode in mammalian pre-mRNAs.
- Mutually exclusive exons: One of two exons is retained in mRNAs after splicing, but not both.
- Alternative donor site: An alternative 5' splice junction is used, changing the 3' boundary of the upstream exon.
- Alternative acceptor site: An alternative 3' splice junction is used, changing the 5' boundary of the downstream exon.
- Intron retention: A sequence may be spliced out as an intron or simply retained. This is distinguished from exon skipping because the retained sequence is not flanked by introns. If the retained intron is in the coding region, the intron must encode amino acids in frame with the neighboring exons, or a stop codon or a shift in the reading frame will cause the protein to be non-functional. This is the rarest mode in mammals but the most common in plants.
These modes describe basic splicing mechanisms, but may be inadequate to describe complex splicing events. For instance, the figure to the right shows three spliceforms from the mouse hyaluronidase 3 gene. Comparing the exonic structure shown in the first line with the one in the second line shows intron retention, whereas the comparison between the second and the third spliceform exhibits exon skipping. A model nomenclature to uniquely designate all possible splicing patterns has recently been proposed.
Mechanisms
General splicing mechanism
When the pre-mRNA has been transcribed from the DNA, it includes several introns and exons. The exons to be retained in the mRNA are determined during the splicing process. The regulation and selection of splice sites are done by trans-acting splicing activator and splicing repressor proteins as well as cis-acting elements within the pre-mRNA itself such as exonic splicing enhancers and exonic splicing silencers.The typical eukaryotic nuclear intron has consensus sequences defining important regions. Each intron has the sequence GU at its 5' end. Near the 3' end there is a branch site. The nucleotide at the branchpoint is always an A; the consensus around this sequence varies somewhat. In humans the branch site consensus sequence is yUnAy. The branch site is followed by a series of pyrimidines – the polypyrimidine tract – then by AG at the 3' end.
Splicing of mRNA is performed by an RNA and protein complex known as the spliceosome, containing snRNPs designated U1, U2, U4, U5, and U6. U1 binds to the 5' GU and U2, with the assistance of the U2AF protein factors, binds to the branchpoint A within the branch site. The complex at this stage is known as the spliceosome A complex. Formation of the A complex is usually the key step in determining the ends of the intron to be spliced out, and defining the ends of the exon to be retained..
The U4,U5,U6 complex binds, and U6 replaces the U1 position. U1 and U4 leave. The remaining complex then performs two transesterification reactions. In the first transesterification, 5' end of the intron is cleaved from the upstream exon and joined to the branch site A by a 2',5'-phosphodiester linkage. In the second transesterification, the 3' end of the intron is cleaved from the downstream exon, and the two exons are joined by a phosphodiester bond. The intron is then released in lariat form and degraded.
Regulatory elements and proteins
Splicing is regulated by trans-acting proteins and corresponding cis-acting regulatory sites on the pre-mRNA. However, as part of the complexity of alternative splicing, it is noted that the effects of a splicing factor are frequently position-dependent. That is, a splicing factor that serves as a splicing activator when bound to an intronic enhancer element may serve as a repressor when bound to its splicing element in the context of an exon, and vice versa. The secondary structure of the pre-mRNA transcript also plays a role in regulating splicing, such as by bringing together splicing elements or by masking a sequence that would otherwise serve as a binding element for a splicing factor. Together, these elements form a "splicing code" that governs how splicing will occur under different cellular conditions.There are two major types of cis-acting RNA sequence elements present in pre-mRNAs and they have corresponding trans-acting RNA-binding proteins. Splicing silencers are sites to which splicing repressor proteins bind, reducing the probability that a nearby site will be used as a splice junction. These can be located in the intron itself or in a neighboring exon. They vary in sequence, as well as in the types of proteins that bind to them. The majority of splicing repressors are heterogeneous nuclear ribonucleoproteins such as hnRNPA1 and polypyrimidine tract-binding protein. Splicing enhancers are sites to which splicing activator proteins bind, increasing the probability that a nearby site will be used as a splice junction. These also may occur in the intron or exon. Most of the activator proteins that bind to ISEs and ESEs are members of the SR protein family. Such proteins contain RNA recognition motifs and arginine and serine-rich domains.
In general, the determinants of splicing work in an inter-dependent manner that depends on context, so that the rules governing how splicing is regulated form a splicing code. The presence of a particular cis-acting RNA sequence element may increase the probability that a nearby site will be spliced in some cases, but decrease the probability in other cases, depending on context. The context within which regulatory elements act includes cis-acting context that is established by the presence of other RNA sequence features, and trans-acting context that is established by cellular conditions. For example, some cis-acting RNA sequence elements influence splicing only if multiple elements are present in the same region so as to establish context. As another example, a cis-acting element can have opposite effects on splicing, depending on which proteins are expressed in the cell. The adaptive significance of splicing silencers and enhancers is attested by studies showing that there is strong selection in human genes against mutations that produce new silencers or disrupt existing enhancers.