RNA-Seq
RNA-Seq is a next-generation sequencing technique used to quantify and identify RNA molecules in a biological sample, providing a snapshot of the transcriptome at a specific time. It enables transcriptome-wide analysis by sequencing cDNA derived from RNA. Modern workflows often incorporate pseudoalignment tools and cloud-based processing pipelines, improving speed, scalability, and reproducibility.
RNA-Seq facilitates the ability to look at alternative gene spliced transcripts, post-transcriptional modifications, gene fusion, mutations/SNPs and changes in gene expression over time, or differences in gene expression in different groups or treatments. In addition to mRNA transcripts, RNA-Seq can look at different populations of RNA to include total RNA, small RNA, such as miRNA, tRNA, and ribosomal profiling. RNA-Seq can also be used to determine exon/intron boundaries and verify or amend previously annotated 5' and 3' gene boundaries. Recent advances in RNA-Seq include single cell sequencing, bulk RNA sequencing, 3' mRNA-sequencing, in situ sequencing of fixed tissue, and native RNA molecule sequencing with single-molecule real-time sequencing. Other examples of emerging RNA-Seq applications due to the advancement of bioinformatics algorithms are copy number alteration, microbial contamination, transposable elements, cell type and the presence of neoantigens.
History
Prior to RNA-Seq, gene expression studies were done with hybridization-based microarrays. Issues with microarrays include cross-hybridization artifacts, poor quantification of lowly and highly expressed genes, and needing to know the sequence a priori. Because of these technical issues, transcriptomics transitioned to sequencing-based methods. These progressed from Sanger sequencing of Expressed sequence tag libraries, to chemical tag-based methods, and finally to the current technology, next-gen sequencing of complementary DNA, notably RNA-Seq in mid 2000s.The first manuscripts that used RNA-Seq even without using the term includes those of prostate cancer cell lines, Medicago truncatula, maize, while the term "RNA-Seq" itself was first mentioned in 2008. The number of manuscripts referring to RNA-Seq in the title or abstract is continuously increasing with 6754 manuscripts published in 2018. The intersection of RNA-Seq and medicine has similar celerity.
Methods
Library preparation
The general steps to prepare a complementary DNA library for sequencing are described below, but often vary between platforms.- RNA Isolation: RNA is isolated from tissue and mixed with Deoxyribonuclease. DNase reduces the amount of genomic DNA. The amount of RNA degradation is checked with gel and capillary electrophoresis and is used to assign an RNA integrity number to the sample. This RNA quality and the total amount of starting RNA are taken into consideration during the subsequent library preparation, sequencing, and analysis steps.
- RNA selection/depletion: To analyze signals of interest, the isolated RNA can either be kept as is, enriched for RNA with 3' polyadenylated tails to include only eukaryotic mRNA, depleted of ribosomal RNA, and/or filtered for RNA that binds specific sequences. RNA molecules having 3' poly tails in eukaryotes are mainly composed of mature, processed, coding sequences. Poly selection is performed by mixing RNA with oligomers covalently attached to a substrate, typically magnetic beads. Poly selection has important limitations in RNA biotype detection. Many RNA biotypes are not polyadenylated, including many noncoding RNA and histone-core protein transcripts, or are regulated via their poly tail length and thus might not be detected after poly selection. Furthermore, poly selection may display increased 3' bias, especially with lower quality RNA. These limitations can be avoided with ribosomal depletion, removing rRNA that typically represents over 90% of the RNA in a cell. Both poly enrichment and ribosomal depletion steps are labor intensive and could introduce biases, so more simple approaches have been developed to omit these steps. Small RNA targets, such as miRNA, can be further isolated through size selection with exclusion gels, magnetic beads, or commercial kits.
- cDNA synthesis: RNA is reverse transcribed to cDNA because DNA is more stable and to allow for amplification and leverage more mature DNA sequencing technology. Amplification subsequent to reverse transcription results in loss of strandedness, which can be avoided with chemical labeling or single molecule sequencing. Fragmentation and size selection are performed to purify sequences that are the appropriate length for the sequencing machine. The RNA, cDNA, or both are fragmented with enzymes, sonication, divalent ions, or nebulizers. Fragmentation of the RNA reduces 5' bias of randomly primed-reverse transcription and the influence of primer binding sites, with the downside that the 5' and 3' ends are converted to DNA less efficiently. Fragmentation is followed by size selection, where either small sequences are removed or a tight range of sequence lengths are selected. Because small RNAs like miRNAs are lost, these are analyzed independently. The cDNA for each experiment can be indexed with a hexamer or octamer barcode, so that these experiments can be pooled into a single lane for multiplexed sequencing.
| Strategy | Predominant type of RNA | Ribosomal RNA content | Unprocessed RNA content | Isolation method |
| Total RNA | All | High | High | - |
| PolyA selection | Coding | Low | Low | Hybridization with poly oligomers |
| rRNA depletion | Coding, noncoding | Low | High | Removal of oligomers complementary to rRNA |
| RNA capture | Targeted | Low | Moderate | Hybridization with probes complementary to desired transcripts |
Complementary DNA sequencing (cDNA-Seq)
The cDNA library derived from RNA biotypes is then sequenced into a computer-readable format. There are many high-throughput sequencing technologies for cDNA sequencing including platforms developed by Illumina, Thermo Fisher, BGI/MGI, PacBio, and Oxford Nanopore Technologies. For Illumina short-read sequencing, a common technology for cDNA sequencing, adapters are ligated to the cDNA, DNA is attached to a flow cell, clusters are generated through cycles of bridge amplification and denaturing, and sequence-by-synthesis is performed in cycles of complementary strand synthesis and laser excitation of bases with reversible terminators. Sequencing platform choice and parameters are guided by experimental design and cost. Common experimental design considerations include deciding on the sequencing length, sequencing depth, use of single versus paired-end sequencing, number of replicates, multiplexing, randomization, and spike-ins.Small RNA/non-coding RNA sequencing
When sequencing RNA other than mRNA, the library preparation is modified. The cellular RNA is selected based on the desired size range. For small RNA targets, such as miRNA, the RNA is isolated through size selection. This can be performed with a size exclusion gel, through size selection magnetic beads, or with a commercially developed kit. Once isolated, linkers are added to the 3' and 5' end then purified. The final step is cDNA generation through reverse transcription.Direct RNA sequencing
Because converting RNA into cDNA, ligation, amplification, and other sample manipulations have been shown to introduce biases and artifacts that may interfere with both the proper characterization and quantification of transcripts, single molecule direct RNA sequencing has been explored by companies including Helicos, Oxford Nanopore Technologies, and others. This technology sequences RNA molecules directly in a massively-parallel manner.Single-molecule real-time RNA sequencing
Massively parallel single molecule direct RNA-Seq has been explored as an alternative to traditional RNA-Seq, in which RNA-to-cDNA conversion, ligation, amplification, and other sample manipulation steps may introduce biases and artefacts. Technology platforms that perform single-molecule real-time RNA-Seq include Oxford Nanopore Technologies Nanopore sequencing. Sequencing RNA in its native form preserves modifications like methylation, allowing them to be investigated directly and simultaneously. Another benefit of single-molecule direct RNA-Seq is that transcripts can be covered in full length, allowing for higher confidence isoform detection and quantification compared to short-read sequencing. Traditionally, single-molecule RNA-Seq methods have higher error rates compared to short-read sequencing, but newer methods like ONT direct RNA-Seq have a reduced error rate. Recent uses of ONT direct RNA-Seq for differential expression in human cell populations have demonstrated that this technology can overcome many limitations of short and long cDNA sequencing.Single-cell RNA sequencing (scRNA-Seq)
Standard methods such as microarrays and standard bulk RNA-Seq analysis analyze the expression of RNAs from large populations of cells. In mixed cell populations, these measurements may obscure critical differences between individual cells within these populations.Single-cell RNA sequencing provides the expression profiles of individual cells. Although it is not possible to obtain complete information on every RNA expressed by each cell, due to the small amount of material available, patterns of gene expression can be identified through gene clustering analyses. This can uncover the existence of rare cell types within a cell population that may never have been seen before. For example, rare specialized cells in the lung called pulmonary ionocytes that express the Cystic fibrosis transmembrane conductance regulator were identified in 2018 by two groups performing scRNA-Seq on lung airway epithelia.