Long non-coding RNA
Long non-coding RNAs are a type of RNA, generally defined as transcripts more than 200 nucleotides that are not translated into protein. This arbitrary limit distinguishes long ncRNAs from small non-coding RNAs, such as microRNAs, small interfering RNAs, Piwi-interacting RNAs, small nucleolar RNAs, and other short RNAs. Given that some lncRNAs have been reported to have the potential to encode small proteins or micro-peptides, the latest definition of lncRNA is a class of transcripts of over 200 nucleotides that have no or limited coding capacity. However, John S. Mattick and colleagues suggested to change definition of long non-coding RNAs to transcripts more than 500 nt, which are mostly generated by Pol II. That means that question of lncRNA exact definition is still under discussion in the field. Long intergenic noncoding RNAs are sequences of transcripts that do not overlap protein-coding genes.
Long non-coding RNAs include intergenic lincRNAs, intronic ncRNAs, and sense and antisense lncRNAs, each type showing different genomic positions in relation to genes and exons.
The definition of lncRNAs differs from that of other RNAs such as siRNAs, mRNAs, miRNAs, and snoRNAs because it is not connected to the function of the RNA. A lncRNA is any transcript that is not one of the other well-characterized RNAs and is longer than 200-500 nucleotides. Some scientists think that most lncRNAs do not have a biologically relevant function because they are transcripts of junk DNA.
Abundance
Long non-coding transcripts are found in many species. Large-scale complementary DNA sequencing projects such as FANTOM reveal the complexity of these transcripts in humans. The FANTOM3 project identified ~35,000 non-coding transcripts that bear many signatures of messenger RNAs, including 5' capping, splicing, and poly-adenylation, but have little or no open reading frame. This number represents a conservative lower estimate, since it omitted many singleton transcripts and non-polyadenylated transcripts. Identifying ncRNAs within these cDNA libraries is challenging since it can be difficult to distinguish protein-coding transcripts from non-coding transcripts. It has been suggested through multiple studies that testis, and neural tissues express the greatest amount of long non-coding RNAs of any tissue type. Using FANTOM5, 27,919 long ncRNAs have been identified in various human sources.Quantitatively, these transcripts demonstrate ~10-fold lower abundance than mRNAs, much of which is explained by higher cell-to-cell variation of expression levels of lncRNAs in the individual cells, when compared to protein-coding genes and well-characterized non-coding genes. This is consistent with the idea that many of these transcripts are non-functional spurious transcripts and the transcribed regions are not genes by any standard definition.
In general, the majority of lncRNAs are characterized as tissue-specific, as opposed to only ~19% of mRNAs. Only 3.6% of human lncRNAs are present in various biological contexts and 34% of lncRNAs are present at high level in at least one biological context. In addition to higher tissue specificity, lncRNAs are characterized by higher developmental stage specificity, and cell subtype specificity in tissues such as human neocortex and other parts of the brain, regulating correct brain development and function. In 2022, a comprehensive integration of lncRNAs from existing databases, revealed that there are 95,243 lncRNAs and 323,950 transcripts in humans.
In comparison to mammals relatively few studies have focused on the prevalence of lncRNAs in plants. However an extensive study considering 37 higher plant species and six algae identified ~200,000 non-coding transcripts using an in-silico approach, which also established the associated Green Non-Coding Database, a repository of plant lncRNAs.
Genomic organization
In 2005 the landscape of the mammalian genome was described as numerous 'foci' of transcription that are separated by long stretches of intergenic space. While some long ncRNAs are located within the intergenic stretches, the majority are overlapping sense and antisense transcripts that often include protein-coding genes, giving rise to a complex hierarchy of overlapping isoforms. Genomic sequences within these transcriptional foci are often shared within a number of coding and non-coding transcripts in the sense and antisense directions For example, 3012 out of 8961 cDNAs previously annotated as truncated coding sequences within FANTOM2 were later designated as genuine ncRNA variants of protein-coding cDNAs. While the abundance and conservation of these arrangements suggest they have biological relevance, the complexity of these foci frustrates easy evaluation.The GENCODE consortium has collated and analysed a comprehensive set of human lncRNA annotations and their genomic organisation, modifications, cellular locations and tissue expression profiles. Their analysis indicates human lncRNAs show a bias toward two-exon transcripts.
Translation
There has been considerable debate about whether lncRNAs have been misannotated and do in fact encode proteins. Several lncRNAs have been found to in fact encode for peptides with biologically significant function. Ribosome profiling studies have suggested that anywhere from 40% to 90% of annotated lncRNAs are in fact translated, although there is disagreement about the correct method for analyzing ribosome profiling data. Additionally, it is thought that many of the peptides produced by lncRNAs may be highly unstable and without biological function.Conservation
The sequences of most long non-coding transcripts are not conserved, which supports the idea that most of them are spurious transcripts with no biological function. Initial studies into lncRNA conservation noted that some of them were enriched for conserved sequence elements, depleted in substitution and insertion/deletion rates and depleted in rare frequency variants, indicative of purifying selection maintaining lncRNA function. However, further investigations into vertebrate lncRNAs revealed that while some lncRNAs are conserved in sequence, they are not conserved in transcription. In other words, even when the sequence of a human lncRNA is conserved in another vertebrate species, there is often no transcription of a lncRNA in the orthologous genomic region. Some argue that these observations suggest non-functionality of the majority of lncRNAs, while others argue that they may be indicative of rapid species-specific adaptive selection.While most long non-coding transcripts are not conserved, it is important to note that still, hundreds of lncRNAs are conserved at the sequence level. There have been several attempts to delineate the different categories of selection signatures seen amongst lncRNAs including: lncRNAs with strong sequence conservation across the entire length of the gene, lncRNAs in which only a portion of the transcript is conserved, and lncRNAs that are transcribed from syntenic regions of the genome but have no recognizable sequence similarity. Additionally, there have been attempts to identify conserved secondary structures in lncRNAs, though these studies have currently given way to conflicting results. Several of the most well studied lncRNA have indicated conservation of structure within the functional domains of lncRNA, with lack of sequence similarity across species.
Functions
Some groups have claimed that the majority of long noncoding RNAs in mammals are likely to be functional, but other groups have claimed the opposite. This is an active area of research.Some lncRNAs have been functionally annotated in LncRNAdb, with the majority of these being described in humans. Over 2600 human lncRNAs with experimental evidences have been community-curated in LncRNAWiki. According to the curation of functional mechanisms of lncRNAs based on the literatures, lncRNAs are extensively reported to be involved in ceRNA regulation, transcriptional regulation, and epigenetic regulation. A further large-scale sequencing study provides evidence that many transcripts thought to be lncRNAs may, in fact, be translated into proteins.
In the regulation of gene transcription
In gene-specific transcription
In eukaryotes, RNA transcription is a tightly regulated process. Noncoding RNAs act upon different aspects of this process, targeting transcriptional modulators, RNA polymerase II and even the DNA duplex to regulate gene expression.NcRNAs modulate transcription by several mechanisms, including functioning themselves as co-regulators, modifying transcription factor activity, or regulating the association and activity of co-regulators. For example, the noncoding RNA Evf-2 functions as a co-activator for the homeobox transcription factor Dlx2, which plays important roles in forebrain development and neurogenesis. Sonic hedgehog induces transcription of Evf-2 from an ultra-conserved element located between the Dlx5 and Dlx6 genes during forebrain development. Evf-2 then recruits the Dlx2 transcription factor to the same ultra-conserved element whereby Dlx2 subsequently induces expression of Dlx5. The existence of other similar ultra- or highly conserved elements within the mammalian genome that are both transcribed and fulfill enhancer functions suggest Evf-2 may be illustrative of a generalised mechanism that regulates developmental genes with complex expression patterns during vertebrate growth. Indeed, the transcription and expression of similar non-coding ultraconserved elements was shown to be abnormal in human leukaemia and to contribute to apoptosis in colon cancer cells, suggesting their involvement in tumorigenesis in like fashion to protein-coding RNA.
Local ncRNAs can also recruit transcriptional programmes to regulate adjacent protein-coding gene expression.
The RNA binding protein TLS binds and inhibits the CREB binding protein and p300 histone acetyltransferase activities on a repressed gene target, cyclin D1. The recruitment of TLS to the promoter of cyclin D1 is directed by long ncRNAs expressed at low levels and tethered to 5' regulatory regions in response to DNA damage signals. Moreover, these local ncRNAs act cooperatively as ligands to modulate the activities of TLS. In the broad sense, this mechanism allows the cell to harness RNA-binding proteins, which make up one of the largest classes within the mammalian proteome, and integrate their function in transcriptional programs. Nascent long ncRNAs have been shown to increase the activity of CREB binding protein, which in turn increases the transcription of that ncRNA. A study found that a lncRNA in the antisense direction of the Apolipoprotein A1 regulates the transcription of APOA1 through epigenetic modifications.
Recent evidence has raised the possibility that transcription of genes that escape from X-inactivation might be mediated by expression of long non-coding RNA within the escaping chromosomal domains.