Orphan gene
Orphan genes, ORF
In some cases, a gene can be classified as an orphan gene due to undersampling of the existing genome space. While it is possible that homologues exist for a given gene, that gene will still be classified as an orphan if the organisms harbouring homologues have not yet been discovered and had their genomes sequenced and properly annotated. For example, one study of orphan genes across 119 archaeal and bacterial genomes could identify that at least 56% were recently acquired from integrative elements from non-cellular sources such as viruses and plasmids that remain to be explored and characterized, and another 7% arise through horizontal gene transfer from distant cellular sources. In other cases, limitations in computational methods for detecting homologues may result in missed homologous sequences and thus classification of a gene as an orphan. Homology detection failure appears to account for the majority, but not all orphan genes. In other cases, homology between genes may go undetected due to rapid evolution and divergence of one or both of these genes from each other to the point where they do not meet the criteria used to classify genes as evidently homologous by computational methods. One analysis suggests that divergence accounts for a third of orphan gene identifications in eukaryotes. When homologous genes exist but are simply undetected, the emergence of these orphan genes can be explained by well-characterized phenomena such as genomic recombination, exon shuffling, gene duplication and divergence, etc. Orphan genes may also simply lack true homologues and in such cases have an independent origins via de novo gene birth, which tends to be a more recent event. These processes may act at different rates in insects, primates, and plants. Despite their relatively recent origin, orphan genes may encode functionally important proteins. Characteristics of orphan genes include AT richness, relatively recent origins, taxonomic restriction to a single genome, elevated evolution rates, and shorter sequences.
Some approaches characterize all microbial genes as part of one of two classes of genes. One class is characterized by conservation or partial conservation across lineages, whereas the other is characterized by evolutionarily instantaneous rates of gene turnover/replacement with a negligible effect on fitness when such genes are either gained or lost. These orphan genes primarily derive from mobile genetic elements and tend to be 'passively selfish', often devoid of cellular functions but persist in the biosphere due to their transient movement across genomes.
Evolution
Orphan genes evolve more rapidly than other genes. They often originate through two primary mechanisms: de novo gene birth, where new genes emerge from non-coding sequences within the genome, and horizontal gene transfer, the acquisition of genetic material from another organism.Biologists believe orphan genes may play a crucial role in developing species-specific traits, environmental adaptations, or responses to changing ecological niches. These functional innovations necessitate rapid evolutionary changes to optimize their efficacy within the organism's biology.
Multiple studies have supported these evolutionary theories regarding orphan genes. Domazet-Loso and Tautz conducted a study focusing on orphan genes in Drosophila, revealing that these genes evolve at a faster pace compared to conserved genes. This finding suggests a potential correlation between evolutionary rate and gene novelty. Similarly, Tautz and Domazet-Loso presented evidence indicating a substantial contribution of orphan genes to phenotypic diversity and adaptation across different species. Their research underscores the crucial role of orphan genes in driving evolutionary innovation and shaping biological diversity.
History
Orphan genes were first discovered when the yeast genome-sequencing project began in 1996. Orphan genes accounted for an estimated 26% of the yeast genome, but it was believed that these genes could be classified with homologues when more genomes were sequenced. At the time, gene duplication was considered the only serious model of gene evolution and there were few sequenced genomes for comparison, so a lack of detectable homologues was thought to be most likely due to a lack of sequencing data and not due to a true lack of homology. However, orphan genes continued to persist as the quantity of sequenced genomes grew, eventually leading to the conclusion that orphan genes are ubiquitous to all genomes. Estimates of the percentage of genes which are orphans varies enormously between species and between studies; 10-30% is a commonly cited figure.The study of orphan genes emerged largely after the turn of the century. In 2003, a study of Caenorhabditis briggsae and related species compared over 2000 genes. They proposed that these genes must be evolving too quickly to be detected and are consequently sites of very rapid evolution. In 2005, Wilson examined 122 bacterial species to try to examine whether the large number of orphan genes in many species was legitimate. The study found that it was legitimate and played a role in bacterial adaptation. The definition of taxonomically restricted genes was introduced into the literature to make orphan genes seem less "mysterious."
In 2008, a yeast protein of established functionality, BSC4, was found to have evolved de novo from non-coding sequences whose homology was still detectable in sister species.
In 2009, an orphan gene was discovered to regulate an internal biological network: the orphan gene, QQS, from Arabidopsis thaliana modifies plant composition. The QQS orphan protein interacts with a conserved transcription factor known as NF-YC4, which explain the compositional changes that are induced when QQS is engineered into diverse species, such as corn, soybean, and rice. In 2011, a comprehensive genome-wide study of the extent and evolutionary origins of orphan genes in plants was conducted in the model plant Arabidopsis thaliana.
Identification
Genes can be tentatively classified as orphans if no orthologous proteins can be found in nearby species.One method used to estimate nucleotide or protein sequence similarity indicative of homology is the Basic Local Alignment Search Tool. BLAST allows query sequences to be rapidly searched against large sequence databases. Simulations suggest that under certain conditions BLAST is suitable for detecting distant relatives of a gene. However, genes that are short and evolve rapidly can easily be missed by BLAST.
The systematic detection of homology to annotate orphan genes is called phylostratigraphy. Phylostratigraphy generates a phylogenetic tree in which the homology is calculated between all genes of a focal species and the genes of other species. The earliest common ancestor for a gene determines the age, or phylostratum, of the gene. The term "orphan" is sometimes used only for the youngest phylostratum containing only a single species, but when interpreted broadly as a taxonomically restricted gene, it can refer to all but the oldest phylostratum, with the gene orphaned within a larger clade. Researchers have automated this method to scan for orphan genes across genomic datasets using fast homology detection algorithms.
Homology detection failure accounts for a majority of classified orphan genes. Some scientists have attempted to recover some homology by using more sensitive methods, such as remote homology detection. In one study, remote homology detection techniques were used to demonstrate that a sizable fraction of orphan genes still exhibited remote homology despite being missed by conventional homology detection techniques, and that their functions were often related to the functions of nearby genes at genomic loci. Other studies directly tested for homology detection failure to uncover orphan genes that are not explained by this process, finding an association between the emergence of orphan genes and the evolution of key traits in plants, animals and brown algae.
''De novo'' gene birth
Novel orphan genes continually arise de novo from non-coding sequences. These novel genes may be sufficiently beneficial to be swept to fixation by selection. Or, more likely, they will fade back into the non-genic background. This latter option is supported by research in Drosophila showing that young genes are more likely go extinct.De novo genes were once thought to be a near impossibility due to the complex and potentially fragile intricacies of creating and maintaining functional polypeptides, but research from the past 10 years or so has found multiple examples of de novo genes, some of which are associated with important biological processes, particularly testes function in animals. De novo genes were also found in fungi and plants.
For young orphan genes, it is sometimes possible to find homologous non-coding DNA sequences in sister taxa, which is generally accepted as strong evidence of de novo origin. However, the contribution of de novo origination to taxonomically restricted genes of older origin, particularly in relation to the traditional gene duplication theory of gene evolution, remains contested. Logistically, de novo origination is much easier for RNA genes than protein-coding ones and Nathan H. Lents and colleagues recently reported the existence of several young microRNA genes on human chromosome 21.