C1orf198
Chromosome 1 open reading frame 198 is a protein that in humans is encoded by the C1orf198 gene. This particular gene does not have any paralogs in Homo sapiens, but many orthologs have been found throughout the Eukarya domain. C1orf198 has high levels of expression in all tissues throughout the human body, but is most highly expressed in lung, brain, and spinal cord tissues. Its function is most likely involved in lung development and hypoxia-associated events in the mitochondria, which are major consumers of oxygen in cells and are severely affected by decreases in available cellular oxygen.
Gene
Location
C1orf198 is a protein-encoding gene found on the reverse strand of chromosome 1 at the locus 1q42. The longest mRNA transcript comprises 3,778 base pairs and spans from 230,837,119 to 230,869,589 on chromosome 1. The span of the gene from the start of transcription to polyA site, including introns, is 32,470 bp. This gene also contains a domain of unknown function called DUF4706. In total, C1orf198 has 4 exons.Expression
Tissue distribution
RNA-seq tissue data revealed high expression of C1orf198 across all tissues, but especially high expression in lung, heart, spinal cord, and brain tissues. Expression from RNA-seq assays are reported as mean TPM, or transcripts per million, which correspond to mean values of the different individual samples from each tissue. Transcription profiling by high throughput sequencing revealed similar patterns of expression.Conditional expression
Comparison of far-upstream element binding protein knockdowns revealed differential expression in C1orf198. Compared to FBP1 and FBP3, FBP2 knockdown had a significant impact on the expression of C1orf198. FBP2 knockdown was associated with a decrease in C1orf198 expression in comparison to cells with regular expression of FBP2.Regulation
Promoter
Genomatix predicted several promoters, but the best prediction was of a 1,223 bp long promoter that overlapped with exon 1 of C1orf198 by 82 bp. This promoter, GXP_127773, was conserved in all 15 orthologs found by Genomatix.
Transcription Factor Binding Sites
Many transcription factor binding sites have been predicted, but a few of the more notable TFs found to bind to a region on C1orf198 are XCPE1, HIF, and USF. XCPE1 is an important transcription factor for poorly categorized TATA-less genes in the human genome, and it drives RNA polymerase II transcription. It is found in the core promoter regions of approximately 1% of human genes. XCPE1 is located between nucleotides -8 and +2 in relation to the start of transcription. With a matrix score of 0.83, it containing the correct consensus sequence, and its location on the promoter being correct, the probability of this transcription factor actually binding to this promoter is high.HIF is a transcription factor that responds to decreases in available oxygen in the cellular environment. It functions as a master regulator of cellular and systemic homeostatic response to hypoxia by activating transcription of many genes. HIF-1 is known to induce transcription of gene involved in energy metabolism, angiogenesis, apoptosis, and other genes whose protein products increase oxygen delivery or facilitate metabolic adaptation to hypoxia.
LKLF2 is a transcription factor that has shown high expression in adult mouse lungs and is thought to play a role in lung development. Overexpression of LKLF in lung epithelial cells increases cytosolic phospholipase A2, which has shown to be the cause of tumorigenesis of non-small-cell lung cancer.
E26 transformation-specific Proto-oncogene 1 functions as an oncogene and plays a key role in the progression of certain cancer. Expression of ETS1was increased in cancer tissues as compared with the expression in corresponding non-neoplastic tissues.
Finally, USF is an upstream stimulating factor, which is involved in mediating recruitment of chromatin remodelling enzymes and interacting with co-activators and members of the transcription pre-initiation complex.
Protein
C1orf198’s longest isoform has a sequence length of 327 amino acids. The entire sequence is as follows:MASMAAAIAASRSAVMSGNRPLDDRERKRFTYFSSLSPMARKIMQDKEKIREKYGPEWARLPPAQQDEII
DRCLVGPRAPAPRDPGDSEELTRFPGLRGPTGQKVVRFGDEDLTWQDEHSAPFSWETKSQMEFSISALSI
QEPSNGTAASEPRPLSKASQGSQALKSSQGSRSSSLDALGPTRKEEEASFWKINAERSRGEGPEAEFQSL
TPSQIKSMEKGEKVLPPCYRQEPAPKDREAKVERPSTLRQEQRPLPNVSTERERPQPVQAFSSALHEAAP
SQLEGKLPSPDVRQDDGEDTLFSEPKFAQVSSSNVVLKTGFDFLDNW
The entire protein has a theoretical molecular weight of 36.346 kDa and its isoelectric point is 5.6.
Isoforms
Three different isoforms of C1orf198 have been found. The longest isoform contains 327 amino acids and has a molecular mass of 36.3 kDa. The second isoform is 289 amino acids long. The third and last known isoform is 197 amino acids long and also lacks DUF4706.Amino acid composition
C1orf198 has the highest composition of serine, glutamic acid, proline, alanine, and arginine; It has the lowest composition of histidine. Relative to the average human protein, C1orf198 is serine-rich, proline-rich, and tyrosine-poor.Domain
This sequence includes a domain of unknown function, DUF4706, which is approximately 101 amino acids long. DUF4706 is located from amino acids 31 to 131 on C1orf198. It has a predicted molecular weight of 11.6 kDa and an isoelectric point of 5.41.Post-translational modifications
The post-translational modifications found in C1orf198 include phosphorylations, SUMOylations, and O-linked β-N-acetylglucosamine sites. While phosphorylations are the most common PTM and found in all protein types, O-GlcNAc is a regulatory PTM of nuclear and cytosolic proteins.Subcellular location
C1orf198 is predicted to be targeted towards the cytoplasm, mitochondria, and nucleus. The most highly supported sub cellular location is the cytoplasm, with many bioinformatics tools citing that as the sole location. Both immunohistochemistry and immunofluorescent staining of human cells showed strong cytoplasmic positivity. However, a mitochondrial targeting peptide was predicted in C1orf198, suggesting that its directed towards the mitochondria in some situations.Interactions
Multiple protein interactions with C1orf198 were found using text mining. One protein interaction involved SART1, which is also known as hypoxia-associated factor. SART1 is known to play a role in mRNA splicing and appears to play a role in hypoxia-induced regulation of EPO gene expression Another protein that interacts with C1orf198 is TOMM20, which is a mitochondrial import receptor subunit. TOMM20 is responsible for the recognition and translocation of cytosolically synthesized mitochondrial preproteins.Evolution
Paralogs
There are no known paralogs of C1orf198.Homologs
As seen in the table below, the homologs for C1orf198 trace back to insects, which diverged from human approximately 797 million years ago.| Species | Estimated Date of Divergence from Humans. | Identity | Similarity | Amino Acid Sequence Length | Reference Sequence |
| Homo sapiens | 0 | 100% | 100% | 327 | NP_116189 |
| Delphinapterus leucas | 96 | 81% | 86% | 317 | XP_022408830.1 |
| Hipposideros armiger | 96 | 79% | 85% | 317 | XP_019521397.1 |
| Erinaceus europaeus | 96 | 76% | 82% | 333 | XP_007538428.1 |
| Phascolarctos cinereus | 159 | 65% | 76% | 333 | XP_020856095.1 |
| Parus major | 312 | 59% | 72% | 335 | XP_015478640.1 |
| Numida meleagris | 312 | 59% | 71% | 335 | XP_021245723.1 |
| Gallus gallus | 312 | 59% | 70% | 334 | XP_015139870.1 |
| Pogona vitticeps | 312 | 58% | 69% | 333 | XP_020656857.1 |
| Notechis scutatus | 312 | 57% | 69% | 333 | XP_026525262.1 |
| Gekko japonicus | 312 | 57% | 69% | 330 | XP_015284731.1 |
| Xenopus tropicalis | 352 | 47% | 68% | 350 | XP_002942404.1 |
| Monopterus albus | 435 | 42% | 56% | 360 | XP_020471043.1 |
| Anabas testudineus | 435 | 42% | 56% | 352 | XP_026197678.1 |
| Danio rerio | 435 | 41% | 54% | 330 | NP_001188382.1 |
| Callorhinchus milii | 473 | 48% | 60% | 349 | XP_007896578.1 |
| Helicoverpa armigera | 797 | 28% | 40% | 284 | XP_021198534.1 |
| Copidosoma floridanum | 797 | 25% | 41% | 297 | XP_014207188.1 |
| Chilo suppressalis | 797 | 24% | 40% | 280 | RVE51599.1 |