C7orf50


C7orf50 is a gene in humans that encodes a protein known as C7orf50. This gene is ubiquitously expressed in the kidneys, brain, fat, prostate, spleen, among 22 other tissues and demonstrates low tissue specificity. C7orf50 is conserved in chimpanzees, Rhesus monkeys, dogs, cows, mice, rats, and chickens, along with 307 other organisms from mammals to fungi. This protein is predicted to be involved with the import of ribosomal proteins into the nucleus to be assembled into ribosomal subunits as a part of rRNA processing. Additionally, this gene is predicted to be a microRNA protein coding host gene, meaning that it may contain miRNA genes in its introns and/or exons''.''

Gene

Background

C7orf50, also known as YCR016W, MGC11257, and LOC84310, is a protein coding gene of poor characterization in need of further research. This gene can be accessed on NCBI at the accession number, on HGNC at the ID number, on ENSEMBL at the ID, on GeneCards at, and on UniProtKB at the ID .

Location

C7orf50 is located on the short arm of chromosome 7, starting at base pair 977,964 and ending at bp 1,138,325. This gene spans 160,361 bps on the minus strand and contains a total of 13 exons.

Gene Neighborhood

Genes within the neighborhood of C7orf50 are the following: LOC105375120, GPR146, LOC114004405, LOC107986755, ZFAND2A, LOC102723758, LOC106799841, COX19, ADAP1, CYP2W1, MIR339, GPER1, and LOC101927021. This neighborhood extends from bp 89700 to bp 1165958 on chromosome 7.

mRNA

Alternative Splicing

C7orf50 has a total of 7 experimentally curated mRNA transcripts. These transcripts are maintained independently of annotated genomes and were not generated computationally from a specific genome build such as the GRCh38.p13 primary assembly; therefore, they are typically more reliable. The longest and most complete of these transcripts being 2138bp, producing a 194 amino acid-long protein, and consisting of 5 exons. Of these transcripts, four of them encode for the same 194aa protein, only differing in their 5' and 3' untranslated regions. The three other transcripts encode isoform b, c, and d, respectively. The table below is representative of these transcripts.
Alternatively, when the primary genomic assembly, GRCh38.p13, is used for annotation, there are 10 computationally predicted mRNA transcripts. The most complete and supported of these transcripts is 1896bp, producing a 225aa-long protein. In total, there are 6 different isoforms predicted for C7orf50. Of these transcripts, 5 of them encode for the same isoform. The remaining transcripts encode isoforms X2, X4, X5, X6, and X7 as represented below.

5' and 3' UTR

Based on the experimentally determined C7orf50 mRNA transcript variant 4, the 5' UTR of C7orf50 is 934 nucleotides long, while the 3' UTR is 619nt. The coding sequence of this transcript spans nt 935..1519 for a total length of 584nt and is encoded in reading frame 2. Interestingly, the 5'UTR of C7orf50 contains a uORF in need of further study, ranging from nt 599 to nt 871 also in the second reading frame.

Protein

General Properties

The C7orf50 Isoform a's 194aa protein sequence from NCBI is as follows:
>NP_001127867.1 uncharacterized protein C7orf50 isoform a
MAKQKRKVPEVTEKKNKKLKKASAEGPLLGPEAAPSGEGAGSKGEAVLRPGLDAEPELSPEEQRVLERKL 70
KKERKKEERQRLREAGLVAQHPPARRSGAELALDYLCRWAQKHKNWRFQKTRQTWLLLHMYDSDKVPDEH 140
FSTLLAYLEGLQGRARELTVQKAEA
LMRELDEEGSDPPLPGRAQRIRQVLQLLS 194
The underlined region within the sequence is indicative of a domain known as DUF2373, found in isoforms a, b, and c.
C7orf50 has a predicted molecular weight of 22 kDa, making C7orf50 smaller than the average protein. The isoelectric point for this isoform is 9.7, meaning that C7orf50 is slightly basic. As for charge runs and patterns within isoform a, there is a significant mixed charge run from aa67 to aa79 and an acidic run from aa171 – aa173. It is likely that this mixed charge run encodes the protein-protein interaction site of C7orf50.
Characterization of the protein has shown binding to GPR146. Based on a proposed role in regulation of serum cholesterol levels in response to dietary cholesterol intake, the protein has been called cholesin.

Domains and Motifs

DUF2373 is a domain of unknown function found in the C7orf50 protein. This is a highly conserved c-terminal region found from fungi to humans. As for motifs, a bipartite nuclear localization signal was predicted from aa6 to aa21, meaning that C7orf50 is likely localized in the nucleus. Interestingly, a nuclear export signal is also found within the C7orf50 protein at the following amino acids: 150, and 153 - 155, suggesting that C7orf50 has function both inside and outside the nucleus.

Structure

Secondary Structure

The majority of C7orf50 secondary structure is made up of alpha helices, with the remainder being small portions of random coils, beta turns, or extended strands.

Tertiary Structure

The tertiary structure of C7orf50 consists primarily of alpha helices as determined I-TASSER.

Quaternary Structure

The interaction network involving the C7orf50 protein has significantly more interactions than a randomly selected set of proteins. This indicates that these proteins are partially connected biologically as a group; therefore, they intrinsically depend on each other within their biological pathway. This means that although the function of C7orf50 is uncharacterized, it is most likely to be associated with the same processes and functions as the proteins within its network.
Biological ProcessesrRNA processingmaturation of 5.8S, LSU, and SSU rRNA
Molecular Functionscatalytic activity, acting on RNAATP-dependent RNA helicase activity
Cellular Componentsnucleoluspreribosomes
Reactome Pathwaysmajor pathway of rRNA processing in the nucleolus and cytosolrRNA modification in the nucleus and cytosol
Protein Domains and Motifshelicase conserved C-terminal domainDEAD/DEAH box helicase

The closest predicted functional partners of C7orf50 are the following proteins: DDX24, DDX52, PES1, EBNA1BP2, RSLD1, NOP14, FTSJ3, KRR1, LYAR, and PWP1. These proteins are predicted to co-express rather than bind directly C7orf50 and each other.

Regulation

Gene Regulation

Promoter

C7orf50 has 6 predicted promoter regions. The promoter with the greatest number of transcripts and CAGE tags overall is promoter set 6 on ElDorado by Genomatix. This promoter region is on the minus strand and has a start position of 1,137,965 and an end position of 1,139,325, making this promoter 1,361bp long. It has 16 coding transcripts and the transcript with the greatest identity to C7orf50 transcript 4 is transcript GXT_27788039 with 98746 CAGE tags.
Promoter IDStart PositionEnd PositionLength# of Coding TranscriptsGreatest # of CAGE Tags in Transcripts
GXP_9000582101306310131631101bp0N/A
GXP_6755691102823910300701832bp4169233
GXP_6053282105520610563061101bp1449
GXP_3207505112728811283881101bp1545
GXP_9000584113054111316411101bp0N/A
GXP_6755694113796511393251361bp16100,070

The CpG island associated with this promoter has 75 CpGs, and is 676bp long. The C count plus G count is 471, the percentage C or G is 70% within this island, and the ratio of observed to expected CpG is 0.91.

Transcription Factor Binding Sites

As determined by MatInspector at Genomatix, the following transcription factor families are most highly predicted to bind to C7orf50 in the promoter region.
Transcription FactorDetailed Family Information
NR2FNuclear receptor subfamily 2 factors
PEROPeroxisome proliferator-activated receptor
HOMFHomeodomain transcription factors
PRDMPR domain transcription factor
VTBPVertebrate TATA binding protein factor
HZIPHomeodomain-leucine zipper transcription factors
ZTREZinc transcriptional regulatory element
XBBFX-box binding factors
SP1FGC-Box factors SP1/GC
CAATCCAAT binding factors
ZF57KRAB domain zinc finger protein 57
CTCFCTCF and BORIS gene family, transcriptional regulators with highly conserved zinc finger domains
MYODMyoblast determining factors
KLFSKrueppel like transcription factors

Expression Pattern

C7orf50 shows ubiquitous expression in the kidneys, brain, fat, prostate, spleen and 22 other tissues and low tissue and immune cell specificity. This expression is very high, 4 times above the average gene; therefore, there is a higher abundance of C7orf50 mRNA than the average gene within a cell. There does not appear to be a definitive cell type in which this gene is not expressed.

Transcription Regulation

Splice Enhancers

The mRNA of C7orf50 is predicted to have exonic splicing enhancers, in which SR proteins can bind, at bp positions 45, 246, 703, 1301, and 1308

Stem Loop Prediction

Both the 5' and 3' UTRs of the mRNA of C7orf50 are predicted to fold into structures such as bulge loops, internal loops, multibranch loops, hairpin loops, and double helices. The 5'UTR has a predicted free energy of -416 kcal/mol with an ensemble diversity of 238. The 3' UTR has a predicted free energy of -279 kcal/mol with an ensemble diversity of 121.

miRNA Targeting

There are many poorly conserved miRNA binding sites predicted within the 3'UTR of C7orf50 mRNA. The notable miRNA families that are predicted to bind to C7orf50 mRNA and regulate/repress transcription are the following: miR-138-5p, miR-18-5p, miR-129-3p, miR-124-3p.1, miR-10-5p, and miR-338-3p.

Protein Regulation

Subcellular Localization

The C7orf50 protein is predicted to localize intercellularly in both the nucleus and cytoplasm, but primarily within the nucleoplasm and nucleoli.

Post-Translational Modification

The C7orf50 protein is predicted to be mucin-type GalNAc o-glycosylated at the following amino acid sites: 12, 23, 36, 42, 59, and 97. Additionally, this protein is predicted to be SUMOylated at aa71 with the SUMO protein binding from aa189 through aa193. C7orf50 is also predicted to be kinase-specific phosphorylated at the following amino acids: 12, 23, 36, 42, 59, 97, 124, 133, 159, and 175. Interestingly, many of these sites overlap with the o-glycosylation sites. Of these phosphorylation sites, the majority are serines with the remainder being either tyrosines or threonines. The most associated kinases with these sites are the following kinase groups:, CAMK,, and . Finally, this protein is predicted to have 8 glycations of the ε amino groups of lysines at the following sites: aa3, 5, 14, 15, 17, 21, 76, and 120.

Homology

Paralogs

No paralogs of C7orf50 have been detected in the human genome; however, there is slight evidence of a paralogous DUF2373 domain in the protein of KIDINS220.

Orthologs

Below is a table of a variety of orthologs of the human C7orf50 gene. The table includes closely, moderately, and distantly related orthologs. C7orf50 is highly evolutionary conserved from mammals to fungi. When these ortholog sequences are compared, the most conserved portions are those of DUF2373, highlighting this domain's importance in the functioning of C7orf50. C7orf50 has evolved moderately and evenly over time with a divergence rate greater than Hemoglobin but less than Cytochrome C.
Genus and speciesCommon nameTaxon ClassDate of Divergence Accession #Length % identity w/ human
Homo sapiensHumanMammaliaN/ANM_001318252.2194aa100%
Tupaia chinensisChinese Tree ShrewMammalia82XP_006167949.1194aa76%
Dasypus novemcinctusNine-banded ArmadilloMammalia105XP_004483895.1198aa70%
Miniopterus natalensNatal Long-fingered BatMammalia96XP_016068464.1199aa69%
Protobothrops mucrosquamatusBrown-spotted Pit ViperReptilia312XP_015673296.1196aa64%
Balearica regulorum gibbericepsGrey-crowned CraneAves312XP_010302837.1194aa61%
Falco peregrinusPeregrine FalconAves312XP_027635198.1193aa59%
Xenopus laevisAfrican Clawed FrogAmphibia352XP_018094637.1198aa50%
Electrophorus electricusElectric EelActinopterygii435XP_026880604.1195aa53%
Rhincodon typusWhale SharkChondrichthyes465XP_020372968.1195aa52%
Ciona intestinalisSea VaseAscidiacea676XP_026696561.1282aa37%
Octopus bimaculoidesCalifornia Two-spot OctopusCephalopoda797XP_014772175.1221aa40%
Priapulus caudatusPriapulusPriapulida797XP_014663190.1333aa39%
Bombus terrestrisBuff-tailed BumblebeeInsecta797XP_012171653.1260aa32%
Actinia tenebrosaAustralian Red Waratah Sea AnemoneAnthozoa824XP_031575029.1330aa43%
Trichoplax adhaerensTrichoplaxTrichoplacidae948XP_002110193.1137aa44%
Spizellomyces punctatusBranching Chytrid FungiFungi1105XP_016610491.1412aa29%
Eremothecium cymbalariaeFungiFungi1105XP_003644395.1266aa25%
Quercus suberCork Oak TreePlantae1496XP_023896156.1508aa30%
Plasmopara halstediiDowny Mildew of SunflowerOomycetes1768XP_024580369.1179aa26%

Function

The consensus prediction of C7orf50 function, as determined by I-TASSER, predicts the molecular function to be, the biological process to be, and the associated cellular component to be a . It can be predicted that the function of C7orf50 is one in which C7orf50 imports ribosomal proteins into the nucleus in order to be made into ribosomes, but further research is needed to solidify this function.

Clinical Significance

C7orf50 has been noted in a variety of genome-wide association studies and has been shown to be associated with type 2 diabetes among sub-Saharan Africans, daytime sleepiness in African-Americans, prenatal exposure to particulate matter, heritable DNA methylation marks associated with breast cancer, DNA methylation in relation to plasma carotenoids and lipid profile, and has significant interactions with prion proteins.