C4orf51


Chromosome 4 open reading frame 51 is a protein which in humans is encoded by the C4orf51 gene''.''

Gene

The C4orf51 gene is located at 4q31.21 on the plus strand of chromosome 4. The gene spans 120,289 base pairs and contains 6 exons. The genomic neighborhood of C4orf51 includes LOC285422, LINC02491, NCOA4P3, and MMAA, all located upstream of C4orf51. ''ZNF827 and LOC105377468 are located downstream of C4orf51''.

mRNA

There are three known transcript variants for C4orf51, which encode for isoforms X1, X2, and X3. Though the variants vary in length, all contain exons 1 and 2. At times, C4orf51 is transcribed to form an mRNA corresponding to C4orf51 and the neighboring gene.

Protein

C4orf51 encodes for a protein with 202 amino acids and a molecular weight of 23 kDa. The theoretical isoelectric point of C4orf51 is 8.6. Relative to other human proteins, C4orf51 has more serine resides and fewer valine residues.

Domains and motifs

In humans, the C4orf51 protein contains one domain of unknown function, DUF4722. DUF4722 spans the first 168 amino acids of C4orf51 and has a predicted molecular weight of 19.3 kDa. In a compositional analysis of this domain, no extremes were identified. The DUF is highly conserved in orthologous proteins, particularly near the N-terminus.

Secondary structure

Alpha-helices are predicted to span amino acids 20-34 and 150–165 in C4orf51. Amino acids 45 to 48 are predicted to form a beta sheet. No coils are predicted in C4orf51.

Tertiary and quaternary structure

The best-aligned structural analog of C4orf51, generated by I-TASSER, contains Clr2_transil, a domain involved in transcriptional silencing. Per Origene, migration of a C4orf51 rabbit polyclonal antibody in gel resulted in a band at 23 kDa and at ~44-46 kDa, suggesting that C4of51 may form a dimer.

Post-translational modifications

C4orf51 is predicted to undergo several post-translational modifications, including phosphorylation, glycation, and acetylation. Though SUMOylation and tyrosine sulfation are also predicted, the sites of these modifications are not conserved in distant C4orf51 orthologs.

Subcellular localization

C4orf51 is predicted to be localized to the cell nucleus. The protein contains pat4, a motif commonly used to identify potential nuclear localization signals. This motif is conserved in the most distantly related C4orf51 ortholog known, found in ''Anolis carolinensis.''

Expression

C4orf51 expression is low in all tissues, with the exception of the testes. However, because C4orf51 contains long-terminal repeats of human endogenous retroviruses in the gene body, it has exhibited high levels of expression in differentiation-defective human induced pluripotent stem cells.

Promoter

There are two promoter regions predicted by Genomatix, but only one is located upstream of the transcription start site. GXP_921944 spans 1910 base pairs on chromosome 4. There are 15 coding transcripts supporting this promoter, but none are experimentally verified.

Interacting proteins

Experimentally-determined protein interactions for C4orf51 have not yet been identified.

Clinical significance

Vlaikou et al. report that a 4q deletion containing C4orf51 and six other genes causes growth failure and developmental delay, minor craniofacial dysmorphism, digital anomalies, and cardiac and skeletal defects.

Homology

Paralogs

No paralogs or paralogous domains exist for C4orf51.

Orthologs

Orthologs of C4orf51 have been found in mammals and reptiles. Within class Mammalia, orthologs have been identified in orders Primata, Scandentia, Lagomorpha, Rodentia, Perissodactyla, Chiroptera, Carnivora, Cetartiodactyla, Sirenia, and Proboscidea, as well as mammalian infraclass Marsupialia. The green anole and Burmese python contain the most distantly related orthologs of C4orf51. Both species diverged from humans an estimated 312 million years ago. C4orf51 orthologs have not yet been identified in bacteria, archaea, protists, plants, fungi, trichoplax, invertebrates, bony or cartilaginous fish, amphibians, or birds.
Genus and speciesCommon nameTaxonomic groupEstimated date of divergenceAccession numberLength Sequence identitySequence similarity
Homo sapiensHumanMammalia 0NP_001074000.1202100.00%100%
Macaca mulattaRhesus macaqueMammalia 29.44NP_001181807.120294.55%97%
Callithrix jacchusCommon marmosetMammalia 43.6XP_008990874.121779.72%88%
Tupaia chinensisChinese tree shrewMammalia 82XP_006143532.120168.9677%
Oryctolagus cuniculusEuropean rabbitMammalia 90XP_017202803.122257.40%76%
Mus musculusHouse mouseMammalia 90NP_080591.120850.96%66%
Urocitellus parryiiArctic ground squirrelMammalia 90XP_026248522.114243.35%72%
Ceratotherium simum simumSouthern white rhinocerosMammalia 96XP_014635653.119966.50%77%
Equus asinusDonkeyMammalia 96XP_014693612.120164.71%75%
Pteropus vampyrusLarge flying foxMammalia 96XP_023385935.120062.56%71%
Enhydra lutris kenyoniSea otterMammalia 96XP_022368037.120161.39%74%
Myotis brandtiiBrandt's batMammalia 96XP_014393999.119959.11%69%
Callorhinus ursinusNorthern fur sealMammalia 96XP_025730051.114650.50%68%
Vicugna pacosAlpacaMammalia 96XP_00621000715850.50%59%
Balaenoptera acutorostrata scammoniMinke whaleMammalia 96XP_007189508.116849.51%56%
Trichechus manatus latirostrisWest Indian manateeMammalia 105XP_004378925.116257.64%66%
Loxodonta africanaAfrican bush elephantMammalia 105XP_023412869.121353.00%65%
Sarcophilus harrisiiTasmanian devilMammalia 159XP_023361728.119028.71%52%
Anolis carolinensisGreen anoleReptilia312XP_003221711.119421.53%46%
Python bivittatusBurmese pythonReptilia312XP_025028520.117619.43%51%