UBALD1
UBALD1 is a protein encoded by the UBALD1 gene, located on chromosome 16 in humans. UBALD1 has high ubiquitous tissue expression and localizes in the nucleus and cytoplasm. UBALD1 is conserved in animals, including invertebrates. An alias for UBALD1 is FAM100A.
Gene
The human UBALD1 gene is located on the minus strand of chromosome 16 at cytogenetic location 16p13.3. The gene contains three exons and two introns, with a total gene length of 6,145 base pairs.Transcripts
There are three isoforms of UBALD1 in humans, all of which contain three exons. UBALD1 isoform 1 has a mRNA sequence of 1,374 nucleotides and encodes the longest protein. Isoform 2 differs with an exclusion of 75 nucleotides at the start of exon 3, and isoform 3 differs with an insertion of 185 nucleotides at the end of exon 1.| Transcript Variant | mRNA length | Exon 1 length | Exon 2 length | Exon 3 length | Protein Isoform | Protein Length |
| 1 | 1374 | 211 | 62 | 1099 | 1 | 177 |
| 2 | 1299 | 211 | 62 | 1024 | 2 | 152 |
| 3 | 1559 | 396 | 62 | 1099 | 3 | 122 |
Protein
Isoforms
UBALD1 isoform 1 encodes the longest protein that consists of 177 amino acids. The protein sequence from isoform 2 is 85.9% identical to isoform 1. Isoform 2 contains an exclusion of 25 amino acids in exon 3 and lacks the PHA03247 domain. Isoform 3 greatly differs from isoform 1 and 2, being 35.6% identical to isoform 1. Isoform 3 contains 122 amino acids and an inclusion at exon 1, causing a frameshift of codons and earlier occurrence of its stop codon. Isoform 3 also lacks the PHA03247 domain.Properties and domains
The protein encoded by UBALD1 isoform 1 has a predicted isoelectric point of 6.13 and a molecular weight 19.0 kDa. UBALD1 composition is rich in alanine and proline, and contains multiple duplets/triplets of these residues. Proline residues and multiplets are highly conserved, specifically within the PHA03247 domain. The protein contains one domain, PHA03247, or large tegument protein UL36 domain. Tegument protein UL36 is the largest tegument protein found in herpes simplex virus 1, and contains deubiquitinating activity.Structure
The UBALD1 protein secondary structure consists of mostly coils and four short alpha helical regions. Its tertiary structure is subsequently coiled and globular-like.Regulation
Gene Level Expression
UBALD1 is a highly expressed gene, 1.4x more expressed than the average gene. UBALD1 has ubiquitous expression, with its highest levels in the placenta, skeletal muscle, liver, and brain. Within the brain, UBALD1 expression is highest in the hippocampal formation and olfactory regionsProtein level regulation
UBALD1 protein is predicted to localize in the nucleus and cytoplasm. A nuclear export signal is located moderately at positions 79-85 and strongly at positions 174–177. UBALD1 has many predicted phosphorylation and glycosylation sites, with known phosphorylation sites at S88, S90, S93, and S96.Evolutionary history
Orthologs
The ortholog space for UBALD1 is large, with its most distant orthologs diverging 694 million years ago in invertebrates. The orthologs include most vertebrates, such as mammals, birds, reptiles, amphibians, fishes, as well as some invertebrates, such as arthropods, cnidaria, mollusks, echinoderms, and nematodes. There are no orthologs in fungi, plants, or bacteria. Closely related orthologs, including mammals, birds, and reptiles, range 67-92% sequence similarity. Moderately related orthologs, including amphibians and fishes, range sequence 55-75% similarity. Distantly related orthologs, including invertebrates, range 29-50% sequence similarity.| Taxonomic group | Genus and species | Common name | Date of Divergence | Sequence length ! !Sequence similarity % | Accession number | - |
| Mammals | Homo sapiens | human | 0 | 177 | 100 | |
| Mammals | Mus musculus | mouse | 87 | 176 | 92.1 | |
| Mammals | Phascolarctos cinereus | koala | 160 | 174 | 80.6 | |
| Aves | Gallus gallus | chicken | 319 | 162 | 76.3 | |
| Aves | Strigops habroptila | owl parrot | 319 | 162 | 74.6 | |
| Reptiles | Chrysemys picta bellii | painted turtle | 319 | 165 | 76.3 | |
| Reptiles | Varanus komodoensis | komodo dragon | 319 | 166 | 67.6 | |
| Amphibians | Geotrypetes seraphini | gaboon caecilian | 353 | 161 | 70.4 | |
| Amphibians | Bufo bufo | common toad | 353 | 142 | 65.4 | |
| Fishes | Danio rerio | zebra fish | 431 | 155 | 68.0 | |
| Fishes | Petromyzon marinus | sea lamprey | 599 | 179 | 64.7 | |
| Invertebrates | Styela clava | tunicates | 603 | 111 | 42.9 | |
| Invertebrates | Strongylocentrotus purpuratus | purple sea urchin | 619 | 183 | 36.2 | |
| Invertebrates | Trichoplax sp. H2 | trichoplax | 661 | 94 | 39.5 | |
| Invertebrates | Vanessa tameamea | kamehameha butterfly | 694 | 151 | 50.5 | |
| Invertebrates | Gigantopelta aegis | deep sea snail | 694 | 128 | 42.1 | |
| Invertebrates | Mercenaria mercenaria | hard clam | 694 | 122 | 42.5 |
Paralogs
The human UBALD1 protein has one paralog, UBALD2, present in vertebrates but not invertebrates. They are similar, with a 63.1% sequence identity and 70.9% sequence similarity. UBALD2 protein has a length of 164 amino acids, predicted isoelectric point of 6.78 and molecular weight of 17.7 kDa.The UL36 tegument protein domain of UBALD1 is partially conserved in the UBALD2 paralog. UBALD1 has monoallelic expression, where as UBALD2 has biallelic expression.