Intrinsically disordered proteins


In molecular biology, an intrinsically disordered protein is a protein that lacks a fixed or ordered three-dimensional structure, typically in the absence of its macromolecular interaction partners, such as other proteins or RNA. IDPs range from fully unstructured to partially structured and include random coil, molten globule-like aggregates, or flexible linkers in large multi-domain proteins. They are sometimes considered as a separate class of proteins along with globular, fibrous and membrane proteins.
IDPs are a very large and functionally important class of proteins. They are most numerous in eukaryotes, with an estimated 30-40% of residues in the eukaryotic proteome located in disordered regions. Disorder is present in around 70% of proteins, either in the form of disordered tails or flexible linkers. Proteins can also be entirely disordered and lack a defined secondary and/or tertiary structure. Their discovery has disproved the idea that three-dimensional structures of proteins must be fixed to accomplish their biological functions. For example, IDPs have been identified to participate in weak multivalent interactions that are highly cooperative and dynamic, lending them importance in DNA regulation and in cell signaling. Many IDPs can also adopt a fixed three-dimensional structure after binding to other macromolecules. Overall, IDPs are different from structured proteins in many ways and tend to have distinctive function, structure, sequence, interactions, evolution and regulation.

History

In the 1930s-1950s, the first protein structures were solved by protein crystallography. These early structures suggested that a fixed three-dimensional structure might be generally required to mediate biological functions of proteins. These publications solidified the central dogma of molecular biology in that the amino acid sequence of a protein determines its structure which, in turn, determines its function. In 1950, Fred Karush at the Neurological Institute of New York described the "configurational adaptability" found in serum albumins contradicting this assumption. Karush was convinced that proteins have more than one configuration at the same energy level and can choose one when binding to other substrates. In the 1960s, Levinthal's paradox suggested that the systematic conformational search of a long polypeptide is unlikely to yield a single folded protein structure on biologically relevant timescales. Curiously, for many proteins or protein domains, relatively rapid and efficient refolding can be observed in vitro. As stated in Anfinsen's Dogma from 1973, the fixed 3D structure of these proteins is uniquely encoded in its primary structure, is kinetically accessible and stable under a range of physiological conditions, and can therefore be considered as the native state of such "ordered" proteins.
During the subsequent decades, however, many large protein regions could not be assigned in x-ray datasets, indicating that they occupy multiple positions, which average out in electron density maps. The lack of fixed, unique positions relative to the crystal lattice suggested that these regions were "disordered". Nuclear magnetic resonance spectroscopy of proteins also demonstrated the presence of large flexible linkers and termini in many solved structural ensembles.
In 2001, Dunker questioned whether information was ignored for 50 years with more quantitative analyses becoming available in the 2000s. In the 2010s it became clear that IDPs are common among disease-related proteins, such as alpha-synuclein and tau.

Abundance

Proteins exist as an ensemble of similar structures with some regions more constrained than others. IDPs occupy the extreme end of this spectrum of flexibility and include proteins of considerable local structure tendency or flexible multidomain assemblies.
Intrinsic disorder is particularly elevated among proteins that regulate chromatin and transcription, and bioinformatic predictions indicate that is more common in genomes and proteomes than in known structures in the protein database. Based on DISOPRED2 prediction, long disordered segments occur in 2.0% of archaean, 4.2% of eubacterial and 33.0% of eukaryotic proteins, including certain disease-related proteins.

Biological roles

Highly dynamic disordered regions of proteins have been linked to functionally important phenomena such as allosteric regulation and enzyme catalysis. Many disordered proteins have the binding affinity with their receptors regulated by post-translational modification, thus it has been proposed that the flexibility of disordered proteins facilitates the different conformational requirements for binding the modifying enzymes as well as their receptors. Intrinsic disorder is particularly enriched in proteins implicated in cell signaling and transcription, as well as chromatin remodeling functions. Genes that have recently been born de novo tend to have higher disorder. In animals, genes with high disorder are lost at higher rates during evolution.

Flexible linkers

Disordered regions are often found as flexible linkers or loops connecting domains. Linker sequences vary greatly in length but are typically rich in polar uncharged amino acids. Flexible linkers allow the connecting domains to freely twist and rotate to recruit their binding partners via protein domain dynamics. They also allow their binding partners to induce larger scale conformational changes by long-range allostery. The flexible linker of FBP25 which connects two domains of FKBP25 is important for the binding of FKBP25 with DNA.

Linear motifs

Linear motifs are short disordered segments of proteins that mediate functional interactions with other proteins or other biomolecules. Many roles of linear motifs are associated with cell regulation, for instance in control of cell shape, subcellular localisation of individual proteins and regulated protein turnover. Often, post-translational modifications such as phosphorylation tune the affinity of individual linear motifs for specific interactions. Relatively rapid evolution and a relatively small number of structural restraints for establishing novel interfaces make it particularly challenging to detect linear motifs but their widespread [|biological roles] and the fact that many viruses mimick/hijack linear motifs to efficiently recode infected cells underlines the timely urgency of research on this very challenging and exciting topic.

Pre-structured motifs

Unlike globular proteins, IDPs do not have spatially-disposed active pockets. 80% of target-unbound IDPs subjected to detailed structural characterization by NMR possess linear motifs termed PresMos that are transient secondary structural elements primed for target recognition. In several cases it has been demonstrated that these transient structures become full and stable secondary structures, e.g., helices, upon target binding. Hence, PresMos are the putative active sites in IDPs.

Coupled folding and binding

Many unstructured proteins undergo transitions to ordered states upon binding to their targets. The coupled folding and binding may be local, involving only a few interacting residues, or it might involve an entire protein domain. It was recently shown that the coupled folding and binding allows the burial of a large surface area that would be possible only for fully structured proteins if they were much larger. Moreover, certain disordered regions might serve as "molecular switches" in regulating certain biological function by switching to ordered conformation upon molecular recognition like small molecule-binding, DNA/RNA binding, ion interactions etc.
The ability of disordered proteins to bind, and thus to exert a function, shows that stability is not a required condition. Many short functional sites, for example short linear motifs are over-represented in disordered proteins. Disordered proteins and short linear motifs are particularly abundant in many RNA viruses such as Hendra virus, HCV, HIV-1 and human papillomaviruses. This enables such viruses to overcome their informationally limited genomes by facilitating binding, and manipulation of, a large number of host cell proteins.

Disorder in the bound state (fuzzy complexes)

Intrinsically disordered proteins can retain their conformational freedom even when they bind specifically to other proteins. The structural disorder in bound state can be static or dynamic. In fuzzy complexes structural multiplicity is required for function and the manipulation of the bound disordered region changes activity. The conformational ensemble of the complex is modulated via post-translational modifications or protein interactions. Specificity of DNA binding proteins often depends on the length of fuzzy regions, which is varied by alternative splicing. Some fuzzy complexes may exhibit high binding affinity, although other studies showed different affinity values for the same system in a different concentration regime.

Structural aspects

Intrinsically disordered proteins adapt a dynamic range of rapidly interchanging conformations in vivo according to the cell's conditions, creating a structural or conformational ensemble.
Their structures are strongly function-related. Few proteins are fully disordered in their native state. Disorder is mostly found in intrinsically disordered regions within an otherwise well-structured protein. The term intrinsically disordered protein therefore includes proteins that contain IDRs as well as fully disordered proteins.
The existence and kind of protein disorder is encoded in its amino acid sequence. In general, IDPs are characterized by a low content of bulky hydrophobic amino acids and a high proportion of polar and charged amino acids, usually referred to as low hydrophobicity. This property leads to good interactions with water. Furthermore, high net charges promote disorder because of electrostatic repulsion resulting from equally charged residues. Thus disordered sequences cannot sufficiently bury a hydrophobic core to fold into stable globular proteins. In some cases, hydrophobic clusters in disordered sequences provide the clues for identifying the regions that undergo coupled folding and binding. Many disordered proteins reveal regions without any regular secondary structure. These regions can be termed as flexible, compared to structured loops. While the latter are rigid and contain only one set of Ramachandran angles, IDPs involve multiple sets of angles. The term flexibility is also used for well-structured proteins, but describes a different phenomenon in the context of disordered proteins. Flexibility in structured proteins is bound to an equilibrium state, while it is not so in IDPs. Many disordered proteins also reveal low complexity sequences, i.e. sequences with over-representation of a few residues. While low complexity sequences are a strong indication of disorder, the reverse is not necessarily true, that is, not all disordered proteins have low complexity sequences. Disordered proteins have a low content of predicted secondary structure.
Topological approaches have been developed to search for conformational patterns in their dynamics. For instance, circuit topology has been applied to track the dynamics of disordered protein domains. By employing a topological approach, one can categorize motifs according to their topological buildup and the timescale of their formation.
A common aspect of IDP structural ensembles is the ability or tendency to fold upon an interaction to a binding partner in the cell. Examples of IDP folding in a binding context are binding-coupled folding, and formation of fuzzy complexes. However, it is also possible for proteins to remain entirely disordered in a binding scenario. Conversely, it is also possible for an isolated IDP to form compact states while preserving disorder and high solvent accessibility.