Domain of unknown function
A domain of unknown function is a protein domain that has no characterised function. These families have been collected together in the Pfam database using the prefix DUF followed by a number, with examples being DUF2992 and DUF1220. As of 2019, there are almost 4,000 DUF families within the Pfam database representing over 22% of known families. Some DUFs are not named using the nomenclature due to popular usage but are nevertheless DUFs.
The DUF designation is tentative, and such families tend to be renamed to a more specific name after a function is identified.
History
The DUF naming scheme was introduced by Chris Ponting, through the addition of DUF1 and DUF2 to the SMART database. These two domains were found to be widely distributed in bacterial signaling proteins. Subsequently, the functions of these domains were identified and they have since been renamed as the GGDEF domain and EAL domain respectively.Characterisation
programmes have attempted to understand the function of DUFs through structure determination. The structures of over 250 DUF families have been solved. This work showed that about two thirds of DUF families had a structure similar to a previously solved one and therefore likely to be divergent members of existing protein superfamilies, whereas about one third possessed a novel protein fold.Some DUF families share remote sequence homology with domains that has characterized function. Computational work can be used to link these relationships. A 2015 work was able to assign 20% of the DUFs to characterized structural superfamilies. Pfam also continuously perform the assignment in "clan" superfamily entries.