LSm
In molecular biology, LSm proteins are a family of RNA-binding proteins found in virtually every cellular organism. LSm is a contraction of 'like Sm', because the first identified members of the LSm protein family were the Sm proteins. LSm proteins are defined by a characteristic three-dimensional structure and their assembly into rings of six or seven individual LSm protein molecules, and play a large number of various roles in mRNA processing and regulation.
The Sm proteins were first discovered as antigens targeted by so-called anti-Sm antibodies in a patient with a form of systemic lupus erythematosus, a debilitating autoimmune disease. They were named Sm proteins in honor of Stephanie Smith, a patient who suffered from SLE. Other proteins with very similar structures were subsequently discovered and named LSm proteins. New members of the LSm protein family continue to be identified and reported.
Proteins with similar structures are grouped into a hierarchy of protein families, superfamilies, and folds. The LSm protein structure is an example of a small beta sheet folded into a short barrel. Individual LSm proteins assemble into a six or seven member doughnut ring, which usually binds to a small RNA molecule to form a ribonucleoprotein complex. The LSm torus assists the RNA molecule to assume and maintain its proper three-dimensional structure. Depending on which LSm proteins and RNA molecule are involved, this ribonucleoprotein complex facilitates a wide variety of RNA processing including degradation, editing, splicing, and regulation.
Alternate terms for LSm family are LSm fold and Sm-like fold, and alternate capitalization styles such as lsm, LSM, and Lsm are common and equally acceptable.
History
Discovery of the Smith antigen
The story of the discovery of the first LSm proteins begins with a young woman, Stephanie Smith, who was diagnosed in 1959 with systemic lupus erythematosus, eventually succumbing to complications of the disease in 1969 at the age of 22. During this period, she was treated at New York's Rockefeller University Hospital, under the care of Dr. Henry Kunkel and Dr. Eng Tan. As those with an autoimmune disease, SLE patients produce antibodies to antigens in their cells' nuclei, most frequently to their own DNA. However, Kunkel and Tan found in 1966 that Smith produced antibodies to a set of nuclear proteins, which they named the 'smith antigen'. About 30% of SLE patients produce antibodies to these proteins, as opposed to double stranded DNA. This discovery improved diagnostic testing for SLE, but the nature and function of this antigen was unknown.Sm proteins, snRNPs, the spliceosome and messenger RNA splicing
Research continued during the 1970s and early 1980s. The smith antigen was found to be a complex of ribonucleic acid molecules and multiple proteins. A set of uridine-rich small nuclear RNA molecules was part of this complex, and given the names U1, U2, U4, U5 and U6. Four of these snRNAs were found to be tightly bound to several small proteins, which were named SmB, SmD, SmE, SmF, and SmG in decreasing order of size. SmB has an alternatively spliced variant, SmB', and a very similar protein, SmN, replaces SmB'/B in certain tissues. SmD was later discovered to be a mixture of three proteins, which were named SmD1, SmD2 and SmD3. These nine proteins became known as the Sm core proteins, or simply Sm proteins. The snRNAs are complexed with the Sm core proteins and with other proteins to form particles in the cell's nucleus called small nuclear ribonucleoproteins, or snRNPs. By the mid-1980s, it became clear that these snRNPs help form a large complex, called the spliceosome, around pre-mRNA, excising portions of the pre-mRNA called introns and splicing the coding portions together. After a few more modifications, the spliced pre-mRNA becomes messenger RNA which is then exported from the nucleus and translated into a protein by ribosomes.Discovery of proteins similar to the Sm proteins
The snRNA U6 does not associate with the Sm proteins, even though the U6 snRNP is a central component in the spliceosome. In 1999 a protein heteromer was found that binds specifically to U6, and consisted of seven proteins clearly homologous to the Sm proteins. These proteins were denoted LSm proteins, with the similar LSm8 protein identified later. In the bacterium Escherichia coli, the Sm-like protein HF-I encoded by the gene hfq was described in 1968 as an essential host factor for RNA bacteriophage Qβ replication. The genome of Saccharomyces cerevisiae was sequenced in the mid-1990s, providing a rich resource for identifying homologs of these human proteins. Subsequently, as more eukaryotes genomes were sequenced, it became clear that eukaryotes, in general, share homologs to the same set of seven Sm and eight LSm proteins. Soon after, proteins homologous to these eukaryote LSm proteins were found in Archaea and Bacteria. The archaeal LSm proteins are more similar to the eukaryote LSm proteins than either are to bacterial LSm proteins. The LSm proteins described thus far were rather small proteins, varying from 76 amino acids for human SmG to 231 amino acids for human SmB. But recently, larger proteins have been discovered that include a LSm structural domain in addition to other protein structural domains.Discovery of the LSm fold
Around 1995, comparisons between the various LSm homologs identified two sequence motifs, 32 nucleic acids long, that were very similar in each LSm homolog, and were separated by a non-conserved region of variable length. This indicated the importance of these two sequence motifs, and suggested that all LSm protein genes evolved from a single ancestral gene. In 1999, crystals of recombinant Sm proteins were prepared, allowing X-ray crystallography and determination of their atomic structure in three dimensions. This demonstrated that the LSm proteins share a similar three-dimensional fold of a short alpha helix and a five-stranded folded beta sheet, subsequently named the LSm fold. Other investigations found that LSm proteins assemble into a torus of six or seven LSm proteins, and that RNA binds to the inside of the torus, with one nucleotide bound to each LSm protein.Structure
Uridine phosphate binds in archaeal Sm1 between the β2b/β3a loop and β4b/β5 loop. The uracil is stacked between the histidine and arginine residues, stabilized by hydrogen bonding to an asparagine residue, and hydrogen bonding between the aspartate residue and the ribose. LSm proteins are characterized by a beta sheet, folded into the LSm fold, polymerization into a six or seven member torus, and binding to RNA oligonucleotides. A modern paradigm classifies proteins on the basis of protein structure and is a currently active field, with three major approaches, SCOP, CATH, and FSSP/DALI.Secondary
The secondary structure of a LSm protein is a small five-strand anti-parallel beta sheet, with the strands identified from the N-terminal end to the C-terminal end as β1, β2, β3, β4, β5. The SCOP class of All beta proteins and the CATH class of Mainly Beta are defined as protein structures that are primarily beta sheets, thus including LSm. The SM1 sequence motif corresponds to the β1, β2, β3 strands, and the SM2 sequence motif corresponds to the β4 and β5 strands. The first four beta strands are adjacent to each other, but β5 is adjacent to β1, turning the overall structure into a short barrel. This structural topology is described as 51234. A short N-terminal alpha helix is also present in most LSm proteins. The β3 and β4 strands are short in some LSm proteins, and are separated by an unstructured coil of variable length. The β2, β3 and β4 strands are strongly bent about 120° degrees at their midpoints The bends in these strands are often glycine, and the side chains internal to the beta barrel are often the hydrophobic residues valine, leucine, isoleucine and methionine.Tertiary
SCOP simply classifies the LSm structure as the Sm-like fold, one of 149 different Beta Protein folds, without any intermediate groupings. The LSm beta sheet is sharply bent and described as a Roll architecture in CATH. One of the beta strands crosses the open edge of the roll to form a small SH3 type barrel topology. CATH lists 23 homologous superfamilies with an SH3 type barrel topology, one of which is the LSm structure. SCOP continues its structural classification after Fold to Superfamily, Family and Domain, while CATH continues to Sequence Family, but these divisions are more appropriately described in the "Evolution and phylogeny" section.The SH3-type barrel tertiary structure of the LSm fold is formed by the strongly bent β2, β3 and β4 strands, with the barrel structure closed by the β5 strand. Emphasizing the tertiary structure, each bent beta strand can be described as two shorter beta strands. The LSm fold can be viewed as an eight-strand anti-parallel beta sandwich, with five strands in one plane and three strands in a parallel plane with about a 45° pitch angle between the two halves of the beta sandwich. The short N-terminal alpha helix occurs at one edge of the beta sandwich. This alpha helix and the beta strands can be labeled α, β1, β2a, β2b, β3a, β3b, β4a, β4b, β5 where the a and b refer to either the two halves of a bent strand in the five-strand description, or to the individual strands in the eight-strand description. Each strand is formed from five amino acid residues. Including the bends and loops between the strands, and the alpha helix, about 60 amino acid residues contribute to the LSm fold, but this varies between homologs due to variation in inter-strand loops, the alpha helix, and even the lengths of β3b and β4a strands.