Coronavirus spike protein


Spike glycoprotein is the largest of the four major structural proteins found in coronaviruses. The spike protein assembles into trimers that form large structures, called spikes or peplomers, that project from the surface of the virion. The distinctive appearance of these spikes when visualized using negative stain transmission electron microscopy, "recalling the solar corona", gives the virus family its main name.
The function of the spike glycoprotein is to mediate viral entry into the host cell by first interacting with molecules on the exterior cell surface and then fusing the viral and cellular membranes. Spike glycoprotein is a class I fusion protein that contains two regions, known as S1 and S2, responsible for these two functions. The S1 region contains the receptor-binding domain that binds to receptors on the cell surface. Coronaviruses use a very diverse range of receptors; HCoV-NL63, SARS-CoV and SARS-CoV-2 all interact with angiotensin-converting enzyme 2. The S2 region contains the fusion peptide and other fusion infrastructure necessary for membrane fusion with the host cell, a required step for infection and viral replication. Spike glycoprotein determines the virus' host range and cell tropism.
Spike glycoprotein is highly immunogenic. Antibodies against spike glycoprotein are found in patients recovered from SARS and COVID-19. Neutralizing antibodies target epitopes on the receptor-binding domain. Most COVID-19 vaccine development efforts in response to the COVID-19 pandemic aim to activate the immune system against the spike protein.

Structure

The spike protein is very large, often 1200 to 1400 amino acid residues long; it is 1273 residues in SARS-CoV-2. It is a single-pass transmembrane protein with a short C-terminal tail on the interior of the virus, a transmembrane helix, and a large N-terminal ectodomain exposed on the virus exterior.
Spike glycoprotein forms homotrimers in which three copies of the protein interact through their ectodomains. The trimer structures have been described as club- pear-, or petal-shaped. Each spike protein contains two regions known as S1 and S2, and in the assembled trimer the S1 regions at the N-terminal end form the portion of the protein furthest from the viral surface while the S2 regions form a flexible "stalk" containing most of the protein-protein interactions that hold the trimer in place. Both the S1 and S2 regions are involved in ACE2 receptor binding and fusion between the viral envelope and the host cell membrane.

S1

The S1 region of the spike glycoprotein is responsible for interacting with receptor molecules on the surface of the host cell in the first step of viral entry. S1 contains two domains, called the N-terminal domain and C-terminal domain, sometimes also known as the A and B domains. Depending on the coronavirus, either or both domains may be used as receptor-binding domains. Target receptors can be very diverse, including cell surface receptor proteins and sugars such as sialic acids as receptors or coreceptors. In general, the NTD binds sugar molecules while the CTD binds proteins, with the exception of mouse hepatitis virus which uses its NTD to interact with a protein receptor called CEACAM1. The NTD has a galectin-like protein fold, but binds sugar molecules somewhat differently than galectins. The observed binding of N-acetylneuraminic acid by the NTD and loss of that binding through mutation of the corresponding sugar binding pocket in emergent variants of concern has suggested a potential role for tranisent sugar-binding in the zoonosis of SARS-CoV-2, consistent with prior evolutionary proposals.
The CTD is responsible for the interactions of MERS-CoV with its receptor dipeptidyl peptidase-4, and those of SARS-CoV and SARS-CoV-2 with their receptor angiotensin-converting enzyme 2. The CTD of these viruses can be further divided into two subdomains, known as the core and the extended loop or receptor-binding motif, where most of the residues that directly contact the target receptor are located. There are subtle differences, mainly in the RBM, between the SARS-CoV and SARS-CoV-2 spike proteins' interactions with ACE2. Comparisons of spike proteins from multiple coronaviruses suggest that divergence in the RBM region can account for differences in target receptors, even when the core of the S1 CTD is structurally very similar.
Within coronavirus lineages, as well as across the four major coronavirus subgroups, the S1 region is less well conserved than S2, as befits its role in interacting with virus-specific host cell receptors. Within the S1 region, the NTD is more highly conserved than the CTD.

S2

The S2 region of spike glycoprotein is responsible for membrane fusion between the viral envelope and the host cell, enabling entry of the virus' genome into the cell. The S2 region contains the fusion peptide, a stretch of mostly hydrophobic amino acids whose function is to enter and destabilize the host cell membrane. S2 also contains two heptad repeat subdomains known as HR1 and HR2, sometimes called the "fusion core" region. These subdomains undergo dramatic conformational changes during the fusion process to form a six-helix bundle, a characteristic feature of the class I fusion proteins. The S2 region is also considered to include the transmembrane helix and C-terminal tail located in the interior of the virion.
Relative to S1, the S2 region is very well conserved among coronaviruses.

Post-translational modifications

Spike glycoprotein is heavily glycosylated through N-linked glycosylation. Studies of the SARS-CoV-2 spike protein have also reported O-linked glycosylation in the S1 region. The C-terminal tail, located in the interior of the virion, is enriched in cysteine residues and is palmitoylated.
Spike proteins are activated through proteolytic cleavage. They are cleaved by host cell proteases at the S1-S2 boundary and later at what is known as the S2' site at the N-terminus of the fusion peptide. This cleavage may occur upon receptor binding, or the spike protein may be pre-cleaved such as by Furin at a furin cleavage site if one is present.

Conformational change

Like other class I fusion proteins, the spike protein undergoes a very large conformational change during the fusion process. Both the pre-fusion and post-fusion states of several coronaviruses, especially SARS-CoV-2, have been studied by cryo-electron microscopy. Functionally important protein dynamics have also been observed within the pre-fusion state, in which the relative orientations of some of the S1 regions relative to S2 in a trimer can vary. In the closed state, all three S1 regions are packed closely and the region that makes contact with host cell receptors is sterically inaccessible, while the open states have one or two S1 RBDs more accessible for receptor binding, in an open or "up" conformation.
File:Novel Coronavirus SARS-CoV-2 .jpg|thumb|Transmission electron micrograph of a SARS-CoV-2 virion, showing the characteristic "corona" appearance with the spike proteins forming prominent projections from the surface of the virion.

Expression and localization

The gene encoding the spike protein is located toward the 3' end of the virus's positive-sense RNA genome, along with the genes for the other three structural proteins and various virus-specific accessory proteins. Protein trafficking of spike proteins appears to depend on the coronavirus subgroup: when expressed in isolation without other viral proteins, spike proteins from betacoronaviruses are able to reach the cell surface, while those from alphacoronaviruses and gammacoronaviruses are retained intracellularly. In the presence of the M protein, spike protein trafficking is altered and instead is retained at the ERGIC, the site at which viral assembly occurs. In SARS-CoV-2, both the M and the E protein modulate spike protein trafficking through different mechanisms.
The spike protein is not required for viral assembly or the formation of virus-like particles; however, presence of spike may influence the size of the envelope. Incorporation of the spike protein into virions during assembly and budding is dependent on protein-protein interactions with the M protein through the C-terminal tail. Examination of virions using cryo-electron microscopy suggests that there are approximately 25 to 100 spike trimers per virion.

Function

The spike protein is responsible for viral entry into the host cell, a required early step in viral replication. It is essential for replication. It performs this function in two steps, first binding to a receptor on the surface of the host cell through interactions with the S1 region, and then fusing the viral and cellular membranes through the action of the S2 region. The location of fusion varies depending on the specific coronavirus, with some able to enter at the plasma membrane and others entering from endosomes after endocytosis.

Attachment

The interaction of the receptor-binding domain in the S1 region with its target receptor on the cell surface initiates the process of viral entry. Different coronaviruses target different cell-surface receptors, sometimes using sugar molecules such as sialic acids, or forming protein-protein interactions with proteins exposed on the cell surface. Different coronaviruses vary widely in their target receptor, although some such as SARS-CoV-1 and HCoV-NL63 use the same receptor despite having widely divergent spike proteins. The presence of a target receptor that S1 can bind is a determinant of host range and cell tropism. Human serum albumin binds to the S1 region, competing with ACE2 and therefore restricting viral entry into cells.
SpeciesGenusReceptorReference
Human coronavirus 229EAlphacoronavirusAminopeptidase N
Human coronavirus NL63AlphacoronavirusAngiotensin-converting enzyme 2
Human coronavirus HKU1BetacoronavirusN-acetyl-9-O-acetylneuraminic acid
Human coronavirus OC43BetacoronavirusN-acetyl-9-O-acetylneuraminic acid
Middle East respiratory syndrome–related coronavirusBetacoronavirusDipeptidyl peptidase-4
Severe acute respiratory syndrome coronavirusBetacoronavirusAngiotensin-converting enzyme 2
Severe acute respiratory syndrome coronavirus 2BetacoronavirusAngiotensin-converting enzyme 2 and N-acetylneuraminic acid