Structure and genome of HIV
The genome and proteins of HIV have been the subject of extensive research since the discovery of the virus in 1983. "In the search for the causative agent, it was initially believed that the virus was a form of the Human T-cell leukemia virus, which was known at the time to affect the human immune system and cause certain leukemias. However, researchers at the Pasteur Institute in Paris isolated a previously unknown and genetically distinct retrovirus in patients with AIDS which was later named HIV." Each virion comprises a viral envelope and associated matrix enclosing a capsid, which itself encloses two copies of the single-stranded RNA genome and several enzymes. The discovery of the virus itself occurred two years following the report of the first major cases of AIDS-associated illnesses.
Structure
The complete sequence of the HIV-1 genome, extracted from infectious virions, has been solved to single-nucleotide resolution.The HIV genome encodes a small number of viral proteins, invariably establishing cooperative associations among HIV proteins and between HIV and host proteins, to invade host cells and hijack their internal machineries. HIV is different in structure from other retroviruses. The HIV virion is ~100 nm in diameter. Its innermost region consists of a cone-shaped core that includes two copies of the ssRNA genome, the enzymes reverse transcriptase, integrase and protease, some minor proteins, and the major core protein. The genome of human immunodeficiency virus encodes 8 viral proteins playing essential roles during the HIV life cycle.
HIV-1 is composed of two copies of noncovalently linked, unspliced, positive-sense single-stranded RNA enclosed by a conical capsid composed of the viral protein p24, typical of lentiviruses. The two RNAs are often identical, yet they are not independent, but form a compact dimer within the virion. Several reasons as for why two copies of RNA are packaged rather than just one have been proposed, including probably a combination of these advantages: One advantage is that the two copies of RNA strands are vital in contributing to HIV-1 recombination, which occurs during reverse transcription of viral replication, thus increasing genetic diversity. Another advantage is that having two copies of RNA would allow the reverse transcriptase to switch templates when encountering a break in the viral RNA, thus completing the reverse transcription without loss of genetic information. Yet another reason is that the dimeric nature of the RNA genome of the virus may play a structural role in viral replication. The containment of two copies of single-stranded RNA within a virion but the production of only a single DNA provirus is called pseudodiploidy. The RNA component is 9749 nucleotides long and bears a 5’ cap, a 3’ poly tail, and many open reading frames. Viral structural proteins are encoded by long ORFs, whereas smaller ORFs encode regulators of the viral life cycle: attachment, membrane fusion, replication, and assembly.
The single-strand RNA is tightly bound to p7 nucleocapsid proteins, late assembly protein p6, and enzymes essential to the development of the virion, such as reverse transcriptase and integrase. Lysine tRNA is the primer of the magnesium-dependent reverse transcriptase. The nucleocapsid associates with the genomic RNA and protects the RNA from digestion by nucleases. Also enclosed within the virion particle are Vif, Vpr, Nef, and viral protease. The envelope of the virion is formed by a plasma membrane of host cell origin, which is supported by a matrix composed of the viral p17 protein, ensuring the integrity of the virion particle. At the surface of the virion can be found a limited number of the envelope glycoprotein of HIV, a trimer formed by heterodimers of gp120 and gp41. Env is responsible for binding to its primary host receptor, CD4, and its co-receptor, leading to viral entry into its target cell.
As the only proteins on the surface of the virus, the envelope glycoproteins are the major targets for HIV vaccine efforts. Over half of the mass of the trimeric envelope spike is N-linked glycans. The density is high as the glycans shield underlying viral protein from neutralisation by antibodies. This is one of the most densely glycosylated molecules known and the density is sufficiently high to prevent the normal maturation process of glycans during biogenesis in the endoplasmic reticulum and Golgi apparatus. The majority of the glycans are therefore stalled as immature 'high-mannose' glycans not normally present on secreted or cell surface human glycoproteins. The unusual processing and high density means that almost all broadly neutralising antibodies that have so far been identified bind to or, are adapted to cope with, these envelope glycans.
The molecular structure of the viral spike has now been determined by X-ray crystallography and cryo-electron microscopy. These advances in structural biology were made possible due to the development of stable recombinant forms of the viral spike by the introduction of an intersubunit disulphide bond and an isoleucine to proline mutation in gp41. The so-called SOSIP trimers not only reproduce the antigenic properties of the native viral spike but also display the same degree of immature glycans as presented on the native virus. Recombinant trimeric viral spikes are promising vaccine candidates as they display less non-neutralising epitopes than recombinant monomeric gp120 which act to suppress the immune response to target epitopes.
Genome organization
HIV has several major genes coding for structural proteins that are found in all retroviruses as well as several nonstructural genes unique to HIV. The HIV genome contains nine genes that encode fifteen viral proteins. These are synthesized as polyproteins which produce proteins for virion interior, called Gag, group specific antigen; the viral enzymes or the glycoproteins of the virion env. In addition to these, HIV encodes for proteins which have certain regulatory and auxiliary functions as well. HIV-1 has two important regulatory elements: Tat and Rev and few important accessory proteins such as Nef, Vpr, Vif and Vpu which are not essential for replication in certain tissues. The gag gene provides the basic physical infrastructure of the virus, and pol provides the basic mechanism by which retroviruses reproduce, while the others help HIV to enter the host cell and enhance its reproduction. Though they may be altered by mutation, all of these genes except tev exist in all known variants of HIV; see Genetic variability of HIV.HIV employs a sophisticated system of differential RNA splicing to obtain nine different gene products from a less than 10kb genome. HIV has a 9.2kb unspliced genomic transcript which encodes for gag and pol precursors; a singly spliced, 4.5 kb encoding for env, Vif, Vpr and Vpu and a multiply spliced, 2 kb mRNA encoding for Tat, Rev and Nef.
| Class | Gene name | Primary protein products | Processed protein products | UniProt HIV-1 | UniProt HIV-2 |
| Viral structural proteins | gag | Gag polyprotein | MA, CA, SP1, NC, SP2, P6 | ||
| Viral structural proteins | pol | Pol polyprotein | PR, RT, RNase H, IN | ||
| Viral structural proteins | env | gp160 | gp120, gp41 | ||
| Essential regulatory elements | tat | Tat | |||
| Essential regulatory elements | rev | Rev | |||
| Accessory regulatory proteins | nef | Nef | |||
| Accessory regulatory proteins | vpr | Vpr | |||
| Accessory regulatory proteins | vif | Vif | |||
| Accessory regulatory proteins | vpu | Vpu | N/A | ||
| Accessory regulatory proteins | vpx | Vpx | N/A |
Viral structural proteins
- gag codes for the precursor gag polyprotein which is processed by viral protease during maturation to MA ; CA ; SP1 ; NC ; SP2 and P6 protein.
- pol codes for viral enzymes HIV protease, reverse transcriptase and RNase H, and integrase. HIV protease is required to cleave the precursor Gag polyprotein to produce structural proteins, RT is required to transcribe DNA from RNA template, and IN is necessary to integrate the double-stranded viral DNA into the host genome. The RT is produced both in a form connected with RNase H and in a form separate from RNase H. p66 and p51 form a heterodimer.
- env codes for gp160, which is cleaved by a host protease, furin, within the endoplasmic reticulum of the host cell. The post-translational processing produces a surface glycoprotein, gp120 or SU, which attaches to the CD4 receptors present on lymphocytes, and gp41 or TM, which embeds in the viral envelope to enable the virus to attach to and fuse with target cells.
Essential regulatory elements
- tat plays an important role in regulating the reverse transcription of viral genome RNA, ensuring efficient synthesis of viral mRNAs and regulating the release of virions from infected cells. Tat is expressed as 72-amino acid one-exon Tat as well as the 86–101-amino-acid two-exon Tat, and plays an important role early in HIV infection. Tat binds to the bulged genomic RNA stem-loop secondary structure near the 5' LTR region forming the trans-activation response element.
- rev : The Rev protein binds to the viral genome via an arginine-rich RNA-binding motif that also acts as a NLS, required for the transport of Rev to the nucleus from cytosol during viral replication. Rev recognizes a complex stem-loop structure of the mRNA env located in the intron separating coding exon of Tat and Rev, known as the HIV Rev response element. Rev is important for the synthesis of major viral proteins and is hence essential for viral replication.