AlphaFold


AlphaFold is an artificial intelligence program developed by DeepMind, a subsidiary of Alphabet, which performs predictions of protein structure. It is designed using deep learning techniques.
AlphaFold 1 placed first in the overall rankings of the 13th Critical Assessment of Structure Prediction in December 2018. It was particularly successful at predicting the most accurate structures for targets rated as most difficult by the competition organizers, where no existing template structures were available from proteins with partially similar sequences.
AlphaFold 2 repeated this placement in the CASP14 competition in November 2020. It achieved a level of accuracy much higher than any other entry. It scored above 90 on CASP's global distance test for approximately two-thirds of the proteins, a test measuring the similarity between a computationally predicted structure and the experimentally determined structure, where 100 represents a complete match. The inclusion of metagenomic data has improved the quality of the prediction of multiple sequence alignments. One of the biggest sources of the training data was the custom-built Big Fantastic Database of 65,983,866 protein families, represented as multiple sequence alignments and Hidden Markov models, covering 2,204,359,010 protein sequences from reference databases, metagenomes, and metatranscriptomes.
AlphaFold 2's results at CASP14 were described as "astounding" and "transformational". However, some researchers noted that the accuracy was insufficient for a third of its predictions, and that it did not reveal the underlying mechanism or rules of protein folding for the protein folding problem, which remains unsolved.
Despite this, the technical achievement was widely recognized. On 15 July 2021, the AlphaFold 2 paper was published in Nature as an advance access publication alongside open source software and a searchable database of species proteomes. As of November 2025, the paper had been cited nearly 43,000 times.
AlphaFold 3 was announced on 8 May 2024. It can predict the structure of complexes created by proteins with DNA, RNA, various ligands, and ions. The new prediction method shows a minimum 50% improvement in accuracy for protein interactions with other molecules compared to existing methods.
Demis Hassabis and John Jumper of Google DeepMind shared one half of the 2024 Nobel Prize in Chemistry, awarded "for protein structure prediction," while the other half went to David Baker "for computational protein design." Hassabis and Jumper had previously won the Breakthrough Prize in Life Sciences and the Albert Lasker Award for Basic Medical Research in 2023 for their leadership of the AlphaFold project.

Background

s consist of chains of amino acids which spontaneously fold to form the three dimensional structures of the proteins. The 3-D structure is crucial to understanding the biological function of the protein.
Protein structures can be determined experimentally through techniques such as X-ray crystallography, cryo-electron microscopy and nuclear magnetic resonance, which are all expensive and time-consuming. Such efforts, using the experimental methods, have identified the structures of about 170,000 proteins over the last 60 years, while there are over 200 million known proteins across all life forms.
Over the years, researchers have applied numerous computational methods to predict the 3D structures of proteins from their amino acid sequences, accuracy of such methods in best possible scenario is close to experimental techniques by the use of homology modeling based on molecular evolution. CASP, which was launched in 1994 to challenge the scientific community to produce their best protein structure predictions, found that GDT scores of only about 40 out of 100 can be achieved for the most difficult proteins by 2016. AlphaFold started competing in the 2018 CASP using an artificial intelligence deep learning technique.

Algorithm

DeepMind is known to have trained the program on over 170,000 proteins from the Protein Data Bank, a public repository of protein sequences and structures. The program uses a form of attention network, a deep learning technique that focuses on having the AI identify parts of a larger problem, then piece it together to obtain the overall solution. The overall training was conducted on processing power between 100 and 200 GPUs.

AlphaFold 1 (2018)

AlphaFold 1 was built on work developed by various teams in the 2010s, work that looked at the large databanks of related DNA sequences now available from many different organisms, to try to find changes at different residues that appeared to be correlated, even though the residues were not consecutive in the main chain. Such correlations suggest that the residues may be close to each other physically, even though not close in the sequence, allowing a contact map to be estimated. Building on recent work prior to 2018, AlphaFold 1 extended this by estimating a probability distribution for the distances between residues, effectively transforming the contact map into a distance map. It also used more advanced learning methods than previously to develop the inference.

AlphaFold 2 (2020)

The 2020 version of the program is significantly different from the original version that won CASP 13 in 2018, according to the team at DeepMind.
AlphaFold 1 used a number of separately trained modules to produce a guide potential, which was then combined with a physics-based energy potential. AlphaFold 2 replaced this with a system of interconnected sub-networks, forming a single, differentiable, end-to-end model based on pattern recognition. This model was trained in an integrated manner. After the neural network's prediction converges, a final refinement step applies local physical constraints using energy minimization based on the AMBER force field. This step only slightly adjusts the predicted structure.
A key part of the 2020 system are two modules, believed to be based on a transformer design, which are used to progressively refine a vector of information for each relationship between an amino acid residue of the protein and another amino acid residue ; and between each amino acid position and each different sequences in the input sequence alignment. Internally these refinement transformations contain layers that have the effect of bringing relevant data together and filtering out irrelevant data for these relationships, in a context-dependent way, learned from training data. These transformations are iterated, the updated information output by one step becoming the input of the next, with the sharpened residue/residue information feeding into the update of the residue/sequence information, and then the improved residue/sequence information feeding into the update of the residue/residue information. As the iteration progresses, according to one report, the "attention algorithm... mimics the way a person might assemble a jigsaw puzzle: first connecting pieces in small clumps—in this case clusters of amino acids—and then searching for ways to join the clumps in a larger whole."
The output of these iterations then informs the final structure prediction module, which also uses transformers, and is itself then iterated. In an example presented by DeepMind, the structure prediction module achieved a correct topology for the target protein on its first iteration, scored as having a GDT_TS of 78, but with a large number of stereochemical violations – i.e. unphysical bond angles or lengths. With subsequent iterations the number of stereochemical violations fell. By the third iteration the GDT_TS of the prediction was approaching 90, and by the eighth iteration the number of stereochemical violations was approaching zero.
The training data was originally restricted to single peptide chains. However, the October 2021 update, named AlphaFold-Multimer, included protein complexes in its training data. DeepMind stated this update succeeded about 70% of the time at accurately predicting protein-protein interactions.

AlphaFold 3 (2024)

Announced on 8 May 2024, AlphaFold 3 was co-developed by Google DeepMind and Isomorphic Labs, both subsidiaries of Alphabet. AlphaFold 3 is not limited to single-chain proteins, as it can also predict the structures of protein complexes with DNA, RNA, post-translational modifications and selected ligands and ions.
AlphaFold 3 introduces the "Pairformer," a deep learning architecture inspired by the transformer, which is considered similar to, but simpler than, the Evoformer used in AlphaFold 2. The Pairformer module's initial predictions are refined by a diffusion model. This model begins with a cloud of atoms and iteratively refines their positions, guided by the Pairformer's output, to generate a 3D representation of the molecular structure.
The AlphaFold server was created to provide free access to AlphaFold 3 for non-commercial research. As of November 2025, the AlphaFold 3 research paper has been directly cited more than 9,000 times.

Competitions

CASP13

In December 2018, DeepMind's AlphaFold placed first in the overall rankings of the 13th Critical Assessment of Techniques for Protein Structure Prediction.
The program was particularly successfully predicting the most accurate structure for targets rated as the most difficult by the competition organisers, where no existing template structures were available from proteins with a partially similar sequence. AlphaFold gave the best prediction for 25 out of 43 protein targets in this class, achieving a median score of 58.9 on the CASP's global distance test score, ahead of 52.5 and 52.4 by the two next best-placed teams, who were also using deep learning to estimate contact distances. Overall, across all targets, AlphaFold 1 achieved a GDT score of 68.5.
In January 2020, implementations and illustrative code of AlphaFold 1 was released open-source on GitHub. but, as stated in the "Read Me" file on that website: "This code can't be used to predict structure of an arbitrary protein sequence. It can be used to predict structure only on the CASP13 dataset. The feature generation code is tightly coupled to our internal infrastructure as well as external tools, hence we are unable to open-source it." Therefore, in essence, the code deposited is not suitable for general use but only for the CASP13 proteins. The company has not announced plans to make their code publicly available as of 5 March 2021.