AlphaFold background

3D Protein structure prediction

The problem

Predicting protein three-dimensional (3D) structures given a linear sequence of amino acids.

The landscape of 3D protein structure prediction has changed dramatically since the publication  (see references below) and subsequent release of the AlphaFold2 prediction method. There are ample articles, blogs and videos “out there” easily found with a simple web search.

This post is to assemble a few interesting resources that may help clarify what is AlphaFold before one or more new blog entries that will provide more practical information.

protein structures
Prediction by AlphaFold of two protein structures (cyan) matches the experimental structure (yellow) almost perfectly.( Modified from animated version on Deepmind Blog)

Method of the year

In the past year, the deep-learning-based methods AlphaFold2 and RoseTTAfold have managed to achieve this feat over a range of targets, forever altering the course of the structural biology field. More impressively, a collaboration between the European Molecular Biology Laboratory and DeepMind has predicted structures for over 350,000 proteins for 21 model organisms and made them freely available at the AlphaFold Protein Structure Database — with plans for expanding predictions to millions of structures in 2022. For these remarkable achievements, we have chosen protein structure prediction as the Method of the Year 2021. (Excerpt from Nat Methods 19, 1 (2022).)

4 Videos

The inside story

The Human adventure. 8min videoAlphaFold: The making of a scientific breakthrough” – The inside story of the DeepMind team of scientists and engineers who created AlphaFold.

See also DeepMind blog: AlphaFold: a solution to a 50-year-old grand challenge in biology


Nature paper explained

December 1, 2020: Best video explaining the Nature paper (Senior et al., 2020) by AI researcher Yannic Kilcher, Ph.D. (ETH Zurich’s data analytics lab.) (54min)

0:00 – Intro & Overview
3:10 – Proteins & Protein Folding
14:20 – AlphaFold 1 Overview
18:20 – Optimizing a differentiable geometric model at inference
25:40 – Learning the Spatial Graph Distance Matrix
31:20 – Multiple Sequence Alignment of Evolutionarily Similar Sequences
39:40 – Distance Matrix Output Results
43:45 – Guessing AlphaFold 2 (it’s Transformers)
53:30 – Conclusion & Comments

2019 lecture

Nature paper author Andrew Senior (1h02). Andrew Senior is a research scientist at Google DeepMind and team lead on the AlphaFold project. This talk was recorded at the University of Washington on August 19, 2019.

00:01:25 — Protein structure prediction at DeepMind
00:05:05 — Protein folding problem (overview)
00:07:45 — CASP13 (overview)
00:12:28 — CASP13 results
00:14:55 — AlphaFold system (overview)
00:18:01 — Key aspects of AlphaFold
00:21:00 — Deep learning (overview)
00:25:35 — Why machine learning for protein structure modelling?
00:26:29 — Predicting inter-residue distances
00:31:20 — Data used by AlphaFold
00:33:06 — Deep Dilated Convolutional Residual network
00:34:56 — Data cropping
00:37:43 — Example of an AlphaFold prediction
00:39:50 — Distogram performance on contact metrics
00:41:55 — Secondary structure and Torsion angle prediction
00:43:31 — Using deep learning to construct a reference state
00:49:23 — Accuracy vs computational cost
00:50:00 — Conclusions
00:52:10 — What’s next
00:54:50 — Q&A

Kendrew lecture 2021

John Jumper, AlphaFold lead, DeepMind. (58min).
Highly accurate protein structure prediction with AlphaFold (paper: Jumper et al. 2021)

The lecture details the machine learning, neural network details; distance matrix.


Acronyms

CASP: Critical Assessment of Protein Structure Prediction

References

Senior, A.W., Evans, R., Jumper, J. et al. (2020) Improved protein structure prediction using potentials from deep learning. Nature 577, 706–710.  https://doi.org/10.1038/s41586-019-1923-7

Jumper, J., Evans, R., Pritzel, A. et al.  (2021) Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589. https://doi.org/10.1038/s41586-021-03819-2

Method of the Year 2021: Protein structure prediction. Nat Methods 19, 1 (2022). https://doi.org/10.1038/s41592-021-01380-4

Marx, V. Method of the Year: protein structure prediction. Nat Methods 19, 5–10 (2022). https://doi.org/10.1038/s41592-021-01359-1

Baek, M, DiMaio, F,  et al. (2021). Accurate prediction of protein structures and interactions using a three-track neural network. Science, 373, 6557:871-876. https://dx.doi.org/10.1126/science.abj8754

Database References

EMBL-EBI is part of the European Molecular Biology Laboratory (EMBL)

Varadi M, Anyango S, Deshpande M, et al.  (2022) AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 50(D1):D439-D444.  doi:10.1093/nar/gkab1061

Tunyasuvunakool, K., Adler, J., Wu, Z. et al.  (2021) Highly accurate protein structure prediction for the human proteome. Nature 596, 590–596. https://doi.org/10.1038/s41586-021-03828-1