Predicting protein three-dimensional (3D) structures given a linear sequence of amino acids.
The landscape of 3D protein structure prediction has changed dramatically since the publication (see references below) and subsequent release of the AlphaFold2 prediction method. There are ample articles, blogs and videos “out there” easily found with a simple web search.
This post is to assemble a few interesting resources that may help clarify what is AlphaFold before one or more new blog entries that will provide more practical information.
Method of the year
In the past year, the deep-learning-based methods AlphaFold2 and RoseTTAfold have managed to achieve this feat over a range of targets, forever altering the course of the structural biology field. More impressively, a collaboration between the European Molecular Biology Laboratory and DeepMind has predicted structures for over 350,000 proteins for 21 model organisms and made them freely available at the AlphaFold Protein Structure Database — with plans for expanding predictions to millions of structures in 2022. For these remarkable achievements, we have chosen protein structure prediction as the Method of the Year 2021. (Excerpt from Nat Methods19, 1 (2022).)
4 Videos
The inside story
The Human adventure. 8min video “AlphaFold: The making of a scientific breakthrough” – The inside story of the DeepMind team of scientists and engineers who created AlphaFold.
December 1, 2020: Best video explaining the Nature paper (Senior et al., 2020) by AI researcher Yannic Kilcher, Ph.D. (ETH Zurich’s data analytics lab.) (54min)
0:00 – Intro & Overview 3:10 – Proteins & Protein Folding 14:20 – AlphaFold 1 Overview 18:20 – Optimizing a differentiable geometric model at inference 25:40 – Learning the Spatial Graph Distance Matrix 31:20 – Multiple Sequence Alignment of Evolutionarily Similar Sequences 39:40 – Distance Matrix Output Results 43:45 – Guessing AlphaFold 2 (it’s Transformers) 53:30 – Conclusion & Comments
2019 lecture
Nature paper author Andrew Senior (1h02). Andrew Senior is a research scientist at Google DeepMind and team lead on the AlphaFold project. This talk was recorded at the University of Washington on August 19, 2019.
00:01:25 — Protein structure prediction at DeepMind 00:05:05 — Protein folding problem (overview) 00:07:45 — CASP13 (overview) 00:12:28 — CASP13 results 00:14:55 — AlphaFold system (overview) 00:18:01 — Key aspects of AlphaFold 00:21:00 — Deep learning (overview) 00:25:35 — Why machine learning for protein structure modelling? 00:26:29 — Predicting inter-residue distances 00:31:20 — Data used by AlphaFold 00:33:06 — Deep Dilated Convolutional Residual network 00:34:56 — Data cropping 00:37:43 — Example of an AlphaFold prediction 00:39:50 — Distogram performance on contact metrics 00:41:55 — Secondary structure and Torsion angle prediction 00:43:31 — Using deep learning to construct a reference state 00:49:23 — Accuracy vs computational cost 00:50:00 — Conclusions 00:52:10 — What’s next 00:54:50 — Q&A
Kendrew lecture 2021
John Jumper, AlphaFold lead, DeepMind. (58min).
Highly accurate protein structure prediction with AlphaFold (paper: Jumper et al. 2021)
The lecture details the machine learning, neural network details; distance matrix.
Acronyms
CASP: Critical Assessment of Protein Structure Prediction
CASP is a community-wide, worldwide experiment for protein structure prediction taking place every two years since 1994.
Senior, A.W., Evans, R., Jumper, J. et al. (2020) Improved protein structure prediction using potentials from deep learning. Nature577, 706–710. https://doi.org/10.1038/s41586-019-1923-7
Jumper, J., Evans, R., Pritzel, A. et al. (2021) Highly accurate protein structure prediction with AlphaFold. Nature596, 583–589. https://doi.org/10.1038/s41586-021-03819-2
Baek, M, DiMaio, F, et al.(2021). Accurate prediction of protein structures and interactions using a three-track neural network. Science, 373, 6557:871-876. https://dx.doi.org/10.1126/science.abj8754
Varadi M, Anyango S, Deshpande M, et al. (2022) AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 50(D1):D439-D444. doi:10.1093/nar/gkab1061
Tunyasuvunakool, K., Adler, J., Wu, Z. et al. (2021) Highly accurate protein structure prediction for the human proteome. Nature596, 590–596. https://doi.org/10.1038/s41586-021-03828-1