This post is a summary of installation of AlphaFold2 onto a Macintosh with an M1 style (arm64) computer chip (i.e. not an Intel/AMD Chip.) This post started with the blog post Installing Alphafold2 on Apple Silicon. The installation will require about 8Gb of space.
AlphaFold2 is a trained Machine Learning set of code that predicts the 3D structure of a protein from its amino acid sequence, but does not actually predict the folding mechanism which remains a challenge to be solved.
- AlphaFold source code (an open-source license): https://github.com/deepmind/alphafold.
best set-up uses a 2.5 to 3Tb database and is soon to be available at CHTC.
A version with a limited (but still 8-15Gb) database is available via a Google Colab notebook: https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb. However, there is a 2 hour timeout which makes it impossible to compute more complex structures.
The good news is that the
LocalColab version can be installed on a local computer, even a laptop following Yoshitaka Moriwaki’s instructions https://github.com/YoshitakaMo/localcolabfoldthat are offered for multiple plaforms. As in the original post I will describe my installation for an M1 Apple Silicon, with 32 Gb RAM and an Apple M1 Max chip (the post author had a 64Gb version.)
Here we’ll also see how it is possible to install without Admin password.
/Applications/Utilities/Terminal.app or simple
Step 1 install homebrew
This step is rather involved, but can help with many other projects, way beyond AlphaFold!
Homebrew is a method to install a large set of software but the typical installation is for all users and require Admin password. There is a way to install
homebrew in one’s local area on the computer without Admin privileges.
- See Untar anywhere (unsupported) on page official page https://docs.brew.sh/Installation
- Clear demonstration on this 6 ½ min video: https://youtu.be/RT8Rh8yJy-w
- Useful summary page: https://www.scivision.dev/macos-homebrew-non-sudo/
Install packages for AlphaFold
The following packages are necessary to eventually install the LocalColab version of AlphaFold:
brew install wget cmake gnu-sed brew install brewsci/bio/hh-suite
This will install
miniconda which is necessary:
brew install --cask miniforge
Install AlphaFold2 colab
Create a new directory, for example in your default home directory (
$HOME) called e.g.
AlphaFold and change into it:
cd $HOME mkdir AlphaFold cd AlphaFold
Then we download the main installation script from github with
wwget installed earlier.
homebrew is installed
normally i.e. with
sudo (meaning with Admin privileges) the script will work
as is assuming that
homebrew is located within
/opt/homebrew/. The script will run a simple
presence test for homebrew and will not continue if the presence test fails.
I had to edit the
install_colabbatch_M1mac.sh to change the PATH on 2 lines:
- Line 10:
- Line 35: runs
In both case simply changing
/opt/ to the value of
$HOME is enough. In my case I changed
As part of the installation, Python 3.9.x will be installed locally within that directory.
An important step is to add the
PATH to this location, which is nomrally suggested at the end of the installation. For me it was:
This also helps to call the command simply with
colabfold_batch which can be tested with the request for help:
In my case the command to predict structure failed twice, as some additional Python packages had to be installed. If the
PATH was set properly within the Terminal window, the
pipcommand would install the new packages within the local Python 3.9.x version as it should.
% pip install opt-einsum
Correct installation info was found at: https://developer.apple.com/metal/tensorflow-plugin/
% python -m pip install tensorflow-macos
First run done with the same protein as in https://www.macinchem.org/reviews/alphafold/installalphafold2.php
% cat ffa2.fa >sp|O15552|FFAR2_HUMAN Free fatty acid receptor 2 OS=Homo sapiens OX=9606 GN=FFAR2 PE=1 SV=1 MLPDWKSSLILMAYIIIFLTGLPANLLALRAFVGRIRQPQPAPVHILLLSLTLADLLLLL LLPFKIIEAASNFRWYLPKVVCALTSFGFYSSIYCSTWLLAGISIERYLGVAFPVQYKLS RRPLYGVIAALVAWVMSFGHCTIVIIVQYLNTTEQVRSGNEITCYENFTDNQLDVVLPVR LELCLVLFFIPMAVTIFCYWRFVWIMLSQPLVGAQRRRRAVGLAVVTLLNFLVCFGPYNV SHLVGYHQRKSPWWRSIAVVFSSLNASLDPLLFYFSSSVVRRAFGRGLQVLRNQGSSLLG RRGKDTAEGTNEDRGVGQGEGMPSSDFTTE
NOTE: AlphaFold takes the WHOLE first line as a name: in the future make it short!
% colabfold_batch --amber --templates --num-recycle 3 /Users/jsgro/AlphaFoldM1/ffa2.fa FFA2output
The sequence is 330aa long. The run was successful and provided 5 models. Since there are no GPUs the run took almost 5 hours: 8:48PM till 2:37AM (i.e. 5h 49min.)
This sequence has not been solved by X-Ray, NMR, or CryoEM and appears on the Uniprot web site only as a predicted structure, taken from the AlphaFold database. However, BLAST shows that this sequences matches rather well with *Chain R, Probable G-protein coupled receptor 174 [Homo sapiens]* (PDB: 7XV3.) It is also similar to “Free fatty acid receptor 1” that has been crystallized (PDB 4PHU, and others.)
I tested a multimer of alpha and beta (2 each) of human hemoglobin. It has to be noted that the file structure for the
local colab version of AlphaFold is different than that for the
regularAlphafold: There has to be only one
> symbol at the top and the sequences are separated by a colon
:. The sequences below are ordered as: alpha:beta:alpha:beta sequences.
>HBA_HUMAN_2a MALSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLS HGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFK LLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR: MAHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLS TPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVD PENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH: MALSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLS HGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFK LLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR: MAHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLS TPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVD PENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH
The command was:
% colabfold_batch --amber --templates --num-recycle 3 --model-type alphafold2_multimer_v3 /Users/jsgro/AlphaFoldM1/multimer.fa HBAResults
Alpha sequences contain 142aa, beta sequence contain 147aa.
Computations started at 10:14 PM and finshed successfully at 1:31 AM (3h 17 min.)
The shorter size of each sequence allowed for faster computation in spite of the multimeric form.
The resulting quaternaty structure appears very well defined. However, it only contains the protein and is lacking the heme and iron.
It favorably compares to a crystal structure, here
It is remarkable that this works on a laptop. Of course using a NVDIA GPU on an alternate computer or on a Linux cluster would provide faster computation.
Hard Drive Space
The disk space used for the complete installation is:
- Local folder (contains local Python, scripts etc.): 2:13 Gb
- Cached “params” parameters for Alphafold: 5.59 Gb