Summary
This post is a summary of installation of AlphaFold2 onto a Macintosh with an M1 style (arm64) computer chip (i.e. not an Intel/AMD Chip.) This post started with the blog post Installing Alphafold2 on Apple Silicon. The installation will require about 8Gb of space.
Installing AlphaFold2
AlphaFold2 is a trained Machine Learning set of code that predicts the 3D structure of a protein from its amino acid sequence, but does not actually predict the folding mechanism which remains a challenge to be solved.
- AlphaFold source code (an open-source license): https://github.com/deepmind/alphafold.
The very best
set-up uses a 2.5 to 3Tb database and is soon to be available at CHTC.
A version with a limited (but still 8-15Gb) database is available via a Google Colab notebook: https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb. However, there is a 2 hour timeout which makes it impossible to compute more complex structures.
The good news is that the LocalColab
version can be installed on a local computer, even a laptop following Yoshitaka Moriwaki’s instructions https://github.com/YoshitakaMo/localcolabfoldthat are offered for multiple plaforms. As in the original post I will describe my installation for an M1 Apple Silicon, with 32 Gb RAM and an Apple M1 Max chip (the post author had a 64Gb version.)
Here we’ll also see how it is possible to install without Admin password.
Requirement: using /Applications/Utilities/Terminal.app
or simple Terminal
to install.
Step 1 install homebrew
This step is rather involved, but can help with many other projects, way beyond AlphaFold!
Homebrew is a method to install a large set of software but the typical installation is for all users and require Admin password. There is a way to install homebrew
in one’s local area on the computer without Admin privileges.
- See Untar anywhere (unsupported) on page official page https://docs.brew.sh/Installation
- Clear demonstration on this 6 ½ min video: https://youtu.be/RT8Rh8yJy-w
- Useful summary page: https://www.scivision.dev/macos-homebrew-non-sudo/
Install packages for AlphaFold
The following packages are necessary to eventually install the LocalColab version of AlphaFold:
brew install wget cmake gnu-sed
brew install brewsci/bio/hh-suite
This will install miniconda
which is necessary:
brew install --cask miniforge
Install AlphaFold2 colab
Create a new directory, for example in your default home directory ($HOME
) called e.g.AlphaFold
and change into it:
cd $HOME
mkdir AlphaFold
cd AlphaFold
Then we download the main installation script from github withwwget
installed earlier.
wget https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/install_colabbatch_M1mac.sh
If homebrew
is installed normally
i.e. with sudo
(meaning with Admin privileges) the script will work as is
assuming that homebrew
is located within /opt/homebrew/
. The script will run a simple presence
test for homebrew and will not continue if the presence test fails.
I had to edit the install_colabbatch_M1mac.sh
to change the PATH on 2 lines:
- Line 10:
presence
test line - Line 35: runs
conda.sh
In both case simply changing /opt/
to the value of $HOME
is enough. In my case I changed /opt/
to /Users/jsgro/
As part of the installation, Python 3.9.x will be installed locally within that directory.
An important step is to add the PATH
to this location, which is nomrally suggested at the end of the installation. For me it was:
export PATH="/Users/jsgro/AlphaFoldM1/localcolabfold/colabfold-conda/bin:$PATH"
This also helps to call the command simply with colabfold_batch
which can be tested with the request for help:
colabfold_batch -h
Finalizing installation
In my case the command to predict structure failed twice, as some additional Python packages had to be installed. If the PATH
was set properly within the Terminal window, the pip
command would install the new packages within the local Python 3.9.x version as it should.
% pip install opt-einsum
Correct installation info was found at: https://developer.apple.com/metal/tensorflow-plugin/
% python -m pip install tensorflow-macos
Monomer run
First run done with the same protein as in https://www.macinchem.org/reviews/alphafold/installalphafold2.php
% cat ffa2.fa
>sp|O15552|FFAR2_HUMAN Free fatty acid receptor 2 OS=Homo sapiens OX=9606 GN=FFAR2 PE=1 SV=1
MLPDWKSSLILMAYIIIFLTGLPANLLALRAFVGRIRQPQPAPVHILLLSLTLADLLLLL
LLPFKIIEAASNFRWYLPKVVCALTSFGFYSSIYCSTWLLAGISIERYLGVAFPVQYKLS
RRPLYGVIAALVAWVMSFGHCTIVIIVQYLNTTEQVRSGNEITCYENFTDNQLDVVLPVR
LELCLVLFFIPMAVTIFCYWRFVWIMLSQPLVGAQRRRRAVGLAVVTLLNFLVCFGPYNV
SHLVGYHQRKSPWWRSIAVVFSSLNASLDPLLFYFSSSVVRRAFGRGLQVLRNQGSSLLG
RRGKDTAEGTNEDRGVGQGEGMPSSDFTTE
NOTE: AlphaFold takes the WHOLE first line as a name: in the future make it short!
Fold command:
% colabfold_batch --amber --templates --num-recycle 3 /Users/jsgro/AlphaFoldM1/ffa2.fa FFA2output
The sequence is 330aa long. The run was successful and provided 5 models. Since there are no GPUs the run took almost 5 hours: 8:48PM till 2:37AM (i.e. 5h 49min.)
This sequence has not been solved by X-Ray, NMR, or CryoEM and appears on the Uniprot web site only as a predicted structure, taken from the AlphaFold database. However, BLAST shows that this sequences matches rather well with *Chain R, Probable G-protein coupled receptor 174 [Homo sapiens]* (PDB: 7XV3.) It is also similar to “Free fatty acid receptor 1” that has been crystallized (PDB 4PHU, and others.)
Multimer run
I tested a multimer of alpha and beta (2 each) of human hemoglobin. It has to be noted that the file structure for the local colab
version of AlphaFold is different than that for the regular
Alphafold: There has to be only one >
symbol at the top and the sequences are separated by a colon :
. The sequences below are ordered as: alpha:beta:alpha:beta sequences.
>HBA_HUMAN_2a
MALSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLS
HGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFK
LLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR:
MAHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLS
TPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVD
PENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH:
MALSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLS
HGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFK
LLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR:
MAHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLS
TPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVD
PENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH
The command was:
% colabfold_batch --amber --templates --num-recycle 3 --model-type alphafold2_multimer_v3 /Users/jsgro/AlphaFoldM1/multimer.fa HBAResults
Alpha sequences contain 142aa, beta sequence contain 147aa.
Computations started at 10:14 PM and finshed successfully at 1:31 AM (3h 17 min.)
The shorter size of each sequence allowed for faster computation in spite of the multimeric form.
The resulting quaternaty structure appears very well defined. However, it only contains the protein and is lacking the heme and iron.
It favorably compares to a crystal structure, here 1SI4
:
Conclusion
It is remarkable that this works on a laptop. Of course using a NVDIA GPU on an alternate computer or on a Linux cluster would provide faster computation.
Hard Drive Space
The disk space used for the complete installation is:
- Local folder (contains local Python, scripts etc.): 2:13 Gb
- Cached “params” parameters for Alphafold: 5.59 Gb
FAQs
From https://zzun.app/repo/YoshitakaMo-localcolabfold-python-miscellaneous