AlphaFold2 on Macintosh M1

Summary

This post is a summary of installation of AlphaFold2 onto a Macintosh with an M1 style (arm64) computer chip (i.e. not an Intel/AMD Chip.) This post started with the blog post Installing Alphafold2 on Apple Silicon. The installation will require about 8Gb of space.

Installing AlphaFold2

AlphaFold2 is a trained Machine Learning set of code that predicts the 3D structure of a protein from its amino acid sequence, but does not actually predict the folding mechanism which remains a challenge to be solved.

The very best set-up uses a 2.5 to 3Tb database and is soon to be available at CHTC.

A version with a limited (but still 8-15Gb) database is available via a Google Colab notebook: https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb. However, there is a 2 hour timeout which makes it impossible to compute more complex structures.

The good news is that the LocalColab version can be installed on a local computer, even a laptop following Yoshitaka Moriwaki’s instructions https://github.com/YoshitakaMo/localcolabfoldthat are offered for multiple plaforms. As in the original post I will describe my installation for an M1 Apple Silicon, with 32 Gb RAM and an Apple M1 Max chip (the post author had a 64Gb version.)

Here we’ll also see how it is possible to install without Admin password.

Requirement: using /Applications/Utilities/Terminal.app or simple Terminalto install.

Step 1 install homebrew

This step is rather involved, but can help with many other projects, way beyond AlphaFold!

Homebrew is a method to install a large set of software but the typical installation is for all users and require Admin password. There is a way to install homebrew in one’s local area on the computer without Admin privileges.

Install packages for AlphaFold

The following packages are necessary to eventually install the LocalColab version of AlphaFold:

brew install wget cmake gnu-sed
brew install brewsci/bio/hh-suite

This will install miniconda which is necessary:

brew install --cask miniforge

Install AlphaFold2 colab

Create a new directory, for example in your default home directory ($HOME) called e.g.AlphaFold and change into it:

cd $HOME
mkdir AlphaFold
cd AlphaFold

Then we download the main installation script from github withwwget installed earlier.

wget https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/install_colabbatch_M1mac.sh

If homebrew is installed normally i.e. with sudo (meaning with Admin privileges) the script will work as is assuming that homebrew is located within /opt/homebrew/. The script will run a simple presence test for homebrew and will not continue if the presence test fails.

I had to edit the install_colabbatch_M1mac.sh to change the PATH on 2 lines:

  1. Line 10: presence test line
  2. Line 35: runs conda.sh

In both case simply changing /opt/ to the value of $HOME is enough. In my case I changed /opt/ to /Users/jsgro/

As part of the installation, Python 3.9.x will be installed locally within that directory.

An important step is to add the PATH to this location, which is nomrally suggested at the end of the installation. For me it was:

export PATH="/Users/jsgro/AlphaFoldM1/localcolabfold/colabfold-conda/bin:$PATH"

This also helps to call the command simply with colabfold_batch which can be tested with the request for help:

colabfold_batch -h

Finalizing installation

In my case the command to predict structure failed twice, as some additional Python packages had to be installed. If the PATH was set properly within the Terminal window, the pipcommand would install the new packages within the local Python 3.9.x version as it should.

% pip install opt-einsum

Correct installation info was found at: https://developer.apple.com/metal/tensorflow-plugin/

% python -m pip install tensorflow-macos

Monomer run

First run done with the same protein as in https://www.macinchem.org/reviews/alphafold/installalphafold2.php

% cat ffa2.fa 
>sp|O15552|FFAR2_HUMAN Free fatty acid receptor 2 OS=Homo sapiens OX=9606 GN=FFAR2 PE=1 SV=1
MLPDWKSSLILMAYIIIFLTGLPANLLALRAFVGRIRQPQPAPVHILLLSLTLADLLLLL
LLPFKIIEAASNFRWYLPKVVCALTSFGFYSSIYCSTWLLAGISIERYLGVAFPVQYKLS
RRPLYGVIAALVAWVMSFGHCTIVIIVQYLNTTEQVRSGNEITCYENFTDNQLDVVLPVR
LELCLVLFFIPMAVTIFCYWRFVWIMLSQPLVGAQRRRRAVGLAVVTLLNFLVCFGPYNV
SHLVGYHQRKSPWWRSIAVVFSSLNASLDPLLFYFSSSVVRRAFGRGLQVLRNQGSSLLG
RRGKDTAEGTNEDRGVGQGEGMPSSDFTTE

NOTE: AlphaFold takes the WHOLE first line as a name: in the future make it short!

Fold command:

 % colabfold_batch --amber --templates --num-recycle 3 /Users/jsgro/AlphaFoldM1/ffa2.fa FFA2output

The sequence is 330aa long. The run was successful and provided 5 models. Since there are no GPUs the run took almost 5 hours: 8:48PM till 2:37AM (i.e. 5h 49min.)

This sequence has not been solved by X-Ray, NMR, or CryoEM and appears on the Uniprot web site only as a predicted structure, taken from the AlphaFold database. However, BLAST shows that this sequences matches rather well with *Chain R, Probable G-protein coupled receptor 174 [Homo sapiens]* (PDB: 7XV3.) It is also similar to “Free fatty acid receptor 1” that has been crystallized (PDB 4PHU, and others.)

ffar2 (light blue) onto 7xv3
ffar2 (light blue) onto 4phu

Multimer run

I tested a multimer of alpha and beta (2 each) of human hemoglobin. It has to be noted that the file structure for the local colab version of AlphaFold is different than that for the regularAlphafold: There has to be only one > symbol at the top and the sequences are separated by a colon :. The sequences below are ordered as: alpha:beta:alpha:beta sequences.

>HBA_HUMAN_2a
MALSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLS
HGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFK
LLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR:
MAHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLS
TPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVD
PENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH:
MALSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLS
HGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFK
LLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR:
MAHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLS
TPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVD
PENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH

The command was:

% colabfold_batch --amber --templates --num-recycle 3 --model-type alphafold2_multimer_v3 /Users/jsgro/AlphaFoldM1/multimer.fa HBAResults

Alpha sequences contain 142aa, beta sequence contain 147aa.

Computations started at 10:14 PM and finshed successfully at 1:31 AM (3h 17 min.)

The shorter size of each sequence allowed for faster computation in spite of the multimeric form.

The resulting quaternaty structure appears very well defined. However, it only contains the protein and is lacking the heme and iron.

Predicted quaternary structure (no heme or iron.)

It favorably compares to a crystal structure, here 1SI4:

Predicted quaternary structure (cyan) overlayed with 1SI4.

Conclusion

It is remarkable that this works on a laptop. Of course using a NVDIA GPU on an alternate computer or on a Linux cluster would provide faster computation.

Hard Drive Space

The disk space used for the complete installation is:

  • Local folder (contains local Python, scripts etc.): 2:13 Gb
  • Cached “params” parameters for Alphafold: 5.59 Gb

FAQs

From https://zzun.app/repo/YoshitakaMo-localcolabfold-python-miscellaneous

  • The lack of the large (~3Tb) database is compensated by the multiple sequence alignment (MSA) being generated by the MMseqs2 web server, just as it is implemented in the Google Colab web page.
  • ColabFold Tutorial presented at the Boston Protein Design and Modeling Club. [video see below] [slides].