AlphaFold2 with ColabFold in Container

A computer inside an aquarium as an analogy for a Linux container running on a mac or windows computer.

Summary

Run the ColabFold version of AlphaFold2 on your laptop (slow without GPU) or on a large Linux cluster.

The full tutorial with scripts is located at ColabFold with HTCondor

What is AlphaFold2

Excerpt from a previous post (Five ways to run AlphaFold)

AlphaFold can accurately predict 3D models of protein structures by providing an amino acid sequence.The AlphaFold network directly predicts the 3D coordinates of all heavy atoms for a given protein using the primary amino acid sequence and aligned sequences of homologues as inputs. (Jumper et al., 2021.)

Running the “native” AlphaFold2 software requires the installation of a large (2.5 TeraBytes) database of known sequences and 3D structures. Therefore this version can only be run on large clusters. The “Colab” version below is more flexible as it accesses “pre-made” multiple sequence alignments online, avoiding the need for the large database.

AlphaFold2  Colab

AlphaFold2 Colab is a Google Colab notebook that allows users to predict protein structures using the AlphaFold2 model developed by Deep Mind. While free there are usage limits to using this Jupyter notebook version that make this not suitable for all projects. Fortunately, the software is available as a Container image that can run on multiple computers, from laptops to large servers. The presence of a GPU from Nvidia makes the computation faster than CPU-only.

The software is available from github.com/sokrypton/ColabFold/

AlphaFold2  Colab Container

Containers are a method to running usually Linux software on other computers, in an isolated environment. A simple analogy could be an aquarium, which contains a water environment in our air world.  They co-exist but don’t mix. The owner of the aquarium can access the content and for example feed the fish.

The local computer needs to have a supporting software (e.g. Docker or Podman) that can open an existing “image” to create a “container” that will run the “bottled” operating system, libraries, and software needed. (Singularity is typically only available on Linux clusters)

In the case of ColabFold the instructions are on this page: Running ColabFold in Docker. The provided commands start with docker but the same command could be replaced with one starting with podman if that is what is installed.

Running on Linux cluster under HTCondor

For large computations and access to a GPU, it is best to run the software on a Linux cluster such as the one in the Biochemistry Dpt or the larger University cluster at CHTC .

In this case we cannot use a command like docker run...as we need to use a non-interactive method through the HTCondor scheduler and a script to depict the job we want to accomplish i.e. running ColabFold with a protein sequence inside the container.


Image Credits:Created by Copilot. Prompt: “*A simple image depicting colabfold alphafold as running inside an aquarium.*”