Rosetta Ligand Docking – Help with Docker

Summary

Combine software and scripts on Docker and local macOS computer (Intel amd64 or arm64 Silicon Chip M series) to follow successfully the Rosetta tutorial Ligand Docking with a G-Protein Coupled Receptor. This method will allow to access the native OS speed while fulfilling all preparatory and exploratory steps that fail or are too complex to set-up on the local computer.

NOTE: The complete tutorial is now on page: Rosetta – Ligand Docking and extends materials from this blog.

Rosetta

The Rosetta software suite includes algorithms for computational modeling and analysis of protein structures. […] including de novo protein design, enzyme design, ligand docking, and structure prediction of biological macromolecules and macromolecular complexes.

The software is rather complex and it can be difficult to “make things work” considering the breath of options for algorithms or hardware and operating system support. This method will allow to access the native OS speed while fulfilling all preparatory and exploratory steps.

This post is an attempt to help users that want to use the software on their native OS (macOS) but since some functionality built-in the tutorials assumes a Linux OS, some of the steps are in fact easier handled on the Linux side thanks to Docker.

Preparations

These instructions should benefit macOS users primarily.

Users should be somewhat familiar with bash command line (see e.g. my Survival Command Line tutorial) and have the Docker Desktop software installed (see e.g. my tutorial Docker – Beginner for Biologists.)

The Rosetta Docker image can be downloaded from Terminal with the command below, assuming that Docker is already installed:

docker pull rosettacommons/rosetta:latest

While the purpose of the Docker image is to provide access to all the Linux compiled binaries, we will take advantage of some of the Linux functionality as well as the installed Python.

Rosetta preparations

Rosetta is freely available for academic and non-commercial purposes, under license. The software can be downloaded from the links provided on the Download page.

In order to compute the docking computation “natively”(for faster results) on the local computer users should download the newest “release”, e.g. from the Academic download page. For this post I used Rosetta 3.14 for M1 (Silicon Chip “M1 binaries“, 13Gb) Macintosh, Intel-based Mac users should download the “Mac binaries” (14Gb). The unarchived file will require about 45 Gb of disk space but will contain the material for all tutorials and demos.

Getting Started

We will use 2 Terminal sessions: one to navigate within the Macintosh natively. The other to run a Docker container that will be activated in a way that both Terminal session will share the same directory area on the local computer.

Environment variables

On the Macintosh side it will be useful to create environment variables as suggested in the Rosetta Commons section How To Read These Tutorials. Assuming that the binaries are found within the Downloads directory, we can keep that location and its default name. For the M1 series the unarchived directory was called rosetta.binary.m1.release-371 and the following variables were created. I replaced my username by $USER so that these commands become generic and can be copied, with the caveat that the binary name might be different (change accordingly!.)

Open a Mac Terminal (/Applications/Utilities/Terminal.app) and paste the (edited) commands:

export ROSETTA3=/Users/$USER/Downloads/rosetta.binary.m1.release-371/main/source
export ROSETTA3_DB=/Users/$USER/Downloads/rosetta.binary.m1.release-371/main/database
export ROSETTA_TOOLS=/Users/$USER/Downloads/rosetta.binary.m1.release-371/main/tools
export ROSETTA3_DEMOS=/Users/$USER/Downloads/rosetta.binary.m1.release-371/main/demos

Copy/Paste these (or edited) commands onto your Terminal for the current session. To make these permanent add them within .zshrc or .bashrc files. Or simply Copy/Paste again next time!

Start the Docker container

Open a new Terminal (I usually select a different color to better distinguish which terminal I am using using the Top menu cascade: Shell > New Window and then select a new color (Basic is white background.)

Navigate the the top level of the release directory within the main directory. For me it would be:

cd /Users/$USER/Downloads/rosetta.binary.m1.release-371/main/

Verify that this was successful with pwd and the continue.

Launch the Docker container. The -v option lets us share the current directory (i.e. main with the container, mapping it as /data within, and making it the working directory with the -w option:

docker run -it --rm -v ${PWD}:/data -w /data rosettacommons/rosetta

(On a Silicon M series Mac Docker will complain about the “platform” but this can be ignored.)

A listing of the files and directories should reveal the same content as the main directory:

/data# ls -F
CITING_ROSETTA.md     README.md           rosetta_scripts_scripts/
CLA.md                database/           source/
CONTRIBUTING.md       demos/              tests/
LICENSE.md            documentation/      tools/
PyRosetta.notebooks/  pyrosetta_scripts/

Continue Docker Container set-up

The tutorial assumes that the user is sitting in front of the fully functional Linux computer, including graphical interface, with all ancillary software needed for such a computer already installed. However, within the container a few important utilities are missing, so we need to add them now. You can check the Linux distribution that is running with the command:  cat /etc/os-release, and python -V if you want to check the Python version installed.

Then issue the following commands after the # prompt:

apt-get update
apt-get install -y wget nano pymol

With these installed we can run all of the commands in the tutorial. Adding the pymol Python module will allow the creation of PyMOL .pse files as described in the tutorial, from the Docker Container Terminal.

Following the tutorial

Open a web browser with the Rosetta tutorial: Ligand Docking with a G-Protein Coupled Receptor in order to follow the steps. We will follow some steps within the macOS, while other steps will more easily be performed within the Docker container. In addition to 2 different colors for the Terminal (if you chose to do that) you can be guided by the fact that the Container runs as root and has # as the prompt, while on the Mac it would be $ (bash) or % (zsh) depending on the shell in use.

Within the Docker container  we don’t really need the ROSETTA variables as all paths can start with /data to be “absolute” (i.e. non ambiguous.)

The sections below will follow the same numbering as the tutorial.
However, read most of the information from the web page!

1. Go to the desired location:

On macOS Terminal:
cd $ROSETTA3_DEMOS/tutorials/ligand_docking/protein_prep

On Container Terminal:
cd /data/demos/tutorials/ligand_docking/protein_prep

2. Prepare a human dopamine 3 receptor structure:

The first step has to be accomplished on the Docker Container side as the called python script clean_pdb.py calls on wget to download a PDB file but is not installed on macOS by default. The script then calls on zcat to unarchive the file but on macOS this software behaves differently. Thus it is best to accomplish this task on the Container side, but since we are sharing the directories these will “magically” appear on the macOS side as well!

On the Container Terminal type:

/data/tools/protein_tools/scripts/clean_pdb.py 3PBL.pdb A

Files 3PBL_A.fasta and 3PBL_A.pdb should now be within the directory.

2.3. From either terminal looking within protein_prep directory type:

cp 3PBL_A.pdb ../docking

3. prepare the ligand files :

The command pymol eticlopride_conformers.sdf assumes a Linux computer with a Graphical interface and will not work on macOS and will not work with the Docker Container as it is running as “Text-only.”

To open PyMOL from command line on a Mac use the command:open -a /Applications/PyMOL.app. Then slide the file eticlopride_conformers.sdf onto PyMOL with your mouse (using as an argument does not open it.)

3.2.3. Generate a .params file

This command can be run from either computer… However, the first line of the script reads: #!/usr/bin/env python which assumes that the computer environment has a python path defined. On my Mac it is currently defined as python3 and therefore it complains with env: python: No such file or directory This is fixed easily by adding python3 in front of the actual command.

You can run this step with either of the following commands. The first one could also work on Mac if python is defined as such. For the Mac Terminal option I make use of the environment variable $ROSETTA3. You can use the option -h first as suggested in the tutorial.

Option 1 (In Container): /data/source/scripts/python/public/molfile_to_params.py -n ETQ -p ETQ --conformers-in-one-file eticlopride_conformers.sdf

Option 2 (Mac Terminal): python3 $ROSETTA3/scripts/python/public/molfile_to_params.py -n ETQ -p ETQ --conformers-in-one-file eticlopride_conformers.sdf

Note: The tutorial assumes that we are within the ligand_prep/ directory, but also calls the sdf file with ligand_prep/eticlopride_conformers.sdf which will cause an error since we are within that directory already. Hence the directory name has been removed from the above commands.

You can also verify that the last line of the just created file ETQ.params contains the following text: PDB_ROTAMERS ETQ_conformers.pdb with the commandtail ETQ.params and copy the files into the  ligand_docking directory with the command:

cp ETQ* ../

4. Final preparations in the docking directory

First we go back to within the ligand_docking directory

cd ../

The command pymol 3PBL_A.pdb ETQ.pdb invites to explore the complex graphically. (see above.)

4.2 Concatenate protein and ligand

cp protein_prep/3PBL_A.pdb .
cat 3PBL_A.pdb ETQ.pdb > 3PBL_ETQ.pdb

If you are missing these files check the web page for instruction to obtain them from the answers directory.

5. Rosetta wrapper and helpers

See web page for details. Copy the files (remember that . means “current directory.)

cp docking/dock.xml .
cp docking/options .
cp docking/crystal_complex.pdb .

6.Run the docking study

This is where it is useful to have the macOS binaries installed. For large projects they will run faster than running those within the Docker Container by emulation. The name of the binary will differ depending on the operating system.

The tutorial assumes a standard installation, with binary:

$ROSETTA3/bin/rosetta_scripts.linuxgccrelease

The binaries within the container are within /usr/local/bin and the specific one to call for this section is called

rosetta_scripts.cxx11threadserialization.linuxgccrelease

On the Mac it will be:

$ROSETTA3/bin/rosetta_scripts.static.macosclangrelease

To run the docking use the appropriate binary, followed by @options

On the Mac it would be:

$ROSETTA3/bin/rosetta_scripts.static.macosclangrelease @options

7. Rosetta models

The Rosetta models are saved with the prefix 3PBL_ETQ_ followed by a four digit identifier. 3PBL_ETQ_0001.pdb. Each model PDB contains the coordinates, and Rosetta score corresponding to that model further down the file. All models data is also summarized within the plain text file scores.sc.

8. Transform_accept_ratio.

9. ligand_rms_no_super_X

gives the RMSD difference between our model ligand and the crystal structure ligand given in crystal_complex.pdb.

10.Use pymol to visually compare

11. Script  visualize_ligand.py

Provides a quick visualizations of protein-ligand interfaces and saves a .pse PyMOL session file. This can to be done on the Docker Container side if the pymol Python package was installed (see above.)  Note that the tutorial file name is  3PBL_A_ETQ_0001.pdb but our result files do not have the _A_ portion.

Run this command from the Docker Container Terminal, assuming we are within the ligand_docking directory:

scripts/visualize_ligand.py 3PBL_ETQ_0001.pdb

The new file 3PBL_ETQ_0001.pse can be opened on the Mac side, either graphically or from the Mac Terminal with the command: open ./3PBL_ETQ_0001.pse


Analysis

The out directory contains 50 precomputed structures for a better analysis evaluation as well as files score.sc, a score_vs_rmsd.csv file, a rmsds_to_best_model.data, and several .png image files.

Change into that directory:

cd out

The following files should be present:

  • score.sc: summary score file for the 50 structures as outputted by Rosetta
  • score_vs_rmsd.csv: a comma separated file with the filename in the first column, total_score for the complex in the second column, the interface score in the third column, and ligand RMSD to the native structure in the fourth column.

If the score_vs_rmsd.csv is absent it can be recreated with the provided script  extract_scores.bash

../scripts/extract_scores.bash score.sc > score_vs_rmsd.csv

The next file is calculated with script calculate_ligand_rmsd.py

  • rmsds_to_best_model.data: space separated file containing RMSD comparisons with the best scoring model (not crystal structure!) for all PDB files.

However, when run under Python3 the script calculate_ligand_rmsd.py will give an error which can be prevented by updating the print statements on lines 83 and 221. It just needs some parentheses added around the print statements. Open the file with a simply word processor e.g. nano) and change line 83 from print "Doing aligning" to print("Doing aligning")
and line 221 from print "file, name: "+file+' '+name
to: print("file, name: "+file+' '+name)

(See Appendix “Python print statement” for a script to modify these lines without manual editing.)

Finally comment out the last line as the script capture_command.sh is not present.

After editing, the script can then be run from the Docker Container with the command below. The Tutorial suggests to leave the out directory first and run this on the predictions that were run previously:

cd ../

Then:

../scripts/calculate_ligand_rmsd.py -n 3PBL_ETQ_0003.pdb -c X -a 7 -o rmsds_to_best_model.data *_000*.pdb

The script calculate_ligand_rmsd.py uses the pymol Python module and therefore can only work properly under the Docker Container session.

2.4. PNG files

There is no explanation on the production of the PNG files. (See Appendix for short R and Python script to create similar images from file out/score_vs_rmsd.csv )

The command gthumb is the Linux way to display the image and will not work from within the Docker Container. On macOS use the graphical interface (double click on the icon) or use the command line from within the macOS Terminal. For example, if the file is within out:

open -a /System/Applications/Preview.app ./out/score_vs_crystal_rmsd_plot.png

5. Look at some structures

This is mostly a visual exercise.

 


Appendix

Python print statement

The Python 2 print statement was converted to a function in Python 3, which changed the nomenclature, and requires parentheses. The following regular expression command will change the print statements to print() function syntax within the same file using the sed (stream editor) program. On macOS the default sed version does not work and one would need to install gnu-sed (gsed) for this to work.

Thus the substitute command should be issued at the Docker Container Terminal:

sed -i -r 's/^(\s*print)\s+(.*)/\1(\2)/g' ../scripts/calculate_ligand_rmsd.py

This command was derived from a vi/vim text editor command found on stackoverflow.com

The -i option will edit and overwrite the file calculate_ligand_rmsd.py

(Note: if there is a permission error remove -i and redirect the output to a new file.)

Copilot provided the following explanations:

Here’s what’s happening in this sed command:

  • -r option allows sed to understand extended regular expressions.
  • s is the substitute command.
  • ^\s*print\s+(.*) is the pattern to match. It matches lines that start with zero or more spaces, followed by print, followed by one or more spaces, and then any characters.
  • \1(\2) is the replacement pattern. It replaces the matched pattern with print followed by the matched characters in parentheses.
  • g at the end is a flag that tells sed to apply the substitution globally on each line.

PyMOL as pymol

Note: It may be possible to use PyMOL on macOS in a similar manner as on Linux knowing that the executable pymol is found as /Applications/PyMOL.app/Contents/bin/pymol

Scripts for PNG files

The following scripts were suggested by Copilot, with the simple question to plot the 3rd and 4th columns.

R Script

Assumes ggpplot2 has been installed  or install with  command at the R console:install.packages("ggplot2")

Make sure that you are in the directory containing the desired CSV file.

The plot is shown graphically but can be saved manually.

# Read the data from a text file (assuming the file is named 'data.txt')
data <- read.table("score_vs_rmsd.csv", header=FALSE)

# Extract the 3rd and 4th columns
x <- data$V3
y <- data$V4

# Load the ggplot2 library for plotting
library(ggplot2)

# Create a scatter plot
ggplot(data, aes(x=x, y=y)) + 
  geom_point() +
  theme_minimal() +
  labs(title="Scatter Plot of 3rd and 4th Columns",
       x="3rd Column",
       y="4th Column")

Python version

Assumes pandas and matplotlib are installed (e.g. using pip)

The script will export a PNG file called plot.png

import pandas as pd
import matplotlib.pyplot as plt

# Read the data from a text file (assuming the file is named 'data.txt')
data = pd.read_csv('data.txt', sep=' ', header=None)

# Extract the 3rd and 4th columns
x = data.iloc[:, 2]
y = data.iloc[:, 3]

# Create a scatter plot
plt.scatter(x, y)
plt.title('Scatter Plot of 3rd and 4th Columns')
plt.xlabel('3rd Column')
plt.ylabel('4th Column')
plt.grid(True)

# Save the plot to a PNG file
plt.savefig('plot.png')