No GUI PyMOL for high throughput images and optional Docker

Summary

Computing a PNG image for each of 1,000 PDB file as a cartoon, color-coded by B factor.

30 contact sheets containing the 1,000 images

PyMOL without GUI

PyMOL is routinely used by Biologists to illustrate molecules, using the graphical user interface (GUI.) However, there are situations where it may be beneficial to run PyMOL without using the mouse. I recently computed 1,000 predicted 3D structures using Omegafold and I wanted to render each of on the predicted structure as a ribbon diagram, colored by B factor. Using the mouse would mean to open each of the 1,000 PDB files, and then click the relevant buttons, or at best type their command equivalent.

This is where the option to start PyMOL without GUI is amazingly powerful. This can be accomplished on one’s own laptop, even if PyMOL is not installed thanks to a Docker image. (See also a previous Docker attempt with PyMOL in this blog entry “Docker PyMOL, Posted on March 3, 2020.)

1,000 PDF files

This large number of files was the result of computing predictions based on sequence with “evodiff“, and subsequently computed as 3D structures by “Omegafold,” which will be detailed on a subsequent post. This post is dedicated to imaging 1,000 3D coordinates all at once, without manual intervention.

PyMOL Script: reproducibility and sharing

While most users are happy to tweak an image graphically using the PyMOL interface, the best method to working with PyMOL is to create a text script that can re-create the same image at will. The use of script is very well illustrated in the book PyMOL in book Exploring Protein Structure – Principles and Practice which provides a script for each of the molecular images within the book. I also made an example of this in my “PyMOL scripts book” illustrating the COVID19 spike protein.

The principle is simple: each action within the PyMOL GUI can be done as a command. The list of commands saved into a simple text file creates a “PyMOL Script” file typically saved with the .pml filename extension. In turn, the script can be submitted to PyMOL from either the GUI or a call to PyMOL from the command line resulting in a saved PNG image.

From script to image with pyMOL

A command can be “looped” in a terminal therefore allowing the processing of a large number of files.

One script to image them all

One script could probably contain most of the commands necessary to open and present each structure as a cartoon, and then save a graphic file in PNG format. This would be nice, but in many cases it is useful to color the structure based on a property contained within a specific column in the 3D data. In that sense, “one script to rule them all” would probably not provide the best images.

Coloring each structure by B factor

The B factor (sometimes called the “temperature factor”) contained within these 3D prediction can help visualize a measure of the accuracy of the prediction. To “personalize” each structure image it is necessary to provide the minimal and maximal value of the B factor within each PDB file. It is therefore necessary to process each file to obtain these numbers. Once again, the “loop” option of a shell command can integrate that request. The computation for each will be done with a short  awk script saving the results in a temporary shell variable called min and max respectively.

“Here file” to the rescue

Since we need specific number for each file we need to create a specific .pml filename script for each one. How can we make each command specific to each structure without manual intervention? This is the genius of a “here document” which can create a new document “on the fly”. And since it’s used only once, it can be overwritten each time once a structure has been processed.

PyMOL Commands

PyMOL is written in the Python language, but most commands have a simpler PyMOL equivalent closer to English. For example, the PyMOL load command is simpler than the Python function cmd.load() while providing the same functionality. What do we need to insert in the script?

1. we want to load the file: load filename.pdb
2. orient the molecule along its longer axis
3. remove solvent i.e. water molecules (if any)
4. color by B-factor: spectrum b
5. as cartoon to show a ribbon-diagram
6. save image: png command. The default size is 640×480 pixels

Calling PyMOL on the command line

OK that sounds great. BUT… How do we call PyMOL from the command line?
On a Linux system where PyMOL is properly installed the command is simply pymol which is quite simple.
However, this is also possible on a Mac and on Windows if one knows where to look….!

Mac option: On macOS, an application is a special kind of a folder containing all that is necessary. To open any application, simply right-click (or control-click) the icon named PyMOL or PyMOL.app and use the second menu: “Show Package Contents” and then follow the menu cascade on the folders by clicking: Contents > MacOS in which we find PyMOL with a back/terminal icon. If we right-click (or control-click) on this specific PyMOL icon and then click the “option” key, we can see further down the option to ‘Copy “PyMOL” as Pathname‘ which is the “secret” we need… The clipboard will then contain:
/Applications/PyMOL.app/Contents/MacOS/PyMOL which is the text command we need and should be the same for all Macs if the application is installed in the standard location.

Windows Option: PyMOL is typically installed in the “Program Files” “Program Files (x86)” directory on Windows 10 or 11. Look for a folder named “PyMOL” or a similar name. We want to find a file called pymol.exe whih should be located within this folder. Once you locate the pymol.exe file, note its full path. For example it could be:
C:\Program Files\Schrodinger\PyMOL\pymolwin.exe However, the exact location of pymol.exe may vary depending on the version of PyMOL and how it was installed.

The current Windows installed for version 2.5.4 proposes to install either “just for me” (i.e. the user) or for all users (requiring Admin password.) The installation locations for these were either one of:

C:\Users\sgro\AppData\Local\pymol
C:\ProgramData\pymol

In both cases the name of the software was: PyMOLWin.exe but upper/lower case does not affect Windows commands.

Linux option: the program is simply called with pymol

Docker option: This Docker image contains the free version PyMOL 2.5.0 and can be activated, sharing the current directory containing all PDB files with this command.

docker run -it --rm -v ${PWD}:/data -w /data biopod/pymol:2.5.0

(Note:Mac wtih Silicon Chip (M1 etc.) may need to add –-platform linux/amd64 to “pull”or “run” this Docker image.)

We are almost there… One last thing we need to do is to tell PyMOL that we don’t want the GUI to start when it is invoked on the command line. There are many command line options available. The options of interest for 0ur purpose are:

-c        launch in command-line only mode for batch processing
-q        supress startup message
-Q        quiet, suppress all text output

We can add  -c and -q or -Q to suppress text output which can be combined as follows for the Linux command, but also works with the appropriately called application on macOS and Windows.

pymol -qc

or

pymol -Qc

One adapted script to rule them all

We can combine all these requirements to create a “here document” that is personalized to each file with a loop. The very last command is calling PyMOL, and this command will depend on the operating system as described just above. The example below is calling the macOS version.

#!/bin/bash

for f in SEQUENCE*.pdb                    
do
min=`cat $f| egrep ^ATOM | awk '{print $11}' | awk 'BEGIN{a=1000}{if ($1<0+a) a=$1} END{print a}'`
max=`cat $f| egrep ^ATOM | awk '{print $11}' | awk 'NR==1{max = $1 + 0; next} {if ($1 > max) max = $1;} END {print max}'`

cat > commands.pml <<-EOF
load $f
orient
remove solvent
spectrum b, blue_white_red, minimum=$min, maximum=$max
as cartoon
# cartoon putty
png $f.png
EOF
/Applications/PyMOL.app/Contents/MacOS/PyMOL -Qc commands.pml
done

To make the command simpler, this version will create a file ending with .pdb.png

If the folder contains 1,000 PDB files then 1,000 PNG files will be created… with just a dozen lines of code.

Contact sheets

The final step was to create contact sheets to assemble multiple images together. This was done with yet another Docker image containing the command-line programs of “ImageMagick” placing 35 PNGs onto each sheet with the program montage. This resulted in 29 contact sheets. The program was called with a docker command from a Docker image:

docker run -it --rm -v ${PWD}:/data -w /data minidocks/imagemagick montage

Then for each set of 35 PNG files within a PNG directory the command looked like:

montage -verbose -label '%f' -font Inconsolata-Regular -pointsize 10 -background '#EEEEEE' -fill 'black' -define jpeg:size=200x200 -geometry 200x200+2+2 -auto-orient -tile 5x7  ./PNG/SEQUENCE_{0..34}.pdb.png contact_0..34.jpg

The 29 contact sheets are at the top of this page.