No GUI PyMOL – computing distances for thousands of PDB files from Rosetta

""

Summary

We run a script to compute the distance between 2 atoms in a series of PDB files with identical molecules in different conformations and save the distance results into a plain text file.

No GUI PyMOL

In previous posts I explained how we can use PyMOL as a command-line tool from a Terminal when PyMOL is launched without its GUI interface. See for example No GUI PyMOL for high throughput images and optional Docker.

This post is about a similar task, but this time exporting distance(s) between atoms within a PDB file rather than exporting a PNG image for each PDB structure.

Thousands of files

Materials: we’ll use the 50 PDB files exported at the end of the tutorial “Rosetta – Ligand Docking” but the number of PDB files can be much larger.

Philosophy: rather than trying to create a complicated program I like to apply the Unix philosophy if using multiple specific tools that can be combined together.

PyMOL distance command

The PyMOL distance command first computes and then displays a distance in Angstroms between selected atoms within the graphical representation  of the molecule. Our goal for this exercise is to capture the distance and save the number within a plain text file.

Since PyMOL is written over the Python programming language, “under the hood “the distance command is in fact a Python function (see bottom page of distance under paragraph PyMOL API.)

For our purpose this alternate nomenclature will be useful as it is the method used in the PyMOL Wiki page Measure_Distance that we will use with very few modifications.

Selecting Atoms

The PDB files are assumed to be the same protein/ligand with various configurations, and we want to measure the distance between specific pairs of atoms. For now we’ll use a single pair.

For selecting the atom and writing their name with the correct PyMOL syntax it is easier to use the graphical interface: open one of the PDB files and click to the 2 atoms for which we want to compute the distance. As an example I opened the first of the 50 Rosetta ligand exercise which was named  3PBL_ETQ_0001.pdb and clicked on 2 atoms. PyMOL will echo the syntax for selecting these atoms within the top text window. The named appeared as:

You clicked /3PBL_ETQ_0001//X/ETQ`1/O2
You clicked /3PBL_ETQ_0001//X/ETQ`1/C11

The standard PyMOL command would be:

PyMOL>distance d1, /3PBL_ETQ_0001//X/ETQ`1/O2,  /3PBL_ETQ_0001//X/ETQ`1/C11
Executive: object "d1" created.

The object created is presented as dashed line with the distance value.

''

This can also be used within the line from the PyMOL Wiki script Measure_Distance by changing the new names into this line from the script that reads by default:

dst=cmd.distance('tmp','mol1///25/ha','mol1///26/ha')

However, we need to add one line at the top of the script to load that specific PDB file. (Later we’ll do this in a different way with additional help from the shell.)

load 3PBL_ETQ_0001.pdb

We can then save this into a plain text file that we can call e.g. dist_01.pml that we can use to test if it all works. The script has instructions to write the distance value within a plain text file called dist.txt.

Test for one file

To test for this one file (the name of the PDB is wihtin the script – see above) and the output will be written into a file called dist.txt. On a Macintosh the PyMOL program without the GUI can be called with the following command. (See also No GUI PyMOL for high throughput images and optional Docker for other methods.)

/Applications/PyMOL.app/Contents/MacOS/PyMOL -Qc dist_01.pml

We can the check the content from the terminal with the cat command:

$ cat dist.txt
   3.563

On the GUI the number was rounded to 3.6 and therefore the conclusion is that the method works!

Adapt to many files

OK. However, we now have to process many, perhaps thousands of files. So we need to adapt the script and then call on the shell to create a loop calling.

Adapt the script

In the example script the name of the molecule was defined asmol1 and indeed we need to have a “generic” name for calling the atoms. We can use the shell (see below) to copy the PDB file that we want to analyze to e.g. mol.pdb which would result in the naming inside PyMOL as asmol.

We can then simply change the load command at the top of the script to load mol.pdb.

We now need to make sure that the numbers will be added to the dist.txt result file. For this we need to modify the Python code that write to file from f=open('dist.txt','w') to f=open('dist.txt','a') changing the w (write) into an a (append) otherwise our file would be overwritten every time and result in a single line containing the very last measure!

The final script will then be:

# This script writes the distance from 
# to the file "dist.txt"
# Simply change your selections to see different distances.

load mol.pdb

# import PyMOL's command namespace
from pymol import cmd

# open dist.txt for writing
f=open('dist.txt','a')

# calculate the distance and store it in dst
dst=cmd.distance('tmp', '/mol//X/ETQ`1/O2',  '/mol//X/ETQ`1/C11')

# write the formatted value of the distance (dst) to the output file
f.write("%8.3f\n"%dst)

# close the output file.
f.close()

Using the script

Assuming that we are in a directory with all the PDBs to analyze, we can create a bash shell loop to repetitively rename each file into the generic PDB name mol.pdb and then compute the distance. We can also add one line to print the name of the file before the distance measure so that we know to which file the measure corresponds.This is done with the echo command.

However, since we are appending to the dist.txt result file it is best to delete any previous version first with rm dist.txt. The loop will then be:

for f in *.pdb         
do
echo $f >> dist.txt
cp $f mol.pdb
/Applications/PyMOL.app/Contents/MacOS/PyMOL -Qc dist_01.pml
rm mol.pdb
done

Here is a decription of each line of the loop:

Command Description
for f in *.pdb Create a variable called f that will in turn take the name of each PDB file one by one
do it’s a bash way of saying let’s go to work and do something
echo $f >> dist.txt copy the file name into a text file called dist.txt
cp $f mol.pdb duplicate the PDB file into a copy called mol.pdb
/Applications/PyMOL.app/Contents/MacOS/PyMOL -Qc dist_01.pml run the no GUI PyMOL. (Mac command.) This will compute distance for each PDB one at a time and append the result into the dist.txt file
rm mol.pdb remove the copy made above (not mandatory but cleaner code.)
done We are done with the loop.

Formatting the output

The first 6 lines of the dist.txt result file can be shown with the command:

$ head -6 dist.txt
3PBL_ETQ_0001.pdb
   3.563
3PBL_ETQ_0002.pdb
   6.069
3PBL_ETQ_0003.pdb
   6.069

If the format is not adapted, we can very easily convert this output into a 2-column format which may be easier to use in e.g. a spreadsheet, thanks to the very useful Unix utility called paste as follows:

paste - - < dist.txt

which will result in a column format, for example here are the first 5 lines:

3PBL_ETQ_0001.pdb      3.563
3PBL_ETQ_0002.pdb      6.069
3PBL_ETQ_0003.pdb      6.069
3PBL_ETQ_0004.pdb      6.068
3PBL_ETQ_0005.pdb      6.069

If we want a `.csv` file instead we can pass (with the Unix pipe |)  this output to awk to easily add a comma in between. With awk  each column is called with a preceding $ sign, so column 1 is $1 and column 2 is $2. We add the comma between quotes:

paste - - < dist.txt | awk '{print $1","$2}' 

To save this into a file called e.g. dist.csv then we “redirect” ( > ) the output at the end of the  bash or zsh shell command on the terminal:

paste - - < dist.txt | awk '{print $1","$2}' > dist.csv

To save just the numbers we can use the pattern recognition programgrep to weed out all lines containing e.g. pdb, for example: grep -v pdb < dist.txt > dist_justnumber.txt

More than one measure

So far we have measured only one distance, albeit in perhaps thousands of files… But what if we need to measure more than one. The solution is contained within the Mailing list question: Re: [PyMOL] command question RE printing out distances to text file showing an example with 3 distance measures, in 3 separate cmd.distance() commands, and also 3 separate write statements.

Note that for a recursive loop method as above the write statement should be changed to harbor the 'a' rather than 'w' in the line  f=open('distnew.txt','w') on the suggested script to append (i.e. add at the end of the file) otherwise the file will be written all over again as explained above.

The rest of the loop would then be exactly the same.


Reference links used:

– Mailing list question: Re: [PyMOL] command question RE printing out distances to text file
– Between states: measure distance between two atoms, in each state, in pymol
– Append rather then write: How do you append to a file instead of overwriting it in Python?
PyMOL Wiki links within the text