PyMOL and Biological Units from REMARK 350

Illustration of 3x4 rotation/translation matrix

Summary

Compute biological assembly from coordinates with a python script.

Symmetry

X-ray crystallography solved the mathematical problem to provide 3D (cartesian) coordinates but the resulting “asymmetric unit cell” does not necessarily represent the biological structure. The Protein Data Bank (PDB) now provides options to download the biological entity, which can contain a lesser number than the deposited coordinates, or in other cases, the deposited coordinates need to be used to create symmetrical coordinates by applying symmetry. For these biological entry, the coordinate files contain REMARK 350 entries providing the 3×4 rotation/translation matrices used to compute them.

For example, entry 2BIW has a unit of 4 identical proteins, but the biological entity is just one of them, while entry 1DUD  is a monomer while the biological entity is a trimer, therefore needed to compute 2 added symmetrical units.

The following guide is a useful resource to understand these concepts: Introduction to Biological Assemblies and the PDB Archive

Note: It is important to note that there may be other symmetry matrices within the file which are related to the packing of the molecules within the crystal.

PyMOL and Symmetry

PyMOL is based on the Python language and scripts exist to help compute symmetry-related molecules.

While the PDB offers the possibility to download the biological assembly, there are still times when computing your own may be necessary. For example, receiving pre-published data, or working with altered coordinates.

I found entries for this purpose within the PyMOL Wiki. I was not able to run BiologicalUnit but I was able to run Quat after “fixing” Python 2 nomenclature to be able to run within a modern PyMOL which is not under Python 3. (Fixed Python 2 print statements into Python 3 print()function by adding the parenthesis. Changed basestring to  str. The modified script file can be downloaded as  quat.py.)

Using and testing quat.py

I tested the slightly modified script with a small, simpler file first: 1DUD (Crystal structure of the Escherichia coli dUTPase in complex with a substrate analogue (dUDP)) and then with a more challenging file:  4RHV (human rhinovirus B14 structure.)

The procedure is rather simple, but the novice user may wonder how to activate the quat.py script within PyMOL, which is done by command line with run within the command-line in PyMOL.

Here are the steps:

1. Download quat.py and keep the file in e.g. the Downloads directory

2. load the coordinates. For the unit cell simply call the PDB ID with fetch or use the load command if the file is already within the Downloads directory.
For example: fetch 1dud or load 1dud.pdbif the file is already present.

IMPORTANT NOTE: the file need to be in PDB format and NOT CIF for the script to work.
Therefore the command may need to be modified as fetch 1dud, type=pdb as the default is now CIF. If the file is not present a warning will print with: “please provide filename

3. Add the script functionality to PyMOL with: run quat.py

4. Use the script: quat 1dud
Note: the name used is that which appears on the right in PyMOL after loading the coodinates.

1dud: monomer and trimer

The image shows the addition of  2 symmetry-related units to obtain the biological assembly.

For the rhinovirus test the computation was a bit longer as there are 60 BIOMT matrices.

4RHV rhinovirus as protomer, complete capsid and pentamer

The 4rHV protomer (left) containing viral proteins vp 1, 2, 3, 4 was first saved as a complete capsid (center) and the pentamer was extracted by simple text editing to save only the pentamer shown.

PyMOL can now save the coordinates with the menu cascade: File > Export Molecule…

The default option of this menu is to “Write segment identifier (segi) colum which is very useful. This number is easiest to recognize if the file is saved in the older PDB format: it will be the column before the last one.

For the rhinovirus example, the first 5 segi represent the first pentamer of the structure. The first and last lines of coordinates were as follows:

ATOM      1  N   THR 1  17      32.241  35.354  96.610  1.00 15.33      1    N  
...
ATOM  32701  OXT ASN 4  68      66.658  66.484  74.345  1.00 37.38      5    O1-

It is easy to recognize 1and 5 as the segi numbers. This number thus makes it easier to select segments by text editing as selecting graphically is more challenging is such as large structure.

The pentamer was colored by command line selection  for each vp, e.g. select vp1, chain 1 and colored by menu: C > blues > slate

Alternate methods

Other commands are useful for this purpose:

all_states
split_states
assembly
– Chain characters: assembly ID

Note: The assembly option does not require running a Python script, but the results of quatprovide a better separation of the coordinates that allow the selection of single chains thanks to the choice of segi numbers.