Summary
Compute biological assembly from coordinates with a python script.
Symmetry
X-ray crystallography solved the mathematical problem to provide 3D (cartesian) coordinates but the resulting “asymmetric unit cell” does not necessarily represent the biological structure. The Protein Data Bank (PDB) now provides options to download the biological entity, which can contain a lesser number than the deposited coordinates, or in other cases, the deposited coordinates need to be used to create symmetrical coordinates by applying symmetry. For these biological entry, the coordinate files contain REMARK 350
entries providing the 3×4 rotation/translation matrices used to compute them.
For example, entry 2BIW
has a unit of 4 identical proteins, but the biological entity is just one of them, while entry 1DUD
is a monomer while the biological entity is a trimer, therefore needed to compute 2 added symmetrical units.
The following guide is a useful resource to understand these concepts: Introduction to Biological Assemblies and the PDB Archive
Note: It is important to note that there may be other symmetry matrices within the file which are related to the packing of the molecules within the crystal.
PyMOL and Symmetry
PyMOL is based on the Python language and scripts exist to help compute symmetry-related molecules.
While the PDB offers the possibility to download the biological assembly, there are still times when computing your own may be necessary. For example, receiving pre-published data, or working with altered coordinates.
I found entries for this purpose within the PyMOL Wiki. I was not able to run BiologicalUnit but I was able to run Quat after “fixing” Python 2 nomenclature to be able to run within a modern PyMOL which is not under Python 3. (Fixed Python 2 print
statements into Python 3 print()
function by adding the parenthesis. Changed basestring
to str
. The modified script file can be downloaded as quat.py
.)
Using and testing quat.py
I tested the slightly modified script with a small, simpler file first: 1DUD
(Crystal structure of the Escherichia coli dUTPase in complex with a substrate analogue (dUDP)) and then with a more challenging file: 4RHV
(human rhinovirus B14 structure.)
The procedure is rather simple, but the novice user may wonder how to activate the quat.py
script within PyMOL, which is done by command line with run
within the command-line in PyMOL.
Here are the steps:
1. Download quat.py
and keep the file in e.g. the Downloads directory
2. load the coordinates. For the unit cell simply call the PDB ID with fetch
or use the load
command if the file is already within the Downloads directory.
For example: fetch 1dud
or load 1dud.pdb
if the file is already present.
IMPORTANT NOTE: the file need to be in PDB format and NOT CIF for the script to work.
Therefore the command may need to be modified as fetch 1dud, type=pdb
as the default is now CIF. If the file is not present a warning will print with: “please provide filename”
3. Add the script functionality to PyMOL with: run quat.py
4. Use the script: quat 1dud
Note: the name used is that which appears on the right in PyMOL after loading the coodinates.
The image shows the addition of 2 symmetry-related units to obtain the biological assembly.
For the rhinovirus test the computation was a bit longer as there are 60 BIOMT matrices.
The 4rHV protomer (left) containing viral proteins vp 1, 2, 3, 4 was first saved as a complete capsid (center) and the pentamer was extracted by simple text editing to save only the pentamer shown.
PyMOL can now save the coordinates with the menu cascade: File > Export Molecule…
The default option of this menu is to “Write segment identifier (segi
) colum which is very useful. This number is easiest to recognize if the file is saved in the older PDB format: it will be the column before the last one.
For the rhinovirus example, the first 5 segi
represent the first pentamer of the structure. The first and last lines of coordinates were as follows:
ATOM 1 N THR 1 17 32.241 35.354 96.610 1.00 15.33 1 N
...
ATOM 32701 OXT ASN 4 68 66.658 66.484 74.345 1.00 37.38 5 O1-
It is easy to recognize 1
and 5
as the segi
numbers. This number thus makes it easier to select segments by text editing as selecting graphically is more challenging is such as large structure.
The pentamer was colored by command line selection for each vp, e.g. select vp1, chain 1
and colored by menu: C > blues > slate
Alternate methods
Other commands are useful for this purpose:
– all_states
– split_states
– assembly
– Chain characters: assembly ID
Note: The assembly option does not require running a Python script, but the results of quat
provide a better separation of the coordinates that allow the selection of single chains thanks to the choice of segi
numbers.