Building a Protein Backbone from alpha-Carbon
Files containing only C-alpha
ATOM records can be the result of modeling but can also be a step in structure resolution. Theoretical models are now deposited in the ModelArchive(*).
How can we obtain a complete structure for further investigation, e.g. to use in a molecular graphics software?
In 2019 we installed a now defunct software called Sybyl-X (v 2.1.1) on the teaching Macintoshes for a hands-on workshop.
Many years ago Sybyl was very expensive and I had used an older version that worked on Silicon Graphics workstations with a specific goal that Sybyl could handle: add complete amino acid side-chains using a C-alpha only 3D coordinate (PDB) file. However, the first step was to construct a complete backbone from C-alpha coordinates before any side-chain could be added.
Today I was wondering if the Mac version that is installed could handle that. In theory it could from the “Biopolymer module” that is installed, but after multiple failed attempts it seems that the installation would need a “PRODAT” binary database that is not installed.
One reason I was testing this is that the computers will likely be erased in the near future, and installing this particular version of Sybyl had been a difficult task back them.
Since I was not getting there with Sybyl-X I went to the search engine and found some newer, and free options. They are not all simple, might require using a web site, limited to non-commercial use, or require compiling from C++ code.
My first find was PULCHRA. I was happily surprised that a compiled version was part of the downloadable code. This is a command-line software and is not encumbered with graphical interface. To try it I created a C-alpha only PDB formatted file from the 2nd monomer of
2biw that I had used in my PyMOL tutorials. The PyMOL command-line is:
fetch 2biw, type=pdb2
The file gets saved (as PDB format) within the home directory as
2biw.pdb2. To create a
CA-only file a Terminal shell command is the simplest method with the pattern matcher
grep and the power of the Unix standards:
grep "^ATOM " < 2biw.pdb2 | grep " CA " > 2biw_CA.pdb
Using the PULCHRA software is easy. If the
pulchraexecutable and the file are in the same directory it is simply:
The resulting file
2biw_CA.rebuilt.pdb and the original can be compared within PyMOL: original in gray and reconstructed in gold yellow.
A PyMOL alignment command provides a deviation estimate:
RMSD = 0.021 (460 to 460 atoms)
For most purposes the quality of the resulting file would be quite sufficient.
Comparison to others
A later paper (Moore et al. 2013) proposing a new method (PD2) provides and extensive comparison table for multiple proteins for different software. A short extract is below:
|SD (Å) from this work||RMSD (Å) from previously published methods|
|Structure||PD2−min||PD2+min||BBQ||BBQ||MaxSprout||Milik et al.||PULCHRA||SABBAC||REMO|
(Truncated table from Moore et al. 2013)
For PD-2 the code is available as C++ (needs to be compiled) or used via the PD2 ca2main web server.
PULCHRA: Piotr Rotkiewicz and Jeffrey Skolnick. Fast procedure for reconstruction of full-atom protein models from reduced representations J Comput Chem. 2008 Jul 15; 29(9): 1460–1465. doi: 10.1002/jcc.20906
PD-2: Benjamin L. Moore, Lawrence A. Kelley, James Barber, James W. Murray, James T. MacDonald. High–quality protein backbone reconstruction from alpha carbons using Gaussian mixture models. Journal of Computational Chemistry 2013, 34, 1881–1889 doi: 10.1002/jcc.23330
(*) From ModelArchive web site:
Since 2006, only structures that have been determined experimentally are allowed to be deposited in the PDB, and theoretical models of macromolecular structures are no longer part of the PDB archive (Berman et al, 2006). ModelArchive is being developed following a community recommendation during a workshop on applications of protein models in biomedical research (Schwede et al, 2009).
ModelArchive provides a unique stable accession code (DOI) for each deposited model, which can be directly referenced in the corresponding manuscripts. Besides of the actual model coordinates, archiving of models should include sufficient details about assumptions, parameters and constraints applied in the simulation to allow the user of a model to assess and if necessary reproduce the simulation.
The development of ModelArchive is supported by the SIB – Swiss Institute of Bioinformatics.n