Protein Backbone from alpha-Carbon

Building a Protein Backbone from alpha-Carbon

Files containing only C-alpha ATOM records can be the result of modeling but can also be a step in structure resolution. Theoretical models are now deposited in the ModelArchive(*).

How can we obtain a complete structure for further investigation, e.g. to use in a molecular graphics software?

Carbon-Alpha tracing over text

Using Sybyl?

In 2019 we installed a now defunct software called Sybyl-X (v 2.1.1) on the teaching Macintoshes for a hands-on workshop.

Many years ago Sybyl was very expensive and I had used an older version that worked on Silicon Graphics workstations with a specific goal that Sybyl could handle: add complete amino acid side-chains using a C-alpha only 3D coordinate (PDB) file. However, the first step was to construct a complete backbone from C-alpha coordinates before any side-chain could be added.

Today I was wondering if the Mac version that is installed could handle that. In theory it could from the “Biopolymer module” that is installed, but after multiple failed attempts it seems that the installation would need a “PRODAT” binary database that is not installed.

One reason I was testing this is that the computers will likely be erased in the near future, and installing this particular version of Sybyl had been a difficult task back them.

Newer options

Since I was not getting there with Sybyl-X I went to the search engine and found some newer, and free options. They are not all simple, might  require using a web site, limited to non-commercial use, or require compiling from C++ code.


My first find was PULCHRA. I was happily surprised that a compiled version was part of the downloadable code. This is a command-line software and is not encumbered with graphical interface. To try it I created a C-alpha only PDB formatted file from the 2nd monomer of 2biw that I had used in my PyMOL tutorials. The PyMOL command-line is:

fetch 2biw, type=pdb2

The file gets saved (as PDB format) within the home directory as 2biw.pdb2. To create a CA-only file a Terminal shell command is the simplest method with the pattern matcher grep and the power of the Unix standards:

grep "^ATOM " < 2biw.pdb2 | grep " CA " > 2biw_CA.pdb

Using the PULCHRA software is easy. If the pulchraexecutable and the file are in the same directory it is simply:

pulchra 2biw_CA.pdb

The resulting file 2biw_CA.rebuilt.pdb and the original can be compared within PyMOL: original in gray and reconstructed in gold yellow.



2biw compared to reconstructed


A PyMOL alignment command provides a deviation estimate: RMSD = 0.021 (460 to 460 atoms)

For most purposes the quality of the resulting file would be quite sufficient.

Comparison to others

A later paper (Moore et al. 2013) proposing a new method (PD2) provides and extensive comparison table for multiple proteins for different software. A short extract is below:

SD (Å) from this work RMSD (Å) from previously published methods
Structure PD2−min PD2+min BBQ BBQ MaxSprout Milik et al. PULCHRA SABBAC REMO
1CRN 0.316 0.306 0.470 0.456 0.408 0.358 0.317 0.509
1CTF 0.264 0.239 0.398 0.388 0.750 0.461 0.594 0.327 0.574
1TIM 0.572 0.569 0.621 0.643 0.668 0.595
1UBQ 0.225 0.198 0.214 0.259 0.320 0.324 0.376 0.267 0.490
2ALP 0.398 0.382 0.430 0.462 0.439 0.453 0.691 0.513 0.525
2CTS 0.378 0.366 0.432 0.422 0.484 0.369 0.493 0.417 0.612

(Truncated table from Moore et al. 2013)

For PD-2 the code is available as C++ (needs to be compiled) or used via the  PD2 ca2main web server.


PULCHRA: Piotr Rotkiewicz and Jeffrey Skolnick. Fast procedure for reconstruction of full-atom protein models from reduced representations J Comput Chem. 2008 Jul 15; 29(9): 1460–1465. doi: 10.1002/jcc.20906

PD-2: Benjamin L. Moore, Lawrence A. Kelley, James Barber, James W. Murray, James T. MacDonald. High–quality protein backbone reconstruction from alpha carbons using Gaussian mixture models. Journal of Computational Chemistry 2013, 34, 1881–1889 doi: 10.1002/jcc.23330

(*) From ModelArchive web site:

Since 2006, only structures that have been determined experimentally are allowed to be deposited in the PDB, and theoretical models of macromolecular structures are no longer part of the PDB archive (Berman et al, 2006). ModelArchive is being developed following a community recommendation during a workshop on applications of protein models in biomedical research (Schwede et al, 2009).

ModelArchive provides a unique stable accession code (DOI) for each deposited model, which can be directly referenced in the corresponding manuscripts. Besides of the actual model coordinates, archiving of models should include sufficient details about assumptions, parameters and constraints applied in the simulation to allow the user of a model to assess and if necessary reproduce the simulation.

The development of ModelArchive is supported by the SIB – Swiss Institute of Bioinformatics.n