Automatic structure prediction of HIV coreceptor CCR5
Kuan-ming Lin Jieru Zheng Ji Zhang
Department of Department of Department of
Computer Science Chemistry Biomedical Engineering
Duke University, Durham, North Carolina, USA
Introduction Data postprocessing
Currently, there is no cure or The predicted secondary structure and alignment profile data from different algorithms
vaccine for HIV or AIDS, and are in various formats such as FASTA, BLAST, ALI, and A2M. To derive comparable
most HIV patients adopt the results of the structure from all algorithms, we wrote PERL scripts to translate each
anti-retroviral treatments which output data to our desired template format. In addition to the secondary structure, the 3D
change the reverse transcription configuration is with greater interests since it is actually responsible for forming the
process inside the cells. Since binding gloves of this receptor. As almost none of the algorithms gave the 3D structure
the discovery of the chemokine of the protein, the second task of postprocessing is to produce 3D structures from
receptor function (CCR5 and templates. MODELLER 8v1 is used to perform the generation. It outputs the standard
CXCR4), many scientists prefer PDB file, which can be parsed by most visualization software.
the CCR5 antagonists than other Fig. 1. Interaction between HIV envelop protein
medications because they stop gp120 and CCR5 receptor on the surface of the
target T cell. (Picture from
the infection process even be- http://www.aidsreagent.org/techlib/default.cfm?Action=HIVGraphics)
fore HIV enters a cell (Fig. 1).
However, due to technical difficulties in visualizing membrane proteins using either X- Results
ray crystallization & diffraction or Nuclear Magnetic Resonance spectroscopy, no
structural or kinetic data are available for such membrane protein system. In this study, Table 1 highlights five of our predicted
we generate structure predictions for CCR5 via a number of existing automatic structure secondary structures for CCR5, compared
prediction tools, and compare them with the published models on CCR5 produced by with five available theoretical models
molecular dynamic simulation. published in (Liu 2003) and later. The main
structure of the CCR5 protein consists of
seven α-helices. Except GenTHREADER,
most algorithms give all α-helices with
their positions aligned well along the
sequence. These methods further output the
Prediction methods best alignment with the Bovine Rhodopsin
protein, which also contains seven α- Fig. 3. Alignment between theoretic
helices. Predictions for other structures are model (blue) and predictions of
Current available structure prediction algorithms were grouped in (Ginalski 2005) into however not consistent. For example, the PROSPECT-local (red). Both uses
those based on physical principles which underlines the protein folding mechanism, and small β-strands are located differently, and the same Bovine Rhodopsin as the
those relying on statistics or evolutionary information. The former is currently far from some methods (PROSPECT, FUGUE) template, but of different resolutions.
capable to generate large-scale protein structure. Thus, the statistic-based algorithms are predict no β-strand at all. The theoretical Left: 1LN6 (NMR, RMSD = 14.57Å);
studied here, which are further categorized into three groups: model does not show any β-strands, right: 1F88 (X-ray, RMSD = 12.30Å).
suggesting that the β-strand predictions are
• Sequence-only methods – these algorithms do not consider any structure information probably unreliable.
when comparing two sequences.
The 3D structure alignments of our predictions with the theoretical model are depicted
• Threading methods – taking the name from the conceptual threading of the protein
in Fig. 3 and also quantified in Table 1 via RMSD (root-mean-square standard devia-
sequence through the structure of the template, these methods generate profiles by cal-
tions) between our models and the theoretical model. With different template proteins,
culating probabilities of the 20 amino acids at a position based on surrounding structure.
the RMSD are usually greater than 10 Ǻ, which is not within typical good alignment
• Hybrid methods – they combine the former two techniques to utilize both sequence score (<5 Ǻ). After we aligned the predicted models based on the same template protein,
and structure information. the RMSD are reduced to around 6 Ǻ, which suggests the 3D constructor (MODEL
LER) relies largely on the given template structure.
The methods we employed are all available online and public to academic users.
Specifically, nine algorithms are applied, whose relations are shown in Fig. 2.
Sequence-only methods • PSPIRED Threading methods
• PDB-BLAST • FUGUE • PROSPECT
• FFAS03 • SPARKS2
• SAM-T99 • Phyer
• SAM-T02 13.90
Fig. 2. Categorization of the nine algorithms studied. Table 1. Predicted secondary structure alignments and RMSD values for CCR5,
α-helices in red, β-strands in green, coils in black, and S-S bonds in purple.
PDB-BLAST consists of two PSI-BLAST runs. The first run searches non-redundant
database to derive a profile used by the second searching in PDB amino acid database.
FFAS03 finds matching profiles by PSI-BLAST and then compares one another with
the dot-product metric, using a standard Smith-Waterman DP algorithm.
SAM-T99 uses BLAST to predict the protein secondary structure, and then builds the
HMM models for searching PDB for similar proteins. SAM-T02 improves SAM-T99 Conclusion References
by generating another HMM scoring function from the secondary structure predictions. The protein prediction algorithms provided Ginalski, K., Grishin, N. V.,
PSIPRED and its component GenTHREADER incorporate predictions of four an economic and efficient tool to predict the Godzik, A., Rychlewski, L. (2005).
separated pairs of feed-forward neural networks, whose inputs are from PSI-BLAST. unknown structure of CCR5. All algorithms Practical lessons from protein
FUGUE utilizes environment-specific substitution tables and local gap penalties to studied in this work produce reasonable 3D structure prediction. Nucleic Acids
produce a list of potential homologues and alignments. structures, although they cannot predict Res. 33(6):1874-91.
SPARKS2 scores alignments with sequence profile, secondary structure, and certain elements such as SS bonds and the Liu, S., Fan, S., and Sun, Z. (2003).
knowledge-based energy tables from torsion-angle information and surface contacts. upper binding loops. Therefore, statistic- Structural and functional charac-
Phyer is threaded using secondary structures, sequence profiles from close and remote based predictions should be used to provide terization of the human CCR5 re-
homologues, and propensities and varying levels of solvent accessibility of the residues. a first approximation of the real structure ceptor in complex with HIV gp120
PROSPECT uses threading templates containing both protein chains and compact and to serve as the starting point for more envelope glycoprotein and CD4 re-
domains. Such threading algorithms are able to find homologically far sequences. sophisticated methods such as molecular ceptor by molecular modeling stu-
dynamics simulation. dies. J. Mol. Modeling 9(5):329-36.
Further information contact email@example.com.