Embed
Email

Important_Properties

Document Sample

Categories
Tags
Stats
views:
0
posted:
11/27/2011
language:
English
pages:
3
Improved prediction of protein-protein binding sites using a support vector machines approach



James R. Bradford and David R. Westhead*



School Of Biochemistry and Molecular Biology, University of Leeds, Leeds, LS2 9JT, UK.



Telephone: +44 113 2333116



Fax: +44 113 2333167



Email: westhead@bmb.leeds.ac.uk



*To whom correspondence should be addressed.



Running title: Binding site prediction using SVMs.







Important properties





Our original choice of the seven properties used for training the SVM was based on past

studies that have implicated them in distinguishing binding sites from the rest of the protein

surface. Here we present our own posterior analysis of how each property contributed to

training the SVM. Although the SVM did not explicitly provide such details, training the SVM

on each property separately gave us a measure of their relative importance. As well as carrying

out this procedure on the whole training set, we used two subsets of the training set: one

containing the 66 proteins involved in transient interactions, the other containing the 114

proteins involved in obligomeric interactions. This allowed us to compare important properties

at transient binding sites with those at obligomeric binding sites.





We evaluated training performance using the Matthews Correlation Coefficient (MCC;

Matthews 1975):







MCC 

TP  TN   FP  FN 

TP  FP TP  FN TN  FP TN  FN 



where TP = true positives (correctly classified interacting patches), TN = true negatives

(correctly classified non-interacting patches), FP = false positives (non-interacting patches

classified as interacting patches), FN = false negatives (interacting patches classified as non-

interacting patches). An MCC of +1 represents perfect training classification (no false

positives or negatives) whereas –1 represents a complete failure (all interacting patches

classified as non-interacting patches and vice versa). Because the non-interacting patch was

chosen at random, it was not possible to obtain identical results from any two training runs.

Therefore, we repeated training on each data set ten times, and calculated the mean and

standard deviations of our performance indicators





Generally, results obtained using the whole training set were as expected (Supplementary

Table 3a). Attributes based on interface residue propensity, hydrophobicity and ASA achieved

the highest MCC values. Shape index and conservation also performed well. Electrostatic

potential seemed to have some differentiating power even though it has been used in past

studies more successfully to predict DNA binding sites (Jones et al. 2003). In general, using

the SVM on all attributes gave better performance than any one attribute alone, indicating that

attributes give complementary information.





Supplementary Table 3: Importance of each attribute in training the SVM.





Results from training on obligomeric and transient interfaces separately go some way to

explaining the heterogeneous cross validation results. All the properties that are important at

an obligomeric interface (Supplementary Table 3b) seem to be important to a transient

interface (Supplementary Table 3c) as well. The major differences concern electrostatic

potential and curvedness. The higher MCC value achieved with curvedness on transient

interfaces probably reflects the number of enzyme-inhibitor interfaces in the subset. It is

common for a protrusion on the inhibitor surface to bind inside a cleft on the enzyme surface

and these protrusions and clefts will be highly curved. Electrostatic potential has almost no

distinguishing power on transient interfaces, achieving an MCC value of only 0.080.01 in

contrast to obligomeric interfaces where it achieves an MCC value of 0.270.05.





A higher MCC value (0.720.03) was achieved with training on obligomeric interfaces using

all attributes than with transient interfaces (0.630.04) suggesting that obligomeric interfaces

contain stronger signals that distinguish them from the rest of the protein surface than transient

interfaces.





References





Matthews,B.W. (1975) Comparison of the predicted and observed secondary structure of T4

phage lysozyme, Biochim Biophys Acta, 405, 442-451.

Jones,S., Shanahan,H.P., Berman,H.M. and Thornton,J.M. (2003) Using electrostatic potentials

to predict DNA-binding sites on DNA-binding proteins, Nucleic Acids Res, 31, 7189-7198.



Related docs
Other docs by Stariya Js @ B...
How we become literate
Views: 0  |  Downloads: 0
15189
Views: 0  |  Downloads: 0
Enrollment Agreement
Views: 0  |  Downloads: 0
seddc 061009 pm
Views: 0  |  Downloads: 0
Juvanec-KamenNaKamen-eng
Views: 0  |  Downloads: 0
Syllabus Macro Fall 10
Views: 0  |  Downloads: 0
23401
Views: 0  |  Downloads: 0
9-11-RPH-stonefabrication-ord-memo-agss
Views: 0  |  Downloads: 0
Junior_Pre_season_Soccer_League_application
Views: 0  |  Downloads: 0
guide_to_moodle_quizzes
Views: 0  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!