Docstoc

Geometry and Protein Evolution

Document Sample
Geometry and Protein Evolution Powered By Docstoc
					Protein Structure Space

Patrice Koehl Computer Science and Genome Center http://www.cs.ucdavis.edu/~koehl/

From Sequence to Function

Structure

Sequence
KKAVINGEQIRSISDLHQTLKK WELALPEYYGENLDALWDCLTG VEYPLVLEWRQFEQSKQLTENG AESVLQVFREAKAEGCDITI

Function

ligand

Protein Structure Space
1CTF 1TIM 1K3R

68 AA

247 AA

268 AA

1A1O

1NIK

1AON

384 AA

4504 AA

8337 AA

Outline
•Protein Structure Space
Dimension?

•Protein Shape Descriptors
Differential Geometry Tools

•Complexity of Protein Structures
Are Proteins 3D, or 1D objects?

•Classifying Proteins
The Shapes of Protein Structures

Outline
•Protein Structure Space
Dimension?

•Protein Shape Descriptors
Differential Geometry Tools

•Complexity of Protein Structures
Are Proteins 3D, or 1D objects?

•Classifying Proteins
The Shapes of Protein Structures

Classification of Protein Structure: CATH
Alpha
C

Mixed Alpha Beta

Beta

Barrel Sandwich A Tim Barrel T

Super Roll

Other Barrel

Protein Structure Space
Test set
2,930 proteins out of 23,000 proteins in PDB No sequence similarity (Fasta E-value < e-4)

Reference structural similarity defined from CATH
769 folds 104,000 pairs of similar structures out of 4,600,000 pairs

Performance measure: ROC curve
(Receiver Operating Characteristic)

Projecting Protein Structure Space
 0 ... d1N  D   ... 0 ...    d N 1 ... 0   
Distance Matrix

GX X
T
Metric Matrix

X

Points in Space

Projecting Protein Structure Space
0 1 1  D  1 0 ...   1 ... 0   
Class lk

k

0 1 0  D  1 0 ...   0 ... 0   

Fold
lk

k

Protein Structure Similarity
Root mean square distance: cRMS:

cRMS A, B  

1 N

 a  Rb  T
i 1 i i

N

2

N: number of equivalent atoms between A and B R, T: rigid transformation that minimizes cRMS.

Protein Structure Classes
Measure of Structure Similarity: cRMS after Optimal Superposition (Structal) Eigenvalues of the Metric Matrix:

A Picture of the Protein Structure Space

b Proteins

α and b Proteins

a Proteins

A Picture of the Protein Structure Space
1repC2 1bdo00 1a81G2

b Proteins

2bi6H0

α and b Proteins

a Proteins

1sfcK0

Outline
•Protein Structure Space
Dimension?

•Protein Shape Descriptors
Differential Geometry Tools

•Complexity of Protein Structures
Are Proteins 3D, or 1D objects?

•Classifying Proteins
The Shapes of Protein Structures

Protein Fold Space ROC Analysis
(Receiver Operating Characteristic)
100

Rate of true positives (%)

90 80 70 60 50 40 30

“Perfect” measure Area = 1.0

20
10 0 10 20 30

Random measure Area = 0.5
40 50 60 70 80 90 100

Rate of true negatives (%)

Protein Fold Space ROC Analysis
(Receiver Operating Characteristic)

True positives
pairs of proteins that belong to the same T class of CATH

True negatives
pairs of proteins that belong to the same C class, but not the same T class.

Protein Fold Space
CATH Fold20 : 0.98

Rate of true positives (%)

Fasta: 0.54 CATH Class : 0.51

Fold20: first 20 coordinates derived from the CATH fold matrix CATH class: first 3 coordinates derived from the CATH class matrix

Rate of true negatives (%)

Protein Fold Space

Rate of true positives (%)

Structal: 0.88 Fasta: 0.54 Fasta: 0.54

Rate of true negatives (%)

Protein Structure Features
d ( x, y) R( x, y, z)  ˆ 2 sinz 
Global radius of curvature:

y

x

R(x,y,z)

 ( x )  minR( x, y, z )
( y,z )

z

Thickness:

  min ( x )
x
(Gonzalez & Maddocks, PNAS, 1999, 96:4769)

Thickness of a protein structure

 = 2.60 Ǻ

Curvature Feature Vector

  1 U p    dCx dC y dCz  p   R ( x, y , z )  

1/ p

C5  U1 U 2 U 3 U 4 U 5 

Performance of the Curvature Feature Vector
Structal: 0.88

Rate of true positives (%)

C5: 0.65 Fasta: 0.54

Curvature vector performs better than fasta.
Needs more features to match Structal.

Rate of true negatives (%)

Protein Structure Features: Writhing
Sign of Crossing

+

-

(t1)

Writhing Number

1 Wr1  4
(t2)





2

 (t1 , t2 )dt1dt2
 (t1 )   (t2 )
3

 (t1, t2 ) 

det  ' (t1 ), (t1 )   (t2 ), ' (t1 ) 

Writhe Feature Vector for Each Protein

W10  Wr1 | Wr1 | Wr12



| Wr12 |



Fain and Røgen, PNAS, 100: 119 (2003)

Protein Structure Features: Writhing

Rate of true positives (%)

Structal: 0.88
W10: 0.77 C5: 0.65

Fasta: 0.54

W10 Writhe performs better than C5 Curvature

Rate of true negatives (%)

Outline
•Protein Structure Space
Dimension?

•Protein Shape Descriptors
Differential Geometry Tools

•Complexity of Protein Structures
Are Proteins 3D, or 1D objects?

•Classifying Proteins
The Shapes of Protein Structures

Clustering Protein Fragments to Extract a Small Set of Representatives (a Library)

data

clustered data

library

(Simulated annealing K means)

Generating an approximate structure
A B C D

Fragment library

Generating an approximate structure
A B C D

Fragment library

Generating an approximate structure
A B C D

Fragment library

Generating an approximate structure
A B C D

Fragment library

Generating an approximate structure
A B C D

Fragment library

Structural Sequence: AC

Fitting Protein Structures

50 fragments of length 7 2.78 Ǻ cRMS

100 fragments of length 5 0.91 Ǻ cRMS

Longer fragments give better fit at same complexity

CN
Average cRMS distance

1 L 3

N: number of fragments L: size of each fragment

Fragment Size: 7 residues 6 residues 5 residues 4 residues Complexity(states/residue)
(Kolodny, Koehl, Guibas, Levitt, J. Mol. Biol.,323, 297 2002)

Choosing the “right” library
Size L 7 6 5 4 N such that Complexity=20 160000 8000 400 20

A Structural Alphabet for Protein Backbone
Fragment size: 4 Number of fragment: 20

# of structures

0.2

0.6

1.0

Protein size

0.2

0.6

1.0

cRMS model-experimental structure

cRMS model-experimental structure

Structural Alphabet: Application to Structure Comparison

cRMS = 1Å

Collaborators
• Marc Delarue (Biophysics) Institut Pasteur, Paris • Herbert Edelsbrunner (Math/Computer Science) Duke University • Peter Roegen (Math) DTU, Denmark • Michael Levitt (Computational Biology) Stanford University • Rachel Kolodny (Computer Science) Columbia University

• Joel Hass (Math) UC Davis

Thank You


				
DOCUMENT INFO