Embed
Email

aaron

Document Sample

Shared by: huanglianjiang1
Categories
Tags
Stats
views:
1
posted:
12/22/2011
language:
pages:
31
Associating Biomedical Terms:

Case Study for Acetylation









Aaron Buechlein

Indiana University School of Informatics

Advisor: Dr. Predrag Radivojac

Overview

• Background



• Previous Work



• Methods



• Results

Central Dogma



Background



Previous Work



Methods



Results









http://www.accessexcellence.org/RC/VL/GG/images/central.gif

Post-Translational Modifications

(PTMs)

Background



Previous Work



Methods



Results

Acetylation



Background • Acetylation involves the substitution of an acetyl group

(-COCH3) for hydrogen

Previous Work



Methods • Typically occurs on N-terminal tails and lysine residues

Results

(Lys or K)

Previous Predictors



Background • Several PTM predictors have been created prior to this

work

Previous Work



Methods • There are also acetylation predictors prior

Results

• NetAcet is a predictor for only N-terminal sites

• AutoMotif Server is a predictor for various PTMs and

includes an acetylation portion

• PAIL is a lysine acetylation predictor

Methods



Background • Create Dataset

Previous Work

• Download articles relevant to acetylation and extract

Methods sites

• Rank articles in order to elucidate sites quickly

Results

• SwissProt and Human Protein Reference Database

(HPRD)



• Create Predictors



• Leave – one – protein – out validation

• Matlab

Article Retrieval



Background • Searched individual journal sites for articles relevant to

acetylation

Previous Work



Methods • Saved resultant html pages for each journal

Results

• These pages were then used as the input for a web

crawler to download articles



• Due to varying journal site construction each journal

required a unique regular expression to extract links

for articles

Rank Articles



Background • First locate occurrences of first phrase: “phrase 1”

Previous Work

• A = {a1, a2, …, a|A |}

Methods

• Next locate occurrences of second phrase: “phrase 2”

Results



• R = {r1, r2…, r|R|}







• c and d are constants

• x is the distance in characters between r and the nearest

word a

An example: acetylation



Background



Previous Work



Methods



Results 1. word “acetylat”

A = {a1, a2, …, am}



2. regular expression

(k  lys  lysine)(space)*(digit)+

R = {r1, r2, …, rn}

An example: acetylation



Background



Previous Work



Methods

Score for article S:

Results

S  i 1 score (ri , A)

n





where





and

An example: acetylation

10

Background

9





Previous Work Score for article S: 8

f ( x)  10  e 0.005x

7





S  i 1 score (ri , A)

Methods n

6









f(x)

5

Results

where: 4





3

score(ri , A)  f (| position(ri )  position(ak ) |)

2





and 1





0

k  arg min j 1...m | position(ri )  position(a j ) | 0 100 200 300 400 500 600

Distance in characters

700 800 900 100









Papers with S > 100 are rich in sites; if S < 30 “twilight” zone

Elucidate Sites



Background • Sites were manually extracted from articles beginning

with the highest rank

Previous Work



Methods • The original experimental paper for these sites was

verified for traceable evidence

Results



• Sites were extracted from SwissProt



• Sites were extracted from HPRD

Predictors



Background • Support Vector Machine

Previous Work



Methods

• Artificial Neural Network

Results





• Decision Tree

Predictor Input



Background • Positives taken as all lysines found to be acetylated

Previous Work

• Negatives taken as all lysines not found to be

Methods acetylated

Results

• Features created based on characteristics surrounding

lysines



• Amino acid content, hydrophobicity, charge, disorder,

etc.

Predictor Input



Background Protein Features Acetylated

1 8 1 0.48609 0.001767 0.48979 0.51508 1

Previous Work

1 7 1 0.92146 0.03019 0.96423 0.79416 1

Methods 1 0 0 0.50622 0.015251 0.52335 0.51855 0



Results 2 10 2 0.2008 0.038708 0.25441 0.36071 1

2 1 0 0.62016 0.009772 0.62846 0.67525 0

2 0 0 0.27783 0.028957 0.32162 0.34207 0

3 11 1 0.89239 0.018354 0.91884 0.88125 1

3 12 2 0.87354 0.022307 0.90349 0.87446 1

3 8 1 0.81549 0.025339 0.85289 0.85702 1

3 2 0 0.84588 0.024766 0.88219 0.86599 0

Article and Ranking Results



Background • 4888 articles from 10 sites were searched

• Nature provided 2147 articles

Previous Work

• Science Direct provided1519 articles

Methods



Results • The highest ranking article was obtained from the

Journal of Biological Chemistry

• Score of 151.87

• Contained 10 acetylation sites



• The highest ranking article was obtained from Nature

when histones are excluded

• Previously ranked at #5

• score of 116.36

• Contained 9 unique acetylation sites

Top 25

Rank Score Sites Article Source

1) 151.8667 10 Journal of Biological Chemistry

Background 2) 123.2314 12 Cell / Science Direct

3) 121.9031 6 Nature

Previous Work 4) 117.7988 9 Journal of Proteome Research

5) 116.3582 9 Nature

6) 111.1745 14 Biochemistry

Methods 7) 104.4652 6 Cell / Science Direct

8) 104.0166 7 Nature

9) 102.0683 13 Molecular Cell / Science Direct

Results 10) 98.80812 6 Journal of Biological Chemistry

11) 97.64634 6 Biochemistry

12) 96.76536 6 Journal of Biological Chemistry

13) 96.0845 9 International Journal of Mass Spectrometry / Science Direct

14) 88.12967 9 Biochemistry

15) 86.17157 6 Journal of Biological Chemistry

16) 81.78705 5 Nucleic Acids Research

17) 81.30967 6 Biochemistry

18) 81.06128 6 Molecular Cell / Science Direct

19) 80.74899 9 Journal of Biological Chemistry

20) 80.16261 9 Nature

21) 79.65658 6 Molecular Cell / Science Direct

22) 77.9022 4 Cell / Science Direct

23) 77.88304 5 Nucleic Acids Research

24) 77.60087 8 Gene / Science Direct

25) 77.44198 6 Journal of the American Society for Mass Spectrometry

Ranking Results



Background • Articles with scores greater than 30 had potential for

providing at least one site

Previous Work



Methods • As scores approached 30, articles became less fruitful

Results

Dataset Results



Background • Dataset included 1442 total sites and 1085 non-

redundant sites

Previous Work



Methods • HPRD contributed 90 total sites

Results

• Swiss-Prot contributed 825

• Our Study contributed 527

Dataset Results



Background



Previous Work



Methods



Results

Sensitivity, Specificity, and Precision



Background • Sensitivity(sn) -

Previous Work



Methods

• Specificity(sp) -

Results









• Precision(pr) -

Accuracy and AUC



Background • Accuracy(acc) -

Previous Work



Methods



Results

• Area Under Curve(AUC)

• Refers to the area under the Receiver Operating Curve

(ROC)

• ROC is the graphical plot of sensitivity vs. 1-specificity

SVM Predictor



Background

Polynomial kernel

Degree

Previous Work sn sp pr acc AUC



p=1 52.3 71.0 24.6 61.6 65.2

Methods

p=2 46.1 69.8 20.3 57.9 62.8



Results p=3 31.6 80.8 23.5 56.2 60.3





Gaussian kernel

Degree

sn sp pr acc AUC



σ = 10-2 43.8 75.8 24.9 59.8 64.3



σ = 10-3 54.1 72.1 25.9 63.1 68.1



σ = 10-6 52.8 70.7 24.6 61.8 65.3

Artificial Neural Network



Background

Hidden Artificial Neural Network

Neurons

Previous Work sn sp pr acc AUC



1 68.0 47.7 20.7 57.8 61.9

Methods

3 65.2 47.7 19.4 56.4 58.9

Results 5 65.0 47.2 19.1 56.1 57.5

Decision Tree



Background

Decision Tree

Algorithm

Previous Work sn sp pr acc AUC

Decision

61.7 45.9 18.3 53.8 42.1

Methods Tree





Results

Algorithm Comparison



Background

Algorithm sn sp pr acc AUC



Previous Work SVM 54.1 72.1 25.9 63.1 68.1

Neural

68.0 47.7 20.7 57.8 61.9

Methods Network

Decision

61.7 45.9 18.3 53.8 42.1

Tree

Results

I would like to acknowledge those who have helped

me throughout the duration of this project,

Dr. Predrag Radivojac, Dr. Haixu Tang, and Wyatt Clark

I welcome your questions and/or comments

An example: acetylation



Background



Previous Work



Methods



Results 1. word “acetylat”

A = {a1, a2, …, am}



2. regular expression

(k  lys  lysine)(space)*(digit)+

R = {r1, r2, …, rn}

An example: acetylation



Background



Previous Work



Methods

Score for article S:

Results

S  i 1 score (ri , A)

n





where





and



Related docs
Other docs by huanglianjiang...
Employment-Application-March-11
Views: 1  |  Downloads: 0
rvek10ad
Views: 0  |  Downloads: 0
FACILITY RENTAL APPLICATION
Views: 0  |  Downloads: 0
week9Done
Views: 0  |  Downloads: 0
Construction
Views: 0  |  Downloads: 0
Descargar
Views: 34  |  Downloads: 0
Triad_recall
Views: 1  |  Downloads: 0
11 Million de-domains
Views: 0  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!