Embed
Email

Mi

Document Sample

Shared by: dandanhuanghuang
Categories
Tags
Stats
views:
4
posted:
12/9/2011
language:
pages:
17
Assessment of Genome-wide

Protein Function Classification for

Drosophila melanogaster









Huaiyu Mi

mihn@fc.celera.com

Panther Protein Informatics group

Celera Genomics

How to classify proteins in

a robust and accurate way?

Outline









1. Introduction to PANTHER



2. Comparison of functional classification of

Drosophila proteins by FlyBase and

PANTHER

What is PANTHER?





PANTHER library (PANTHER/LIB)

 a family tree

 a multisequence alignment

 an HMM





PANTHER index (PANTHER/X)

 Molecular function

 Biological process

Building the library









&







500,000 &

protein sequences

(filtered GenBank NR)

Biologist curation



MSA HMM tree 40,000 subfamilies



Family and subfamily was

2200+ protein labeled with a name and

family clusters classified by PANTHER/X

categories

PANTHER library (PANTHER/LIB)

PANTHER index (PANTHER/X)



signal transducer GO:0004871 GO

PANTHER/X => receptor GO:0004872

=> => transmembrane receptor GO:0004888

RECEPTOR transmembrane receptor protein kinase GO:0019199

=> G-protein coupled receptor => transmembrane receptor protein serine/threonine kinase GO:0004675

=> protein kinase receptor => => transforming growth factor alpha receptor GO:0005023

=> => serine/threonine protein kinase receptor => => transforming growth factor beta receptor GO:0005024

=> => tyrosine protein kinase receptor => => => activin receptor GO:0017002

=> => => => type I activin receptor GO:0016361

=> => => => type II activin receptor GO:0016362

=> => => type I transforming growth factor beta receptor GO:0005025

=> => => => type I activin receptor GO:0016361

=> => => type II transforming growth factor beta receptor GO:0005026

=> => => => type II activin receptor GO:0016362

=> transmembrane receptor protein tyrosine kinase GO:0004714

=> => boss receptor GO:0008288

=> => ephrin receptor GO:0005003

=> => => GPI-linked ephrin receptor GO:0005004

=> => => transmembrane-ephrin receptor GO:0005005

=> => epidermal growth factor receptor GO:0005006

=> => => gurken receptor GO:0008313

=> => fibroblast growth factor receptor GO:0005007

=> => hepatocyte growth factor receptor GO:0005008

=> => insulin receptor GO:0005009

=> => insulin-like growth factor receptor GO:0005010

=> => macrophage colony stimulating factor receptor GO:0005011

=> => macrophage receptor GO:0008019

=> => Neu/ErbB-2 receptor GO:0005012

=> => neurotrophin TRK receptor GO:0005013

=> => => neurotrophin TRKA receptor GO:0005014

=> => => neurotrophin TRKB receptor GO:0005015

=> => => neurotrophin TRKC receptor GO:0005016

=> => platelet-derived growth factor receptor GO:0005017

=> => => platelet-derived growth factor\, alpha-receptor GO:0005018

=> => => platelet-derived growth factor\, beta-receptor GO:0005019

=> => stem cell factor receptor GO:0005020

=> => vascular endothelial growth factor receptor GO:0005021

=> vascular endothelial growth factor receptor GO:0005021

PANTHER Scoring









yes Classified

Score above (Name

A fasta file threshold? Molecular function

Biological process)









Family and subfamily HMMs

How accurate is PANTHER?





FlyBase PANTHER



A manually curated An automated annotation process

database for Drosophila

genes









Assess the associations

Process for comparison

Fly protein

sequences









FlyBase annotation PANTHER annotation

With GO terms by Scoring against

PANTHER







Automated Comparison

of FlyBase and Match

Panther assignments







Not Match



Correct



Manual review

Inconclusive







Incorrect

Coverage of Drosophila proteins

classified by FlyBase and PANTHER.



FlyBase PANTHER Both





A B C

PANTHER

PANTHER HMM hits

classified to GO

FlyBase Not hit

classified to 4862

Molecular 6301 GO

6205

8031

FlyBase not

function classified to

GO

3265 FlyBase



PANTHER

PANTHER HMM hits not classified Classified

not classified to GO to GO overlap

3283







D E F

FlyBase classified

to GO PANTHER

PANTHER HMM hits

classified to GO

2794

3658

Not hit

Biological 6205



process 11538

4469 FlyBase



Classified

PANTHER HMM hits PANTHER

overlap 1159

FlyBase not classified to GO not classified to GO not classified

to GO

Assessment of molecular function

associations



PANTHER FlyBase



37

35







58

50







195

345









663





700









2737

2747









Auto match Manual match Correct Incorrect Inconclusive

Types of errors







•Homology error – an error cause by incorrect

functional prediction based on sequence

homology.



•Human error – an error on part of the human

curator.



•Evidence error – an error by using an evidence

that is incorrect.

Analysis of errors





PANTHER FlyBase

Number of

8 35

homology errors

Number of human

40 23

errors

Number of

2 0

evidence errors

Total number of

incorrect 50 58

associations

Association error

1.3% 1.6%

rate (%)

Example of homology error

PANTHER function inference in the context of a protein sequence tree





FBgn0032382 (CG14934)



FlyBase: alpha glucosidase

neutral amino acid transporter

PANTHER: alpha glucosidase









CG14934









Alpha

glucosidase

Alpha amylase





Neutral a.a.

transporter

Alpha amylase

Summary





•PANTHER is an automated method to classify proteins in a robust way.



•The accuracy of PANTHER was assessed by comparing its classification of

Drosophila proteins with FlyBase’s.



•A total of 3283 Drosophila proteins were associated to at least one molecular

function category by both FlyBase and PANTHER (3867 molecular function

associations by PANTHER, and 3700 by FlyBase).



•About 90% of these associations by FlyBase and PANTHER match with each

other.



•Total error rate is < 2% for both methods.

Acknowledgements



Celera Genomics FlyBase





Paul Thomas Michael Ashburner

Susanna Lewis

Jody Vandergriff

Michael Campbell

Apurva Narechania

William Majoros





Karen Diemer

Olivier Doremieux

Nan Guo

Anish Kejariwal

Steven Ladunga

Betty Lazareva

Anushya Muruganujan

Steve Rabkin



Related docs
Other docs by dandanhuanghua...
Human2
Views: 0  |  Downloads: 0
COH Application
Views: 0  |  Downloads: 0
1 INTRODUCTION
Views: 0  |  Downloads: 0
labour_supply
Views: 1  |  Downloads: 0
Chpt15HW
Views: 0  |  Downloads: 0
membership-fees-2008
Views: 0  |  Downloads: 0
Treatnet ASI Workshop 3 Slides 010107
Views: 0  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!