Bioinformatics for Proteomics

W
Document Sample
scope of work template
							Bioinformatics for Proteomics – 2D Gels

By Andrew Garrow, University of Leeds

Myself
 PhD – Bioinformatics for a proteomics approach to understanding the Schistosome tegument.
   

Construction of a proteomics database 2D Gels Mass spectrometry Data analysis

Lecture Layout
 Introduction to 2D gel electrophoresis  2D gel databases  2D gel analysis

Proteomics
 Analysis by direct measurement of proteins in terms of their presence and relative abundance.

Why study Proteomics?
To better understand protein expression and formation:
 





Transcriptional control Post-transcriptional control e.g. alternate splicing, RNA editing Translational and degradation control, translational frameshifting >200 known PTM e.g. phosphorylation, glycolysation, lipid attachment, peptide cleavage

No proteins > No genes

Biological question Biological sample Sample prep. 2D Gel separation Imaging Protein excision Protein digestion MS/Protein ID Global bioinformatics Proteomic discovery

2D Gels
 Used for the separation of proteins within a sample.  Dependant upon protein molecular weight and pI.  Can be used for to resolve >1,800 spots (Choe and Lee, 2000).

2D Gels

The Problem
 2D Gels can routinely be used to separate >1000 spots, yet cells express 1000’s-10000’s of proteins.  Approaches to improve protein coverage:




Separation on the basis on differential compartmentalisation/solubilisation Narrow range IPG strips for focusing on particular pI ranges.

Zooming
 Use narrow range IPG strips to focus on particular pI ranges.

2D Gel Procedure
 Three day process

 

Day 1 – Rehydration phase Day 2 – Isoelectric focusing (IEF) Day 3 – Second dimension

Staining
Sensitivity •Bio-safe Coomassie 10ng Process Time/Steps 2.5hr/3 steps Advantages MS compatible; easily visualised; non hazardous Oldest and least expensive method

•Coomassie Blue R-250

40ng

2.5hr/2 steps

•Silver Stain Plus •Bio-Rad silver stain

1ng

1.5hr/3 steps

MS compatible; high sensitivity; low background High sensitivity; detects some highly glycosylated and other difficult to stain proteins MS compatible, allows analysis in flourescent imagers, linear over 3 orders of magnitude

1ng

2hr/7 steps

•Sypro ruby

1ng

3hr/2 steps

Gel Analysis
 Gel images digitally captured using a chargedcouple device (CCD) camera or scanner  Analysed by specialised software
Phoretix 2D Advanced (www.phoretix.com) PDQuest (www.proteomeworks.bio-rad.com) 2D Elite (http://www.imsupport.com/) Melanie (www.expasy.ch/melanie)

Gel Analysis
Software features

 




Spot detection Spot quantification Noise reduction Gel comparison by warping Linkage with robots

Robot Spot Cutter

Post Gel analysis
 Robotic spot picking  Protein/spot digestion – e.g. with trypsin  Mass spectrometry (MS)  MS data analysis  Data repository

Data Repository
 Database – a collection of data records either in a single file or in multiple files.  Database management system (DBMS) – a software suite including a database, the utilities required to organize it, search, update, maintain data security and control access.  Databases – flat file, relational, object orientated.

2D Gel Databases
www.expasy.ch - Swiss-2DPAGE http://www.anl.gov/BIO/PMG/ - Mouse liver, human breast cell lines, pyrococcus. Argonne Protein Mapping Group. http://www.harefield.nthames.nhs.uk/nhli/protein/index.html - HSC2DPAGE, Heart Science Centre, Harefield Hospital http://oto.wustl.edu/thc/peri-gels.htm - Washington Univ. Inner Ear Protein Database http://ca.expasy.org/ch2d/2d-index.html - World 2DPAGE, Index of 2D gel databases

Federated 2D PAGE database
 Described by Appel et al (1996)  Aimed to tackle (then) emerging problems with 2D Gel databases:
   

non-uniformity of data-encoding conventions robustness consistency commitment of groups to maintain the databases and data quality

Federated 2D PAGE database
 Rules:
 



 

Rule 1 – Individual entries in the database must be accessible by a keyword search. Other methods are possible but not required. Rule 2 – The database must be linked to other databases by active hypertext cross-references, linking together all related databases. Database entries must be at least linked to the main index. Rule 3 – A main index has to be supplied that provides a means of querying all databases through one unique query point. Currently, the main index is the SWISS-PROT database. Rule 4 – Individual protein entries must be available through clickable images. Rule 5 – 2DE analysis software designed for use with federated databases, must be able to access individual entries in any federated 2DE databases.

http://ca.expasy.org/ch2d/fed-rules.html

Swiss 2DPAGE
 Established in 1993  Maintained by the Central Clinical Chemistry Laboratory of the Geneva University Hospital and the Swiss Institute of Bioinformatics.  Entries highly annotated 

containing textual data on proteins including:
 





mapping procedure physiological and pathological information, experimental data (isoelectric point, molecular weight, amino acid composition, peptide masses) bibliographical references.

Swiss 2DPAGE
 Entries are linked to images showing the experimentally determined and theoretical protein locations.  Cross-references are provided to other federated 2D-PAGE database entries, Medline and SWISS-PROT  Search via - clickable images - keywords

Make2DDB
 Software package provided by ExPASY  Allows for production of a 2DPAGE database on users server.  Database created which is queryable via description, accession or spot clicking.  Provides links to Swiss-Prot.

Make2DDB databases
http://semele.anu.edu.au/2d/2d.html ANU 2D-PAGE, Australian National University 2D-PAGE database http://babbage.csc.ucm.es/2d/2d.html COMPLUYEAST 2DPAGE, Saccharomyces cerevisae 2D-PAGE database at Universidad complutense Madrid, Spain http://www.gram.au.dk/ PHCI-2DPAGE, Parasite host cell interaction 2D-PAGE interaction database. http://www.bio-mol.unisi.it/2d/2d.html Sienna 2D PAGE
A sample of 2D-PAGE databases created with make2ddb.

2D Gel Databases
 Limitations of current databases:






Do not contain strict/detailed descriptions of protocol (buffers, sample volume, staining techniques all important information for gel comparisons). Designed as 2D (and not proteomics) databases and therefore not readily expandable to incorporate other proteomics data e.g. MS, MDLC. Designed for reference gels, not on-going projects.

Proteomics Database Schema
 What should it encompass?
 

 

Proteomics methods (e.g. protein sample prep, electrophesis buffers, staining techniques, digestion for MS etc). Results from each stage of the experiment (e.g. gel images, MS data). Parameters used for MS data analysis/statistical results All stored in strict format.



Note: MIAME and MAGE-ML

Database querying
 Interact via web interface using Perl/CGI  Clickable gel images  Text querying – for keywords, gel/spot name, author, sequence etc.  XML used for data exchange

Proteomics Database Schema

Introduction to databases
 Flat file –simplest database type, an ordered collection of data entries, analogous to how files would be stored in a filing cabinet.  Relational –more sophisticated, storing data in interrelated tables. Allow for flexible querying using Structured Query Language (SQL).  Object Orientated – database consistent with object orientated principles, allowing for storage of complex datatypes (i.e. multimedia) and querying beyond that defined by a rigidly defined query language.

DBMS choice
 A flat file database would contain many redundancies in storing complex data types.  An object-oriented database could intrinsically store complex data types e.g. large images, however, a relational database could contain links to images stored elsewhere.  SQL would provide a fast and easy way of querying and updating the database.  A relational database would provide a platform, easily expandable to accommodate additional forms of data.

Future
 Standard database schema for proteomics and mark-up language for data exchange.  Improved spot detection, quantification and gel warping algorithms.  Improved sample preparation techniques.  More automation (linkage of robots!).  Protein array technologies.

References
 Appel RD, et al 1993 - SWISS-2DPAGE: a database of two-dimensional gel electrophoresis images. Electrophoresis, 14, 1232-1238. Appel RD, Bairoch A, Sanchez JC, Vargas JR, Golaz O, Pasquali C and Hochstrasser DF, 1996 – Federated two-dimensional electrophoresis database: a simple means of publishing two-dimensional electrophoresis data, Electrophoresis, 17, 540-546. Bjellqvist B, Ek K, Righetti PG, Gianazza E, Gorg A, Westermeier R, Postel W., 1982 – Isoelectric focusing in immobilised pH gradients: principle, methodology and applications, J.Biochem.Biophys.Methods, 6, 317-339. Brazma A, et al. 2001 – Minimum information about a microarray experiment (MIAME)towards standards for microarray data, Nat. genetics, 29, 365-71. Hoogland C, Baujard, Sanchez JC Hochstrasser DF and Appel RD, 1997 – Make2ddb: a simple package to set up a two-diensional electrophoresis database for the world wide web, Electrophoresis, 18, 2755-2758. O'Farrell, 1975 - High resolution two-dimensional electrophoresis of proteins., J.Biol.Chem., 25, 250, 4007-21. 










						
Related docs
Other docs by pptfiles