Bioinformatics for Proteomics
Document Sample


Bioinformatics for Proteomics – 2D Gels By Andrew Garrow, University of Leeds Myself PhD – Bioinformatics for a proteomics approach to understanding the Schistosome tegument. Construction of a proteomics database 2D Gels Mass spectrometry Data analysis Lecture Layout Introduction to 2D gel electrophoresis 2D gel databases 2D gel analysis Proteomics Analysis by direct measurement of proteins in terms of their presence and relative abundance. Why study Proteomics? To better understand protein expression and formation: Transcriptional control Post-transcriptional control e.g. alternate splicing, RNA editing Translational and degradation control, translational frameshifting >200 known PTM e.g. phosphorylation, glycolysation, lipid attachment, peptide cleavage No proteins > No genes Biological question Biological sample Sample prep. 2D Gel separation Imaging Protein excision Protein digestion MS/Protein ID Global bioinformatics Proteomic discovery 2D Gels Used for the separation of proteins within a sample. Dependant upon protein molecular weight and pI. Can be used for to resolve >1,800 spots (Choe and Lee, 2000). 2D Gels The Problem 2D Gels can routinely be used to separate >1000 spots, yet cells express 1000’s-10000’s of proteins. Approaches to improve protein coverage: Separation on the basis on differential compartmentalisation/solubilisation Narrow range IPG strips for focusing on particular pI ranges. Zooming Use narrow range IPG strips to focus on particular pI ranges. 2D Gel Procedure Three day process Day 1 – Rehydration phase Day 2 – Isoelectric focusing (IEF) Day 3 – Second dimension Staining Sensitivity •Bio-safe Coomassie 10ng Process Time/Steps 2.5hr/3 steps Advantages MS compatible; easily visualised; non hazardous Oldest and least expensive method •Coomassie Blue R-250 40ng 2.5hr/2 steps •Silver Stain Plus •Bio-Rad silver stain 1ng 1.5hr/3 steps MS compatible; high sensitivity; low background High sensitivity; detects some highly glycosylated and other difficult to stain proteins MS compatible, allows analysis in flourescent imagers, linear over 3 orders of magnitude 1ng 2hr/7 steps •Sypro ruby 1ng 3hr/2 steps Gel Analysis Gel images digitally captured using a chargedcouple device (CCD) camera or scanner Analysed by specialised software Phoretix 2D Advanced (www.phoretix.com) PDQuest (www.proteomeworks.bio-rad.com) 2D Elite (http://www.imsupport.com/) Melanie (www.expasy.ch/melanie) Gel Analysis Software features Spot detection Spot quantification Noise reduction Gel comparison by warping Linkage with robots Robot Spot Cutter Post Gel analysis Robotic spot picking Protein/spot digestion – e.g. with trypsin Mass spectrometry (MS) MS data analysis Data repository Data Repository Database – a collection of data records either in a single file or in multiple files. Database management system (DBMS) – a software suite including a database, the utilities required to organize it, search, update, maintain data security and control access. Databases – flat file, relational, object orientated. 2D Gel Databases www.expasy.ch - Swiss-2DPAGE http://www.anl.gov/BIO/PMG/ - Mouse liver, human breast cell lines, pyrococcus. Argonne Protein Mapping Group. http://www.harefield.nthames.nhs.uk/nhli/protein/index.html - HSC2DPAGE, Heart Science Centre, Harefield Hospital http://oto.wustl.edu/thc/peri-gels.htm - Washington Univ. Inner Ear Protein Database http://ca.expasy.org/ch2d/2d-index.html - World 2DPAGE, Index of 2D gel databases Federated 2D PAGE database Described by Appel et al (1996) Aimed to tackle (then) emerging problems with 2D Gel databases: non-uniformity of data-encoding conventions robustness consistency commitment of groups to maintain the databases and data quality Federated 2D PAGE database Rules: Rule 1 – Individual entries in the database must be accessible by a keyword search. Other methods are possible but not required. Rule 2 – The database must be linked to other databases by active hypertext cross-references, linking together all related databases. Database entries must be at least linked to the main index. Rule 3 – A main index has to be supplied that provides a means of querying all databases through one unique query point. Currently, the main index is the SWISS-PROT database. Rule 4 – Individual protein entries must be available through clickable images. Rule 5 – 2DE analysis software designed for use with federated databases, must be able to access individual entries in any federated 2DE databases. http://ca.expasy.org/ch2d/fed-rules.html Swiss 2DPAGE Established in 1993 Maintained by the Central Clinical Chemistry Laboratory of the Geneva University Hospital and the Swiss Institute of Bioinformatics. Entries highly annotated containing textual data on proteins including: mapping procedure physiological and pathological information, experimental data (isoelectric point, molecular weight, amino acid composition, peptide masses) bibliographical references. Swiss 2DPAGE Entries are linked to images showing the experimentally determined and theoretical protein locations. Cross-references are provided to other federated 2D-PAGE database entries, Medline and SWISS-PROT Search via - clickable images - keywords Make2DDB Software package provided by ExPASY Allows for production of a 2DPAGE database on users server. Database created which is queryable via description, accession or spot clicking. Provides links to Swiss-Prot. Make2DDB databases http://semele.anu.edu.au/2d/2d.html ANU 2D-PAGE, Australian National University 2D-PAGE database http://babbage.csc.ucm.es/2d/2d.html COMPLUYEAST 2DPAGE, Saccharomyces cerevisae 2D-PAGE database at Universidad complutense Madrid, Spain http://www.gram.au.dk/ PHCI-2DPAGE, Parasite host cell interaction 2D-PAGE interaction database. http://www.bio-mol.unisi.it/2d/2d.html Sienna 2D PAGE A sample of 2D-PAGE databases created with make2ddb. 2D Gel Databases Limitations of current databases: Do not contain strict/detailed descriptions of protocol (buffers, sample volume, staining techniques all important information for gel comparisons). Designed as 2D (and not proteomics) databases and therefore not readily expandable to incorporate other proteomics data e.g. MS, MDLC. Designed for reference gels, not on-going projects. Proteomics Database Schema What should it encompass? Proteomics methods (e.g. protein sample prep, electrophesis buffers, staining techniques, digestion for MS etc). Results from each stage of the experiment (e.g. gel images, MS data). Parameters used for MS data analysis/statistical results All stored in strict format. Note: MIAME and MAGE-ML Database querying Interact via web interface using Perl/CGI Clickable gel images Text querying – for keywords, gel/spot name, author, sequence etc. XML used for data exchange Proteomics Database Schema Introduction to databases Flat file –simplest database type, an ordered collection of data entries, analogous to how files would be stored in a filing cabinet. Relational –more sophisticated, storing data in interrelated tables. Allow for flexible querying using Structured Query Language (SQL). Object Orientated – database consistent with object orientated principles, allowing for storage of complex datatypes (i.e. multimedia) and querying beyond that defined by a rigidly defined query language. DBMS choice A flat file database would contain many redundancies in storing complex data types. An object-oriented database could intrinsically store complex data types e.g. large images, however, a relational database could contain links to images stored elsewhere. SQL would provide a fast and easy way of querying and updating the database. A relational database would provide a platform, easily expandable to accommodate additional forms of data. Future Standard database schema for proteomics and mark-up language for data exchange. Improved spot detection, quantification and gel warping algorithms. Improved sample preparation techniques. More automation (linkage of robots!). Protein array technologies. References Appel RD, et al 1993 - SWISS-2DPAGE: a database of two-dimensional gel electrophoresis images. Electrophoresis, 14, 1232-1238. Appel RD, Bairoch A, Sanchez JC, Vargas JR, Golaz O, Pasquali C and Hochstrasser DF, 1996 – Federated two-dimensional electrophoresis database: a simple means of publishing two-dimensional electrophoresis data, Electrophoresis, 17, 540-546. Bjellqvist B, Ek K, Righetti PG, Gianazza E, Gorg A, Westermeier R, Postel W., 1982 – Isoelectric focusing in immobilised pH gradients: principle, methodology and applications, J.Biochem.Biophys.Methods, 6, 317-339. Brazma A, et al. 2001 – Minimum information about a microarray experiment (MIAME)towards standards for microarray data, Nat. genetics, 29, 365-71. Hoogland C, Baujard, Sanchez JC Hochstrasser DF and Appel RD, 1997 – Make2ddb: a simple package to set up a two-diensional electrophoresis database for the world wide web, Electrophoresis, 18, 2755-2758. O'Farrell, 1975 - High resolution two-dimensional electrophoresis of proteins., J.Biol.Chem., 25, 250, 4007-21.
Get documents about "