Docstoc

EMBL EBI Powerpoint Presentation

Document Sample
EMBL EBI Powerpoint Presentation Powered By Docstoc
					ChEBI: an EBI chemistry reference




                         EBI is an Outstation of the European Molecular Biology Laboratory.
    Overview

    • Introduction to ChEBI

    • Searching and browsing

    • Understanding the ontology

    • Downloads and programmatic
      access




2   08.01.2011   ChEBI – Chemical Entities of Biological Interest
Introduction to ChEBI
Block 1




                        EBI is an Outstation of the European Molecular Biology Laboratory.
Small Molecules within Bioinformatics
  Genomes
                                               Literature


 Expressions                          Nucleotide sequences


                                       Protein sequences
Protein domains, families

                                               Enzymes

3D structures



   Small molecules



                Pathways    Systems
Small Molecules within Bioinformatics
  Genomes
                                               Literature


 Expressions                          Nucleotide sequences


                                       Protein sequences
Protein domains, families

                                               Enzymes

3D structures



   Small molecules



                Pathways    Systems
Small molecules participate in all
the processes of life
                                                          γ-aminobutyric acid
    Signaling

GABA: chief inhibitory neurotransmitter in the mammalian central nervous system.
In humans, also regulates muscle tone.

•    synthesized by neurons

•    found mostly as a zwitterion, that is, with the carboxyl group deprotonated and
     the amino group protonated

•    conformational flexibility of GABA is important for its biological function, as it has
     been found to bind to different receptors with different conformations

•     GABA deficiency linked to
      •  anxiety disorder, depression, alcoholism
      •  multiple sclerosis, action tremors, tardive dyskinesia
                                                     Adenosine 5'-triphosphate
Metabolism
Adenosine 5’-triphosphate (ATP): the
"molecular unit of currency" of intracellular
energy transfer.

•   generated in the cell by energy-consuming processes, broken down by
    energy-releasing processes

•   proteins that bind ATP do so in a characteristic protein fold known as the
    Rossmann fold, which is a general nucleotide-binding structural domain that
    can also bind the cofactor NAD
Enzymes
• Enzyme inhibitors are molecules that bind to enzymes and
  decrease their activity.

• Many drugs are enzyme inhibitors.
  They are also used as herbicides
                                                    clavulanic acid
  and pesticides.
                                                    acts as a suicide
                                                    inhibitor of
                                                    bacterial β-lactamase
                                                    enzymes
• Enzyme activators bind to enzymes and increase their enzymatic
  activity.

• Enzyme activators are often involved in the allosteric regulation of
  enzymes in the control of metabolism.
Pathways




http://www.genome.jp/kegg-bin/highlight_pathway?scale=1.0&map=map00231&keyword=tryptophan
Systems biology




BioModels: quantitative models of biochemical and cellular systems

tryptophan




D-enantiomer: sweet                         L-enantiomer: bitter
Drug design
•   Ligand-based: relies on knowledge of other molecules that bind to the
    biological target of interest.

•   Structure-based: relies on knowledge of the 3D structure of the biological
    target.

•   A lead has
     •   evidence that modulation of the target will have therapeutic value: e.g. disease
         linkage studies showing associations between mutations in the biological target
         and certain disease states.
     •   evidence that the target is druggable, i.e. capable of binding to a small molecule
         and that its activity can be modulated by the small molecule.


•   Target is cloned and expressed, then libraries of potential drug compounds
    are screened using screening assays
  Drug types 2003 - 2009




'Small molecules' in various shades of blue (http://chembl.blogspot.com/)
Getting the chemistry right
•   Thalidomide a non-barbiturate hypnotic

•   Thalidomide displays immunosuppresive and anti-angiogenic activity. It
    inhibits release of tumor necrosis factor-alpha from monocytes, and
    modulates other cytokine action.


•   Thalidomide is racemic — it contains both left and right handed isomers in
    equal amounts: one enantiomer is effective against morning sickness, and
    the other is teratogenic.

•   Enantiomers are interconverted in vivo. That is, if a human is given D-
    thalidomide or L-thalidomide, both isomers can be found in the serum.
    Hence, administering only one enantiomer does not prevent the teratogenic
    effect in humans.
                                             http://www.drugbank.ca/drugs/DB01041
Small molecule data sources
                                     http://pubchem.ncbi.nlm.nih.gov/
 Deposition-driven publicly available compound repository,
 containing more than 25 million unique structures.



                                  http://www.chemspider.com/

 Automatic aggregation of publicly available chemistry data
 with crowdsourced annotation.



                                 http://www.ebi.ac.uk/chebi/


 Manually annotated database and ontology
Small molecule annotations

• Often appear as free text in biological databases, in
  which they are not the core data
• Are frequently referred to by common names which may
  be chemically ambiguous
   • eg. adrenaline
     = (S)-adrenaline ? (R)-adrenaline ?




• May be referred to by several different names
   • paracetamol, acetaminophen, 4-acetamidophenol,
     N-(4-hydroxyphenyl)acetamide, …
Chemicals - ChEBI
Nomenclature                                   Ontology
caffeine                                      metabolite
1,3,7-trimethylxanthine                       CNS stimulant
methyltheobromine                             trimethylxanthines



Chemical data                                     Database Xrefs
Formula: C8H10N4O2                                MSDchem: CFF
Charge: 0                                         KEGG DRUG: D00528
Mass: 194.19



Chemical Informatics                   Visualisation
InChI=1/C8H10N4O2/c1-10-4-9-6-
5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3
SMILES CN1C(=O)N(C)c2ncn(C)c2C1=O
     What is ChEBI?

     •      Chemical Entities of Biological Interest
     •      Freely available
     •      Focused on ‘small’ chemical entities (no proteins or
            nucleic acids)
     •      Illustrated dictionary of chemical nomenclature
     •      High quality, manually annotated
     •      Provides chemical ontology

     Access ChEBI at http://www.ebi.ac.uk/chebi/




18   08.01.2011      ChEBI – Chemical Entities of Biological Interest
     ChEBI home page




19   08.01.2011   ChEBI – Chemical Entities of Biological Interest
     How is ChEBI maintained?

     • Automatic loading of preliminary data

     • Automatic loading of 2 star annotated data

     • Manual annotation

     • User requests via Submission Tool

     • Public release: First Wednesday of every
       month.



20   08.01.2011   ChEBI – Chemical Entities of Biological Interest
     ChEBI entries contain

     • A unique, unambiguous, recommended ChEBI name and
       an associated stable unique identifier
     • An illustration where appropriate (compounds and
       groups, but generally not classes)
     • A definition where appropriate (mostly classes)
     • A collection of synonyms, including the IUPAC
       recommended name for the entity where appropriate
     • A collection of cross-references to other databases

     • Links to the ChEBI ontology



21   08.01.2011   ChEBI – Chemical Entities of Biological Interest
     ChEBI entry view




22   08.01.2011   ChEBI – Chemical Entities of Biological Interest
     Automatic Cross-references




23   08.01.2011   ChEBI – Chemical Entities of Biological Interest
     Chemical Structures

     • Chemical structure may be
       interactively explored
       using MarvinView applet

     • Available in formats
            •     Image
            •     Molfile
            •     InChI and InChIKey
            •     SMILES




24   08.01.2011          ChEBI – Chemical Entities of Biological Interest
     Molfile format




25   08.01.2011   ChEBI – Chemical Entities of Biological Interest
Time for Exercises




                     EBI is an Outstation of the European Molecular Biology Laboratory.
Searching and browsing ChEBI
Block 2




                        EBI is an Outstation of the European Molecular Biology Laboratory.
     Simple text search

     • Simple text search


                                                                          Wildcard: %
           Enter any
             text




28   08.01.2011        ChEBI – Chemical Entities of Biological Interest
     Advanced text search




                                                                     Narrow to
     AND, OR                                                         category
     and BUT
       NOT




29   08.01.2011   ChEBI – Chemical Entities of Biological Interest
     Structure search
       Structure                                                        Search options
     drawing tools




30   08.01.2011      ChEBI – Chemical Entities of Biological Interest
       Search Results
                                                                               Hover-over for
                                                                               search menu




     Click to go to entry
             page

31     08.01.2011           ChEBI – Chemical Entities of Biological Interest
     Fingerprints
     • Chemical substructure searching is computationally expensive…




32   08.01.2011   ChEBI – Chemical Entities of Biological Interest
     Fingerprints [2]
     • … so heuristics must be used to decrease the number of search
       candidates
                                                        cannot be a
                                                        substructure of
                                                        an entity which
                                                        does not have at
                                C8H9NO2
                                                        least 8 carbon
                                                        atoms, 9
                                                        hydrogen
                                                        atoms…

      • Fingerprints are a generalized, abstract encoding of structural
        features which can be used as an effective screening device




33   08.01.2011     ChEBI – Chemical Entities of Biological Interest
     Fingerprints [3]
     •    Encoding of structural patterns

                  water (HOH)
                  0-bond paths                               H              O      H
                  1-bond paths                               HO             OH
                  2-bond paths                               HOH
     • Hashed to create bit strings, which are added together to give final
       fingerprint
                                   Pattern                                 Hashed bitmap
                                   H                                       0000010000
                                   O                                       0010000000
                                   HO                                      1010000000
                                   OH                                      0000100010
                                   HOH                                     0000000101
                                   Result:                                 1010110111


34   08.01.2011         ChEBI – Chemical Entities of Biological Interest
     Types of structure search
     • Identity – based on InChI
                                                                          InChI=1/H2O/h1H2

     • Substructure – uses fingerprints to narrow search range, then
       performs full substructure search algorithm


                  0010110010                                                 1010110111

     • Similarity – based on Tanimoto coefficient calculated between the
       fingerprints

              Tanimoto(a,b)                         a        0010110010      = 4 / (4+7-4)
              = c / (a+b-c)                         b        1010110111      = 0.57

35   08.01.2011       ChEBI – Chemical Entities of Biological Interest
     Browse via Periodic Table

                                                                     Molecular
                                                                     entities /
                                                                     Elements




36   08.01.2011   ChEBI – Chemical Entities of Biological Interest
     Navigate via links in ontology




                                                                     Click to follow links



37   08.01.2011   ChEBI – Chemical Entities of Biological Interest
Time for Exercises




                     EBI is an Outstation of the European Molecular Biology Laboratory.
Understanding the ChEBI ontology
Block 3




                        EBI is an Outstation of the European Molecular Biology Laboratory.
     Annotation of bioinformatics data
     • Essential for capturing understanding and knowledge
       associated with core data

     • Often captured in free text, which is easier to read and better
       for conveying understanding to a human audience, but…

             • Difficult for computers to parse
             • Quality varies from database to database
             • Terminology used varies from annotator to annotator

     • Towards annotation using standard vocabularies: ontologies
       within bioinformatics



40   08.01.2011       ChEBI – Chemical Entities of Biological Interest
     The ChEBI ontology

     Organised into three sub-ontologies, namely
                  • Molecular structure ontology
                  • Subatomic particle ontology
                  • Role ontology




                  (R)-adrenaline


41   08.01.2011         ChEBI – Chemical Entities of Biological Interest
     Molecular structure ontology




42   08.01.2011   ChEBI – Chemical Entities of Biological Interest
     Role ontology




43   08.01.2011   ChEBI – Chemical Entities of Biological Interest
     ChEBI ontology relationships

     • Generic ontology relationships



     • Chemistry-specific relationships




44   08.01.2011   ChEBI – Chemical Entities of Biological Interest
     Viewing ChEBI ontology




45   08.01.2011   ChEBI – Chemical Entities of Biological Interest
     Viewing ChEBI ontology [2]



                  Tree view




46   08.01.2011      ChEBI – Chemical Entities of Biological Interest
     Browsing ChEBI ontology (OLS)



                  Browse the ontology




     Ontology Lookup Service (OLS): http://www.ebi.ac.uk/ontology-lookup/

47   08.01.2011        ChEBI – Chemical Entities of Biological Interest
     Ontology Lookup Service

     • Provides a centralised query interface for ontology and
       controlled vocabulary lookup
     • Can integrate any ontology available in OBO format
     • At last release, 58 ontologies integrated, including
            •     GO
            •     ChEBI
            •     Molecular interaction (PSI MI)
            •     Pathway ontology (PW)
            •     Human disease (DOID)
            •     and many more…
     • Provides a search and a browse facility, as well as
       displaying a graph of terms and relationships

48   08.01.2011           ChEBI – Chemical Entities of Biological Interest
     OBO Foundry



        “The OBO Foundry is a collaborative experiment
       involving developers of science-based ontologies
      who are establishing a set of principles for ontology
         development with the goal of creating a suite of
      orthogonal interoperable reference ontologies in the
                      biomedical domain.”




49   08.01.2011   ChEBI – Chemical Entities of Biological Interest
Time for Exercises




                     EBI is an Outstation of the European Molecular Biology Laboratory.
Download and programmatic access
Block 4




                       EBI is an Outstation of the European Molecular Biology Laboratory.
     ChEBI domain model
                                                                     Self-referencing -
                                                                          merging




52   08.01.2011   ChEBI – Chemical Entities of Biological Interest
     Compound IDs and Merging
     • Compound accessions are maintained after merging, but…
                                                                                 Main accession:
                                                                                  CHEBI:15377




                                                          Navigated accession:
                                                              CHEBI:5585


                  only the main accession of a merged group is displayed

53   08.01.2011      ChEBI – Chemical Entities of Biological Interest
     Compound IDs and Merging [2]

     15377        C         CHEBI:15377 ChEBI                               null       water null
     5585         C         CHEBI:5585                        KEGG          15377      null    null


                                                    Additional acc                     Parent ID



     16213        5585             C00001                         KEGG accn C           KEGG    KEGG
     17314        5585             7732-18-5                      CAS              C    KEGG    null
                                                                  Registry


                                    This compound ID = additional acc


54   08.01.2011          ChEBI – Chemical Entities of Biological Interest
     Downloading ChEBI flavours

      • All downloads come in two flavours
         • 3 star only entries (manually annotated ChEBI
            entries)
         • 2 and 3 star entries (manually annotated ChEBI,
            ChEMBL and user submissions)




55   08.01.2011   ChEBI – Chemical Entities of Biological Interest
     Downloading ChEBI

     • OBO file
        • Use on OBO-edit
     • SDF File
        • Chemistry software compliant such as Bioclipse
     • Flat file, tab delimited
        • Import all the data into Excel
        • Parse it into your own database structure
     • Oracle binary dumps
        • Import into an oracle database
     • Generic SQL insert statements
        • Import into MySQL or postgresql database
56   08.01.2011   ChEBI – Chemical Entities of Biological Interest
     OBO File Format

      • File format defined specifically for capturing biological
        ontologies                         General header
                                             information
      • Why use this format?
         • Use it if you are primarily interested in the ontology.
                                           Synonym types
                                            used chemical
         • Don’t use it if you are interested in in terms
            structural information.
                                             Root terms
      • What can you do with it?
         • Can parse it directly using parsers such as OBO-Edit
         • Can upload and browse the ontology using OBO-
            Edit
                                                                     Relationships to
                                                                       other terms
57   08.01.2011   ChEBI – Chemical Entities of Biological Interest
     SDF File Lite format

     • Chemistry software compliant format
     • Why use this format?
        • Use it to obtain the ChEBI entries with their chemical
          structural information.
        • Don’t use it for the ontology.
     • What can I do with this format?
        • Parse it using existing software libraries such as CDK.
        • Open it in standalone toolsEntries separated by
                                       such as Bioclipse
        • Copy and paste individual structures into JChemPaint
                                              $$$$




58   08.01.2011   ChEBI – Chemical Entities of Biological Interest
     SDF File complete format




                                            Entries
                                         separated by
                                             $$$$

59   08.01.2011   ChEBI – Chemical Entities of Biological Interest
     Flat-file tab and comma delimited

     • Why use this format?
        • Use it to obtain the entire ChEBI database structure.
     • What can I do with this format?
        • Open it using Excel
        • Import it into a relevant database such as Oracle




60   08.01.2011   ChEBI – Chemical Entities of Biological Interest
     Table dumps

     • Similar structure to the flat-file tab delimited files
     • Why use this format?
        • Use it to obtain the entire ChEBI database structure.

     • Oracle binary dumps
       • Import into an oracle database
     • Generic SQL insert statements
       • Import into MySQL or postgresql database




61   08.01.2011   ChEBI – Chemical Entities of Biological Interest
     Web services

     • Allow users to create their own applications to query data




       User
     application




62   08.01.2011    ChEBI – Chemical Entities of Biological Interest
     The ChEBI web service

     • Programmatic access to a ChEBI entry
     • SOAP based Java implementation
            • Clients currently available in Java and perl
     • Methods
            •     getLiteEntity
            •     getCompleteEntity and getCompleteEntityByList
            •     getOntologyParents
            •     getOntologyChildren and getAllOntologyChildrenInPath
            •     getStructureSearch


     • Documented at
       http://www.ebi.ac.uk/chebi/webServices.do.

63   08.01.2011          ChEBI – Chemical Entities of Biological Interest
     Web service client object model


        getLiteEntity


       getCompleteEntity




     getOntology
     (Parents and
       Children)




64   08.01.2011         ChEBI – Chemical Entities of Biological Interest
     Methods and parameters (1)




65   08.01.2011   ChEBI – Chemical Entities of Biological Interest
     Methods and parameters (2)




66   08.01.2011   ChEBI – Chemical Entities of Biological Interest
     Methods and parameters (3)




67   08.01.2011   ChEBI – Chemical Entities of Biological Interest
Time for Exercises




                     EBI is an Outstation of the European Molecular Biology Laboratory.
     For more information

     • Email: chebi-help@ebi.ac.uk

     • SourceForge: https://sourceforge.net/projects/chebi/

     • User Manual:
       http://www.ebi.ac.uk/chebi/userManualForward.do

     • RSS Feed




69   08.01.2011   ChEBI – Chemical Entities of Biological Interest
     Acknowledgements
     • The ChEBI team
            Nico Adams                                           Paula de Matos
            Adriano Dekker                                       Marcus Ennis
            Janna Hastings                                       Duncan Hull
            Zara Josephs                                         Steve Turner
            Christoph Steinbeck


     • Everyone @ the EBI and elsewhere who uses or
       contributes to ChEBI

          ChEBI is funded by the European Commission under SLING, grant agreement
          number 226073 (Integrating Activity) within Research Infrastructures of the FP7
          Capacities Specific Programme; and by the BBSRC, grant agreement number
          BB/G022747/1 within the "Bioinformatics and biological resources" fund.




70   08.01.2011           ChEBI – Chemical Entities of Biological Interest
Thank you




            EBI is an Outstation of the European Molecular Biology Laboratory.

				
DOCUMENT INFO