Ontologies in Bioinformatics by fionan


									Ontologies in Bioinformatics
            Robert Stevens
    Department of Computer Science
       University of Manchester

         http://img.cs.man.ac.uk/stevens   1
 What is knowledge?
 What is an ontology?
Relationships between the two communities
 The last decade of bio-ontologiesontologies
 The future

                     http://img.cs.man.ac.uk/stevens   2
            What is Knowledge?
• Knowledge – all information and
                                      B               man
  an understanding to carry out       O        academic, senior
  tasks and to infer new              L   ancient university, 5 rated
                                      G            European
                                      Y   important figure in biology
• Information -- data equipped with
                                      C             Name
                                      o              Job
• Data -- un-interpreted signals
                                      n           Institution
  that reach our senses
                                      f            Country

                                           I          Michael Ashburner
                                           S               Professor
                                           M University of Cambridge
                                                              UK        3
           Things, Symbols & Concepts
• Humans require words (or at least symbols) to communicate
  efficiently. The mapping of words to things is only indirectly
  possible. We do it by creating symbols that stand for things.

• The relation between symbols and things has been described in the
  form of the meaning triangle:


            [Ogden, Richards, 1923]

                             http://img.cs.man.ac.uk/stevens       4
          Representing Knowledge
• Language uses symbols and rules (natural language) to
  communicate knowledge
• Need human intelligence to deal with pragmatics
• NLP notoriously difficult
• Need to capture knowledge in a computationally amenable
• Ontology: A conceptual model
• Ontology plus lexicon is a terminology
• Primary aim of creating a shared understanding of a domain and
  the relationships within that domain
• Common symbols for the things within a domain
• Capturing domain knowledge with fidelity and precision

                     http://img.cs.man.ac.uk/stevens           5
            Sharing info  Sharing meaning
•   Data describing the content and                    Service
                                           Service     provider
    meaning of resources and                                         Service
    services.                              provider                  provider
•   But everyone must speak the same
Terminologies                                                        provider
•   Shared and common vocabularies          provider
•   For search engines, agents,
    curators, authors and users
•   But everyone must mean the same
                                        Shared and common
                                       understanding of a domain
                                             Essential for search, exchange and
                            http://img.cs.man.ac.uk/stevens                   6
               What is an Ontology?
•   Concepts: Units of thought: Classes and individuals;
•   Protein, Gene, DNA, Hexokinase, glycolysis,…
•   Terms: Labels for concepts “Protein”, “Gene”,…
•   Relationships: Semantic links between concepts
•   Is-a-kind, is-a, part-of, name-of,…
•   Taxonomy backbone of ontology

                       http://img.cs.man.ac.uk/stevens     7
           So what Counts as an ontology?
                        [Deborah McGuinness, Stanford]

                               Formal               Frames
             Thesauri                                              Logical
                               Is-a                 (properties)
Catalog/                                                           constraints
                  Informal           Formal                  Inverse, partof
                  Is-a               instance Value
 Gene Ontology                                            TAMBIS
     Mouse Anatomy http://img.cs.man.ac.uk/stevens PharmGKB              8
The art of ranking things in genera and species is of no small importance
   and very much assists our judgment as well as our memory. You know
   how much it matters in botany, not to mention animals and other
   substances, or again moral and notional entities as some call them.
   Order largely depends on it, and many good authors write in such a
   way that their whole account could be divided and subdivided
   according to a procedure related to genera and species. This helps one
   not merely to retain things, but also to find them. And those who have
   laid out all sorts of notions under certain headings or categories have
   done something very useful.

Gottfried Wilhelm Leibniz, New Essays on Human Understanding

                         http://img.cs.man.ac.uk/stevens                9
The Gene Ontology

   http://img.cs.man.ac.uk/stevens   10
  Bio-Ontologies in the Past Decade

• Explicit use of ontologies fairly recent
• EcoCyc and RiboWeb using Frame Based Systems to create
  knowledge bases
• An area in which the CS community can test their technology
• Large, complex and dynamic
• “A knowledge based discipline”
• The post-genomic era encourages the need for shared
• Cross-genome comparisons need structured, controlled
• Moved from small nich to a much bigger niche
• Biologists are building ontologies

                     http://img.cs.man.ac.uk/stevens            11
             Uses of Bio-Ontologies
•   Controlled vocabularies for annotation
•   Describing schema dn the content of schema
•   Domain maps
•   Query mechanisms
•   Resolution of semantic heterogeneiety
•   Text analysis….

                      http://img.cs.man.ac.uk/stevens   12
               The Gene Ontology
• Tutorial and the first Bio-Ontologies meeting at ISMB 1998 in
• Fly, mouse and yeast get together to develop GO
• First release some 3,500 terms covering Molecular Function,
  biological Process and Cellular Component
• Now some 15,000 terms and growing
• Gene Ontology Consortium covers some 15 organism
  databases plus SWISS-PROT and others
• Synonyms, abbreviations and associations to gene products:
  Access to names, genes etc.
• A common understanding across a community

                      http://img.cs.man.ac.uk/stevens             13
 GO DAG for heparin biosynthesis

GO:0003673 : Gene_Ontology (46199)
         GO:0008150 : biological_process (30188)
             GO:0008151 : cell growth and/or maintenance (20547)
                GO:0008152 : metabolism (14693)
                   GO:0016051 : carbohydrate metabolism (267)
                      GO:0006023 : aminoglycan metabolism (18)
                                    GO:0030203 :glycosaminoglycan metabolism
                          GO:0030202 : heparin metabolism (3)
                           GO:0030210 : heparin biosynthesis (3)

                                        http://img.cs.man.ac.uk/stevens        14
        Open bio-Ontologies (OBO)
• Go, though large, is narrow
• Sequence Ontology
• Chemical Ontology
• Promotes a common ontology format, tools and house-style
• Micro-array community a further boost – avoiding mistakes of
  previous bioinformatics resources
• Need ontolgoies for phenotype, tissues, anatomies, etc.

                     http://img.cs.man.ac.uk/stevens             15
                      Two Communities
Computer Scientists                                               Biologists
Building ontologies                                          Ontology content
KR                                                         Domain Knowledge

                            Better Ontologies
                         http://img.cs.man.ac.uk/stevens              16
         What are We Saying?
         is-a                   is-a

  Man                       Woman

•Are all instances of Man instances of Person?
•Can an instance of Person be both a Man
and an instance of Woman?
•Can there be any more kinds of Person?
                http://img.cs.man.ac.uk/stevens   17
                This Year’s Meeting
•   A theme of text analysis and ontology
•   First time talks have matched theme
•   Ontologies and indexing
•   Integrating ontologies into NLP systems
•   Ontologies in information retrieval
•   Developing terminologies
•   GO in NLP
•   New Ontologies
•   Semantic Similarity

                       http://img.cs.man.ac.uk/stevens   18
• Ontologies to help text analysis
• Text analysis to help build ontologies
• Biology community steadily building a large number of large
  domain ontologies
• CS community can help build computationally amenable
• Vast quantities of domain knowledge in natural language forms
  in literature and databanks
• Opportunities for language and ontology communities

                     http://img.cs.man.ac.uk/stevens          19

To top