Ontologies in Bioinformatics

					Ontologies in Bioinformatics
            Robert Stevens
    Department of Computer Science
       University of Manchester
     Robert.stevens@cs.man.ac.uk



         http://img.cs.man.ac.uk/stevens   1
                      Introduction
 What is knowledge?
 What is an ontology?
Relationships between the two communities
 The last decade of bio-ontologiesontologies
 The future




                     http://img.cs.man.ac.uk/stevens   2
            What is Knowledge?
• Knowledge – all information and
                                      B               man
                                      I
  an understanding to carry out       O        academic, senior
  tasks and to infer new              L   ancient university, 5 rated
                                      O
  information
                                      G            European
                                      Y   important figure in biology
• Information -- data equipped with
  meaning
                                      C             Name
                                      o              Job
• Data -- un-interpreted signals
                                      n           Institution
  that reach our senses
                                      f            Country

                                           I          Michael Ashburner
                                           S               Professor
                                           M University of Cambridge
                                           B
                           http://img.cs.man.ac.uk/stevens
                                                              UK        3
           Things, Symbols & Concepts
• Humans require words (or at least symbols) to communicate
  efficiently. The mapping of words to things is only indirectly
  possible. We do it by creating symbols that stand for things.

• The relation between symbols and things has been described in the
  form of the meaning triangle:

                                 Concept




“Jaguar“
            [Ogden, Richards, 1923]


                             http://img.cs.man.ac.uk/stevens       4
          Representing Knowledge
• Language uses symbols and rules (natural language) to
  communicate knowledge
• Need human intelligence to deal with pragmatics
• NLP notoriously difficult
• Need to capture knowledge in a computationally amenable
  manner
• Ontology: A conceptual model
• Ontology plus lexicon is a terminology
• Primary aim of creating a shared understanding of a domain and
  the relationships within that domain
• Common symbols for the things within a domain
• Capturing domain knowledge with fidelity and precision

                     http://img.cs.man.ac.uk/stevens           5
            Sharing info  Sharing meaning
Metadata
•   Data describing the content and                    Service
                                           Service     provider
    meaning of resources and                                         Service
    services.                              provider                  provider
•   But everyone must speak the same
    language…
                                                                     Service
Terminologies                                                        provider
                                            Service
•   Shared and common vocabularies          provider
•   For search engines, agents,
    curators, authors and users
•   But everyone must mean the same
    thing…
                                       Ontologies
                                        Shared and common
                                       understanding of a domain
                                             Essential for search, exchange and
                                           discovery
                            http://img.cs.man.ac.uk/stevens                   6
               What is an Ontology?
•   Concepts: Units of thought: Classes and individuals;
•   Protein, Gene, DNA, Hexokinase, glycolysis,…
•   Terms: Labels for concepts “Protein”, “Gene”,…
•   Relationships: Semantic links between concepts
•   Is-a-kind, is-a, part-of, name-of,…
•   Taxonomy backbone of ontology




                       http://img.cs.man.ac.uk/stevens     7
           So what Counts as an ontology?
                        [Deborah McGuinness, Stanford]


                                                                   General
                               Formal               Frames
             Thesauri                                              Logical
                               Is-a                 (properties)
Catalog/                                                           constraints
ID
                                                             Disjointness,
                  Informal           Formal                  Inverse, partof
  Terms/
                  Is-a               instance Value
  glossary
                                              restrictions
                                          Arom
 Gene Ontology                                            TAMBIS
                                            EcoCyc
     Mouse Anatomy http://img.cs.man.ac.uk/stevens PharmGKB              8
The art of ranking things in genera and species is of no small importance
   and very much assists our judgment as well as our memory. You know
   how much it matters in botany, not to mention animals and other
   substances, or again moral and notional entities as some call them.
   Order largely depends on it, and many good authors write in such a
   way that their whole account could be divided and subdivided
   according to a procedure related to genera and species. This helps one
   not merely to retain things, but also to find them. And those who have
   laid out all sorts of notions under certain headings or categories have
   done something very useful.

Gottfried Wilhelm Leibniz, New Essays on Human Understanding




                         http://img.cs.man.ac.uk/stevens                9
The Gene Ontology




   http://img.cs.man.ac.uk/stevens   10
  Bio-Ontologies in the Past Decade

• Explicit use of ontologies fairly recent
• EcoCyc and RiboWeb using Frame Based Systems to create
  knowledge bases
• An area in which the CS community can test their technology
• Large, complex and dynamic
• “A knowledge based discipline”
• The post-genomic era encourages the need for shared
  understanding
• Cross-genome comparisons need structured, controlled
  vocabularies
• Moved from small nich to a much bigger niche
• Biologists are building ontologies

                     http://img.cs.man.ac.uk/stevens            11
             Uses of Bio-Ontologies
•   Controlled vocabularies for annotation
•   Describing schema dn the content of schema
•   Domain maps
•   Query mechanisms
•   Resolution of semantic heterogeneiety
•   Text analysis….




                      http://img.cs.man.ac.uk/stevens   12
               The Gene Ontology
• Tutorial and the first Bio-Ontologies meeting at ISMB 1998 in
  Montreal
• Fly, mouse and yeast get together to develop GO
• First release some 3,500 terms covering Molecular Function,
  biological Process and Cellular Component
• Now some 15,000 terms and growing
• Gene Ontology Consortium covers some 15 organism
  databases plus SWISS-PROT and others
• Synonyms, abbreviations and associations to gene products:
  Access to names, genes etc.
• A common understanding across a community


                      http://img.cs.man.ac.uk/stevens             13
 GO DAG for heparin biosynthesis

GO:0003673 : Gene_Ontology (46199)
         GO:0008150 : biological_process (30188)
             GO:0008151 : cell growth and/or maintenance (20547)
                GO:0008152 : metabolism (14693)
                   GO:0016051 : carbohydrate metabolism (267)
                      GO:0006023 : aminoglycan metabolism (18)
                                    GO:0030203 :glycosaminoglycan metabolism
                          GO:0030202 : heparin metabolism (3)
                           GO:0030210 : heparin biosynthesis (3)




                                        http://img.cs.man.ac.uk/stevens        14
        Open bio-Ontologies (OBO)
• Go, though large, is narrow
• Sequence Ontology
• Chemical Ontology
• Promotes a common ontology format, tools and house-style
• Micro-array community a further boost – avoiding mistakes of
  previous bioinformatics resources
• Need ontolgoies for phenotype, tissues, anatomies, etc.




                     http://img.cs.man.ac.uk/stevens             15
                      Two Communities
Computer Scientists                                               Biologists
Building ontologies                                          Ontology content
KR                                                         Domain Knowledge
Reasoning




                            Better Ontologies
                         http://img.cs.man.ac.uk/stevens              16
         What are We Saying?
                Person
         is-a                   is-a

  Man                       Woman




•Are all instances of Man instances of Person?
•Can an instance of Person be both a Man
and an instance of Woman?
•Can there be any more kinds of Person?
                http://img.cs.man.ac.uk/stevens   17
                This Year’s Meeting
•   A theme of text analysis and ontology
•   First time talks have matched theme
•   Ontologies and indexing
•   Integrating ontologies into NLP systems
•   Ontologies in information retrieval
•   Developing terminologies
•   GO in NLP
•   New Ontologies
•   Semantic Similarity



                       http://img.cs.man.ac.uk/stevens   18
                    Opportunities
• Ontologies to help text analysis
• Text analysis to help build ontologies
• Biology community steadily building a large number of large
  domain ontologies
• CS community can help build computationally amenable
  ontologies
• Vast quantities of domain knowledge in natural language forms
  in literature and databanks
• Opportunities for language and ontology communities




                     http://img.cs.man.ac.uk/stevens          19

				
DOCUMENT INFO