Plant Ontologies

Document Sample
Plant Ontologies Powered By Docstoc

                         Plant Ontologies –
                      Industrial Science meets
                       Renaissance Concepts

                                 Dave Selinger
                            Computational Biologist
                                Pioneer Hi-Bred,
                         DuPont Agriculture and Nutrition
   What is the nature of the problem that a Plant
    Anatomy Ontology can solve?
   What is an Ontology?
   How do you make a Plant Anatomy Ontology?
   Does it really solve the problem?

                  Industrial Science
   Not science in industry, but the industrialization of data
    creation, i.e. the „omics revolutions.
   High-throughput data
       Sequencing
       Expression
   Medium-throughput data
       Proteomics
       Metabolomics
   Low-throughput data
       Gene/protein function
       Phenotype
      The double-edged sword of
          Industrial Science
   Industrial science means lots of cheap data
     Sequencing       << $0.01/base
          $10,000 prokaryotic genomes are reality
          $10,000 eukaryotic genomes will be reality in the next five years
     Expression<$0.50/gene
     And much of this data is available for free after it is
   Lots of data means that you can‟t sit down with
    your lab notebook and analyze the data by hand.
     Databases, software for searching and comparing
     Whole new areas of research devoted to finding
      meaningful patterns in lots of data.                              RESEARCH
               Organizing information
   Information is not knowledge.
       But knowledge can be acquired from information.
       But only with a lot of effort, see third law of thermodynamics
   Central challenge with Industrial science is organizing the
       The organization of the information determines what you can
       Experimental design
            Good design will produce a contrast that will support or refute a
            Statistical rigor –
              – Is the signal higher than the noise?
              – How conclusive will the discoveries be?
   How do we compare across experiments?
     Not too hard if one person did all the experiments and
      kept careful notes.
       multiple people, then we need to define what was
     If
      done, what the analysis was, and what the sample was.
          What was done – e.g. MIAME standard for describing the
           technical details of an expression experiment.
          Analysis – e.g. ANOVA, SAM, etc.
          Sample – ?

           Renaissance concepts
        (historically Enlightenment)
   Things can be systematically described
    and classified
       Organisms - Linneaus, Species Plantarum,
   Linneaus‟ problem is much the same as
    the sample description problem
       Variable specificity
            California Laurel or Oregon Myrtlewood?
            Kernel or seed?
   In addition, a term like kernel assumes all
    parts, but this assumption could be wrong

          Ontologies to the rescue?
   Ontology = the study of being (Philosophy)
       The specification of a conceptualization of a domain of interest
        (Computer Science)
       Original and continuing computer science interest was Artificial
            How can a computer make inferences?
            Need to define meanings – can for example.
            Structure and relationships in an ontology allow a computer to make
              – Mary is the mother of Bill. Is Mary a parent of Bill?
              – IsA Mother Parent
   Parts of an ontology
       Concepts -> objects, real and abstract, processes, functions
       Partitions -> rules that can classify concepts
       Attributes -> properties of a concept, can have individual and class
       Relationships -> is a, part of
    Does an ontology make sense?
   The value of ontologies is a current debate among
    information scientists.
       One group advocates that ontologies are necessary for computers
        to understand content.
            Semantic web -> an extension of the current HTML/XML based web to
             something with ontological inference
       Others argue that ontologies are not needed and are not practical
            Complexity is ok and just use a Google like search to connect concepts.
       However, some problems, like organismal classification and the
        periodic table are very amenable to an ontological approach.
            Formal categories and stable entities
            Expert users and catalogers

                Forms of ontologies
   Ontologies can take several forms (data
     Controlled     vocabulary (List)
          Terms but no relationships
          Enforces systematic naming
     Hierarchy     (tree structure) => Taxonomy
          Terms and “is a” relationship
          Children are unique and have a single parent
     Directed     acyclic graph => Gene Ontology
          Multiple relationship types
          Children with multiple parents
                      Features of Trees
   Because each child node has only one parent
       There is an unambiguous path to the root from each leaf
       Child nodes can be easily grouped at any level of the structure
   Trees can express only one organizing principle
   Work well for taxonomy (at least eukaryotic taxonomy)
       Organizing principle is classification by similarity
       All terms have an “is a” relationship to the next level term
       Organisms were classified before evolution was hypothesized, but
        the classification matches the evolutionary relationships
            Similar example would be the periodic table of the elements
            Classification can facilitate discovery of underlying principles

    A tree based Anatomy Ontology
   Developed by Winston Hide‟s group at SANBI and
    Electric Genetics
   Single concept, orthogonal trees
     Cells
     Tissues
     Organs
     Disease   state
   Each tree is independent, but has related
    dimensions describing a sample
   Set operations, intersection or union, between
    trees allows specific queries.                   RESEARCH
                   Features of DAGs
   A tree is a special case of the DAG class
   Children can have multiple parents.
     Allows    multiple classifications of the same child
          E.g. a guard cell is both part of a leaf and is an epidermal cell.
          Allows for more than a binary classification of a concept
     Ifthis results from poor definition of the concept, then it
      is not good.
   Multiple parentage fits a “normalized” data model
     Likea normalized relational database, a DAG can
      minimize duplication of objects (concepts).
                             Sample DAG
   Root
     Cooking
         Spices
           – Bay leaf
              •   Laurel nobilis
              •   Umbellularia californica (California laurel)

     Trees
         Lauraceae
           – Laurel
              •   Laurel nobilis
           – Umbellularia
              •   Umbellularia californica
    Constructing the Pioneer Plant
   Decided to produce a DAG
     Used DAGeditor (editor developed for GO)
     Developed our own web based viewing tool
         AmiGO was too complicated to re-use. Other public browsers
          did not have the functionality we wanted.
   Decided to focus on Corn and Soybeans
     Used   Kiesselbach‟s 1949 Monograph on Corn structure
      and reproduction as the primary source.
     Used Iowa State University Ag Extension publications
      for the development stages of corn and soybeans
     Added information from a botany textbook to cover
      missing terms from soybean.
           To collaborate or not to
   Advantage of just using the Pioneer Ontology was
    that it served our needs and was focused on corn
    and soybeans, our major crops.
   Disadvantage was that it was not synchronized to
    the public
     We  would not be able to easily integrate public tissue
      classifications to ours
     We would not be able to easily take advantage of
      improvements to the public ontology
     Presumably   the public ontology would be more
      “botanically correct” than ours.
      Plant Ontology Consortium
   Focused on model organisms
     Arabidopsis
     Rice   and other grasses with the rice terms (corn).
   Used a DAG approach
     Multiple   concepts
         Structure (cells, tissues, sporophyte and gametophyte)
         Development
     Used    DAGeditor and other GO approaches
         Most terms have multiple parents
         Same software and data structures as GO
                         Plant Ontology
   Domain = Plant anatomy and development
   Concepts
          Plant parts (leaf, root, flower, meristem, etc.)
          Life cycle stages (sporophyte, gametophyte)
          Developmental stages (V1, flowering, R1, etc.)
   Relationships between concepts
          “A kind of” (Is a)
            – A prop root is a root
          “A part of” (part of)
            – A root cap is part of a root
          In addition, for plant anatomy a “develops from” relation is needed
            – For example the relationship between stomatal guard cells and the guard
              mother cell
            – Guard cells develop from guard mother cells
    Adapting the POC ontology for
           Pioneer’s needs
   Problem is that it has many more terms than
    required for our experiments
     Some   terms describe tissues or cells that are not
      practical to collect (e.g. antipodal cells)
     Some terms describe parts not found in corn (e.g.
   Another problem is that we collect samples that
    are convenient subdivisions of structures
     Tipand base of an immature ear. Each differs from a
      whole immature ear in terms of what it contains.
     Basal endosperm – morphologically distinct from starchy
      endosperm, but not found in the ontology
                   Our current solution
   Add additional terms to the POC ontology
       Use a different id system
            easily distinguished from POC terms
            will not be overwritten by on-going public curation efforts.
   Label experiments with the terms from the ontology.
   Create a Custom ontology
       Query the whole ontology with the terms used in the labeling and
        keep only
            terms that are used to label an experimental sample
            Parent terms of used terms.
       Can be readily rebuilt if new experiments or terms are added.

          What can you do with the
   Provides a grouping mechanism
       Summarize expression for a tissue
       Compare expression between tissues
       Make complex queries that involve multiple tissues
   Provides a systematic label for annotating genes
       Where is the gene expressed?
       Query annotation of genes based on terms
   Provides a description of the complexity of tissue samples
       Leaf sample is composed of multiple cell types with different roles
       Cell types can be shared between tissues or structures

             Comparing by tissue
   The ontology provides the groupings, but how to
     Mean?
     Median?
     Maximum    value?
   Significance of differences?
     Eachgroup will be much more variable than a set of
      samples from a controlled experiment.
     But you may be able to eliminate the inevitable false
      discoveries that appear when looking at large numbers
      of genes.                                             RESEARCH
                   Annotating genes
   This is the primary use for TAIR and Gramene
     Potentially    label most genes with tissues of expression
     However,  need to differentiate presence with
      preferential expression.
          A gene may be present in many tissues, but highly expressed in
           a few
          Another gene may be present in the same tissues, but similarly
           expressed in all of them.
            – Might need to precompute and indicate which tissues the gene is
              significantly preferentially expressed in.
            – Might be able to use the RMS differences between expression in
              each tissue as a measure of consistency.

   Genes may appear to differ between tissues for
    trivial reasons
     Example:  Gene appears to be preferentially expressed
      in stem versus leaf tissue.
          If gene is really specific to vascular tissue and stem has more…
          Gene is expressed late in development, adjacent leaves and
           stems may differ in development.
     Ontology     can guide further experiments
          Compare vascular and non-vascular tissue from both leaf and
          Compare multiple leaf and stem samples from different positions
           (developmental stages).
   The Plant Ontology classifies experiments and
    genes based on anatomical and developmental
   Now that we have significant data, can we, like
    Darwin, discern the underlying mechanisms for
    how anatomical and developmental differences
   The Plant Ontology will be successful and used
    long term if it facilitates these kinds of
   Pioneer
     Henry    Mirsky
     Lane   Arthur
     Bob   Merrill
   POC
     Doreen    Ware (Gramene)
     Katica   Ilic (TAIR)


Shared By: