Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out

Papachristoudis

VIEWS: 15 PAGES: 47

									                                 TAGGO: a Tool for
                                 Automatic Grouping
                                  of Gene Ontology
                                     Annotations



                                   George Papachristoudis
                                   Sophia Kossida

Meeting on Bioinformatics and Medical Informatics
                                                      1
          Athens, 4 – 5 October 2006
    TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations


               Organization of the presentation



                                             • Introduction
                                             • Why was the development
                                                of TAGGO a necessity?
                                             • Description of the tool
                                             • Results
                                             • Conclusions
                                             • Comparison
                                             • Acknowledgements
                                                                                       2
Athens, 4 – 5 October 2006             Meeting on Bioinformatics and Medical Informatics
    TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations


                              Introduction




                                         What is TAGGO?




                                                                                       3
Athens, 4 – 5 October 2006             Meeting on Bioinformatics and Medical Informatics
    TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations


                               Introduction




                                                     Is it Tango?



                                                       No!




         But they are both appealing… Each on its domain

                                                                                         4
Athens, 4 – 5 October 2006               Meeting on Bioinformatics and Medical Informatics
    TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations


                              Introduction



                                               TAGGO is a tool
                                               which tries to derive
                                               the protein’s
                                               main functions
                                               automatically



                                                                                       5
Athens, 4 – 5 October 2006             Meeting on Bioinformatics and Medical Informatics
    TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations

                                      Introduction

What are the main questions to fully characterize
a protein’s main activities?
                                                                           nucleus




                             In what processes is involved in?


                                                                           metabolism




           protein
                                                                        protein binding
                                                                                                   6
Athens, 4 – 5 October 2006                         Meeting on Bioinformatics and Medical Informatics
    TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations


               Organization of the presentation



                                              Introduction
                                             • Why was the development
                                               of TAGGO a necessity?
                                             • Description of the tool
                                             • Results
                                             • Comparison
                                             • Conclusions
                                             • Acknowledgements
                                                                                       7
Athens, 4 – 5 October 2006             Meeting on Bioinformatics and Medical Informatics
    TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations


                 Why was the development of
                   TAGGO a necessity?




                             three main reasons justify this…
                                                                                       8
Athens, 4 – 5 October 2006             Meeting on Bioinformatics and Medical Informatics
    TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations


                 Why was the development of
                   TAGGO a necessity?




                                      1) Biologists currently waste a lot of time
                                      and effort in searching manually for the
                                      characteristics of each protein.



                                                                                       9
Athens, 4 – 5 October 2006             Meeting on Bioinformatics and Medical Informatics
    TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations


                 Why was the development of
                   TAGGO a necessity?




                                       2) In virtue of saving time, there is a
                                       refinement in the information provided
                                       by large amounts of data.
                               Each dataset contains 1000 - 3000 proteins,
                               while each protein is annotated to 5 - 12 GO terms
                                                                                       10
Athens, 4 – 5 October 2006               Meeting on Bioinformatics and Medical Informatics
    TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations


                 Why was the development of
                   TAGGO a necessity?
                                                                    The Gene Ontology
                                                                          (GO)




                                       3) The process is hampered further by the wide
                                       variations in terminology (nomenclature) that each
                                       biologist group adopts

                   The GO project
                  • provides controlled, well structured  • widely acceptable nomenclature
                  vocabularies                            The Gene Ontology Dictionary
                                                          http://www.geneontology.org/doc/GODict.DAT
                                                                                                   11
Athens, 4 – 5 October 2006                           Meeting on Bioinformatics and Medical Informatics
    TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations


               Organization of the presentation



                                              Introduction
                                              Why was the development
                                               of TAGGO a necessity?
                                             • Description of the tool
                                             • Results
                                             • Comparison
                                             • Conclusions
                                             • Acknowledgements
                                                                                     12
Athens, 4 – 5 October 2006             Meeting on Bioinformatics and Medical Informatics
    TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations


              Description of the tool – 1st Part
   What is the Gene Ontology project?
   The Gene Ontology (GO) project is a collaborative effort to address the
   need for consistent descriptions of gene products in different databases.
   The project began as a collaboration between three model organism
   databases, the FlyBase (Drosophila), the Saccharomyces Genome
   Database (SGD) and the Mouse Genome Database (MGD), in 1998.
   Since then it has grown to include several plant, animal and microbial
   databases.


  What is the Gene Ontology Consortium?
  The GO Consortium is the set of model organism and protein databases
  and biological research communities actively involved in the development
  and application of the Gene Ontology project.
                                        Source:
                                        An Introduction to the Gene Ontology
                                        The GO Consortium
                                                                                      13
Athens, 4 – 5 October 2006              Meeting on Bioinformatics and Medical Informatics
    TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations


              Description of the tool – 1st Part

                             GOA Databases




                                                                                     14
Athens, 4 – 5 October 2006             Meeting on Bioinformatics and Medical Informatics
    TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations


              Description of the tool – 1st Part
 What is the Gene Annotation
 (GOA) project?
 The GOA project provides
 high-quality Gene Ontology
 (GO) annotations to proteins in
 the UniProt Knowledgebase
 (SWISS-PROT / TrEMBL /
 PIR-PSD) and InterPro
 databases.

                                          Genomes of seven (7) species supplied:
                                          Human, Mouse, Rat, Arabidopsis,
                                          Zebrafish, Chicken, Cow


                                         Source:
                                         Gene Ontology Annotation (GOA) Database


                                                                                     15
Athens, 4 – 5 October 2006             Meeting on Bioinformatics and Medical Informatics
    TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations
                                                   O00154
                                                   ATP9A_HUMAN
                                                   …
                                                   …
                       SELECT undesired ECs                             SELECT species
                                                   IPI00304150




                                                                                  Human



                  Exclude entries
                  Exclude entries
                  with undesired ECs
                  with undesired ECs                GOADB
                                                    GOA                           Mouse
                                                                                  Mouse
                                                    DB

                                                                                    …


                                                      GO                         Zebrafish
                                                                                 Zebrafish
                                                   annotations
                                                       file


                 Gene Ontology



                                                   Results
                                                   Results



                                              CC      MF         BP




                                                                                                      16
Athens, 4 – 5 October 2006                              Meeting on Bioinformatics and Medical Informatics
    TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations


              Description of the tool – 1st Part

   • GOA file format
    The associations between gene products and
    GO terms are stored in tab-delimited files.
    Each record is comprised of 15 fields    Most important fields
    (not all mandatory)                      1. gene / gene product ID (e.g. Q5T0T3)
                                                                  2.   gene / gene product symbol
                                                                       (e.g. TRIO_HUMAN)
                                                                  3.   gene / gene product synonym (IPI)
                                                                       (e.g. IPI00396431)
                                                                  4.   GO term ID (e.g. GO:0007582)
                                                                  5.   GO term aspect ( P, F or C)

                    EC is an index of the record’s reliability
                                                                  6.   Evidence Code (EC) (e.g. TAS, IEA)
                                                                              Source:
                    13 different EC types: 12 manual, 1 electronic (IEA)
                    Source:                                                   Annotation File Fields

                    Guide to GO Evidence Codes
                                                                                                          17
Athens, 4 – 5 October 2006                                  Meeting on Bioinformatics and Medical Informatics
              TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations


                             Description of the tool – 1st Part
                                                                          A
                                                                          s
                                                        DB:               p                                                        Assigned
     DB              ID   Symbol         GO ID                   EC           Name       Synonym       Type      Taxon     Date
                                                     Reference            e                                                           by
                                                                          c
                                                                          t

                                                                 T            DNASE
                                                                              DNASE
                          DNS2A_           GO:         PMID:     T            2,                                 taxon:   200309
  UniPro
  UniProt        O00115   DNS2A_           GO:         PMID:     A        C   2,        IPI00010348    protein   taxon:   200309    PINC
                 O00115   HUMAN          0005764      9714827    A        C   DNASE     IPI00010348    protein    9606      04      PINC
     t                    HUMAN          0005764      9714827    S            DNASE                               9606      04
                                                                 S            2A…
                                                                              2A…
                                                                 T
                                                                 T            ACOT7,
                          BACH_
                          BACH_            GO:
                                           GO:        PMID:
                                                      PMID:                   ACOT7,                             taxon:
                                                                                                                 taxon:   200309
  UniPro
  UniProt        O00154                                          A        F   BACH:     IPI00010415    protein            200309    PINC
                 O00154   HUMAN          0000062     10578051    A        F   BACH:     IPI00010415    protein    9606      04      PINC
     t                    HUMAN          0000062     10578051                 Cytos…                              9606      04
                                                                 S
                                                                 S            Cytos…

                                                                 I
                                                                 I            ACOT7,
                 O00154
                 O00154   BACH_
                          BACH_            GO:
                                           GO:         GOA:
                                                       GOA:                   ACOT7,                             taxon:
                                                                                                                 taxon:   200607
  UniPro
  UniProt                                                        E        F   BACH:     IPI00010415    protein            200607   UniProt
                          HUMAN          0004759       spkw      E        F   BACH:     IPI00010415    protein    9606      21     UniProt
     t                    HUMAN          0004759       spkw                   Cytos…                              9606      21
                                                                 A
                                                                 A            Cytos…

                                                                 T            MANBA,
                          MANBA_           GO:         PMID:                  MANB1:                             taxon:   200309
  UniProt        O00462                                          A        F             IPI00298793    protein                      PINC
                          HUMAN          0004567      9384606                  Beta-                              9606      04
                                                                 S            manno..

                                                                               RP11-
                                                                               RP11-
                                                                 I            508N12.
                                                                              508N12.
                                                                 I
  UniPro                  Q5T0T3_
                          Q5T0T3_          GO:
                                           GO:         GOA:
                                                       GOA:                      1,
                                                                                 1,                              taxon:
                                                                                                                 taxon:   200607
                                                                                                                          200607
  UniProt        Q5T0T3
                 Q5T0T3                                          E
                                                                 E        C
                                                                          C             IPI00748153
                                                                                        IPI00748153    protein
                                                                                                       protein                     UniProt
                                                                                                                                   UniProt
     t                    HUMAN
                          HUMAN          0005622
                                         0005622      Interpro
                                                      Interpro                 RP11-
                                                                               RP11-                              9606
                                                                                                                  9606      21
                                                                                                                            21
                                                                 A
                                                                              508N12
                                                                                …


Input proteins
------------------
BACH_HUMAN
                                                                                                      BACH_HUMAN  → GO: 0000062
O00115                    O00115 BACH_
                                 HUMAN
                                           →         GO:
                                                   0005764
                                                   0004759
                                                   0005622
                                                   0000062            ←                               BACH_HUMAN → GO: 0004759
                                                                                                IPI00748153
…
                                                                                                      O00115      → GO:0005764
IPI00748153
                                                                                                      IPI00748153 → GO: 0005622 18
    Athens, 4 – 5 October 2006                                                    Meeting on Bioinformatics and Medical Informatics
    TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations
                                                   O00154
                                                   ATP9A_HUMAN
                                                   …
                                                   …
                       SELECT undesired ECs                             SELECT species
                                                   IPI00304150




                                                                                  Human


                  Exclude entries
                  with undesired ECs                GOA DB
                                                                                  Mouse



                                                                                    …


                                                      GO                         Zebrafish
                                                   annotations
                                                       file


                 Gene Ontology



                                                   Results
                                                   Results
                                                   Results


                                              CC      MF         BP
                                              CC      MF         BP




                                                                                                      19
Athens, 4 – 5 October 2006                              Meeting on Bioinformatics and Medical Informatics
     TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations


               Description of the tool – 2nd Part
What is The Gene Ontology
(GO)?
The Gene Ontology is in fact an
“umbrella” under which reside
three structured, controlled and
orthogonal vocabularies                                        Gene
(ontologies) that describe gene                               Ontology
products in terms of their
associated biological processes,
cellular components and
molecular functions in a species-
independent manner.
                                    Biological               Molecular             Cellular
                                    Process                  Function              Component
                                    Ontology                 Ontology              Ontology

                                                                                            20
 Athens, 4 – 5 October 2006                   Meeting on Bioinformatics and Medical Informatics
    TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations


              Description of the tool – 2nd Part

                               Structure of each ontology

                               Each ontology is totally independent of the
                               others
                               It is a Directed Acyclic Graph (DAG) structure
                               It allows multiple parentalship
                               Comprised by “IS_A” and “PART_OF” links
                                 IS_A       : each term inherits the attributes of its
                               parent(s)
                                 PART_OF: each term is a component of its parent(s)
                                 IS_A relations outnumber PART_OF




                                                                                      21
Athens, 4 – 5 October 2006              Meeting on Bioinformatics and Medical Informatics
    TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations




                                                                   Example
                                                                       of
                                                                      BP
                                                                   Ontology




                                                                                     22
Athens, 4 – 5 October 2006             Meeting on Bioinformatics and Medical Informatics
    TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations




                                                                   Example
                                                                       of
                                                                      CC
                                                                   Ontology




                                                                                     23
Athens, 4 – 5 October 2006             Meeting on Bioinformatics and Medical Informatics
    TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations




                                                                   Example
                                                                       of
                                                                      MF
                                                                   Ontology




                                                                                     24
Athens, 4 – 5 October 2006             Meeting on Bioinformatics and Medical Informatics
    TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations


              Description of the tool – 2nd Part
                  We now have gene product – GO term pairs

                                                      O00115 – endonuclease activity
                                                      O00115 – DNA metabolism
                                                      O00151 – zinc ion binding
                                                      O00168 – integral to membrane
                                                      O00170 – transcription factor binding
                                                      O00217 – iron ion binding
                                                      O00264 – integral to plasma membrane
                                                      O00487 – ubiquitin-dependent
                                                               protein catabolism




                                 catalytic
                                                     …and we want to
                                                     group them into                          membrane
                                  activity
                                                     general categories




                                  ion                                          protein
                                                                               binding
                                binding            metabolism



                                                                                           25
Athens, 4 – 5 October 2006                   Meeting on Bioinformatics and Medical Informatics
    TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations


              Description of the tool – 2nd Part
                             How can we achieve that?

                             Here comes the Gene Ontology…
                             • The terms high in the hierarchy (shallow terms)
                             provide a sufficient deal of abstractness
                             (generality)
                             • Thus, we could consider them as categories
                             which hold the basic features of their children
                             • We could parse all the multiple paths leading from
                             the term to the root of the ontology
                             • Find all the parents of this parent
                             • Sort the parents in terms of descending generality
                             • Find the most general parent and regard it as the
                             term’s category
                                                                                        26
Athens, 4 – 5 October 2006                Meeting on Bioinformatics and Medical Informatics
    TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations


              Description of the tool – 2nd Part
    Defining generality…
    The Information Content (IC) of a term is an
    indication of its generality
       The less informative a term c is, the more general it is
                      IC(c) = – logb(Prob(c))

    In our case b = e (natural logarithm)                  In other words
    where:
              n                                            nc : number of term’s children
    Prob(c)= c                                             including the term itself
              nr
                                                           nr : number of root’s children
    nc : The number of times the term
                                                           including the root itself
        (or any of its children) occurs
    nr : The number of times the root
        of the ontology occurs                                                             27
Athens, 4 – 5 October 2006                   Meeting on Bioinformatics and Medical Informatics
     TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations


               Description of the tool – 2nd Part

                                                                  1. development
                                                                  2. response to
                                                                     stimulus
                                                                  3. morphogenesis




                                                                                               descending IC
            response to       growth   development
            stimulus                                              4. response to stress
                                                                  5. tissue development
          response to morphogenesisdevelopmental development 6. response to
response to                                tissue
          external stimulus        growth
stress                                                          external stimulus
                                                                  7. growth
                                                                  8. developmental
                                                                     growth
              wound healing                                       9. wound healing
                                          tissue regeneration          development


                                                                                              28
 Athens, 4 – 5 October 2006                     Meeting on Bioinformatics and Medical Informatics
    TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations


                 Description of the tool – 2nd Part




                                                                                                 descending IC
                                                                    1. cell
      cell               organelle
                                                                    2. cell part
                                                                    3. intracellular
             cell part                                              4. intracellular part
                                     a subtle change                5. organelle
    intracellular                    in CC ontology…


     intracellular part

                                          intracellular organelle         intracellular


                                                                                                29
Athens, 4 – 5 October 2006                        Meeting on Bioinformatics and Medical Informatics
    TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations
                                                   O00154
                                                   ATP9A_HUMAN
                                                   …
                                                   …
                       SELECT undesired ECs                             SELECT species
                                                   IPI00304150




                                                                                  Human


                  Exclude entries
                  with undesired ECs                GOA DB
                                                                                  Mouse



                                                                                    …


                                                      GO                         Zebrafish
                                                   annotations
                                                       file


                 Gene Ontology



                                                   Results
                                                   Results



                                              CC      MF         BP




                                                                                                      30
Athens, 4 – 5 October 2006                              Meeting on Bioinformatics and Medical Informatics
    TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations


              Description of the tool – 3rd Part
Creating pie charts…




                                                                                     31
Athens, 4 – 5 October 2006             Meeting on Bioinformatics and Medical Informatics
    TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations


               Organization of the presentation



                                              Introduction
                                              Why was the development
                                               of TAGGO a necessity?
                                              Description of the tool
                                             • Results
                                             • Comparison
                                             • Conclusions
                                             • Acknowledgements
                                                                                     32
Athens, 4 – 5 October 2006             Meeting on Bioinformatics and Medical Informatics
    TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations


                                Results

                              • we used a human brain protein set as input set
                                           1401 proteins
                              • each protein is annotated approximately to 8 terms
                                     11429 protein – GO term annotated pairs
                              • huge amount of collected data

                                      • finding the ten most general
                                      categories of each term

                                      • determination of the categories of
                                      each protein

                                      • finding 15 CC categories,
                                                18 BP categories,
                                                20 MF categories
                                                                                     33
Athens, 4 – 5 October 2006             Meeting on Bioinformatics and Medical Informatics
    TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations


               Organization of the presentation



                                              Introduction
                                              Why was the development
                                               of TAGGO a necessity?
                                              Description of the tool
                                              Results
                                             • Comparison
                                             • Conclusions
                                             • Acknowledgements
                                                                                     34
Athens, 4 – 5 October 2006             Meeting on Bioinformatics and Medical Informatics
    TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations


                                                        Comparison
                                                                  membrane
                         Manual: CC                               mitochondrion

               4%    8%               12%                         cytoplasm
                                                  7%              nucleus

    21%                                                           endoplasmatic reticulum
                                                                  extracellular                                         CC Ontology
                                                                  intracellular
                                                       15%        cytoskeleton
     2%                                                           cellular component unknow n

          7%        3%    6%    3%          12%                   no entry
                                                                  lysosomal
                                                                  other

                                                                                                                          membrane
                                                                                   Automatic: CC                          mito cho ndrio n

                                                                                     0%         0%                        cyto plasm

 The pie charts almost                                                               1%         0%
                                                                                                                          nucleus

                                                                                                                          endo plasmic reticulum

 resemble each other                                                          5%   4%                12%
                                                                                                             6%
                                                                                                                          extracellular regio n

                                                                                                                          intracellular
                                                              16%
                                                                                                                          cyto skeleto n
                                                                                                                  10%
                                                                                                                          cellular co mpo nent unkno wn


 Only in “intracellular”,
                                                                                                                          no entry
                                                             1%
                                                                                                                          cell
                                                                                                             10%
                                                             3%                                                           pro tein co mplex
 there is a big difference                                                   25%
                                                                                                            2%            synapse part
                                                                                                                          extracellular matrix part
                                                                                                       5%                 synapse

                                                                                                                          virio n

                                                                                                                                 35
Athens, 4 – 5 October 2006                                                         Meeting on Bioinformatics and Medical Informatics
    TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations


                                                           Comparison
                                             Manual Vs. Automatic Grouping
                                                     CC ontology

    600
                                                                                          Manual
    500                                                                                   Automatic
    400
    300                                                                                               CC Ontology
    200
    100
        0
            e


                                  l




                                                                                   tr y
                                                                         n
                                                                         r
                                                                         r
                                 m


                                                                       us


                                                                      um




                                                                       on
                                                                       on
                       ra




                                                                       la
                                                                       la
          an




                                                                     ow
                              as
                    di




                                                                  cle




                                                                   llu




                                                                                en
                                                                    m
                                                                   llu




                                                                    et
                                                                  ul
        br


                  on




                                                                kn


                                                              om
                           pl




                                                                 el
                                                              ce
                                                              ce
                                                              t ic
                                                             nu
     em




                                                                               no
                         to




                                                            sk


                                                            un
                ch




                                                          tra
                                                          tr a
                                                           re




                                                          tc
                       cy




                                                         to
    m


            ito




                                                       nt
                                                       in
                                                     ex




                                                     no
                                                       ic




                                                     cy
            m




                                                   ne
                                                    at
                                         m




                                                 po
                                      as




                                               om
                                      pl
                                    do




                                             rc
                                           la
                                 en




                                        llu
                                      ce




     • Slight differences                             =>           The most remarkable in “intracellular”
     • Manual: 1783 occurences                                     Automatic: 1961 occurences



                                                                                                                           36
Athens, 4 – 5 October 2006                                                   Meeting on Bioinformatics and Medical Informatics
    TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations


                                                   Comparison
                                                         structural activity
                   Manual: MF
                                                         transporter activity
                       1%
                                                         enzyme activity
                   2%                                    catalytic activity

               2% 3%         7%                          antioxidant activity
         13%                            13%              ion binding
                                                         signaling
    5%                                        9%         protein/ribonucleoprotein binding                           MF Ontology
                                                         nucleic acid binding
                                                         ATP binding
                                              6%
                                                         molecular function unknow n
         19%
                        7%        10%    3%              no entry
                                                         GTP-, ATPase activity
                                                         other


                                                                     0%                                                        structural mo lecule activity

                                                                                                                               transpo rter activity
                                                                    0%                 Automatic: MF
                                                                                                                               enzyme regulato r activity
                                                                 1%                            3%
                                                                                 10%                     5%   3%               catalytic activity

 The pie charts almost                                   1%
                                                              5%                                                               antio xidant activity

                                                                                                                               io n binding
 resemble each other                               11%                                                                   24%
                                                                                                                               signal transducer activity

                                                                                                                               pro tein/ribo nucleo pro tein binding

                                                                                                                               nucleic acid binding

                                                                                                                               nucleo tide binding

 Only in “catalytic activity”,                       6%
                                                                                                                    0%
                                                                                                                               mo lecular functio n unkno wn

                                                                                                                               no entry
                                                                               19%                            10%
                                                                                                    2%
 there is a big difference                                                                                                     mo to r activity

                                                                                                                               transcriptio n regulato r activity

                                                                                                                               chapero ne regulato r activity

                                                                                                                               binding



                                                                                                                                   37
Athens, 4 – 5 October 2006                                                           Meeting on Bioinformatics and Medical Informatics
    TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations


                                           Comparison
                             Manual Vs. Automatic Grouping
                                     MF ontology

      700                                                              Manual
      600                                                              Automatic
      500
      400
      300                                                                           MF Ontology
      200
      100
        0

                                                       g
                                                      g




                                                      n




                                                    tr y
                                                      g




                                                      g
                                                     ty




                                                   on
                                                   it y
                                                    ity




                                                      g
                                                   ity


                                                  it y




                                                ow
                                                  in




                                                   in
                                                 lin




                                                  in
                                                 in
                                                ivi




                                              t iv
                                              tiv




                                              en
                                              tiv




                                              m
                                             t iv




                                             nd
                                             nd




                                             nd
                                             nd
                                            na
                                             ct




                                           kn

                                          om
                                          ac
                                          ac




                                          ac


                                          ac




                                          bi
                                          bi




                                          bi
                                         ra




                                         no
                                          bi
                                        sig




                                        un


                                      tc
                                       al




                                       in
                                        n
                                       nt
                                        e




                                        e
                                       ic




                                      id
                                    r te


                                    m




                                    io




                                    id
                                   te
                                   ur




                                    n
                                  da
                                  lyt




                                 ac




                                 no
                                tio
                                po


                                zy




                                 ot
                                ro
                                ct




                               ta


                               xi




                            cle
                              ic




                            nc
                           op
                            en
                             ru


                           ns




                           ca


                           tio




                         cle
                          st




                         fu
                        nu
                       tra




                         le
                       an




                      nu
                      uc




                      ar
                  on




                   ul
                ec
              r ib




             ol
            n/




          m
         ei
       ot
    pr




 • Slight differences           =>             The most remarkable in “catalytic activity”
 • Manual: 2078 occurences                      Automatic: 2858 occurences



                                                                                                           38
Athens, 4 – 5 October 2006                                   Meeting on Bioinformatics and Medical Informatics
      TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations


                                               Comparison
                                                       apoptosis
                   Manual: BP
                                                       metabolism
             13%                4%                     cell adhesion
      3%
                                                       cell cycle
 8%                                            31%
                                                       development

 1%
                                                       transport                                           BP Ontology
                                                       signaling
                                                       biological process unknow n
10%
                                                       no entry
                                          4%
           13%                       4%                stressrelated
                                9%
                                                       other

                                                                                                                    death
                                                                            0%       0%    Automatic: BP
                                                                       1%                                           metabo lism

                                                           1%          1%                 0%                        cell adhesio n

                                                                                                                    cellular physio lo gical pro cess
                                                      3%                1%       0%       0%
                                                                                                                    develo pment
                                                       6%                                 2%
                                                                                 0%
 The pie charts almost                                6%                                                      34%
                                                                                                                    lo calizatio n

                                                                                                                    cell co mmunicatio n

 resemble each other                                                                                                bio lo gical pro cess unkno wn
                                                     10%
                                                                                                                    no entry

                                                                                                                    regulatio n o f physio lo gical pro cess
                                                 1%
                                                                                                                    o rganismal physio lo gical pro cess
                                                       8%                                                2%
                                                                         13%                   4%   7%              respo nse to stimulus

                                                                                                                    regulatio n o f cellular pro cess

                                                                                                                    cell differentiatio n


                                                                                                                      39
                                                                                                                    ho meo stasis


Athens, 4 – 5 October 2006                                                                                          regulatio n o f bio lo gical pro cess
                                                                        Meeting on Bioinformatics and Medical Informatics
                                                                                                                    repro ductio n
    TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations


                                      Comparison
                        Manual Vs. Automatic Grouping
                                BP ontology
     900                                                             Manual
     800
                                                                     Automatic
     700
     600
     500                                                                         BP Ontology
     400
     300
     200
     100
       0




                                                               try
                                                      rt
                                                       t




                                                    on
                                                     n
                                                    th




                                                     n
                                                 ism




                                                     e


                                                   en




                                                    n
                                                  po
                                                  io




                                                  io
                                                  cl




                                               ow




                                                            en
                                               ea




                                               m
                                             pm
                                             es




                                              at
                                             cy




                                             ns
                                              ol




                                          om
                                            /d




                                          kn




                                                           no
                                           ic
                                         ab


                                         dh




                                        tra
                                         lo
                                          ll
                                       sis




                                       un


                                       un
                                       ce




                                      tc
                                      et




                                     ve
                                    ll a
                                    to




                                    m




                                  no
                                   s/
                                   m




                                  de




                                    s
                                 ce
    op




                                om


                                 es
                                 es
  ap




                              oc




                              oc
                            ll c
                           pr




                           pr
                         ce


                       al
                        ol
                     si




                     ic
                  hy




                  og
               rp




               ol
             bi
            la
        llu
     ce




    • Slight differences         =>        The most remarkable in “metabolism”
    • Manual: 2167 occurences               Automatic: 2353 occurences


                                                                                                      40
Athens, 4 – 5 October 2006                              Meeting on Bioinformatics and Medical Informatics
    TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations


               Organization of the presentation



                                              Introduction
                                              Why was the development
                                               of TAGGO a necessity?
                                              Description of the tool
                                              Results
                                              Comparison
                                             • Conclusions
                                             • Acknowledgements
                                                                                     41
Athens, 4 – 5 October 2006             Meeting on Bioinformatics and Medical Informatics
    TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations


                             Conclusions


                              What are the goodies of this tool?
                             • Potential of discarding annotations that are supported by not so reliable
                             ECs
                             • Extremely fast process
                             • All the protein – GO term pairs considered
                             • Convenience of searching the ten most general categories of each
                             term
                             • Usage of one of the most reliable Biological Ontologies (Gene
                             Ontology) for the results’ extraction (well structured ontologies based on
                             biological evidence, widely accepted nomenclature)

                              On the other hand…
                             • Sometimes, terms are assigned to very abstract categories (low info
                             content)
                             • Categories with tiny percentages often do not merge into a broader
                              one                                                           42
Athens, 4 – 5 October 2006                    Meeting on Bioinformatics and Medical Informatics
    TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations


                               TAGGO




                                                                                     43
Athens, 4 – 5 October 2006             Meeting on Bioinformatics and Medical Informatics
    TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations


               Organization of the presentation



                                              Introduction
                                              Why was the development
                                               of TAGGO a necessity?
                                              Description of the tool
                                              Results
                                              Comparison
                                              Conclusions
                                             • Acknowledgements
                                                                                     44
Athens, 4 – 5 October 2006             Meeting on Bioinformatics and Medical Informatics
     TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations


                              Acknowledgements
 My project supervisor
 Assoc. Professor Sophia Kossida
 for the constant feedback and support
 as well as for our
 constructive discussions
 throughout our collaboration          Special thanks go to
                                        Karin Soderman
                                        for supplying the data and
                                        contributing to the results’ comparison
                                        and evaluation


                                                                                      45
 Athens, 4 – 5 October 2006             Meeting on Bioinformatics and Medical Informatics
    TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations


                             Acknowledgements
I want also to sincerely thank
   Fotis E. Psomopoulos
   (PhD student of ISSEL)
  for offering me generously and
                                     The Director of the
  straightaway his help whenever
                                      Intelligent Systems and Software
  I asked for his advice
                                      Engineering Lab (ISSEL),
                                     Professor Pericles A. Mitkas
                                     and my diploma thesis supervisor,
                                     Sotiris T. Diplaris (PhD student of ISSEL)
                                     for introducing me to such topics
                                                                                     46
Athens, 4 – 5 October 2006             Meeting on Bioinformatics and Medical Informatics
    TAGGO: a Tool for Automatic Grouping of Gene Ontology Annotations


                              Questions




                                                 3
                                                       2
                                                     the end
                                                                        1
                               time’s up…! ;-)
                                                                                     47
Athens, 4 – 5 October 2006             Meeting on Bioinformatics and Medical Informatics

								
To top