Ontology of Genetic Susceptibility Factors to Diabetes Mellitus

Document Sample
Ontology of Genetic Susceptibility Factors to Diabetes Mellitus Powered By Docstoc
					  Ontology of Genetic
Susceptibility Factors to
   Diabetes Mellitus

      Yu Lin, Norihiro Sakamoto
    Department of Sociomedical Informatics,
 Graduate School of Medicine, Kobe University
 What are Genetic Susceptibility Factors
  (GSF) ?
 How do we confirm genetic susceptibility ?
 Why do we need an ontology ?
 The Ontology of Genetic Susceptibility
  Factors to Diabetes Mellitus (OGSF-DM)
           Methodology
           Testing
    Discussion
2008/02                    InterOntology08     2
 Search “Genetic Susceptibility” in UMLS

2008/02           InterOntology08          3
 Scope of “GSF to Diabetes Mellitus”
   Those genetic characteristic and interaction
   between genetic and environmental factors which
   increase the probability to develop diabetes
   mellitus (DM).                      If “decrease”,
     polymorphism
     linked loci
     SNP
     haplotype
     genotype
2008/02                 InterOntology08             4



  2008/02                                     InterOntology08                                            5
            Ref: [Rioux JD, Abbas AK.] Paths to understanding the genetic basis of autoimmune disease.
                                   Nature. 2005 Jun 2;435(7042):584-9. Review.
           How to confirm the GSF

    Through combined family-based linkage study
     and population-based association study
    Through a combined genetic (gene-by-gene
     function-candidate) association approach with a
     genome-wide association approach
    Through combined statistical study with
     biological function study

2008/02                  InterOntology08               6
    Factors Affecting Statistical Power
            of Confirming GSF
    Number of disease variants
    Allele frequencies among population
    Effect size on disease phenotype
       Odds Ratio (OR)
    Population structure and geography
    Selection bias
    Genotype and phenotype misclassification errors

          Ref: [ Wang WYS, et al.] Genome-wide association studies: theoretical and practical concerns. Nat Rev Genet 2005, 6:109-118.

2008/02                                                      InterOntology08                                                             7
                   No Criteria Established

    There are no established criteria for confirming
     GSF (Genetic Susceptibility Factors)
             OR1.5-2.0 ?
             sample size
             population

                                      Can we settle down this?

2008/02                     InterOntology08                 8
          A Knowledge Base is Needed
    The primary idea is to catalog all GSF to Diabetes
     Mellitus (DM)
    The reality of researches on GSF to DM
       Different levels of genetic object
       Different types of study design
       Inconsistent result
       Complex phenotypes of DM
    Versatile datasets demand a knowledge base on this

2008/02                    InterOntology08                9
                  Ontology in General
    Originally from philosophy
    An ontology is “specification of a shared conceptualization” [Gruber
    Ontology as an approach to “annotation of multiple bodies of
     data”[Smith B. et al]
    Widely used in computer science and information science
       artificial intelligence
       the Semantic Web
       software engineering
       biomedical informatics “Gene Ontology as a successful example”
       library science
       information architecture as a form of knowledge representation

2008/02                                        InterOntology08                      10
                  Ontology is a Good Tool

    In our case, ontology can help with:
             Knowledge representation
             Database design
             Content-oriented analysis
             Information retrieval and extraction
             Information integration
    By setting rules, can we establish a criteria to
     demonstrate either the genetic susceptibility or
     causality to complex disease?

2008/02                            InterOntology08      11
 What are the Genetic Susceptibility
  Factors (GSF)
 How do we confirm genetic susceptibility
 Why do we need an ontology
 The Ontology of Genetic Susceptibility
  Factors to Diabetes Mellitus (OGSF-DM)
           Methodology
           Testing
    Discussion
2008/02                     InterOntology08   12
          The Methodology of OGSF-DM

 specification           Specify the domain and scope

 conceptualization          Build the conceptual model

 integration         Reuse and import other ontologies

                     Protégé 3.3.1, OWL , SWRL rules

2008/02              InterOntology08                     13
                         Step1. Specification
    Domain: Represent the knowledge of GSF to DM and
     related phenotypes
    Explore relevant literature resources:
             PubMed: a corpus of 5873 abstracts (as on 31 Oct. 2007)
             Books:
                  Joslin’s Diabetes Mellitus
                  Human Molecular Genetics 3
     The most fundamental terms:
              i) Human disease: diabetes mellitus and related disorders;
             ii) Phenotypes and observed quantity parameters;
             iii) Genetic concepts;
             iv) Geographical regions;
             v) Disease gene study of the original paper.

2008/02                                 InterOntology08                     14
            Step2. Conceptualization

    The core conception generated by analyzing the titles
     of the corpus
    The conception shows an N-ary relationship

2008/02                     InterOntology08                  15
          The top-level of OGSF-DM

    Adopted terms from BFO (Basic Formal Ontology ):
     Continuant,Occurrent, Independent_Continuant,
     Dependent_Contiuant , Quality

2008/02                    InterOntology08              16
          The position of core concepts

2008/02               InterOntology08     17
          CLASS: Observed_Relationship

          • Class hierarchy                • Constraints of class

2008/02                       InterOntology08                       18
          The term ‘Allele’ is polysemous
    Genetics definition: an allele is either one of a pair (or
     series) of alternative forms of a gene that can occupy
     the same locus on a particular chromosome, and that
     control the same character of the phenotype.
     “Allele” appeared in different resources:
Meaning of Allele        Appeared Form                     Resource
the variant of gene in   disease “allele”                  original paper
an individual
representation of SNP    “allele/allele” in DNA, RNA and   HGVBase
                         amino acid level
allele sharing in sibs   IBS,IBD “allele”                  linkage study
2008/02                         InterOntology08                        19
           Allele CLASS in OGSF-DM
    An abstraction
    Currently, it satisfied the data model
    Need to be refined in the future

2008/02                       InterOntology08   20
                  Gene concept has evolved
                Gene as                          Gene as                            Gene as
                a distinct                      a physical                           ORF
                  locus                         molecule                           sequence                             Gene as
                                                                                    pattern                               …
   1860s-           1910s           1940s             1950s           1960s            1970s-           1990s-             2007-
   1900s                                                                               1980s            2000s

                               Gene as                            Gene as                            Gene as
 Gene as                                                                                            annotated
                              a blueprint                       transcribed
a discrete                                                                                           genomic
                                 for a                             code
  unit of                                                                                             entity

    Ref: [Gerstein MB, et al.] What is a gene, post-ENCODE? History and updated definition. Genome Research. 2007 Jun;17(6):669-81.

2008/02                                                     InterOntology08                                                           21
             Some definitions of „gene‟
    Human Genome Nomenclature Organization:“a DNA segment that
     contributes to phenotype/function. In the absence of demonstrated function
     a gene may be characterized by sequence, transcription or homology”(Wain
     et al. 2002)
    Rat Genome Database : “the DNA sequence necessary and sufficient to
     express the complete complement of functional products derived from a unit
     of transcription ”(2003)
    Sequence Ontology Consortium: “locatable region of genomic
     sequence,corresponding to a unit of inheritance, which is associated with
     regulatory regions, transcribed regions and/or other functional sequence
     regions” (Pearson 2006).
    ENCODE project Consortium: “The gene is a union of genomic
     sequences encoding a coherent set of potentially overlapping functional
     products.”(Gerstein et al.2008)
    MeSH : genes are “Specific sequences of nucleotides along a molecule of
     DNA (or, in the case of some viruses, RNA) which represent functional units
     of HEREDITY. Most eukaryotic genes contain a set of coding regions
     (EXONS) that are spliced together in the transcript, after removal of
     intervening sequence (INTRONS) and are therefore labeled split genes. ”
2008/02                             InterOntology08                           22
           Gene CLASS in OGSF-DM
    A place holder
    The instance of Gene is the name of the gene which
     appears in the research paper

2008/02                    InterOntology08                23
                       Step3. Integration

    Importing two ontologies:
             ontology of glucose metabolism disorders
               A   slim OBO files was extracted from Human
                 Disease ontology
                OBO file was transfered to OWL file
                The class hierarchy was restructure new terms
                 from “Joslin’s Diabetes Mellitus” added
             ontology of geographical regions
                        by hand adopting the terms from
                Generated
                MeSH2008 “Geographic Locations[z01]”
2008/02                           InterOntology08                24
     Step4. Implementation and Evaluation

   Protégé_3.3.1 + OWL

   SWRL rule example:
          hasPopulation-1 Rule
          isObservedIn (?x, ?y) ∧ hasStudyPopulation(?y, ?z)
          → hasPopulation(?x, ?z)

    to infer the population(z) of the
    Obeserved_Relationship(x) ; y is a
2008/02                              InterOntology08           25
                                         The example article

    Full text URL:

2008/02                                                                InterOntology08   26
                Asserting individual 1)
1) associated_with_1 ⊆ Not_Stated_Resistance_or_Susceptibility_Association
           ⋂∀ hasSupportingEvidence ( ∋ {odds_ratio_OR_1.49 } )
           ⋂∃ isObservedIn ( ∋ {Disease_Genetic_Study_15047632})
           ⋂∃ isObservedRelationshipOf ( ∋ {a_3_intronic_SNP_rs3818247})
           ⋂∃ isRelationshipWith ( ∋ {Type_2_Diabetes_})

    means that a 3’ intronic SNP rs3818247 is
    associated with Type 2 Diabetes with a
    supporting evidence of OR 1.49. The
    relationship is an associated relationship, but is
    stated to be neither a susceptibility nor a
    resistance factor in this study.
2008/02                              InterOntology08                         27
          Asserting individual 2),3),4)
2) odds_ratio_OR_1.49 ⊆ Odds_Ratio
                   ⋂∀ hasOR ( ∋ {1.49} )
                   ⋂∀ hasCI95 ( ∋ {1.15-1.90} )
                   ⋂∃ hasP ( ∋ {Corrected_P_0.0252} ⋂ {Uncorrected_P_0.0028} )
                   ⋂∃ hasClassifiedGroup ( ∋ {Control_Group_1} ⋂ {Case_Group_1} )
3) Control_Group_1 ⊆ Classified_Group
                   ⋂∃hasPopulationSize ( ∋ {342 int})
                   ⋂∀isPartOf ( ∋ {an_ashkenazi_jewish_population})
4) Case_Group_1 ⊆ Classified_Group
                  ⋂∃hasPopulationSize ( ∋ {275 int})
                  ⋂∀isPartOf ( ∋ {an_ashkenazi_jewish_population})

     2), 3) and 4) together means that the study conducted a case-
     control study(case size =275 and control size = 342) in an
     Ashkenazai Jewish population.
     Result: Odds Ratio 1.49 (95%CI:1.15-1.90, corrected P = 0.0252,
                            uncorrected P = 0.0028).

2008/02                             InterOntology08                             28
                 Asserting individual 5)
5) Disease_Gene_Study_15047632 ⊆ Disease_Gene_Study
        ⋂∀ hasPubMedID ( ∋ {PMID_15047632}
        ⋂∃ hasStudyPopulation ( ∋ {an_ashkenazi_jewish_population})
        ⋂∀ hasURI ( ∋ {})
6) an_ashkenazi_jewish_population ⊆ Population_Group
                  ⋂∃hasPopulationCharacteristic (∋ {Jews} )
                  ⋂∃hasGeographicalSite ((∋ {Israel} ⋂ {U.S.} )

5) and 6) means :
① An Ashkenazi Jewish population was investigated in this
② The population belongs to Jews ethinic group and
   located in Israel and U.S. ;
③ the PubMedID and URL of this paper were collected.
2008/02                                 InterOntology08                                    29
                    The core conception

    Put 1)-5) together, the core conception of
     this one relationship is built:
     relationships { associated } between the { 3_intronic_SNP_rs3818247} and
     {Type_2_Diabetes} observed in a { an_ashkenazi_jewish_population } from a study
     { PMID_15047632}.

2008/02                                InterOntology08                                 30
              Representation of a SNP
a_3_intronic_SNP_rs3818247 ⊆ htSNP
     ⋂∃ hasAlleleComponent ( ∋ {DNA_Level_Allele_T} ⋂ { DNA_Level_Allele_G})
     ⋂∃ hasGenomeSite ( ∋ {flanking_3_intronic})
     ⋂∃ isGeneticVariantOf ( ∋ {hepatocyte_nuclear_factor-4_alpha})
     ⋂∃ hasVariantDatabase ( ∋ {HGVBase_SNP002310533} ⋂{dbSNP_rs3818247})

     This means that the 3’ intronic SNP rs3818247 is a htSNP of
     hepatocyte nuclear factor 4 alpha, located in the flanking 3’ intronic
     sequence of the gene. The alleles of this SNP are T/G in DNA level.
     Reference databases entry :
      1) HGVBase : “SNP002310533”
      2) dbSNP : “rs3818247”

2008/02                           InterOntology08                          31

    A hybrid of middle-out and top-down approach was
     conducted to build our ontology.
    BFO is important for harmonizing the domain ontologies
     in our case.
    The ontology can apply to other complex diseases too.
    We anticipate the further application of this ontology:
             Information retrieval
             Knowledge base development
             Logic rules establishing
             Mapping or link to other ontologies, such as GO, Mammalian
              Phenotype, and so on.

2008/02                               InterOntology08                      32

Shared By: