Ontology of Genetic Susceptibility Factors to Diabetes Mellitus

					  Ontology of Genetic
Susceptibility Factors to
   Diabetes Mellitus

      Yu Lin, Norihiro Sakamoto
    Department of Sociomedical Informatics,
 Graduate School of Medicine, Kobe University
 What are Genetic Susceptibility Factors
  (GSF) ?
 How do we confirm genetic susceptibility ?
 Why do we need an ontology ?
 The Ontology of Genetic Susceptibility
  Factors to Diabetes Mellitus (OGSF-DM)
           Methodology
           Testing
    Discussion
2008/02                    InterOntology08     2
 Search “Genetic Susceptibility” in UMLS

2008/02           InterOntology08          3
 Scope of “GSF to Diabetes Mellitus”
   Those genetic characteristic and interaction
   between genetic and environmental factors which
   increase the probability to develop diabetes
   mellitus (DM).                      If “decrease”,
     polymorphism
     linked loci
     SNP
     haplotype
     genotype
2008/02                 InterOntology08             4



  2008/02                                     InterOntology08                                            5
            Ref: [Rioux JD, Abbas AK.] Paths to understanding the genetic basis of autoimmune disease.
                                   Nature. 2005 Jun 2;435(7042):584-9. Review.
           How to confirm the GSF

    Through combined family-based linkage study
     and population-based association study
    Through a combined genetic (gene-by-gene
     function-candidate) association approach with a
     genome-wide association approach
    Through combined statistical study with
     biological function study

2008/02                  InterOntology08               6
    Factors Affecting Statistical Power
            of Confirming GSF
    Number of disease variants
    Allele frequencies among population
    Effect size on disease phenotype
       Odds Ratio (OR)
    Population structure and geography
    Selection bias
    Genotype and phenotype misclassification errors

          Ref: [ Wang WYS, et al.] Genome-wide association studies: theoretical and practical concerns. Nat Rev Genet 2005, 6:109-118.

2008/02                                                      InterOntology08                                                             7
                   No Criteria Established

    There are no established criteria for confirming
     GSF (Genetic Susceptibility Factors)
             OR1.5-2.0 ?
             sample size
             population

                                      Can we settle down this?

2008/02                     InterOntology08                 8
          A Knowledge Base is Needed
    The primary idea is to catalog all GSF to Diabetes
     Mellitus (DM)
    The reality of researches on GSF to DM
       Different levels of genetic object
       Different types of study design
       Inconsistent result
       Complex phenotypes of DM
    Versatile datasets demand a knowledge base on this

2008/02                    InterOntology08                9
                  Ontology in General
    Originally from philosophy
    An ontology is “specification of a shared conceptualization” [Gruber
    Ontology as an approach to “annotation of multiple bodies of
     data”[Smith B. et al]
    Widely used in computer science and information science
       artificial intelligence
       the Semantic Web
       software engineering
       biomedical informatics “Gene Ontology as a successful example”
       library science
       information architecture as a form of knowledge representation

2008/02                                        InterOntology08                      10
                  Ontology is a Good Tool

    In our case, ontology can help with:
             Knowledge representation
             Database design
             Content-oriented analysis
             Information retrieval and extraction
             Information integration
    By setting rules, can we establish a criteria to
     demonstrate either the genetic susceptibility or
     causality to complex disease?

2008/02                            InterOntology08      11
 What are the Genetic Susceptibility
  Factors (GSF)
 How do we confirm genetic susceptibility
 Why do we need an ontology
 The Ontology of Genetic Susceptibility
  Factors to Diabetes Mellitus (OGSF-DM)
           Methodology
           Testing
    Discussion
2008/02                     InterOntology08   12
          The Methodology of OGSF-DM

 specification           Specify the domain and scope

 conceptualization          Build the conceptual model

 integration         Reuse and import other ontologies

                     Protégé 3.3.1, OWL , SWRL rules

2008/02              InterOntology08                     13
                         Step1. Specification
    Domain: Represent the knowledge of GSF to DM and
     related phenotypes
    Explore relevant literature resources:
             PubMed: a corpus of 5873 abstracts (as on 31 Oct. 2007)
             Books:
                  Joslin’s Diabetes Mellitus
                  Human Molecular Genetics 3
     The most fundamental terms:
              i) Human disease: diabetes mellitus and related disorders;
             ii) Phenotypes and observed quantity parameters;
             iii) Genetic concepts;
             iv) Geographical regions;
             v) Disease gene study of the original paper.

2008/02                                 InterOntology08                     14
            Step2. Conceptualization

    The core conception generated by analyzing the titles
     of the corpus
    The conception shows an N-ary relationship

2008/02                     InterOntology08                  15
          The top-level of OGSF-DM

    Adopted terms from BFO (Basic Formal Ontology ):
     Continuant,Occurrent, Independent_Continuant,
     Dependent_Contiuant , Quality

2008/02                    InterOntology08              16
          The position of core concepts

2008/02               InterOntology08     17
          CLASS: Observed_Relationship

          • Class hierarchy                • Constraints of class

2008/02                       InterOntology08                       18
          The term ‘Allele’ is polysemous
    Genetics definition: an allele is either one of a pair (or
     series) of alternative forms of a gene that can occupy
     the same locus on a particular chromosome, and that
     control the same character of the phenotype.
     “Allele” appeared in different resources:
Meaning of Allele        Appeared Form                     Resource
the variant of gene in   disease “allele”                  original paper
an individual
representation of SNP    “allele/allele” in DNA, RNA and   HGVBase
                         amino acid level
allele sharing in sibs   IBS,IBD “allele”                  linkage study
2008/02                         InterOntology08                        19
           Allele CLASS in OGSF-DM
    An abstraction
    Currently, it satisfied the data model
    Need to be refined in the future

2008/02                       InterOntology08   20
                  Gene concept has evolved
                Gene as                          Gene as                            Gene as
                a distinct                      a physical                           ORF
                  locus                         molecule                           sequence                             Gene as
                                                                                    pattern                               …
   1860s-           1910s           1940s             1950s           1960s            1970s-           1990s-             2007-
   1900s                                                                               1980s            2000s

                               Gene as                            Gene as                            Gene as
 Gene as                                                                                            annotated
                              a blueprint                       transcribed
a discrete                                                                                           genomic
                                 for a                             code
  unit of                                                                                             entity

    Ref: [Gerstein MB, et al.] What is a gene, post-ENCODE? History and updated definition. Genome Research. 2007 Jun;17(6):669-81.

2008/02                                                     InterOntology08                                                           21
             Some definitions of „gene‟
    Human Genome Nomenclature Organization:“a DNA segment that
     contributes to phenotype/function. In the absence of demonstrated function
     a gene may be characterized by sequence, transcription or homology”(Wain
     et al. 2002)
    Rat Genome Database : “the DNA sequence necessary and sufficient to
     express the complete complement of functional products derived from a unit
     of transcription ”(2003)
    Sequence Ontology Consortium: “locatable region of genomic
     sequence,corresponding to a unit of inheritance, which is associated with
     regulatory regions, transcribed regions and/or other functional sequence
     regions” (Pearson 2006).
    ENCODE project Consortium: “The gene is a union of genomic
     sequences encoding a coherent set of potentially overlapping functional
     products.”(Gerstein et al.2008)
    MeSH : genes are “Specific sequences of nucleotides along a molecule of
     DNA (or, in the case of some viruses, RNA) which represent functional units
     of HEREDITY. Most eukaryotic genes contain a set of coding regions
     (EXONS) that are spliced together in the transcript, after removal of
     intervening sequence (INTRONS) and are therefore labeled split genes. ”
2008/02                             InterOntology08                           22
           Gene CLASS in OGSF-DM
    A place holder
    The instance of Gene is the name of the gene which
     appears in the research paper

2008/02                    InterOntology08                23
                       Step3. Integration

    Importing two ontologies:
             ontology of glucose metabolism disorders
               A   slim OBO files was extracted from Human
                 Disease ontology
                OBO file was transfered to OWL file
                The class hierarchy was restructure new terms
                 from “Joslin’s Diabetes Mellitus” added
             ontology of geographical regions
                        by hand adopting the terms from
                Generated
                MeSH2008 “Geographic Locations[z01]”
2008/02                           InterOntology08                24
     Step4. Implementation and Evaluation

   Protégé_3.3.1 + OWL

   SWRL rule example:
          hasPopulation-1 Rule
          isObservedIn (?x, ?y) ∧ hasStudyPopulation(?y, ?z)
          → hasPopulation(?x, ?z)

    to infer the population(z) of the
    Obeserved_Relationship(x) ; y is a
2008/02                              InterOntology08           25
                                         The example article

    Full text URL:

2008/02                                                                InterOntology08   26
                Asserting individual 1)
1) associated_with_1 ⊆ Not_Stated_Resistance_or_Susceptibility_Association
           ⋂∀ hasSupportingEvidence ( ∋ {odds_ratio_OR_1.49 } )
           ⋂∃ isObservedIn ( ∋ {Disease_Genetic_Study_15047632})
           ⋂∃ isObservedRelationshipOf ( ∋ {a_3_intronic_SNP_rs3818247})
           ⋂∃ isRelationshipWith ( ∋ {Type_2_Diabetes_})

    means that a 3’ intronic SNP rs3818247 is
    associated with Type 2 Diabetes with a
    supporting evidence of OR 1.49. The
    relationship is an associated relationship, but is
    stated to be neither a susceptibility nor a
    resistance factor in this study.
2008/02                              InterOntology08                         27
          Asserting individual 2),3),4)
2) odds_ratio_OR_1.49 ⊆ Odds_Ratio
                   ⋂∀ hasOR ( ∋ {1.49} )
                   ⋂∀ hasCI95 ( ∋ {1.15-1.90} )
                   ⋂∃ hasP ( ∋ {Corrected_P_0.0252} ⋂ {Uncorrected_P_0.0028} )
                   ⋂∃ hasClassifiedGroup ( ∋ {Control_Group_1} ⋂ {Case_Group_1} )
3) Control_Group_1 ⊆ Classified_Group
                   ⋂∃hasPopulationSize ( ∋ {342 int})
                   ⋂∀isPartOf ( ∋ {an_ashkenazi_jewish_population})
4) Case_Group_1 ⊆ Classified_Group
                  ⋂∃hasPopulationSize ( ∋ {275 int})
                  ⋂∀isPartOf ( ∋ {an_ashkenazi_jewish_population})

     2), 3) and 4) together means that the study conducted a case-
     control study(case size =275 and control size = 342) in an
     Ashkenazai Jewish population.
     Result: Odds Ratio 1.49 (95%CI:1.15-1.90, corrected P = 0.0252,
                            uncorrected P = 0.0028).

2008/02                             InterOntology08                             28
                 Asserting individual 5)
5) Disease_Gene_Study_15047632 ⊆ Disease_Gene_Study
        ⋂∀ hasPubMedID ( ∋ {PMID_15047632}
        ⋂∃ hasStudyPopulation ( ∋ {an_ashkenazi_jewish_population})
        ⋂∀ hasURI ( ∋ {})
6) an_ashkenazi_jewish_population ⊆ Population_Group
                  ⋂∃hasPopulationCharacteristic (∋ {Jews} )
                  ⋂∃hasGeographicalSite ((∋ {Israel} ⋂ {U.S.} )

5) and 6) means :
① An Ashkenazi Jewish population was investigated in this
② The population belongs to Jews ethinic group and
   located in Israel and U.S. ;
③ the PubMedID and URL of this paper were collected.
2008/02                                 InterOntology08                                    29
                    The core conception

    Put 1)-5) together, the core conception of
     this one relationship is built:
     relationships { associated } between the { 3_intronic_SNP_rs3818247} and
     {Type_2_Diabetes} observed in a { an_ashkenazi_jewish_population } from a study
     { PMID_15047632}.

2008/02                                InterOntology08                                 30
              Representation of a SNP
a_3_intronic_SNP_rs3818247 ⊆ htSNP
     ⋂∃ hasAlleleComponent ( ∋ {DNA_Level_Allele_T} ⋂ { DNA_Level_Allele_G})
     ⋂∃ hasGenomeSite ( ∋ {flanking_3_intronic})
     ⋂∃ isGeneticVariantOf ( ∋ {hepatocyte_nuclear_factor-4_alpha})
     ⋂∃ hasVariantDatabase ( ∋ {HGVBase_SNP002310533} ⋂{dbSNP_rs3818247})

     This means that the 3’ intronic SNP rs3818247 is a htSNP of
     hepatocyte nuclear factor 4 alpha, located in the flanking 3’ intronic
     sequence of the gene. The alleles of this SNP are T/G in DNA level.
     Reference databases entry :
      1) HGVBase : “SNP002310533”
      2) dbSNP : “rs3818247”

2008/02                           InterOntology08                          31

    A hybrid of middle-out and top-down approach was
     conducted to build our ontology.
    BFO is important for harmonizing the domain ontologies
     in our case.
    The ontology can apply to other complex diseases too.
    We anticipate the further application of this ontology:
             Information retrieval
             Knowledge base development
             Logic rules establishing
             Mapping or link to other ontologies, such as GO, Mammalian
              Phenotype, and so on.

2008/02                               InterOntology08                      32

