Ontology Quality and the Semantic Web

Document Sample
Ontology Quality and the Semantic Web Powered By Docstoc
					Ontology Quality and
 the Semantic Web
       Chris Welty
IBM Watson Research Center
•   Welcome, opening joke
•   History of web and hypertext
•   Semantic Web overview
•   Ontology Engineering and Quality
•   Summary and Closing joke
            History of Hypertext
• 1945: Vannevar Bush’s Memex
   – Associative Indexing and links
• 1965: Ted Nelson coins hypertext
   – “Nonsequential writing”
• 1967: Andries van Dam’s Hypertext Editing System
  (sponsored by IBM).
• 1985: Janet Walker’s Symbolics Document Examiner
• 1987: Bill Atkinson’s Hypercard on the Mac
• 1991: Tim Berners-Lee proposes HTTP, HTML, & URL
   – Genesis c. 1989
• 1993: Mark Andreesen releases Mosaic for Mac, Unix,
         Hypertext Research
• Dating back at least to the late 60s
• Many foci
  – Technology (mouse, software, protocols)
  – User interaction
  – Aesthetic
  – Post-modern
  – Engineering
• Largely ignored by web developers
  – Especially in the early days of the web (93-96)
        Grassroots to the Web
• Early web dominated by “what it looks like” in
• Focus on spreading the word, not doing it right
• Many early web pages didn’t have links in text at
   – “Catalog” pages with lists of links
   – “Text” pages with few or no links
   – Embedded images more interesting than links
• Just do it rather than do it right
• But…
   – When the web became serious, the research started
     to matter
                    Semantic Web
• Defined, to date, by RDF and OWL
• Genesis c. 2000
• Still in the “early days”
  – Faster adoption (so far) than early web
  – FOAF the most widely used SW Ontology
            Agent                Document    Image

 Person                  Group
            Ontology Research
• Dating back…
• Multiple foci
  –   Technology (logics, reasoners…)
  –   Meta-physics (what there is)
  –   Knowledge Acquisition
  –   NLP
  –   Engineering
• Largely ignored by SW developers
  – Web 2.0, groundswell
  – Specifically criticized by some SW pundits
          A little semantics…
• The SW catchphrase
  – “A little semantics goes a long way”
• Sometimes strengthened
  – A lot of semantics is too much
  – 80/20 rule
• Double-edged sword
  – FOAF doesn’t look like even 1%
  – The simplicity of FOAF hides any serious value
    proposition for SW
  – SW not for people, for data
  – Important to get it right?
            Some evidence
• Does quality matter?
• Good quality ontologies cost more
  – Required for some applications
• Improvements in quality can improve
  performance [Welty, et al, 2004]
  – 18% f-improvement in search
  – Cleanup cost ~1mw/3000 classes
  – BUT … low quality ontology still improved
       Dimensions of Quality
• Coverage, correctness, richness,
  commitment [Kashyap, 2003]
• Organization, modularity [Rector, 2002]
• Relation to reality [Smith & Welty, 2001]
• Making meaning clear [Guarino, 1998]
• Meta-level consistency [Guarino & Welty,
• Captures the invariant structure of the
  domain [Welty & Guarino, 2001]
       Making Meaning Clear
• Part-of relates parts to their wholes
  – E.g. part-of(engine,car)

   • Part-of is irreflexive
   • Part-of is anti-symmetric
   • Nothing can have only one part
Reduction of unintended models
• Generally, involves more axioms
• Typically requires negation
  – Disjointness
• Positive axioms
  – Also makes meaning clear, e.g.
         Mammal           Chess Piece

 Horse                                  Horse

• Clear significance for ontology alignment
      Meta-Level Consistency with
•   Identity
•   Unity
•   Rigidity
•   Dependence
•   Actuality
•   Permanence

• Note on terminology: property is a unary relation
  (aka class), meta-property is a property of a
• The foundation of ontology, conceptual analysis, etc
• The criteria under which equivalence is determined
   – Or under which difference is determined
• Already accepted practice in RDBs, OOP
• When you conceive of a class, ask “What makes each
  instance unique?”
   – Note for SW: uniqueness not assumed
• Meta-property
   – Is there an identity criterion for this class (+I)
   – Not always productive to specify the precise condition
       • Esp. if this results in artificial attributes
   – -I  +I
               Unity Criteria
• An object x is a whole under w iff w is an
  equivalence relation that binds together all
  the parts of x, such that
          P(y,x)  (P(z,x)  w(y,z))
but not
          w(y,z)  x(P(y,x)  P(z,x))

• P is the part-of relation
• w can be seen as a generalized indirect
          Unity Meta-Properties
• If all instances of a property are wholes under the same
  relation it carries unity (+U)
• When at least one instance of a property is not a whole,
  or when two instances are wholes under different
  relations, it does not carry unity (-U)
• When no instance of a property is a whole, it carries anti-
  unity (~U)

• -U  +U
• +U  ~U
• An essential property of an entity is a property
  that must necessarily (always) hold
• A rigid property is a property that is essential to
  all possible instances (+R)
• A non-rigid property is a property that is not rigid
• An anti-rigid property is a property that is not
  essential to all possible instances (~R)
• +R  ~R
                Formal Rigidity
   f is rigid (+R): x f(x) •f(x)
    – e.g. Person, Apple

   f is non-rigid (-R):    x  f(x)  ¬• f(x)
    – e.g. Red, Male

   f is anti-rigid (~R):   x f(x)  ¬• f(x)
    – e.g. Student, Agent
                                       (what about time?)
         Rigidity Constraint
                     +R  ~R
• Why?

  • x P(x)  Q(x)                    P+R

             Which one is better?
  has-part                       +I+R+U                    +I+R~U
                            Disk Drive            Memory
  Computer Part                            -I~R-U
                                    Computer Part

     +I+R+U        +I+R~U
Disk Drive Memory                     +I~R-U              +I~R~U
                                Disk Part       Memory Part

                                    Due to: Guizzardi, et al, 2004.
             Ontology Alignment
              Are these the same?

            Food                 Food

    Apple                Apple          Caterpillar

• Most automatic alignment tools would say
• Let’s take a closer look
               Ontology Alignment
                     +I+U-D+R                  +I~U+D~R
              Food                      Food

      Apple                     Apple          Caterpillar

•   Different meta-properties for Food
•   Different intended meaning
•   Should not be aligned
•   Meta-level analysis helps make meaning
    more clear
    A formal ontology of properties
                                                          Category +R
                                                       Attribution -R-D
                           Role                         Formal Role
Property                   ~R+D
                                                        Material role
                         Non-rigid           ~R        Phased sortal -D
                            -R                                      +L
           Sortal                                           Mixin -D
                                  Rigid                     Type +O
                                                       Quasi-type -O
     The Backbone Taxonomy
   Assumption: no entity without identity
                                Quine, 1969

• Since identity is supplied by types, every entity
  must instantiate a type
• The taxonomy of types spans the whole domain
• Together with categories, types form the
  backbone taxonomy, which represents the
  invariant structure of a domain (rigid properties
  spanning the whole domain)

 Location      Amount of
                matter                            Agent     Group

                    object      Living being
        Food                                       Legal agent
                Fruit                       Social entity
                                                      Group of people
                           Lepidoptera Vertebrate
Geographical                    n
  Region                                                     Country
                  Caterpilla        Butterfly
        Red apple     r                         Person Organizatio

 Location      Amount of
                matter                                    Group

                    object      Living being

                Fruit                     Social entity
                                                     Group of people
                           Lepidoptera Vertebrate
Geographical                    n
  Region                                                   Country

                                               Person Organizatio
      Upper-Level Backbone
• The upper level backbone accounts for 5%
  of an ontology and spans the domain
• In empirical work, this is the most
  important layer [Fan et al, 2003]
• Some value in providing upper level
  ontologies to establish the basic
         Backbone of quality
• Conjecture: the primary purpose of an
  ontology is to specify the backbone
  taxonomy, which is the invariant structure
  of the domain
• Bad ontologies:
  – “folksonomies”,
  – Subject hierarchies
  – Thesauri
• Good ontologies should:
  – Clarify meaning
    • Add constraints to eliminate unintended models
  – Have clear identity criteria
  – Have consistent meta-level properties
  – Specify the invariant structure of a domain
Use OntoClean
   for all your
cleaning needs!