Docstoc

Ontology Quality and the Semantic Web

Document Sample
Ontology Quality and the Semantic Web Powered By Docstoc
					Ontology Quality and
 the Semantic Web
       Chris Welty
IBM Watson Research Center
                  Outline
•   Welcome, opening joke
•   History of web and hypertext
•   Semantic Web overview
•   Ontology Engineering and Quality
•   Summary and Closing joke
            History of Hypertext
• 1945: Vannevar Bush’s Memex
   – Associative Indexing and links
• 1965: Ted Nelson coins hypertext
   – “Nonsequential writing”
• 1967: Andries van Dam’s Hypertext Editing System
  (sponsored by IBM).
• 1985: Janet Walker’s Symbolics Document Examiner
• 1987: Bill Atkinson’s Hypercard on the Mac
• 1991: Tim Berners-Lee proposes HTTP, HTML, & URL
   – Genesis c. 1989
• 1993: Mark Andreesen releases Mosaic for Mac, Unix,
  Windows…
         Hypertext Research
• Dating back at least to the late 60s
• Many foci
  – Technology (mouse, software, protocols)
  – User interaction
  – Aesthetic
  – Post-modern
  – Engineering
• Largely ignored by web developers
  – Especially in the early days of the web (93-96)
        Grassroots to the Web
• Early web dominated by “what it looks like” in
  Mosaic
• Focus on spreading the word, not doing it right
• Many early web pages didn’t have links in text at
  all
   – “Catalog” pages with lists of links
   – “Text” pages with few or no links
   – Embedded images more interesting than links
• Just do it rather than do it right
• But…
   – When the web became serious, the research started
     to matter
                    Semantic Web
• Defined, to date, by RDF and OWL
• Genesis c. 2000
• Still in the “early days”
  – Faster adoption (so far) than early web
  – FOAF the most widely used SW Ontology
            Agent                Document    Image



 Person                  Group
                                   http://xmlns.com/foaf/0.1/
          Organization
            Ontology Research
• Dating back…
• Multiple foci
  –   Technology (logics, reasoners…)
  –   Meta-physics (what there is)
  –   Knowledge Acquisition
  –   NLP
  –   Engineering
• Largely ignored by SW developers
  – Web 2.0, groundswell
  – Specifically criticized by some SW pundits
          A little semantics…
• The SW catchphrase
  – “A little semantics goes a long way”
• Sometimes strengthened
  – A lot of semantics is too much
  – 80/20 rule
• Double-edged sword
  – FOAF doesn’t look like even 1%
  – The simplicity of FOAF hides any serious value
    proposition for SW
  – SW not for people, for data
  – Important to get it right?
            Some evidence
• Does quality matter?
• Good quality ontologies cost more
  – Required for some applications
• Improvements in quality can improve
  performance [Welty, et al, 2004]
  – 18% f-improvement in search
  – Cleanup cost ~1mw/3000 classes
  – BUT … low quality ontology still improved
    base
       Dimensions of Quality
• Coverage, correctness, richness,
  commitment [Kashyap, 2003]
• Organization, modularity [Rector, 2002]
• Relation to reality [Smith & Welty, 2001]
• Making meaning clear [Guarino, 1998]
• Meta-level consistency [Guarino & Welty,
  2000]
• Captures the invariant structure of the
  domain [Welty & Guarino, 2001]
       Making Meaning Clear
• Part-of relates parts to their wholes
  – E.g. part-of(engine,car)

   • Part-of is irreflexive
   • Part-of is anti-symmetric
   • Nothing can have only one part
Reduction of unintended models
• Generally, involves more axioms
• Typically requires negation
  – Disjointness
• Positive axioms
  – Also makes meaning clear, e.g.
         Mammal           Chess Piece


 Horse                                  Horse

• Clear significance for ontology alignment
      Meta-Level Consistency with
              OntoClean
•   Identity
•   Unity
•   Rigidity
•   Dependence
•   Actuality
•   Permanence

• Note on terminology: property is a unary relation
  (aka class), meta-property is a property of a
  class
                                Identity
• The foundation of ontology, conceptual analysis, etc
• The criteria under which equivalence is determined
   – Or under which difference is determined
• Already accepted practice in RDBs, OOP
• When you conceive of a class, ask “What makes each
  instance unique?”
   – Note for SW: uniqueness not assumed
• Meta-property
   – Is there an identity criterion for this class (+I)
   – Not always productive to specify the precise condition
       • Esp. if this results in artificial attributes
   – -I  +I
               Unity Criteria
• An object x is a whole under w iff w is an
  equivalence relation that binds together all
  the parts of x, such that
          P(y,x)  (P(z,x)  w(y,z))
but not
          w(y,z)  x(P(y,x)  P(z,x))

• P is the part-of relation
• w can be seen as a generalized indirect
  connection
          Unity Meta-Properties
• If all instances of a property are wholes under the same
  relation it carries unity (+U)
• When at least one instance of a property is not a whole,
  or when two instances are wholes under different
  relations, it does not carry unity (-U)
• When no instance of a property is a whole, it carries anti-
  unity (~U)

• -U  +U
• +U  ~U
                     Rigidity
• An essential property of an entity is a property
  that must necessarily (always) hold
• A rigid property is a property that is essential to
  all possible instances (+R)
• A non-rigid property is a property that is not rigid
  (-R)
• An anti-rigid property is a property that is not
  essential to all possible instances (~R)
• +R  ~R
                Formal Rigidity
   f is rigid (+R): x f(x) •f(x)
    – e.g. Person, Apple



   f is non-rigid (-R):    x  f(x)  ¬• f(x)
    – e.g. Red, Male


   f is anti-rigid (~R):   x f(x)  ¬• f(x)
    – e.g. Student, Agent
                                       (what about time?)
         Rigidity Constraint
                     +R  ~R
                                Q~R
• Why?

  • x P(x)  Q(x)                    P+R




                          O10
             Which one is better?
                                                   Computer
      Computer
                                               has-part
  has-part                       +I+R+U                    +I+R~U
                -I~R-U
                            Disk Drive            Memory
  Computer Part                            -I~R-U
                                    Computer Part

     +I+R+U        +I+R~U
Disk Drive Memory                     +I~R-U              +I~R~U
                                Disk Part       Memory Part


                                    Due to: Guizzardi, et al, 2004.
             Ontology Alignment
              Are these the same?

            Food                 Food



    Apple                Apple          Caterpillar



• Most automatic alignment tools would say
  yes
• Let’s take a closer look
               Ontology Alignment
                     +I+U-D+R                  +I~U+D~R
              Food                      Food



      Apple                     Apple          Caterpillar


•   Different meta-properties for Food
•   Different intended meaning
•   Should not be aligned
•   Meta-level analysis helps make meaning
    more clear
    A formal ontology of properties
                                                          Category +R
             Non-sortal
                    -I
                                                       Attribution -R-D
                           Role                         Formal Role
Property                   ~R+D
                                                        Material role
                                          Anti-rigid
                         Non-rigid           ~R        Phased sortal -D
                            -R                                      +L
           Sortal                                           Mixin -D
             +I
                                  Rigid                     Type +O
                                     +R
                                                       Quasi-type -O
     The Backbone Taxonomy
   Assumption: no entity without identity
                                Quine, 1969

• Since identity is supplied by types, every entity
  must instantiate a type
• The taxonomy of types spans the whole domain
• Together with categories, types form the
  backbone taxonomy, which represents the
  invariant structure of a domain (rigid properties
  spanning the whole domain)
                                  Entity

 Location      Amount of
                matter                            Agent     Group

                   Physical
                    object      Living being
        Food                                       Legal agent
                            Red
                Fruit                       Social entity
                                   Animal
                                                      Group of people
               Apple
                           Lepidoptera Vertebrate
Geographical                    n
  Region                                                     Country
                  Caterpilla        Butterfly
        Red apple     r                         Person Organizatio
                                Entity

 Location      Amount of
                matter                                    Group

                   Physical
                    object      Living being


                Fruit                     Social entity
                                 Animal
                                                     Group of people
               Apple
                           Lepidoptera Vertebrate
Geographical                    n
  Region                                                   Country

                                               Person Organizatio
      Upper-Level Backbone
• The upper level backbone accounts for 5%
  of an ontology and spans the domain
• In empirical work, this is the most
  important layer [Fan et al, 2003]
• Some value in providing upper level
  ontologies to establish the basic
  distinctions
         Backbone of quality
• Conjecture: the primary purpose of an
  ontology is to specify the backbone
  taxonomy, which is the invariant structure
  of the domain
• Bad ontologies:
  – “folksonomies”,
  – Subject hierarchies
  – Thesauri
                 Summary
• Good ontologies should:
  – Clarify meaning
    • Add constraints to eliminate unintended models
  – Have clear identity criteria
  – Have consistent meta-level properties
  – Specify the invariant structure of a domain
Use OntoClean
   for all your
    ontology
cleaning needs!

				
DOCUMENT INFO