Docstoc

Ontology

Document Sample
Ontology Powered By Docstoc
					Ontology Engineering &
     Maintenance


    Semantic Web - Spring 2006
  Computer Engineering Department
   Sharif University of Technology
Outline
 Ontology Engineering
 Ontology evaluation
Introduction
   Why do we use ontology?
     To describe the semantics of the data (which we
      name as Meta-Data)

   Why do we describe the semantics?
     In order to provide a uniform way to make
      different parties to understand each other

   Which data?
     Any data (on the web, or in the existing legacy
      databases)
Introduction
   Formal definition on Ontology:
       Ontologies are knowledge bodies that provide a
        formal representation of a shared
        conceptualization of a particular domain.
   Ontologies are widely used in the Semantic
    Web.
     Recently ontologies have become
      increasingly common on WWW where
      they provide semantics of annotations in
      web pages
What Is “Ontology Engineering”?
Ontology Engineering: Defining terms in the
 domain and relations among them
     Defining concepts in the domain (classes)
     Arranging the concepts in a hierarchy
      (subclass-superclass hierarchy)
     Defining which attributes and properties (slots)
      classes can have and constraints on their
      values
     Defining individuals and filling in slot values
  Ontology-Development Process
  here:


 determine    consider    enumerate        define        define       define        create
   scope       reuse        terms         classes      properties   constraints   instances




In reality - an iterative process:
 determine    consider    enumerate      consider        define     enumerate       define
   scope       reuse        terms         reuse         classes       terms        classes

  define       define       define         define        create       define        create
properties    classes     properties     constraints   instances     classes      instances


consider       define        define        create
 reuse       properties    constraints   instances
Determine Domain and Scope
     determine   consider   enumerate    define     define       define        create
       scope      reuse       terms     classes   properties   constraints   instances



   What is the domain that the ontology will
    cover?
   For what we are going to use the ontology?
   For what types of questions the information in
    the ontology should provide answers?
Consider Reuse
    determine   consider   enumerate    define     define       define        create
      scope      reuse       terms     classes   properties   constraints   instances



   Why reuse other ontologies?
       to save the effort
       to interact with the tools that use other
        ontologies
       to use ontologies that have been validated
        through use in applications
What to Reuse?
   Ontology libraries
       DAML ontology library (www.daml.org/ontologies)
       Ontolingua ontology library
        (www.ksl.stanford.edu/software/ontolingua/)
       Protégé ontology library
        (protege.stanford.edu/plugins.html)
   Upper ontologies
       IEEE Standard Upper Ontology (suo.ieee.org)
       Cyc (www.cyc.com)
What to Reuse? (II)
   General ontologies
       DMOZ (www.dmoz.org)
       WordNet (www.cogsci.princeton.edu/~wn/)
   Domain-specific ontologies
       UMLS Semantic Net
       GO (Gene Ontology) (www.geneontology.org)
Enumerate Important Terms
    determine   consider   enumerate    define     define       define        create
      scope      reuse       terms     classes   properties   constraints   instances



 What are the terms we need to talk
  about?
 What are the properties of these terms?
 What do we want to say about the terms?
Define Classes and the Class
Hierarchy
    determine   consider   enumerate    define     define       define        create
      scope      reuse       terms     classes   properties   constraints   instances



   A class is a concept in the domain
        a class of wines
        a class of wineries
        a class of red wines
   A class is a collection of elements with similar
    properties
   Instances of classes
        a glass of California wine you’ll have for lunch
Class Inheritance
   Classes usually constitute a taxonomic hierarchy
    (a subclass-superclass hierarchy)
   A class hierarchy is usually an IS-A hierarchy:
    an instance of a subclass is an instance of
     a superclass
   If you think of a class as a set of elements, a
    subclass is a subset
   e.g., Apple is a subclass of Fruit
    Every apple is a fruit
Levels in the Hierarchy
                           Top
                          level


      Middle
       level

                          Bottom
                           level
Modes of Development
 top-down – define the most general
  concepts first and then specialize them
 bottom-up – define the most specific
  concepts and then organize them in more
  general classes
 combination – define the more salient
  concepts first and then generalize and
  specialize them
Documentation
   Classes (and Properties) usually have
    documentation
       Describing the class in natural language
       Listing domain assumptions relevant to the
        class definition
       Listing synonyms
   Documenting classes and slots is as
    important as documenting computer code!
Define Properties (Slots) of Classes
    determine   consider   enumerate    define     define       define        create
      scope      reuse       terms     classes   properties   constraints   instances



   Properties in a class definition describe
    attributes of instances of the class and
    relations to other instances
     Each wine will have color, sugar content,
       producer, etc.
Properties (Slots)
   Types of properties
       “intrinsic” properties: flavor and color of wine
       “extrinsic” properties: name and price of wine
       parts: ingredients in a dish
       relations to other objects: producer of wine (winery)


   Simple and complex properties
       simple properties (attributes): contain primitive values
        (strings, numbers)
       complex properties: contain (or point to) other objects
        (e.g., a winery instance)
Property Constraints (facets)
    determine   consider   enumerate    define     define       define        create
      scope      reuse       terms     classes   properties   constraints   instances



   Property constraints (facets) describe or
    limit the set of possible values for a
    property

     The name of a wine is a string
     The wine producer is an instance of Winery
     A winery has exactly one location
An Example: Domain and Range
        DOMAIN                                   RANGE

          class              slot          allowed values


   When defining a domain or range for a slot, find the most
    general class or classes
   Consider the flavor slot
       Domain: Red wine, White wine, Rosé wine
       Domain: Wine
   Consider the produces slot for a Winery:
       Range: Red wine, White wine, Rosé wine
       Range: Wine
Create Instances
    determine   consider   enumerate    define     define       define        create
      scope      reuse       terms     classes   properties   constraints   instances


   Create an instance of a class
        The class becomes a direct type of the instance
        Any superclass of the direct type is a type of the
         instance
   Assign slot values for the instance frame
        Slot values should conform to the facet constraints
        Knowledge-acquisition tools often check that
Defining Classes and a Class Hierarchy
   The things to remember:
       There is no single correct class hierarchy
       But there are some guidelines


   The question to ask:
    “Is each instance of the subclass an instance of
      its superclass?”
Transitivity of the Class Hierarchy

   The is-a relationship is
    transitive:
    B is a subclass of A
    C is a subclass of B
    C is a subclass of A


   A direct superclass of a
    class is its “closest”
    superclass
Multiple Inheritance
   A class can have more than
    one superclass
   A subclass inherits slots and
    facet restrictions from all the
    parents
   Different systems resolve
    conflicts differently
Disjoint Classes
   Classes are disjoint if they cannot have common instances
   Disjoint classes cannot have any common subclasses either




                                                       Wine
Red wine, White wine,                   Dessert
                                         wine
  Rosé wine are disjoint
Dessert wine and Red             Red         White   Rosé
  wine are not disjoint          wine        wine    wine
Avoiding Class Cycles
           Danger of multiple
            inheritance: cycles in the
            class hierarchy
           Classes A, B, and C have
            equivalent sets of instances
                 By many definitions, A, B, and
                  C are thus equivalent
The Perfect Family Size
                If a class has only one child,
                 there may be a modeling
                 problem
                If the only Red Burgundy we
                 have is Côtes d’Or, why
                 introduce the sub-hierarchy?
                Compare to bullets in a
                 bulleted list
The Perfect Family Size (II)
                       If a class has more
                        than a dozen children,
                        additional
                        subcategories may be
                        necessary
                       However, if no natural
                        classification exists,
                        the long list may be
                        more natural
Single and Plural Class Names
                          A “wine” is not a kind-of
                           “wines”
                          A wine is an instance of the
                           class Wines
Class                     Class names should be either
                                all singular
           instance-of          all plural
Instance
Classes and Their Names
   Classes represent concepts in the domain, not
    their names
   The class name can change, but it will still refer
    to the same concept
   Synonym names for the same concept are not
    different classes
       Many systems allow listing synonyms as part of the class
        definition
Content: Top-Level Ontologies
   What does “top-level” mean?
       Objects: tangible, intangible
       Processes, events, actors, roles
       Agents, organizations
       Spaces, boundaries, location
       Time
   IEEE Standard Upper Ontology effort
       Goal: Design a single upper-level ontology
       Process: Merge upper-level of existing ontologies
CYC: Top-Level Categories
WORDNET: Representation of Subclass
Relation among Synsets
Sowa’s Ontology
Ontology Evaluation
   Key factor which makes a particular discipline or
    approach scientific is the ability to evaluate and
    compare the ideas within the area.
   In most practical cases ontologies are a non-
    uniquely expressible.
   One can build many different ontologies which
    conceptualizing the same body of knowledge.
   We should be able to say which of these
    ontologies serves better some predefined
    criterion.
Categories of Ontology Evaluation
   Those based on comparing the ontology to a
    "golden standard“ (a ontology).
   Those based on using the ontology in an
    application and evaluating the results of it.
   Those involving comparisons with a source of
    data (e.g. a collection of documents) about the
    domain that is to be covered by the ontology.
   Those where evaluation is done by humans who
    try to assess how well the ontology meets a set
    of predefined criteria, standards, requirements,
    etc.
Different Levels of Evaluation
 Lexical, vocabulary, or Data Layer
 Hierarchy or Taxonomy
 Other Semantic relations
 Context or application level
 Syntactic Level
 Structure, Architecture, Design
 Multiple-criteria approaches
A: Lexical, Vocabulary, or Data Layer
   The focus is on which concepts, instances, facts, etc. have
    been include in the ontology, and the vocabulary used to
    represent or identify these concepts.
   Evaluation on this level tends to involve comparisons with
    various sources of data concerning the problem, as well as
    techniques such as string similarity measures (e.g. edit
    distance).
   MAEDCHE AND STAAB (2002). Concepts are compared to a
    “Golden Standard” set of strings that are considered a good
    representation of the concepts.
   Golden standard
       Another ontology
       Taken statistically from a corpus of documents
       Prepared by domain experts.
B: Hierarchy or Taxonomy
   An ontology typically includes a hierarchical “is-a
    or subsumption” relation between concepts.
   BREWSTER et al. (2004) used a data-driven
    approach to evaluate the degree of structural fit
    between an ontology and a corpus of documents.
       Cluster the documents and make topic representing
        documents
       Each concept c of the ontology is represented by a set of
        terms including its name in the ontology and the
        hypernyms of this name, taken from Wordnet.
       Measure how well a concept fits a topic results from the
        clustering step.
       Indicate that the structure of the ontology is reasonably
        well aligned with the hidden structure of topics in the
        domain-specific corpus of documents.
C: Context Level
   An ontology may be part of a larger collection of ontologies,
    and may reference or be referenced by various definitions
    in these other ontologies. In this case it may be important
    to take this context into account when evaluating it.
   Swoogle search engine uses cross-references between
    semantic-web documents to define a graph and compute a
    score for each ontology in a manner analogous to PageRank
    used by the Google web search engine. The resulting
    “ontology rank” is used by Swoogle to rank its query
    results.
   An important difference in comparison to PageRank is that
    not all “links” or references between ontologies are treated
    the same. If one ontology defines a subclass of a class from
    another ontology, this reference might be considered more
    important than if one ontology only uses a class from
    another as the domain or range of some relation.
D: Application Level
   It may be more practical to evaluate an ontology
    within the context of particular application, and
    to see how the results of the application are
    affected by the use of ontology in question.
   The outputs of the application, or its performance
    on the given task, might be better or worse
    depending partly on the ontology used in it.
   One might argue that a good ontology is one
    which helps the application in question produce
    good results on the given task.
E: Syntactic Level
 For manually constructed Ontologies.
 The ontology is usually described in a
  particular formal language and must
  match the syntactic requirements of that
  language (use of the correct keywords,
  etc.).
 This is probably the one that lends itself
  the most easily to automated processing.
F: Structure, Architecture, Design
 This is primarily of interest in manually
  constructed ontologies.
 Assuming    that some kind of design
  principles or criteria have been agreed
  upon prior to constructing the ontology,
  evaluation on this level means checking to
  what extent the resulting ontology
  matches those criteria.
 Must usually be done largely or even
  entirely manually by people such as
  ontological engineers and domain experts.
G: Multiple-Criteria Approaches
   Selecting a good ontology from a given set of
    ontologies.
   Techniques familiar from the area of decision
    support systems can be used to help us evaluate
    the ontologies and choose one of them.
   Are based on defining several decision criteria or
    attributes;
       for each criterion, the ontology is evaluated and given a
        numerical score.
       A weight is assigned to each criterion.
       An overall score for the ontology is then computed as a
        weighted sum of its per-criterion scores.
Example Select an Ontology - Type G:
Ontology Auditor Metrics Suite
  Metric          Attributes                      Description
  Syntactic      Lawfulness          Correctness of syntax used
   Quality       Richness            Breadth of syntax used

                 Interpretability    Meaningfulness of terms
  Semantic
                 Consistency         Consistency of meaning of terms
   Quality
                 Clarity             Average number of word senses

                 Comprehensibility   Amount of information
 Pragmatic                           Accuracy of information
                 Accuracy
  Quality
                 Relevance           Relevance of information for a task

                 Authority           Extent to which other ontologies rely on it
Social Quality
                 History             Number of times ontology has been used
Example Cont.: Overall Quality Metric
   Overall quality (Q) is a weighted function of its
    constituents:
      Q = c1 × S + c2 × E + c3 × P + c4 × O
           where
            S = syntactic quality
            E = semantic quality
            P = pragmatic quality
            O = social quality, and
            c1+c2+c3+c4 = 1

   The weights sum to unity, and currently, are set
    by the user, the application, or else assumed
    equal
Example Cont.: Syntactic Quality (S)
   Measures the quality of the ontology
    according to the way it is written.
            Lawfulness
                  refers to the degree to which an ontology language’s
                   rules have been complied.
            Richness
                  refers to the proportion of features in the ontology
                   language that have been used in an ontology

 Syntactic Quality (S)    S = b1SL + b2SR

                          Let X be total syntactical rules. Let Xb be total breached rules. Let NS
 Lawfulness (SL)          be the number of statements in the ontology. Then SL = Xb / NS.

                          Let Y be the total syntactical features available in ontology language.
 Richness (SR)            Let Z be the total syntactical features used in this ontology.
                          Then SR = Z/Y.
Example Cont.: Semantic Quality (E)
   Evaluates the meaning of terms in the
    ontology library.
        Interpretability
               refers to the meaning of terms in the ontology
        Consistency
               whether terms have consistent meaning
        Clarity
               whether the context of terms is clear
 Semantic Quality (E)     E = b1EI + b2EC + b3EA

                          Let C be the total number of terms used to define classes and
 Interpretability (EI)    properties in ontology. Let W be the number of terms that have a
                          sense listed in WordNet. Then EI = W/C.

                          Let I = 0. Let C be the number of classes and properties in ontology.
 Consistency (EC)         Ci, if meaning in ontology is inconsistent, I+1. I = number of terms
                          with inconsistent meaning. Ec = I/C.
                          Let Ci = name of class or property in ontology.  Ci, count Ai , (the
 Clarity (EA)             number of word senses for that term in WordNet). Then EA = A/C.
Example Cont.: Pragmatic Quality (P)
   Refers to ontology’s usefulness for users or their
    agents, irrespective of syntax or semantics.
         Accuracy
               whether the claims an ontology makes are ‘true.’
         Comprehensiveness
               measure of the size of the ontology.
         Relevance
               whether ontology satisfies the agent’s specific requirements.
    Pragmatic Quality (P)    P = b1PO + b2PU + b3PR
                             Let C be the total number of classes and properties in ontology. Let V
    Comprehensiveness (PO)   be the average value for C across entire library. Then PO = C/V.

                             Let NS be the number of statements in ontology. Let F be the number
     Accuracy (PU)           of false statements. PU = F/NS. Requires evaluation by domain expert
                             and/or truth maintenance system.
                             Let NS be the number of statements in the ontology. Let S be the type
         Relevance (PR)      of syntax relevant to agent. Let R be the number of statements within
                             NS that use S. PR = R / NS.
Example Cont.: Social Quality (O)
   Reflects that agents and ontologies exist
    in communities.
       Authority
             number of other ontologies that link to it
       History
             number of times the ontology is accessed

        Social Quality (O)   O = b1OT + b2OH

        Authority (OT)       Let an ontology in the library be OA. Let the set of other ontologies
                             in the library be L. Let the total number of links from ontologies in
                             L to OA be K. Let the average value for K across ontology library be
                             V. Then OT = K/V.

        History (OH)         Let the total number of accesses to an ontology be A. Let the
                             average value for A across ontology library be H. Then OH = A/H.
The End

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:22
posted:6/23/2012
language:English
pages:51