The Semantic Web

Document Sample
The Semantic Web Powered By Docstoc
					The Semantic Web

Is it “The Web shortcut to A.I.”?

       Or more? Or less?
                                          Evolution of World Wide Web

                                                                                    Rule Interchange
                                                                                                          Personal Agents
                                                     Semantic Web era               Linked Data
                                                                                                       WWW Database
                                                                                          Web 3.0
Richness of data connections
                                                                                 SPARQL     2010-2020
                                                                    ATOM   RDF/OWL         Cloud computing & SaaS
                                                                       AJAX/JSON   Social networks
                                                                         SOAP    Blogs/Wikis
                                              WWW era                  Web 2.0
                                                                 XML      Portals
                                                          OO/Java    Intranet
                                                   HTTP/HTML         Groupware
                                                      Web 1.0
                               PC era          Gopher
                                             SQL        Databases
                                         SGML      File servers
                               FTP  File systems

                                          Richness of social connections
Motivation for Semantic Web

Before Semantic Web                           Semantic Web Structure

                                                                             Ontologies   Logical Support

                                                                               Tools      Applications /

WWW      Creators                 Users
                                               WWW         Creators                              Users
and                                            and
Beyond              Web content                                            Web content

                                          7                                                                 8
Linked Data: The World Wide
Web database
The Semantic Web
   A Vision Of Possibilities
   “The Semantic Web is an extension of
    the current web in which information
    is given well-defined meaning, better
    enabling computers and people to
    work in cooperation.”
   -- Tim Berners-Lee, James Hendler and Ora Lassila,
    The Semantic Web, Scientific American, May 2001

Semantic Web
   In the Semantic Web we will need:
       Machines talking to machines – semantics need to
        be unambiguously declared
       Joined-up data – enabling complex tasks based on
        information from various sources
       Wide scope – from, say, home to government to
       Trust – both in data and who is saying it

   This is not going to be easily achieved

   Semantic Web vs Semantic
Semantic Web                               Semantic Technologies

Semantic Web Formats (RDF, OWL, etc.)
                                           Natural-language processing
Query language (SPARQL)
                                           Data mining/Machine learning
Rules language (RIF)
                                           Artificial intelligence/Expert systems
Web pages marking language (RDFa)
                                           Classification
Triple/Quad stores

                                           Semantic search
    Semantic web applications
   Examples:
       Personal information management (Chandler)
       Social networking (FOAF)
       Information syndication (RSS,PRISM)
       Library/museum data (Dublin Core, Harmony)
       Network security and configuration (SWAD-E)
Interesting quotes
   “Knowledge representation (…) is
    clearly a good idea, and some very nice
    demonstrations exist, but it has not yet
    changed the world.”

Meaning: Of course the Semantic Web
 will do that. Will it?
Today´s web
   It is designed for human consumption
   Information retrieval is mainly
    supported by keyword-based search
   Some problems with information
      High recall, low precision

      Low or no recall

      Results are highly sensitive to
            But what about machines?



            Machines still have a very minimal
            understanding of text and images.
   machine-friendly data
   Li Ding is a person              LiDingisasaon
       Natural Language
        as seen by a person                  as seen by a machine

<person>Li Ding</person> <on>LiDig</on>
       XML – represent structures
         as seen by a person                 as seen by a machine
       Semantic Web - represent more semantics
            represent structures
            enable common vocabulary
            associate symbols with logic interpretation for
             inference                                         12
  The Semantic Web
  XML    Customized tags, like:

+ RDF    Relations, in triples, like:
    (Nena) (is_dog_of) (Ahmed/Said)

+ Ontologies        Hierarchies of concepts, like
    mammal -> canine -> Cotton de Tulear -> Nena

+ Inference rules            Like:
    If (person) (owns) (dog), then (person) (cares_for) (dog)

= Semantic Web!
             Semantic Web Layers

           "The Semantic Web is an extension of the current web in which information is
           given well-defined meaning, better enabling computers and people to work in
           cooperation.“    – Berners-Lee, Hendler & Lassila, Scientific American, 2001
Image source:
     XML (eXtensible Markup Language)

 Standard for information and exchange

    HTML: restricted set of tags, e.g. <TABLE>, <H1>, <B>, etc.
    XML: you can create your own tags
 Selena Sol (2000) highlights the four major benefits of using XML
    XML separates data from presentation which means making changes to the
    display of data does not affect the XML data;
    Searching for data in XML documents becomes easier as search engines can
    parse the description-bearing tags of the XML documents;
    XML tag is human readable, even a person with no knowledge of XML
    language can still read an XML document;
    Complex structures and relations of data can be encoded using XML.

               XML: An Example

• XML is a semi structured language
     <Book Id= “B105”>
       <Title> Topics in Optimal Transportation </Title>
            <Name> Cedric Villani </Name>
            <Name> American Mathematical Society
            <Place> NewYork</Place>
RDF Motivation
   The Resource Description Framework (RDF)
    is a language for representing resources in
    the World Wide Web.
   RDF is intended for situations in which this
    information needs to be processed by
    applications, rather than being only displayed
    to people.
   RDF is based on the idea of identifying things
    using Web identifiers (URIs).
                The Semantic Web is simple
        Don't say "colour" say <>
              Each URI denotes a concept
              URIs are connected by triples

              Relational database                            RDF (Resource Description Framework)
              Machines read data as directed RDF graph

Source: Tim Berners-Lee, Putting the Web back into Semantic Web, ISWC2005 Keynote                   18
      RDF Basic Concepts

    „Imagine trying to state that someone named Ahmed
    Hassan created a particular Web page.“ has a creator whose value is Ahmed Hassan

   the thing the statement describes (the web page`s URL)
   a specific property of the thing (e.g. creator)
   the concrete message the statement wants to give,
    in other words the value of the property (Ahmed
     RDF Basic Concepts
RDF terminology
   the part that identifies the thing the statemant is about
    is called subject
   the part that identifies the property is called predicate
   the part that identifies the value of the property is called

                      Subject               Object
       RDF Basic Concepts
RDF terminology
   the part that identifies the thing the statemant is about is called subject
   the part that identifies the property is called predicate
   the part that identifies the value of the property is called object

                  Subject                 Object has a creator whose value is Ahmed Hassan

   the subject is the URL
   the predicate is the word „creator“
   the object is the name „Ahmed Hassan“
      RDF Model
      As mentioned:
             RDF makes statements about resources
             Each statement consists of a subject, a predicate and an object has a creator whose value is Ahmed Hassan

          subject                                                                 predicate



  RDF Basic Concepts
To make these statements machine-proccessable
two things are needed:
      a system of machine-processable identifiers (for subjects,
       predicates and objects) without any possibilty of confusion between
       similar looking identifiers
  Uniform Resource Identifiers (URI) allow to identify and
  uniquely name things - even if they have no network-
  accessible location.

      a machine-processable language for representing these statements
       and exchanging them between machines

  RDF defines a XML markup language, named RDF/XML,
  which allows to represent RDF statements.
       RDF Syntax


          August 16, 1999

<?xml version="1.0"?>
<rdf:RDF xmlns:rdf=""

   <rdf:Description rdf:about="">
     <exterms:creation-date>August 16, 1999</exterms:creation-

RDF Developments
   We have seen that:
       RDF looks complex
       There are still some uncertain areas
   Let’s now look at:
       A simple RDF application
       Browser support
       Project work
       Related work which may:
            require the Semantic Web
            be used to build the Semantic Web
        A Lightweight RDF Application

   RSS (RDF Site Summary):
     Example of a lightweight RDF application

     A format for news syndication

     Worth looking at for:

              News syndication
              Gaining experience of an RDF application
       See:
        Browser Support

       The Mozilla open
        source browser is
        using RDF to
        integrate and

RDF Conclusion
   Expressivity of RDF is limited
       Local scope of properties
       Disjointness of classes
       Boolean combination of classes
       Cardiniality restrictions
       Special characteristics of properties
   Need for standardized ontology language
    that builds upon existing concepts of RDF

    => OWL Web Ontology Language
What Is An Ontology
   An ontology is an explicit description of a
       concepts
       properties and attributes of concepts
       constraints on properties and attributes
       Individuals (often, but not always)
   An ontology defines
       a common vocabulary
       a shared understanding
Ontology Examples
   Taxonomies on the Web
       Yahoo! categories
   Catalogs for on-line shopping
     product catalog
   Domain-specific standard terminology
       Unified Medical Language System (UMLS)
       UNSPSC - terminology for products and
Ontology-Development Process

 determine    consider    enumerate        define        define       define        create
   scope       reuse        terms         classes      properties   constraints   instances

In reality - an iterative process:
 determine    consider    enumerate      consider        define     enumerate       define
   scope       reuse        terms         reuse         classes       terms        classes

  define       define       define         define        create       define        create
properties    classes     properties     constraints   instances     classes      instances

consider       define        define        create
 reuse       properties    constraints   instances
       Determine Domain and Scope
     determine   consider   enumerate    define     define       define        create
       scope      reuse       terms     classes   properties   constraints   instances

   What is the domain that the ontology will
   For what we are going to use the ontology?
   For what types of questions the information
    in the ontology should provide answers
    (competency questions)?
    Answers to these questions may change during
                      the lifecycle
        Consider Reuse
     determine   consider   enumerate    define     define       define        create
       scope      reuse       terms     classes   properties   constraints   instances

   Why reuse other ontologies?
       to save the effort
       to interact with the tools that use other
       to use ontologies that have been validated
        through use in applications
What to Reuse?
   Ontology libraries
       DAML ontology library
       Ontolingua ontology library
       Protégé ontology library
   Upper ontologies
       IEEE Standard Upper Ontology
       Cyc (
What to Reuse? (II)
   General ontologies
       DMOZ (
       WordNet
   Domain-specific ontologies
       UMLS Semantic Net
       GO (Gene Ontology)
        Enumerate Important Terms
    determine   consider   enumerate    define     define       define        create
      scope      reuse       terms     classes   properties   constraints   instances

   What are the terms we need to talk
   What are the properties of these terms?
   What do we want to say about the terms?
Define Classes and the Class
    determine   consider   enumerate    define     define       define        create
      scope      reuse       terms     classes   properties   constraints   instances

   A class is a concept in the domain
       a class of courses
       a class of students
       a class of graduate students
   A class is a collection of elements with similar
   Instances of classes
       a student of AI course who will come today
Class Inheritance
   Classes usually constitute a taxonomic
    hierarchy (a subclass-superclass hierarchy)
   A class hierarchy is usually an IS-A hierarchy:
    an instance of a subclass is an instance of a
   If you think of a class as a set of elements, a
    subclass is a subset
Class Inheritance - Example
   Apple is a subclass of Fruit
    Every apple is a fruit
   Student is a subclass of Person
    Every Student is a Person
   G student is a subclass of Student
    Every G Student is a Student
Modes of Development
   top-down – define the most general
    concepts first and then specialize them
   bottom-up – define the most specific
    concepts and then organize them in
    more general classes
   combination – define the more salient
    concepts first and then generalize and
    specialize them
Define Properties of Classes – Slots
    determine   consider   enumerate    define     define       define        create
      scope      reuse       terms     classes   properties   constraints   instances

   Slots in a class definition describe
    attributes of instances of the class and
    relations to other instances
    Each Student will have Name, GPA, Address,
Slots for the Class

      (in Protégé-2000)
        Property Constraints
    determine   consider   enumerate    define     define       define        create
      scope      reuse       terms     classes   properties   constraints   instances

   Property constraints (facets) describe or
    limit the set of possible values for a slot
    The name of a Student is a string
    The Birth date is an instance of Date
    A student has exactly one Address
Common Facets
   Slot cardinality – the number of values
    a slot has
   Slot value type – the type of values a
    slot has
   Minimum and maximum value – a range
    of values for a numeric slot
   Default value – the value a slot has
    unless explicitly specified otherwise
Common Facets: Value Type
   String: a string of characters (“Château
   Number: an integer or a float (15, 4.5)
   Boolean: a true/false flag
   Enumerated type: a list of allowed values
    (high, medium, low)
   Complex type: an instance of another class
       Specify the class to which the instances belong
    The Wine class is the value type for the slot
      “produces” at the Winery class
        Create Instances
    determine   consider   enumerate    define     define       define        create
      scope      reuse       terms     classes   properties   constraints   instances

   Create an instance of a class
       The class becomes a direct type of the instance
       Any superclass of the direct type is a type of the
   Assign slot values for the instance frame
       Slot values should conform to the facet constraints
       Knowledge-acquisition tools often check that
Creating an Instance: Example
Ontologies and the SW Languages

      Most Semantic Web languages are designed
       explicitly for representing ontologies
          RDF Schema
          DAML+OIL
          SHOE
          XOL
          XML Schema
Ontology Languages: RDFS and OWL
   RDFS
       Set theory – rdfs:Class
       Relation – rdf:Property, rdfs:domain, rdfs:range
       Hierarchy – rdfs:subClassOf, rdfs:subPropertyOf
       Built-in Datatype – xsd:string, xsd:dataTime
   OWL
       Description Logic
             Class, Thing, Nothing
             DatatypeProperty, ObjectProperty, AnnotationProperty,…
       Class axioms
             oneOf, disjointWith, unionOf, complementOf, intersectionOf …
             Restriction, onProperty, cardinality, hasValue…
       Property axioms
             inverseOf , TransitiveProperty , SymmetricProperty
             FunctionalProperty, InverseFunctionalProperty
       Equality– equivalentClass , sameAs , differentFrom…
       Ontology annotation – Ontology, imports, versionInfo
               More languages and more
    Languages (require special inference engine)
          [Trust/Uncertainty] BayesOWL

          [Proof] PML (Proof Markup Language)

          [Query/Data Access] SPARQL Query Language for RDF
          [Rule] SWRL( Semantic Web Rule Language)
          [Policy] REI: A Policy Specification Language

          [Service] OWL-S by DAML (1.2 preview available)
          [Service] SAWSDL (Semantic Annotations for WSDL)
          [Thesauri] SKOS (Simple Knowledge Organization System)

    Ontologies (only need RDFS and/or OWL inference)
          Upper ontologies - OpenCyc, WordNet, OntoSem, SUO
          Specialized common ontologies - FOAF, Dublin Core, RSS
          Domain ontologies – bibtex, biology, and many…
    Li Ding, Pranam Kolari, Zhongli Ding, and Sasikanth Avancha, “Using Ontologies in the Semantic Web: A Survey”, in Ontologies
    in the Context of Information Systems (book chapter), 2005.            50
                Semantic Web Tools
                                                                              Pellet (DL)
     Online Registry                                                          Racer (DL)
                                                    Protégé                  FACT++ (DL)
    DAML Ontology Library                                         Reasoner
                                                    Swoop                    Jena        Jena (SPARQL)
    Schema Web
                                                                              JTP         KAON
 Search Engine
                                                                              F-OWL       Kowari
                                                                              Euler       Seasam
Swoogle                                                                      CWM         OWLIM
Semantic Web Search                                Managing
                                                    Ontologies   instance       Triple store
     Browser                                                                                         Instance store
    Tabulator                                                                                       Tap
    IsaViz                                                          integrate
                                                                                         ONION      RDF store
    Piggybank                                                                           PROMPT     Yars
    Arago                                                                               OntoMapper IBM IODT
    Horus                                                       Mapping Tools           Glue       RDFLib
    Mspace                                                                              OntoMerge RDF gateway
    Magpie                                                                              Ontomorph allegro
source1:    Oracle 10

        Swoogle Semantic Web Search Engine
   Harvesting Semantic Web data
    from the Web
   Provide search/navigation
    services for machines (via
       Digest doc, term, namespace
       Links
   Also serves human users
   Status
       Running since summer 2004
       1.6M RDF documents, 300M RDF
        triples, 10K ontologies

§ Jena is a Java framework for building Semantic Web applications. It
  provides a programmatic environment for RDF, RDFS and OWL,
  including a rule-based inference engine.
§ Jena is open source and grown out of work with the HP Labs Semantic
  Web Program.
§ The Jena Framework includes:
   l    A RDF API
   l    Reading and writing RDF in RDF/XML, N3 and N-Triples
   l    An OWL API
   l    In-memory and persistent storage
   l    RDQL – a query language for RDF

 Jena Integration of Protégé-
§ Jena is one of the most widely used Java APIs for RDF and OWL,
  providing services for model representation, parsing, database
  persistence, querying and some visualization tools. Protege-OWL
  always had a close relationship with Jena. The Jena ARP parser is still
  used in the Protege-OWL parser, and various other services such as
  species validation and datatype handling have been reused from Jena.
  It was furthermore possible to convert a Protege OWLModel into a Jena
  OntModel, to get a static snapshot of the model at run time. This
  model, however had to be rebuild after each change in the model.
§ As of August 2005, Protege-OWL is now much closer integrated with
  Jena. This integration allows programmers to user certain Jena
  functions at run-time, without having to go through the slow rebuild
  process each time. The architecture of this integration is illustrated on
  the next slide…            54
SPARQL dataset finder
           Who knows Anupam Joshi?
           Show me their names, email address
           and pictures
                      1. Compose a SPARQL query
                      without FROM clause

                     2. Parse SPARQL query, search
                     Swoogle for related URLs,
                     and compose a dataset

                      3. Run SPARQL query on dataset

Joseki - a SPARQL Server for Jena

§ Joseki: The Jena RDF Server. Joseki is a server for publishing RDF
  models on the web. Models have URLs and they can be access by HTTP
  GET. Joseki is part of the Jena RDF framework.
§ Joseki is an HTTP and SOAP engine supports the SPARQL Protocol and
  the SPARQL RDF Query language. SPARQL is developed by the W3C
  RDF Data Access Working Group.
§ Joseki Features:
   l    RDF Data from files and databases
   l    HTTP (GET and POST) implementation of the SPARQL protocol
   l    SOAP implementation of the SPARQL protocol

            Integrating Social Networks
                         FOAF Network                                                              Reputation Systems
                                                                               J. Golbeck
                                                 source                                                             Google PageRank

data                                         L. Ding
                                                              knows                                                    Citeseer Rank

  FOAF                    P. Kolari                                       H. Chen
                                                                                                   J. Hendler
        knows RDF
        RDF/XML                   knows                                              F. Perich

                           Kagal                 T. Finin              A. Joshi
   DBLP                                                                                                    Golbeck’s
        Coauthor                          hub                    sink                                  Trust Network
        HTML                                               sameName
                                           L. Ding                                           Y. Peng
   Trust                                                                                     6            1
        Reputation                                                   28
                           L. Kagal                    T. Finin                 A. Joshi                            A. Sheth
        Trust network                                                                         1
Computation                  H. Chen
                                                                                            M. P. Singh
 Entity mapping                                                  F. Perich            DBLP Coauthor Network
 Tie strength
 Trust aggregation                                                                                                            57
      PML: Proof Markup Langauge
  Query           foo:query1 Question foo:question1
  (type TonysSpecialty ?x)     (what is Tony’s Specialty)
     hasAnswer                                                IWBase
    NodeSet          foo:ns1
    (hasConclusion …)
  fromQuery        isConsequentOf          hasInferencEngine
       InferenceStep                        hasRule
     …          NodeSet         foo:ns2
                (hasConclusion …)          hasVariableMapping
   fromAnswer            isConsequentOf
                                    hasSourceUsageSourceUsage hasSource
                                                     usageTime …
Justification Trace

What is a Domain-Specific Markup Language

• Medium of
  communication for
  users of the domain

• Follows XML syntax

• Encompasses the
  semantics of the
                            DSML users
         Examples of DSMLs

 MML: Medical Markup Language
 CML: Chemical Markup Language
 MatML: Materials Markup Language
 WML: Wireless Markup Language
 MathML: Mathematics Markup Language

MathML: Presentation Markup in
Villani’s works
       <mi> H </mi>
       <mo> = </mo>
       <mo> ∫ </mo>
       <mi> ρ </mi>
       <mo> log </mo>
       <mi> ρ </mi>
       <mo> d</mo>
       <mi> v <mi>
A Search Engine Using
    Semantic Web
   Semantic search: extending traditional search
    with the semantic web technology
       Exploiting the explicit meaning of documents (i.e.,
        ontology-based metadata)

   Current semantic search tools
       Form-based, e.g., SHOE, Magnet
       View-based, e.g., GRQL, SQoogle, Ontogator,
       QA-based, e.g., AquaLog, ORAKEL
       Keyword-based, e.g., TAP, Squiggle, DOSE
       The search process
   Step1: making sense of the user
   Step2: translating user queries into
    formal queries
   Step3: Querying the back-end semantic
    data repository
   Step4: Ranking
     Making sense of user queries
   Finding out the meaning of keywords
       Class, e.g., the keyword “phd students”
       Relation, e.g., “author”
       Instance, e.g., “Enrico”, ”director”

   Method: text search
       Labels (rdfs:label)
       Short literals also used in the case of instances
            When searching for “director”, the instances can be picked up.
Translating user queries into formal

   Input: semantic entity matches of the
    search keywords
       Each keyword -> multiple matches

   Output: formal queries which reflect
    the user query
       One user query -> multiple formal queries.
      Simple queries

   There are only two keywords involved: <subject:keyword>
   Fixed number of combination types
    Subject match Keyword match        Example
    Class         Class                <news: phd students>

                  Property             <news: author>

                  Instance             <news: chief scientist>

    Instance      Property             <victoria:author>

                  Instance             <victoria:yuangui>

    Property      Instance             <member: x-media>

                  Property             <member: author>
   Pattern: Subject -> Class Cs; Keyword -> Class Ck
   Results: <Is,Relation,Ik> associated with exploratory links.
   Example: news stories about phd students
         <news “KMi success”, mentions-person, Tom-Heath>

   A simplified template in Sesame SERQL:

         select {Is}, {R}, {Ik} from {Is} rdf:type {Cs},
                                     {Ik} rdf:type {Ck},
                                     {Is} R {Ik}
         select {Is}, {R}, {Ik} from {Is} rdf:type {Cs},
                                     {Ik} rdf:type {Ck},
                                     {Ik} R {Is}
Text mining and the Semantic Web
Text mining stages

      Document selection and filtering (IR
      Document pre-processing (NLP
      Document processing (NLP / ML /
       statistical techniques)
      Stages of document processing
   Document selection involves identification and retrieval
    of potentially relevant documents from a large set (e.g.
    the web) in order to reduce the search space. Standard
    or semantically-enhanced IR techniques can be used for
   Document pre-processing involves cleaning and
    preparing the documents, e.g. removal of extraneous
    information, error correction, spelling normalisation,
    tokenisation, POS tagging, etc.
   Document processing consists mainly of information
   For the Semantic Web, this is realised in terms of
    metadata extraction
    IE as an alternative to IR
   Information Extraction returns knowledge
    at a much deeper level than traditional IR
   Constructing a database through IE and
    linking it back to the documents can
    provide a valuable alternative search tool.
   Even if results are not always accurate,
    they can be valuable if linked back to the
    original text
Some example applications
   HaSIE
   KIM
   Threat Trackers
   System identifies relevant sections of
    each document, pulls out sentences
    about health and safety issues, and
    populates a database with relevant
   KIM is a software platform developed by
    Ontotext for semantic annotation of text.
   KIM performs automatic ontology population
    for Semantic Web
   Indexing and retrieval (an IE-enhanced
    search technology)
     Threat tracker
   Application developed by Alias-I which finds and
    relates information in documents
   Intended for use by Information Analysts who use
    unstructured news feeds and standing collections
    as sources
   Used by DARPA for tracking possible information
    about terrorists etc.
Threat tracker
Semantic Web Services
A Motivating Example
  A Company
  In Germany
     Needs to                            Available on a
     find all                            24/7 basis
     for…       The process should be
 Tax            fully automated: no human
 Preparati      interaction
                 Located in Berlin,
                 Germany         method
                                 should be
Search topics
   Semantic   Web   in   e-learning
   Semantic   Web   in   D-Library
   Semantic   Web   in   regular search engine
   Semantic   Web   in   Security

Shared By:
yaofenjin yaofenjin http://