Semantic Web

Document Sample
Semantic Web Powered By Docstoc
					The Semantic Web

   Foothill College
  Robert D. Cormia
            Preso Roadmap
•   We need a bigger brain
•   Knowing what we know
•   Semantic Web technologies
•   Developing machine intelligence
•   Building the „second generation Web‟
    – Indexing the „pool of human knowledge‟
     Three Eras of the Web
• Content Web (1995 to 2005)
  – Designed for humans
  – HTML ruled the day
• Process Web (2000 to 2010)
  – Designed for machines
  – XML ruled the day
• Semantic Web (2005 to 2015)
  – Designed for networks
  – RDF and ontologies
      Semantic Web Activities
• Index Web pages (or Web data objects) by the
  three descriptive dimensions
  – What it is
  – What it does
  – How it‟s used
• Follow the „user space‟ as information is
  searched and combined, how problems are
  solved, how information space is navigated
• Map out the logical paths in „synaptic space‟
• These relationships form a „schema‟ called RDF
     Three Dimensions of Indexing
                                     What it is

                                                  Web page
                                                  Chart or graph
                                                  Multimedia file

   How it‟s used                                                    What it does
Who uses this information?
                                                     What specific information is conveyed?
To answer what kinds of questions?
Semantic Web Applied to Knots
• What is a knot?
  – declarative description
• How do you tie a knot?
  – procedural diagram of tying a knot
• How is a particular knot used?
  – Who uses square knots (user space)
  – Applications of square knots (process space)
 Picture of a Square Knot

                 square knot

Definition - A common double knot in which the
loose ends are parallel to the standing parts, most
often used to join the ends of two cords or lines.
     How to Tie a Square Knot

Procedure: Start by (insert steps to tie a square knot here)
          1. Step one 2. Step 2         3. Step 3
Semantic View of a Square Knot

        Declarative                          Procedural

  Square Knot. Used for joining ropes of equal thickness. It is
  also the knot used for tying bandages, as it lies flat. This knot
  is also known as a "Reef Knot". The working end is tied over
  the standing end, "right over left, left over right." Cognitive
     The Four Building Blocks
1.   XML
2.   RDF
3.   Ontologies
4.   Agents

“XML allows users to add arbitrary
structure to their documents but says
nothing about what the structures
• Meaning encoded in sets of „triples‟
  – Entities have Properties which have Values
• Entities, properties and values all have
  distinct URIs (long-lived URL / namespace)
• RDF schemas encode the concept of
  „edges‟ that connect related entities
  – Edges are the lattice of a 3 dimensional graph
• Database A and Database B may use
  different fields to contain „zipcode‟
• Ontologies (and schemas) sort this out
• Ontology = „a document or file that formally
  defines the relations among terms‟
• Ontologies for the Web normally have
  – A taxonomy (words)
  – A set of inference rules
• Modern day „crawlers‟
  – WebCrawler®
  – Spiders / robots
  – Rogue harvesters
• Future Web agents
  – Determine relationships (context)
  – Harvest content / discern semantics
  – Create schema that „scaffold‟ the Web
            Semantic Web
The Web was built for human consumption
  Although everything on it is machine-readable,
  this data is not machine-understandable

Semantic Web -- a Web of data that can be
processed directly or indirectly by machines.
The solution proposed is to use metadata to
describe the data contained on the Web.
             Meta Data
• Information about information
• In HTML, we used XML „meta tags‟
• <meta name=„description‟ value=„a
  description of the document in natural
  language that search engines display‟/>
• <meta name=„keywords‟ value=„a list of
  keywords that search engines digest‟/>
• In XML, we use RDF to create Meta data
       Resource Description
        Framework (RDF)
• Part of the W3C XML activities
• Creates a formal indexing markup
• Most tags defined by two working groups
  – Dublin Core Meta Data Initiative
  – W3C Semantic Web activities
• Think of it as „meta tags on steroids‟
  – Author may define their own indexing schema
  RDF – Purpose and Scope
• RDF infrastructure for
  – Encoding
  – Exchange
  – Reuse
  Of metadata (Meta data)
• Human readable and machine processable
  vocabularies – „interchangeable semantics‟
• Reuse and extension of metadata semantics
          Why RDF (i.e., Why Logic)?
• Separates simple statements of the facts conveyed by a
  resource from the representation structure
• With extensions, provides formal foundations for
  reasoning, inference mechanisms, and high-level query
  and other languages (as in relational database model)
• Allows reuse of years of work on knowledge
  representation, AI, techniques for object mapping
• Proven an effective tool in dealing with heterogeneous
  database systems, create RDF for schema namespace
• General confusion: “RDF is about metadata”; but it‟s still
  data, and no reason to restrict to data (i.e. Web objects)
• Instance level RDF can be generated automatically from
  RDF schema and XML schema or DTD (Perl scripting)
  RDF (Schema) Data Model
• RDF provides a model for describing
  (Web) resources – or any resource
• Formal RDF schema comprises „triples‟
  – Resource
  – Property
  – Value
• Resources have a „URI‟ (long-lived URL)
• Property-types connect resources / values
• Values can be text strings – or resources
             RDF Basics
• Resources
• Properties that have values
• Statement: Resource A has property B
  with Value C

      A                           C


                              Mail address

                 RDF Schema
• Not just an XML schema – an RDF schema!
•   rdf:RDF – rdf namespace
•   xmlns:rdf – namespace for rdf
•   xmlns:rdfs – namespace for rdfs
•   xmlns:dc namespace for dc (Dublin Core)
•   rdf:Description – description of (RDF) resource
•   rdf:about – about (RDF) resource
•   dc:creator – who created the (RDF) resource
•   dc:title – title of the (RDF) resource
•   dc:description – description of the (RDF) resource
•   dc:date date of the (RDF) resource
         Basic RDF Document
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="
syntax-ns#" xmlns:rdfs="
 <rdf:Description rdf:about="http://doc" dc:creator="Joe
Smith" dc:title="My document" dc:description="Joe's
ramblings about his summer vacation." dc:date="1999-09-
RDF Schematic Description
  Relatedness - Inheritance

The application and use of the RDF data model can
   be illustrated by concrete examples.
Consider the following statements:
1. "The author of Document 1 is John Smith"
2. "John Smith is the author of Document 1"
       RFC 2396: RDF Resource
• A resource can be anything that has identity. Familiar
  examples include an electronic document, an image, a
  service (e.g., "today's weather report for Los Angeles"),
  and a collection of other resources. Not all resources are
  network "retrievable"; e.g., human beings, corporations,
  and bound books in a library can also be considered
  resources. The resource is the conceptual mapping to an
  entity or set of entities, not necessarily the entity which
  corresponds to that mapping at any particular instance in
  time. Thus, a resource can remain constant even when
  its content --- the entities to which it currently
  corresponds --- changes over time, provided that the
  conceptual mapping is not changed in the process.
     Machine Readable Code
• To humans, these statements convey the same
  meaning (that is, John Smith is the author of a
  particular document). To a machine, however, these
  are completely different strings. Whereas humans are
  extremely adept at extracting meaning from differing
  syntactic constructs, machines remain grossly inept.
  Using a triadic model of resources, property-types
  and corresponding values, RDF attempts to provide
  an unambiguous method of expressing semantics in
  a machine-readable encoding.
Relatedness - Inheritance
   Paths and „Edges‟
Resource Description Framework (RDF)

• W3C‟s model for representing resource
  metadata, “semantic assertions”
• provides:
  –   formal data model
  –   syntax (XML) for (meta)data interchange
  –   schema type system
  –   syntax (XML) for machine understandable schemas
• RDFS (Schema) allows definition of vocabulary
  and types for a particular RDF model
                RDF Model and Syntax
English:    Ora Lassila is the creator of the resource

RDF model:                  Ora Lassila

logic:            Creator ( , Ora Lassila )

RDF syntax (XML encoding):

         <rdf:RDF xmlns:rdf=""
                 <rdf:Description about="">
                         <rdf:Creator>Ora Lassila</rdf:Creator>
             A Simple RDF Schema
<rdf:RDF xml:lang="en" xmlns:rdf=""
         <rdfs:Class rdf:ID="Person">
                   <rdfs:comment>The class of people.</rdfs:comment>
         <rdf:Property ID="maritalStatus">
                   <rdfs:range rdf:resource="#MaritalStatus"/>
                   <rdfs:domain rdf:resource="#Person"/>
         <rdf:Property ID="age">
                   <rdfs:domain rdf:resource="#Person"/>
         <rdfs:Class rdf:ID="MaritalStatus"/>
         <MaritalStatus rdf:ID="Married"/>
         <MaritalStatus rdf:ID="Single"/>
 Dublin Core Meta Data Model
• Defined as Open Metadata
• Usual “Library Catalog” Elements, like Creator, Title,
  Publisher, Date, Language
• Extended by: Source, Relations,Type (what is it – text,
  image, software), Format
• URI (Identifier)
• Used for cataloging “Global Web Library”
• Others: RSS (RDF Site Summary) – Netscape,
• PRISM – publishing – Adobe, Quark, ..
• <indecs> - e-business – EU companies
• RDF mappings for P3P, PICS,…
            Dublin Core Schema

<?xml:namespace ns = "" prefix ="RDF" ?>
<?xml:namespace ns = "" prefix = "DC" ?>
   <RDF:Description RDF:HREF="http://uri-of-Document-1">
       <DC:Creator>John Smith</DC:Creator>
Dublin Core Model
      Creating RDF Documents
• Manually from HTML or “user domain XML”
• With special assisting tools – like Protégé,
  Reggie, DC-dot, RDF for XML
• Ideally – with some automated procedure from
  HTML/XML documents
• Can we use XSLT there?
• Web based tools:
        DC Version of RDF vCard
<?xml:namespace ns = "" prefix = "RDF" ?>
<?xml:namespace ns = "" prefix = "DC" ?>
<?xml:namespace ns = "" prefix = "CARD" ?>
        <RDF:Description RDF:HREF="">
                 <DC:Creator RDF:HREF="#Creator_001"/>
         <RDF:Description ID="Creator_001">
                 <CARD:Name>Robert D. Cormia</CARD:Name>
                 <CARD:Affiliation>Foothill College</CARD:Affiliation>
     rdcormia.rdf (RDF for website)
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns="" xmlns:rdf="
<channel rdf:about="">
          <title>RD Cormia website</title>
          <description>RD Cormia website for Foothill College courses</description>
                                 <rdf:li resource=""/>
                                 <rdf:li resource=""/>
                                 <rdf:li resource=""/>
        <item rdf:about="">
                  <title>COIN78 Introduction to XML</title>
                  <description>Website for COIN78 Introduction to XML</description>
        <item rdf:about="">
                  <title>COIN79 Introduction to XML for Biologists for Biologists</title>
                  <description>Website for COIN79 Introduction to XML</description>
        <item rdf:about="">
                  <title>COIN81 Bioinformatics, Tools, Databases, and Methods</title>
                  <description>Website for COIN81 Intro to Bioinformatics</description>
               RDF Extensions
• W3C expects RDF add-ons in several areas
  – More complete logic capabilities (negation, inferencing)
  – Conversion rules
  – Functions (computation)
• Basis of “Semantic Web” ideas
  – Tim Berners-Lee
  – DARPA Agent Markup Language program
• Other potential specifications of semantics exist
  – c.f. paper “Using UML to Define XML Document Types”
    at Extreme Markup Languages 2000 conference
             Advanced RDF

•   Using objects and edges
•   Containers: bags, sequences, alternatives
•   aboutEach, aboutEachPrefix
•   Reification (higher order statements)
•   Namespaces and Vocabularies
•   Maximizing ontologies
WWW circa. 1989
    RDF - Schema Captured
• Create a tree map of relationships
• Create „edges‟ for each relationship
• Code those edges into RDF schema
  – Determine the „resource => property-value‟ path
  – This is how „non-Web‟ resources can be coded in
    RDF schema (reify a subject and describe as a URI)
  – (website)
  – Robert D. Cormia (website owner)
  – (website owner email)
  – 800.555.1212 (website owner contact information)
         Edge Space in RDF
• Resources attach to other resources
  • Inheritance is one direction of a path
  • Associations are the other direction
  –   Foothill College
  –   CTIS Division
  –   Robert D. Cormia
  –   COIN78 XML
  –   650.949.7456
   Edge Space – Complex Resources
  California State CC System
                                       FHDA CC District

            De Anza College                  Foothill College

   CIS61 Informatics                  Classes

  COIN78 XML                                 CTIS Division

   650.949.7456          Robert D. Cormia
Exploring an Information Space
•   Information space - information cubes
•   Lattices and edges - navigating a graph
•   Representing graphs as RDF schemas
•   Example systems
    – Finding (finished) scientific data on the Web
    – Representing a „student – class – outcome‟
• Information space in a Semantic web
  The Power of Ontologies
Robert Cormia‟s website, having been found by the
Google search agent, links to an ontology which
defines information about computer science,
bioinformatics, and informatics departments. Areas
of research are also described in the documents.
The system can therefore locate information relating
to current research projects. (Current method is for a
human to sift through the content of pages turned up
by a search engine.) This is how searches discover
people, email addresses, and contact information to
connect „non-web‟ resources with Web content.
“Agent based computing appears to be the
appropriate paradigm to work in a complex
world with multiple ontologies, fragments and
multiple inferencing engines.”

Stork, Hans-Georg and Mastroddi, Franco, Semantic
Web Technologies - a New Action Line in the
European Commission’s IST Programme, 2001
       The Power of Agents
“The real power of the Semantic Web will be realized
when people create many programs that collect Web
content from diverse sources, process the
information and exchange the results with other
programs. The effectiveness of such software agents
will increase exponentially as more machine-readable
Web content and automated services (including other
agents) become available.”

Berners-Lee, T, Hendler, J & Lassila, O „The semantic web‟,
Scientific American, May 2001
       „Ambient Intelligence‟
“In the next step, the Semantic Web will break
out of the virtual realm and extend into our
physical world. URIs can point to anything,
including physical entities, which means we
can use the RDF language to describe
devices such as cell phones and TVs.”
Berners-Lee, T, Hendler, J & Lassila, O „The semantic web‟,
Scientific American, May 2001
             Key Problems
• Ontology learning
• Ontology-based annotation of legacy content
• Management of ontology repositories
  – Managing Meta-data and complex pointer space
• Creating and testing algorithms
  – Until the network can learn by itself
• Not least – getting publisher buy-in
  – XML and RDF add a great deal of overhead
     Agents and Metadata
“given the sheer size and dynamics of the
contents involved, it is a case in point for
automating to the largest extent possible the
production of metadata through algorithmic
content analysis.”

Stork, Hans-Georg and Mastroddi, Franco, Semantic
Web Technologies - a New Action Line in the
European Commission’s IST Programme, 2001
 Need for Increased Semantic Content
• General eBusiness operations on the Web require
  semantic interoperability
• Web resources have no explicit, machine-
   interpretable semantics
  – Semantics typically implicit in minds of users
    or developers (e.g., meaning of displayed HTML)
• Implicit semantics works where predefined shared
  semantic agreements exist, e.g.
  – Within a single system or data collection
  – Among a group of people (or an eBusiness domain)
• Breaks when multiple systems must interoperate,
  different data sets must be merged, or programs
  not sharing those semantics must access the data
     Why doesn‟t XML handle this?
        XML            <NAME>John Doe</NAME>
      namespace        <EXPERTISE>XML</EXPERTISE>

•  Multiple XML vocabularies may exist (for sources
   or consumer profiles): EXPERTISE vs. SUBJECT
   – attempts to address by tag standards / namespaces
• Multiple XML structures can represent same “fact” even with
  same vocabulary)
• Ontologies (W3C RDF) can provide more semantics
   – XML tags point to terms defined in ontologies
   – Allow, e.g., deduction that EMPLOYEE is a PERSON
• Multiple ontologies may exist
Multiple XML Structures for same “fact”
   RDF           Purchase              from
 Assertion:       Order

              Encoding DTD                     Example XML instance data
                                              <PurchaseOrder id=“X”>
 <!ELEMENT PurchaseOrder (from)>                <from>
 <!ATTLIST PurchaseOrder id ID #REQUIRED>          <Company id=“Y”/>
 <!ELEMENT from (Company)>                      </from>
 <!ATTLIST Company id ID #IMPLIED>            </PurchaseOrder>

 <!ELEMENT from (PurchaseOrder, Company)>     <from>
 <!ELEMENT PurchaseOrder (#CDATA)>              <PurchaseOrder>X</PurchaseOrder>
 <!ELEMENT Company (#CDATA)>                     <Company>Y</Company>

 <!ELEMENT PurchaseOrderInfo (Company)>       <PurchaseOrderInfo orderID=“X”>
 <!ATTLIST PurchaseOrderInfo                    <Company>Y</Company>
            orderID ID #REQUIRED>             </PurchaseOrderInfo>
 <!ELEMENT Company (#CDATA)>
     Solution: “The Semantic Web”

• Approach: make semantics explicit
   – As additional machine-readable data
• Vision by Tim Berners-Lee of Web future
   – white papers at
• Web contains resources + machine-processable
  descriptions of their meanings or capabilities
  – additional semantics derived from Web of related things
 Semantic Web “Architecture”
             services, agents, etc.
   (e.g., Turing-complete logical language         processing
         with inference and functions)

   Schemas describing data and metadata
      (e.g, XML Schema, RDF Schema)

           Logical assertions about
            base data (e.g., RDF)


 Structured data        Structured data that       Other data
in XML languages     can be interpreted as XML   (e.g., images)
   Semantic Web - Metadata
• Web has unprecedented access to
  globally distributed information
• Finding and navigating that information
  is a tedious and labor intensive task
• Metadata – structured „data about data‟
  improves discovery and access (RDF)
• Metadata needs conventions regarding
  semantics, syntax, and structure (RDF)
Semantic Web Section
  Observations on the Semantic Web

• Lots of agreement on general idea; lots of
  disagreements on details, e.g.:
  – levels in the “architecture” where various capabilities go
  – is RDF the right formalism?
• The Semantic Web rests on logical foundations
  – shouldn‟t be a put-off: so do relational database systems
  – provides formal foundations for more user-friendly
    languages and advanced reasoning capabilities required
  – users and developers will not necessarily deal directly
    with all the logical infrastructure
     Semantic Web Observations
• Web technology is evolving to support more
  object-like ideas
• Script components show how to build “real
  objects” in “real object models” using Web
• Lots of development is under way; needs further
  integration, and “shake out”
• So does CORBA / Web integration
• Web emphasis on “loose binding” and ease of
  content creation an important consideration in
  merging Web and object concepts
    Searching in the Semantic Web
• User tells search agent to find an XML expert
  (“where expertise = “xml”)
  – Agent consults ontology, enhances query with
    synonyms for “expertise” (“subject”, “topic”)
  – Agent consults user context for rules (e.g. anyone
    teaching a course on XML has “expertise”) to
    further enhance query
• University course catalog page contains metadata
  (“subject = XML”) describing course “Web202”
• University Web page profiling Prof. Lila Smith has
   metadata listing “Web202” in courses she teaches
• Agent returns Prof. Smith as potential XML expert
                  OMG Activities
• XML Metadata Interchange (XMI)
  – Interchange format for exchanging UML metadata
• Common Warehouse Metadata Interchange (CWMI)
  – Defines common metamodel for data warehousing using
    UML (based on OMG MOF specification)
  – Defines syntax and semantics for import, export, and other
    data warehousing operations using XML documents
• XML “wiring” language for CORBA components
• CORBA/SOAP Interworking RFP
• Shows increasing integration of object and Web
  technologies based on industry requirements
   How will Object / Web Merger Go?
• “Why do we need more than XML and HTTP?”
• How this will work out is not totally clear, but:
   – You need much more than the current “Web duct tape” to
     build really complex systems
       • Still need architecture concepts like services
       • Still need worked-out domain models
   – You still need to define an object model
       • It‟s just going to be more flexible
   – You need a unified object space that allows flexible
     tradeoff between Web flexibility and service efficiencies
   – More work on improved object concepts for composability
The Ultimate Neural Network
Machine Learning Networks
“Most of the Web's content today is
designed for humans to read, not for
computer programs to manipulate

Berners-Lee, T, Hendler, J & Lassila, O „The Semantic web‟,
Scientific American, May 2001
    Semantic Processing
We want to be able to say to our search
agent, for example:

„What aspects of digital libraries has Bill
Arms most recently been funded to
How Can the System „Know‟?
Massive artificial intelligence not
required. Web pages will be endowed
with semantic awareness using off-the-
shelf software.
How Can Machines Understand?

 Content is machine-understandable if it
 is bound to some formal description of
 itself (i.e. metadata).
Can Machines Learn?
Can Machines Remember?
Can Networks Become Intelligent?
Ontologies & Inference Engines

“For the semantic web to function,
computers must have access to
structured collections of information and
sets of inference rules that they can use
to conduct automated reasoning.”

Berners-Lee, T, Hendler, J & Lassila, O „The Semantic web‟,
Scientific American, May 2001
              How it May Work

          query                        Knowledge
                          web    rules representation
                          page         scheme
 User             Index
                                 rules representation
          Unstructured Text
• There is a wealth of information in
  „unstructured text‟, including:
  – Email (archive and index)
  – Voice (capture and meta tag)
  – Radio (capture, meta tag, and publish as XML)
• From speech to text and then to indexed Web
  documents, searchable through RDF space.
• Navigate a larger pool of human knowledge
                 Some References
• F. Manola, “Towards a Web Object Model”
• F. Manola, “Some Web Object Model Construction Technologies”,
• OMG-DARPA Workshop on Compositional Architectures, Monterey,
  January 6-8, 1998,
• World Wide Web Consortium:
• Microsoft Developer Network site:
• Biztalk:
• The XML Cover Pages:
• OMG XML activities:
• DARPA Agent Markup Language:

Shared By: