Document Sample
details-semantic-web Powered By Docstoc
					Trustworthy Semantic Webs

       Dr. Bhavani Thuraisingham
     The University of Texas at Dallas

             December 2007

  Vision
  XML
  RDF
  Ontology/OWL
  Rules
  Applications
  Ontology Engineering
  Web Services
  Reference
   A semantic web primer: Antoniou and van Harmlen
Today’s Web

  High recall, low precision: Too many web pages resulting in
   searches, many not relevant
  Sometimes low recall
  Results sensitive to vocabulary: Different words even if they mean
   the same thing do not results in same web pages
  Results are single web pages not linked web pages
From Today’s Web to the Semantic Web

  Machine understandable web pages
  Activities on the web such as searching with little or no human
  Technologies for knowledge management, e-commerce,
  Solutions to the problems faced by today‟s web
     - Retrieving appropriate web pages, sensitive to vocabulary etc.
     - Semantic web applications including
Knowledge Management

  Corporation Need
    - Searching, extracting and maintaining information, uncovering
      hidden dependencies, viewing information
  Semantic web for knowledge management
    - Organizing knowledge, automated tools for maintaining
      knowledge, question answering, querying multiple documents,
      controlling access to documents
Business to Consumer E-Commerce

  Users shopping on the web; wrapper technology is used to extract
   information about user preferences etc. and display the products to
   the user
  Use of semantic web: Develop software agents that can interpret
   privacy requirements, pricing and product information and display
   timely and correct information to the use; also provides information
   about the reputation of shops
  Future: negotiation among the behalf of the user
Business to Business E-Commerce

  Organizations work together and carrying out transactions such as
   collaborating on a product, supply chains etc. With today‟s web lack
   of standards for data exchange
  Use of semantic web: XML is a big improvement, but need to agree
   on vocabulary. Future will be the use of ontologies to agree on
   meanings and interpretations
Personal Agents

  John is a president of a company. He needs to have a surgery for a
   serious but not a critical illness. With current web he has to check
   each web page for relevant information, make decisions depending
   on the information provided
  With the semantic web, the agent will retrieve all the relevant
   information, synthesize the information, ask John if needed, and
   then present the various options to John and also makes
Semantic Web Technologies

  Explicit metadata
     - XML, RDF, etc.
  Ontologies
  Logic
  Agents
Explicit metadata

  Metadata is data about data
  Need metadata to be explicitly specified so that different groups and
   organizations will know what is on the web
  Using metadata, one can then carry out various activities such as
   searching, integration and executing actions
  Metadata specification languages include XML and RDF

  Explicit and formal specification of conceptualization describes a
   domain of discourse
  Consists of concepts and prelateships between them
  Web searches can exploit ontologies to facilitate the search process
  Ontology languages include XML, RDF, OWL

  Logic can be used to specify facts as well as rules
  New facts and derived from existing facts based on the inference
  Descriptive Logic is the type of logic that has been developed for
   semantic web applications

  Agents are essentially processes that have evolved from
   object-oriented programming; agent is an active objects
  Agents will use metadata to find resources on the web;
   ontologies will be used to interpret statements; logic will be
   used for drawing conclusions
  Agents will not completely replace humans; but will make the
   tasks of the humans much easier.
Semantic Web vs Artificial Intelligence

  Goal of Artificial Intelligence is to build an intelligent agent
   exhibiting human-level intelligence
  Goal of the semantic web is to assist the humans in their day to day
   online activities
Layered Approach: Tim Berners Lee’s Vision
What is XML all about?
  XML is needed due to the limitations of HTML and
   complexities of SGML
  It is an extensible markup language specified by the W3C
   (World Wide Web Consortium)
  Designed to make the interchange of structured documents
   over the Internet easier
  Key to XML used to be Document Type Definitions (DTDs)
      - Defines the role of each element of text in a formal model
  XML schemas have now become critical to specify the
      - XML schemas are also XML documents
XML Elements

  XML Statement
  John Smith is a Professor in Texas

  This can be expressed as follows:

          <name> John Smith </name>
          <state> Texas </state>
XML Elements

 Now suppose this data can be read by anyone
 then we can augment the XML statement by an additional element
 called access as follows.

         <name> John Smith </name>
         <state> Texas </state>
         <access> All, Read </access>
XML Attributes

 Suppose we want to specify to access based on attribute values.
 One way to specify such access is given below.

         Name = “John Smith”, Access = All, Read
         Salary = “60K”, Access = Administrator, Read, Write
         Department = “Security” Access = All, Read

 Here we assume that everyone can read the name John Smith
 and Department Security.

 But only the administrator can read and write the salary attribute.

 DTDs essentially specify the structure of XML documents.

 Consider the following DTD for Professor with elements
 Name and State.

 This will be specified as:

 <!ELEMENT Professor Officer (Name, State)>
 <!ELEMENT name (#PCDATA)>
 <!ELEMENR state (#PCDATA)>
 <!ELEMENT access (#PCDATA).>
XML Schema
 While DTDs were the early attempts to specify structure for
 XML documents, XML schemas are far more elegant to
 specify structures.

 Unlike DTDs XML schemas essentially use the XML syntax for

 Consider the following example:

 <ComplexType = name = “ProfessorType”>
       <element name = “name” type = “string”/>
       <element name = “state” type = “string”/>
       <element name = “access” type = “strong/>
XML Namespaces
  Namespaces are used for DISAMBIGUATION

  <CountryX: Academic-Institution

       Xmlns: CountryX = DTD”
       Xmlns: USA = “ DTD”
       Xmlns: UK = “ DTD”

  <USA: Title = College
       USA: Name = “University of Texas at Dallas”
       USA: State = Texas”
  <UK: Title = University
       UK: Name = “Cambridge University”
       UK: State = Cambs

  </CountryX: Acedmic-Instiution>
XML Databases

  Data is presented as XML documents
  Query language: XML-QL
  Query optimization
  Managing transactions on XML documents
  Metadata management: XML schemas/DTDs
  Access methods and index strategies
  XML security and integrity management
Credentials in XML

 <Professor credID=“9” subID = “16: CIssuer = “2”>
          <name> Alice Brown </name>
          <university> University of X <university/>
          <department> CS </department>
          <research-group> Security </research-group>

 <Secretary credID=“12” subID = “4: CIssuer = “2”>
          <name> John James </name>
          <university> University of X <university/>
          <department> CS </department>
          <level> Senior </level>
Policies in XML
  <? Xml VERSION = “1.0” ENCODING = “utf-8”?>

      <policy-spec cred-expr = “//Professor[department = „CS‟]” target =
      “annual_ report.xml” path = “//Patent[@Dept = „CS‟]//Node()” priv = “VIEW”/>

     <policy-spec cred-expr = “//Professor[department = „CS‟]” target =
     “annual_ report.xml” path = “//Patent[@Dept = „EE‟] /Short-descr/Node() and
     //Patent [@Dept = „EE‟]/authors” priv = “VIEW”/>

    <policy-spec cred-expr = - - - -

    <policy-spec cred-expr = - - --


  Explantaion: CS professors are entitled to access all the patents of their department.
  They are entitled to see only the short descriptions and authors of patents of the EE
Access Control Strategy
  Subjects request access to XML documents under two modes: Browsing and
      -   With browsing access subject can read/navigate documents
      -   Authoring access is needed to modify, delete, append documents
  Access control module checks the policy based and applies policy specs
  Views of the document are created based on credentials and policy specs
  In case of conflict, least access privilege rule is enforced
  Works for Push/Pull modes
System Architecture for Access Control

       Pull/Query                        Push/result

                X-Access         X-Admin

                    Policy         Credential
                    base           base

Third-Party Architecture

  The Owner is the      XML Source Credential   policy base
   producer of                         base
   information It
   specifies access
   control policies                                            SE-XML
  The Publisher is
   responsible for
   managing (a portion
                                                 Owner                Publisher
   of) the Owner                                                 Reply
   information and                                             document
   answering subject                        credentials
  Goal: Untrusted                                                Query
   Publisher with
   respect to
   Authenticity and
Inference/Privacy Control
                     Interface to the Semantic Web

                         Inference Engine/
                         Rules Processor


                                                     XML, RDF, OWL
                   Semantic web                      Documents
                   engine                            Web Pages,
Example Policies
  Temporal Access Control
     - After 1/1/05, only doctors have access to medical records
  Role-based Access Control
     - Manager has access to salary information
     - Project leader has access to project budgets, but he does not
       have access to salary information
     - What happens is the manager is also the project leader?
  Positive and Negative Authorizations
     - John has write access to EMP
     - John does not have read access to DEPT
     - John does not have write access to Salary attribute in EMP
     - How are conflicts resolved?
Privacy Policies

   Privacy constraints processing
      - Simple Constraint: an attribute of a document is private
      - Content-based constraint: If document contains information
        about X, then it is private
      - Association-based Constraint: Two or more documents taken
        together is private; individually each document is public
      - Release constraint: After X is released Y becomes private
   Augment a database system with a privacy controller for constraint
Why RDF?
  XML cannot be used to specify semantics
  Example:
    - Professor is a subclass of Academic Staff
    - Professor inherits all properties of Academic Staff
  RDF was specified so that the inadequacies of XML could be
  RDF uses XML Syntax
  Additional constructs are needed for RDF

  Resource Description Framework is the essence of the
   semantic web
  Adds semantics with the use of ontologies, XML syntax
  RDF Concepts
     - Basic Model
         Resources, Properties and Statements
     - Container Model
         Bag, Sequence and Alternative
RDF Basics
  Resource: Everything is a resource
    - Person, Vehicle, etc.
  Property: properties describe relationships between
    -  E.g., Invented
  Statement: (Object, Property, Value) Triple
     - Berners Lee invented the Semantic Web
RDF Container Model
  Bag: Unordered container, may contain multiple occurrences
     - Rdf: Bag
  Seq: Ordered container, may contain multiple occurrences
     - Rdf: Seq
  Alt: a set of alternatives
     - Rdf: Alt
RDF Specification

   <rdf: RDF
      xmlns: rdf = “”
      xmlns: xsd = “http:// - - -
      xmlns: uni = “http:// - - - -

   <rdf: Description: rdf: about = “949352”
     <uni: name = Berners Lee</uni:name>
     <uni: title> Professor < uni:title>
   </rdf: Description>

   <rdf: Description rdf: about: “ZZZ”
           < uni: bookname> semantic web <uni:bookname>
           < uni: authoredby: Berners Lee <uni:authoredby>
   </rdf: Description>

   </rdf: RDF>
RDF Specification
  RDF specifications have been given for Attributes, Types
   Nesting, Containers, etc.
  How can security policies be included in the specification
  Example: consider the statement “Berners Les is the Author
   of the book Semantic Web”
  Do we allow access to the connection between author and
   book? Do we allow access to the connection but not to the
   author name and book name?
RDF Policy Specification
   <rdf: RDF
      xmlns: rdf = “”
      xmlns: xsd = “http:// - - -
      xmlns: uni = “http:// - - - -

   <rdf: Description: rdf: about = “949352”
     <uni: name = Berners Lee</uni:name>
     <uni: title> Professor < uni:title>
   Level = L1
   </rdf: Description>

   <rdf: Description rdf: about: “ZZZ”
            < uni: bookname> semantic web <uni:bookname>
            < uni: authoredby: Berners Lee <uni:authoredby>
   Level = L2
   </rdf: Description>

   </rdf: RDF>
RDF Schema
  Need RDF Schema to specify statements such as professor is
  a subclass of academic staff

 <rdfs: Class rdf: ID = “professor”
 <rdfs: comment>
 The class of Professors
 All professors are Academic Staff Members.
 <rdfs: comment>
 <rdfs: subClassof rdf: resource = “academicStaffMember”/>
 <rdfs: Class>
RDF Schema: Security Policies
  How can security policies be specified?

 <rdfs: Class rdf: ID = “professor”
 <rdfs: comment>
 The class of Professors
 All professors are Academic Staff Members.
 <rdfs: comment>
 <rdfs: subClassof rdf: resource = “academicStaffMember”/>
 Level = L
 <rdfs: Class>
RDF Axiomatic Semantics
  First order logic to specify formulas and inferencing
    - Built in functions (First) and predicates (Type)
    - Modus Ponens
    - From A and If A then B, deduce B
  Example: All containers are Resources
    - Type(?C, Container)  Type(?c, Resource)
    - If we have Type(A, Container) then we can infer
      (Type A, Resource)
RDF Inferencing
  While first order logic provides a proof system, it will be
   computationally infeasible
  As a result horn clause logic was developed for logic
   programming; this is still computationally expensive
  RDF uses If then Rules

  IF E contains the triples (?u, rdfs: subClassof, ?v)
 and (?v, rdfs: subClassof ?w)
 E also contains the triple (?u, rdfs: subClassOf, ?w)

 That is, if u is a subclass of v, and v is a subclass of w, then u is
  a subclass of w
RDF Query
  One can query RDF using XML, but this will be very difficult
   as RDF is much richer than XML
  Is there an analogy between say XQuery and a query
   language for RDF?
  RQL – an SQL-like language has been developed for RDF
  Select from “RDF document” where some “condition”
  Common definitions for any entity, person or thing
  Several ontologies have been defined and available for use
  Defining common ontology for an entity is a challenge
  Mappings have to be developed for multiple ontologies
  Specific languages have been developed for ontologies
Why RDF is not sufficient?
   RDF was developed as XML is not sufficient to specify
      - E.g., class/subclass relationship
   RDF has issues also
     -  Cannot express several other properties such as Union,
        Interaction, relationships, etc
   Need a richer language
   Ontology languages were developed by the semantic web
    community for this purpose
   Essentially RDF is not sufficient to specify ontologies
OWL: Background
  It‟s a language for ontologies and relies on RDF
  DARPA (Defense Advanced Research Projects Agency) developed
   early language DAML (DARPA Agent Markup Language)
  Europeans developed OIL (Ontology Interface Language)
  DAML+OIL combines both and was the starting point for OWL
  OWL was developed by W3C
OWL Features
  Subclass relationship
  Class membership
  Equivalence of classes
  Classification
  Consistency (e.g., x is an instance of A, A is a subclass of B, x is not
   an instance of B)
  Three types of OWL: OWL-Full, OWL-DL, OWL-Lite
  Automated tools for managing ontologies
     - Ontology engineering
OWL Specification (e.g., Classes)
 < owl: Class rdf: about = “#associateProfessor”>
 <owl: disjointWith rdf: resource “#professor”/>
 <owl: disjointWith rdf: resource = #assistantProfessor”/>

 <owl: Class rdf: ID = “faculty”>
 <owl: equivalentClass rdf: resource = “academicStaffMember”/>
 </owl: Class>

 Faculty and Academic Staff Member are the same
 Associate Professor is not a professor
 Associate professor is not an Assistant professor
OWL Specification (e.g., Property)
 Courses are taught by Academic staff members

 < owl: ObjectProperty rdf: about = “#isTaughtby”>
 <rdfs domain rdf: resource = “#course”/>
 <rdfs: range rdf: resource = “#academicStaffMember”/>
 <rdfs: subPropertyOf rdf: resource = #involves”/>
 </owl: ObjectProperty>
OWL Specification (e.g., Property Restriction)
 All first year courses are taught only by professors

 < owl: Class rdf: about = “#”firstyearCourse”>
 <rdfs: subClassOf>
 <owl: Restriction>
 <owl: onProperty rdf: resource = “#isTaughtBy”>
 <owl: allValuesFrom rdf: resource = #Professor”/>
 </rdfs: subClassOf>
 </owl: Class>
Policies in OWL: Example
 < owl: Class rdf: about = “#associateProfessor”>
 <owl: disjointWith rdf: resource “#professor”/>
 <owl: disjointWith rdf: resource = #assistantProfessor”/>
 Level = L1

 <owl: Class rdf: ID = “faculty”>
 <owl: equivalentClass rdf: resource = “academicStaffMember”/>
 Level = L2
 </owl: Class>
Logic and Inference
  First order predicate logic
  High level language to express knowledge
  Well understood semantics
  Logical consequence - inference
  Proof systems exist
  Sound and complete
  OWL is based on a subset of logic – descriptive logic
Why Rules?
  RDF is built on XML and OWL is built on RDF
  We can express subclass relationships in RDF; additional
   relationships can be expressed in OWL
  However reasoning power is still limited in OWL
  Therefore the need for rules and subsequently a markup language
   for rules so that machines can understand
Rule Markup
  The various components of logic are expressed in the Rule Markup
   Language – RuleML
  Both monotonic and nonmonotnic rules can be represented

  Example representation of Fact P(a) - a is a parent
Types of Application

  Horizontal Information Products at Elsevier: Integration
  Data integration at Audi: Integration
  Skill finding at Swiss Life: Search
  Think Tank Portal at EnterSearch: Knowledge man agent
  E-Learning: Knowledge management
  Multimedia Collection at Scotland Yard: Searching
  Online Procurement at Daimler Chrysler: E-Business
  Device Interoperability at Nokia: Interoperability
Horizontal Information Products at Elsevier

  Elsevier is publishing company based in Amsterdam
     - E.g., publisher of Computer Standards and Interface Journal that
       has papers on all kinds of computer related standards
  Currently the journals and books are grouped by topics such as say
   operating systems, databases, etc. (or at a higher level, Biology,
   Chemistry, etc.)
  Where do we then put the journal Computer Standards and
  Need horizontal groupings also
Horizontal Information Products at Elsevier

  Semantic web technologies are being used by Elsevier
     - RDF for document representation
     - RDF for ontologies
     - Query language based on RDF to query the documents and the
     - E.g. Life Science Thesaurus EMTREE
     - Other publishing companies are following in Elsevier‟s direction
Data Integration at Audi

  Integrate the data in multiple data sources to provide better
   customer relationship management and other services to improve
  The databases are disparate and heterogeneous
  Many current operations are carried out manually
  Expensive and missed opportunities
Data Integration at Audi

  Ontolotues are being specified to address semantic heterogeneous
  E.g., SLR is a type of camera; one applications calls it SLR, another
   application calls it Olympus-OM-10
  When the latter application encounters the term SLR, it will query the
   ontology and determine that SLR is a camera
  Details are given in Chapter 6
Skill Finding at Swiss Life

  Swiss Life is an insurance company that developed a system to find
   all the skills in the company
     - E.g., John‟s skills are on data management, ontology
  Challenging problem as people have multiple skills for different
  Need the following capabilities
     - Cross listing of skills
     - Querying skills
     - ----
Skill Finding at Swiss Life

  Ontologies are being developed to specify the skills and query
   languages to query the ontologies
  E.g.
     <owl: Class rdf: ID = “Publishing”>
     <rdfs: subClassOf rdf: resource = „#Skills”/>
     </owl: Class>

     <owl: Class rdf: ID = “Skills”>
     - ---
     - </owl: Class>
Think Tank Portal at EnterSearch

  EnterSearch is a consortium of corporations in Europe that provide
   IT for the energy companies
  Similar to MCC in Austin TX
  EnterSerach Portal currently describes the various research
   projects, papers etc.
  XML representation is used for describing the web content
  Need to represent semantics so that the corporations can get
   answers to useful questions of the form
     - “where do I put my computing resources to solve a problem?”
Think Tank Portal at EnterSearch

  Semantic web technologies are being utilized – in particular
   ontoogies are developed for the following
     - Hardware
     - Software
     - Communications
     - E-Commerce
     - Agents
     - Market/Auction
     - Resource Allocation
     - ----

  With the Internet and the web, we now have on-line universities,
   course offerings, tutoring etc.
  Students should have the choice for selecting various courses in the
   order they want, provide they take the prerequisites
  Semantic web technologies enable flexible access as well as
   integration of various data sources and processes to enable learning
  Ontologies are being developed for learning applications
     - E.g., Contents of the courses
     - Description of the courses etc.
Multimedia Collection Indexing at Scotland Yard

  Scotland Yard uses a database to keep track of the antiques that are
  While sophisticated indexing techniques have been developed, there
   is a problem with semantics
  E.g., Red cushioned chair could also be described as Queen Anne
  Ontologies for describing semantics
  Need more details of the project
On-line Procurement at Daimler Chrysler
  Daimler Chrysler interacts with numerous suppliers to develop a
  Standards developed by Rosetta.Net for E-Business are being used
   for interoperability
     - XML syntax, no semantics of the product descriptions are
  Ontologies for describing the various product descriptions including
   the semantics are the long term goal for seamless integration of the
   supply chain operation
  Need more details of the project
Device Interoperability at Nokia

  Nokia‟s objective is to integrate multiple devices (cell phone, PDA,
   cars, laptop etc) to provide a pervasive computing environment
  Objects is to locate the various services and understand the
   different devices and their functions
     - Need to describe the various services
     - Current technology provides syntactic descriptions
  Semantic web technologies, through ontologies enable the
   understanding the devices and reasons about their functions
  Need more details of the project
Common Threads and Challenges

  Common Threads
    - Building Ontologies for Semantics
    - XML for Syntax
  Challenges
    - Scalability, Resolvability
    - Security policy specification, Securing the documents and
    - Developing applications for secure semantic web technologies
    - Automated tools for ontology management
What is Ontology Engineering?
  Tools and Techniques to
    - Create Ontologies
    - Specify Ontologies
    - Maintain Ontologies
    - Query Ontologies
    - Evolve Ontologies
    - Reuse Ontologies
    - Incorporate features such as security, data quality, integrity
Manual Constructiob of Ontologues
  Determine Scope
  Consider Reuse
  Enumerate Terms
  Define Taxonomy
  Define Properties
  Define facets
  Define Instances
  Check for Anomalies
Reuseing Exitsing Ontologies
  The goal is not to reinvent the wheel
  Several ontologies have been developed for different domains
  Codieid Bodies of Expert Knowledge
  Integrated Vocabularies
  Upper Level Ontologies
  Topic Hierarchies
  Linguistic Resources
  Ontology Libraries
Semi/Automatics Methods for Ontology
  Much of the research is focusing on developing ontologies
   using tools from multiple heterogeneous data sources
  Essentially extracting concepts and expanding on concepts
   from the data sources
  Uses combination of data integration, metadata extraction,
   and machine learning techniques
  E.g. Clustering of concepts, Classification of concepts etc.
  Text Book describes Semantic Web Knowledge Management
Web Services

  Web services can be utilized by any of the other applications
   discussed in this unit
  Web services are invoked to carry out functions on the web
   including find locations, search for documents etc.
  Simple services and compound services
  Three components to the service
     - Service profile: Description of the service – what it does
     - Serviced model: how it does it
     - Service groundings: protocol for invoking the service
  E.g.,
     <profile: ServiceProvider rdf ID = “Sportsnews”>
     - ----
     - </profile: ServiceProvider>
Web service architecture



requestor Request the
  Secure Web Service Architecture
       Confidentiality, Authenticity, Integrity


       Query                                              BusinessService

                              UDDI   PublisherAssertion


 Service                                    Service
requestor                                   provider
  Need tools for developing semantic web technologies
     - XML documents, RDF documents, Ontologies, etc.
  How to integrate the multiple ontologjes and tools?
  Role of Agents – agents are processes that reasons with semantic
   web technologies
  Semantic web services, data mining, knowledge management