The Semantic Web

Document Sample
The Semantic Web Powered By Docstoc
					The Semantic Web

Is it “The Web shortcut to A.I.”?

       Or more? Or less?
                                          Evolution of World Wide Web

                                                                                    Rule Interchange
                                                                                                          Personal Agents
                                                     Semantic Web era               Linked Data
                                                                                                       WWW Database
                                                                                          Web 3.0
Richness of data connections
                                                                                 SPARQL     2010-2020
                                                                    ATOM   RDF/OWL         Cloud computing & SaaS
                                                                       AJAX/JSON   Social networks
                                                                         SOAP    Blogs/Wikis
                                              WWW era                  Web 2.0
                                                                           2000-2010
                                                                 XML      Portals
                                                          OO/Java    Intranet
                                                   HTTP/HTML         Groupware
                                                      Web 1.0
                                                          1990-2000
                               PC era          Gopher
                                             SQL        Databases
                                         SGML      File servers
                                Desktop
                                Computing
                                         1980-1990
                               FTP  File systems
                                Email


                                          Richness of social connections
Motivation for Semantic Web




Before Semantic Web                           Semantic Web Structure


                                                              Semantic
                                                                             Ontologies   Logical Support
                                                             Annotations
                                                Semantic
                                                Web

                                                                               Tools      Applications /
                                                             Languages
                                                                                            Services



WWW      Creators                 Users
                                               WWW         Creators                              Users
and                                            and
Beyond              Web content                                            Web content
                                               Beyond

                                          7                                                                 8
                                                                                                                3
Linked Data: The World Wide
Web database
The Semantic Web
   A Vision Of Possibilities
   “The Semantic Web is an extension of
    the current web in which information
    is given well-defined meaning, better
    enabling computers and people to
    work in cooperation.”
   -- Tim Berners-Lee, James Hendler and Ora Lassila,
    The Semantic Web, Scientific American, May 2001


                                                         5
Semantic Web
   In the Semantic Web we will need:
       Machines talking to machines – semantics need to
        be unambiguously declared
       Joined-up data – enabling complex tasks based on
        information from various sources
       Wide scope – from, say, home to government to
        commerce
       Trust – both in data and who is saying it


   This is not going to be easily achieved

                                                      6
   Semantic Web vs Semantic
   Technologies
Semantic Web                               Semantic Technologies

Semantic Web Formats (RDF, OWL, etc.)
                                           Natural-language processing
Query language (SPARQL)
                                           Data mining/Machine learning
Rules language (RIF)
                                           Artificial intelligence/Expert systems
Web pages marking language (RDFa)
                                           Classification
Triple/Quad stores

                                           Semantic search
    Semantic web applications
   Examples:
       Personal information management (Chandler)
       Social networking (FOAF)
       Information syndication (RSS,PRISM)
       Library/museum data (Dublin Core, Harmony)
       Network security and configuration (SWAD-E)
Interesting quotes
   “Knowledge representation (…) is
    clearly a good idea, and some very nice
    demonstrations exist, but it has not yet
    changed the world.”

Meaning: Of course the Semantic Web
 will do that. Will it?
Today´s web
   It is designed for human consumption
   Information retrieval is mainly
    supported by keyword-based search
    engines
   Some problems with information
    retrieval:
      High recall, low precision

      Low or no recall

      Results are highly sensitive to
       vocabulary
            But what about machines?




     tell

register




            Machines still have a very minimal
            understanding of text and images.
                                                 11
   machine-friendly data
   Li Ding is a person              LiDingisasaon
       Natural Language
        as seen by a person                  as seen by a machine


<person>Li Ding</person> <on>LiDig</on>
       XML – represent structures
         as seen by a person                 as seen by a machine
       Semantic Web - represent more semantics
            represent structures
            enable common vocabulary
            associate symbols with logic interpretation for
             inference                                         12
  The Semantic Web
  XML    Customized tags, like:
   <dog>Nena</dog>

+ RDF    Relations, in triples, like:
    (Nena) (is_dog_of) (Ahmed/Said)

+ Ontologies        Hierarchies of concepts, like
    mammal -> canine -> Cotton de Tulear -> Nena

+ Inference rules            Like:
    If (person) (owns) (dog), then (person) (cares_for) (dog)


= Semantic Web!
             Semantic Web Layers
                                                                              Semantic
                                                                              Aspect




                                                                              Web
                                                                              Aspect
                                                                    HTTP
           "The Semantic Web is an extension of the current web in which information is
           given well-defined meaning, better enabling computers and people to work in
           cooperation.“    – Berners-Lee, Hendler & Lassila, Scientific American, 2001
                                                                                         14
Image source: http://en.wikipedia.org/wiki/Image:W3c_semantic_web_stack.jpg
     XML (eXtensible Markup Language)

 Standard for information and exchange

 XML v. HTML
    HTML: restricted set of tags, e.g. <TABLE>, <H1>, <B>, etc.
    XML: you can create your own tags
 Selena Sol (2000) highlights the four major benefits of using XML
  language:
    XML separates data from presentation which means making changes to the
    display of data does not affect the XML data;
    Searching for data in XML documents becomes easier as search engines can
    parse the description-bearing tags of the XML documents;
    XML tag is human readable, even a person with no knowledge of XML
    language can still read an XML document;
    Complex structures and relations of data can be encoded using XML.


                                                                                15
               XML: An Example


• XML is a semi structured language
     <Book Id= “B105”>
       <Title> Topics in Optimal Transportation </Title>
       <Author>
            <Name> Cedric Villani </Name>
       </Author>
       <Publisher>
            <Name> American Mathematical Society
     </Name>
            <Place> NewYork</Place>
        </Publisher>
      </Book>
                                                       16
RDF Motivation
   The Resource Description Framework (RDF)
    is a language for representing resources in
    the World Wide Web.
   RDF is intended for situations in which this
    information needs to be processed by
    applications, rather than being only displayed
    to people.
   RDF is based on the idea of identifying things
    using Web identifiers (URIs).
                The Semantic Web is simple
        Don't say "colour" say <http://example.com/2002/std6#col>
              Each URI denotes a concept
              URIs are connected by triples




              Relational database                            RDF (Resource Description Framework)
              Machines read data as directed RDF graph


Source: Tim Berners-Lee, Putting the Web back into Semantic Web, ISWC2005 Keynote                   18
      RDF Basic Concepts
    Example

    „Imagine trying to state that someone named Ahmed
    Hassan created a particular Web page.“
http://www.example.org/index.html has a creator whose value is Ahmed Hassan

   the thing the statement describes (the web page`s URL)
   a specific property of the thing (e.g. creator)
   the concrete message the statement wants to give,
    in other words the value of the property (Ahmed
    Hassan)
     RDF Basic Concepts
RDF terminology
   the part that identifies the thing the statemant is about
    is called subject
   the part that identifies the property is called predicate
   the part that identifies the value of the property is called
    object

                                Predicate
                      Subject               Object
       RDF Basic Concepts
RDF terminology
   the part that identifies the thing the statemant is about is called subject
   the part that identifies the property is called predicate
   the part that identifies the value of the property is called object

                              Predicate
                  Subject                 Object




http://www.example.org/index.html has a creator whose value is Ahmed Hassan

   the subject is the URL
    „http://www.example.org/index.html“
   the predicate is the word „creator“
   the object is the name „Ahmed Hassan“
      RDF Model
      As mentioned:
             RDF makes statements about resources
             Each statement consists of a subject, a predicate and an object

http://www.example.org/index.html has a creator whose value is Ahmed Hassan



                       http://www.example.org/index.html
          subject                                                                 predicate

                                        http://purl.org/dc/elements/1.1/creator


                      http://www.example.org/staffid/5232

                                                                object
  RDF Basic Concepts
To make these statements machine-proccessable
two things are needed:
      a system of machine-processable identifiers (for subjects,
       predicates and objects) without any possibilty of confusion between
       similar looking identifiers
  Uniform Resource Identifiers (URI) allow to identify and
  uniquely name things - even if they have no network-
  accessible location.

      a machine-processable language for representing these statements
       and exchanging them between machines

  RDF defines a XML markup language, named RDF/XML,
  which allows to represent RDF statements.
       RDF Syntax
  http://www.example.org/index.html


                   http://www.example.org/terms/creation-date



          August 16, 1999



<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
               xmlns:exterms="http://www.example.org/terms/">


   <rdf:Description rdf:about="http://www.example.org/index.html">
     <exterms:creation-date>August 16, 1999</exterms:creation-
date>
   </rdf:Description>


</rdf:RDF>
RDF Developments
   We have seen that:
       RDF looks complex
       There are still some uncertain areas
   Let’s now look at:
       A simple RDF application
       Browser support
       Project work
       Related work which may:
            require the Semantic Web
            be used to build the Semantic Web
                                                 25
        A Lightweight RDF Application


   RSS (RDF Site Summary):
     Example of a lightweight RDF application

     A format for news syndication

     Worth looking at for:

              News syndication
              Gaining experience of an RDF application
       See:
          <http://blogspace.com/rss/>
          <http://www.oreillynet.com/rss/>
          <http://www.webreference.com/authoring/languages/
            xml/rss/intro/>
        Browser Support
                            http://www.mozilla.org/rdf/doc/

       The Mozilla open
        source browser is
        using RDF to
        integrate and
        aggregate
        Internet
        resources.



2
7
RDF Conclusion
   Expressivity of RDF is limited
       Local scope of properties
       Disjointness of classes
       Boolean combination of classes
       Cardiniality restrictions
       Special characteristics of properties
   Need for standardized ontology language
    that builds upon existing concepts of RDF


    => OWL Web Ontology Language
What Is An Ontology
   An ontology is an explicit description of a
    domain:
       concepts
       properties and attributes of concepts
       constraints on properties and attributes
       Individuals (often, but not always)
   An ontology defines
       a common vocabulary
       a shared understanding
Ontology Examples
   Taxonomies on the Web
       Yahoo! categories
   Catalogs for on-line shopping
       Amazon.com product catalog
   Domain-specific standard terminology
       Unified Medical Language System (UMLS)
       UNSPSC - terminology for products and
        services
Ontology-Development Process

 determine    consider    enumerate        define        define       define        create
   scope       reuse        terms         classes      properties   constraints   instances




In reality - an iterative process:
 determine    consider    enumerate      consider        define     enumerate       define
   scope       reuse        terms         reuse         classes       terms        classes

  define       define       define         define        create       define        create
properties    classes     properties     constraints   instances     classes      instances


consider       define        define        create
 reuse       properties    constraints   instances
       Determine Domain and Scope
     determine   consider   enumerate    define     define       define        create
       scope      reuse       terms     classes   properties   constraints   instances



   What is the domain that the ontology will
    cover?
   For what we are going to use the ontology?
   For what types of questions the information
    in the ontology should provide answers
    (competency questions)?
    Answers to these questions may change during
                      the lifecycle
        Consider Reuse
     determine   consider   enumerate    define     define       define        create
       scope      reuse       terms     classes   properties   constraints   instances




   Why reuse other ontologies?
       to save the effort
       to interact with the tools that use other
        ontologies
       to use ontologies that have been validated
        through use in applications
What to Reuse?
   Ontology libraries
       DAML ontology library
        (www.daml.org/ontologies)
       Ontolingua ontology library
        (www.ksl.stanford.edu/software/ontolingua/)
       Protégé ontology library
        (protege.stanford.edu/plugins.html)
   Upper ontologies
       IEEE Standard Upper Ontology
        (suo.ieee.org)
       Cyc (www.cyc.com)
What to Reuse? (II)
   General ontologies
       DMOZ (www.dmoz.org)
       WordNet
        (www.cogsci.princeton.edu/~wn/)
   Domain-specific ontologies
       UMLS Semantic Net
       GO (Gene Ontology)
        (www.geneontology.org)
        Enumerate Important Terms
    determine   consider   enumerate    define     define       define        create
      scope      reuse       terms     classes   properties   constraints   instances




   What are the terms we need to talk
    about?
   What are the properties of these terms?
   What do we want to say about the terms?
Define Classes and the Class
Hierarchy
    determine   consider   enumerate    define     define       define        create
      scope      reuse       terms     classes   properties   constraints   instances



   A class is a concept in the domain
       a class of courses
       a class of students
       a class of graduate students
   A class is a collection of elements with similar
    properties
   Instances of classes
       a student of AI course who will come today
Class Inheritance
   Classes usually constitute a taxonomic
    hierarchy (a subclass-superclass hierarchy)
   A class hierarchy is usually an IS-A hierarchy:
    an instance of a subclass is an instance of a
      superclass
   If you think of a class as a set of elements, a
    subclass is a subset
Class Inheritance - Example
   Apple is a subclass of Fruit
    Every apple is a fruit
   Student is a subclass of Person
    Every Student is a Person
   G student is a subclass of Student
    Every G Student is a Student
Modes of Development
   top-down – define the most general
    concepts first and then specialize them
   bottom-up – define the most specific
    concepts and then organize them in
    more general classes
   combination – define the more salient
    concepts first and then generalize and
    specialize them
Define Properties of Classes – Slots
    determine   consider   enumerate    define     define       define        create
      scope      reuse       terms     classes   properties   constraints   instances




   Slots in a class definition describe
    attributes of instances of the class and
    relations to other instances
    Each Student will have Name, GPA, Address,
     etc.
Slots for the Class




      (in Protégé-2000)
        Property Constraints
    determine   consider   enumerate    define     define       define        create
      scope      reuse       terms     classes   properties   constraints   instances



   Property constraints (facets) describe or
    limit the set of possible values for a slot
    The name of a Student is a string
    The Birth date is an instance of Date
    A student has exactly one Address
Common Facets
   Slot cardinality – the number of values
    a slot has
   Slot value type – the type of values a
    slot has
   Minimum and maximum value – a range
    of values for a numeric slot
   Default value – the value a slot has
    unless explicitly specified otherwise
Common Facets: Value Type
   String: a string of characters (“Château
    Lafite”)
   Number: an integer or a float (15, 4.5)
   Boolean: a true/false flag
   Enumerated type: a list of allowed values
    (high, medium, low)
   Complex type: an instance of another class
       Specify the class to which the instances belong
    The Wine class is the value type for the slot
      “produces” at the Winery class
        Create Instances
    determine   consider   enumerate    define     define       define        create
      scope      reuse       terms     classes   properties   constraints   instances


   Create an instance of a class
       The class becomes a direct type of the instance
       Any superclass of the direct type is a type of the
        instance
   Assign slot values for the instance frame
       Slot values should conform to the facet constraints
       Knowledge-acquisition tools often check that
Creating an Instance: Example
Ontologies and the SW Languages

      Most Semantic Web languages are designed
       explicitly for representing ontologies
          RDF Schema
          DAML+OIL
          SHOE
          XOL
          XML Schema
Ontology Languages: RDFS and OWL
   RDFS
       Set theory – rdfs:Class
       Relation – rdf:Property, rdfs:domain, rdfs:range
       Hierarchy – rdfs:subClassOf, rdfs:subPropertyOf
       Built-in Datatype – xsd:string, xsd:dataTime
   OWL
       Description Logic
             Class, Thing, Nothing
             DatatypeProperty, ObjectProperty, AnnotationProperty,…
       Class axioms
             oneOf, disjointWith, unionOf, complementOf, intersectionOf …
             Restriction, onProperty, cardinality, hasValue…
       Property axioms
             inverseOf , TransitiveProperty , SymmetricProperty
             FunctionalProperty, InverseFunctionalProperty
       Equality– equivalentClass , sameAs , differentFrom…
       Ontology annotation – Ontology, imports, versionInfo
                                                                             49
               More languages and more
               ontologies
    Languages (require special inference engine)
          [Trust/Uncertainty] BayesOWL

          [Proof] PML (Proof Markup Language)

          [Query/Data Access] SPARQL Query Language for RDF
          [Rule] SWRL( Semantic Web Rule Language)
          [Policy] REI: A Policy Specification Language

          [Service] OWL-S by DAML (1.2 preview available)
          [Service] SAWSDL (Semantic Annotations for WSDL)
          [Thesauri] SKOS (Simple Knowledge Organization System)

    Ontologies (only need RDFS and/or OWL inference)
          Upper ontologies - OpenCyc, WordNet, OntoSem, SUO
          Specialized common ontologies - FOAF, Dublin Core, RSS
          Domain ontologies – bibtex, biology, and many…
    Li Ding, Pranam Kolari, Zhongli Ding, and Sasikanth Avancha, “Using Ontologies in the Semantic Web: A Survey”, in Ontologies
    in the Context of Information Systems (book chapter), 2005. http://ebiquity.umbc.edu/paper/html/id/257/            50
                Semantic Web Tools
                                                                              Pellet (DL)
                                                    Editor
     Online Registry                                                          Racer (DL)
                                                    Protégé                  FACT++ (DL)
    DAML Ontology Library                                         Reasoner
                                                    Swoop                    Jena        Jena (SPARQL)
    Schema Web
                                                                              JTP         KAON
 Search Engine
                                                                              F-OWL       Kowari
                                                                 inference
                                                                              Euler       Seasam
Swoogle                                                                      CWM         OWLIM
Semantic Web Search                                Managing
                                                                                                     3store
                                                    Ontologies   instance       Triple store
     Browser                                                                                         Instance store
                                                                                                     Redland
    Tabulator                                                                                       Tap
    IsaViz                                                          integrate
                                                                                         ONION      RDF store
    Piggybank                                                                           PROMPT     Yars
    Arago                                                                               OntoMapper IBM IODT
    Horus                                                       Mapping Tools           Glue       RDFLib
    Mspace                                                                              OntoMerge RDF gateway
    Magpie                                                                              Ontomorph allegro
source1: http://ebiquity.umbc.edu/paper/html/id/257/Using-Ontologies-in-the-Semantic-Web-A-Survey    Oracle 10
source2: http://www.wiwiss.fu-berlin.de/suhl/bizer/toolkits/
                                                                                                          51
                                        52




        Swoogle Semantic Web Search Engine
   Harvesting Semantic Web data
    from the Web
   Provide search/navigation
    services for machines (via
    REST+ RDF/XML)
       Digest doc, term, namespace
       Links
   Also serves human users
   Status
       Running since summer 2004
       1.6M RDF documents, 300M RDF
        triples, 10K ontologies
       JENA

§ Jena is a Java framework for building Semantic Web applications. It
  provides a programmatic environment for RDF, RDFS and OWL,
  including a rule-based inference engine.
§ Jena is open source and grown out of work with the HP Labs Semantic
  Web Program.
§ The Jena Framework includes:
   l    A RDF API
   l    Reading and writing RDF in RDF/XML, N3 and N-Triples
   l    An OWL API
   l    In-memory and persistent storage
   l    RDQL – a query language for RDF

       http://jena.sourceforge.net/tutorial/RDF_API/index.html

                  http://jena.sourceforge.net/
                                                                    53
 Jena Integration of Protégé-
 OWL
§ Jena is one of the most widely used Java APIs for RDF and OWL,
  providing services for model representation, parsing, database
  persistence, querying and some visualization tools. Protege-OWL
  always had a close relationship with Jena. The Jena ARP parser is still
  used in the Protege-OWL parser, and various other services such as
  species validation and datatype handling have been reused from Jena.
  It was furthermore possible to convert a Protege OWLModel into a Jena
  OntModel, to get a static snapshot of the model at run time. This
  model, however had to be rebuild after each change in the model.
§ As of August 2005, Protege-OWL is now much closer integrated with
  Jena. This integration allows programmers to user certain Jena
  functions at run-time, without having to go through the slow rebuild
  process each time. The architecture of this integration is illustrated on
  the next slide…



    http://protege.stanford.edu/plugins/owl/api/guide.html            54
SPARQL dataset finder
           Who knows Anupam Joshi?
           Show me their names, email address
           and pictures
                      1. Compose a SPARQL query
                      without FROM clause



                     2. Parse SPARQL query, search
                     Swoogle for related URLs,
                     and compose a dataset



                      3. Run SPARQL query on dataset

                                              55
Joseki - a SPARQL Server for Jena


§ Joseki: The Jena RDF Server. Joseki is a server for publishing RDF
  models on the web. Models have URLs and they can be access by HTTP
  GET. Joseki is part of the Jena RDF framework.
§ Joseki is an HTTP and SOAP engine supports the SPARQL Protocol and
  the SPARQL RDF Query language. SPARQL is developed by the W3C
  RDF Data Access Working Group.
§ Joseki Features:
   l    RDF Data from files and databases
   l    HTTP (GET and POST) implementation of the SPARQL protocol
   l    SOAP implementation of the SPARQL protocol


       http://prdownloads.sourceforge.net/joseki/joseki-3.0-beta-1.zip?download

                   http://www.joseki.org/
                                                                                  56
            Integrating Social Networks
                         FOAF Network                                                              Reputation Systems
                                                                               J. Golbeck
                                                 source                                                             Google PageRank

data                                         L. Ding
                                                              knows                                                    Citeseer Rank

  FOAF                    P. Kolari                                       H. Chen
                                                                                                   J. Hendler
                                                                                       knows
        knows RDF
        RDF/XML                   knows                                              F. Perich

                           Kagal                 T. Finin              A. Joshi
   DBLP                                                                                                    Golbeck’s
        Coauthor                          hub                    sink                                  Trust Network
                          island
         Database
        HTML                                               sameName
                                           L. Ding                                           Y. Peng
                                                                   co-author
   Trust                                                                                     6            1
        Reputation                                                   28
                           L. Kagal                    T. Finin                 A. Joshi                            A. Sheth
        Trust network                                                                         1
                                                                                                                5
Computation                  H. Chen
                                                                                            M. P. Singh
 Entity mapping                                                  F. Perich            DBLP Coauthor Network
 Tie strength
 Trust aggregation                                                                                                            57
      PML: Proof Markup Langauge
    isQueryFor
  Query           foo:query1 Question foo:question1
  (type TonysSpecialty ?x)     (what is Tony’s Specialty)
     hasAnswer                                                IWBase
                                              hasLanguage
    NodeSet          foo:ns1
                                                               Language
    (hasConclusion …)
  fromQuery        isConsequentOf          hasInferencEngine
                                                               InferenceEngine
       InferenceStep                        hasRule
                                                               InferenceRule
                     hasAntecendent
                                                                 Source
     …          NodeSet         foo:ns2
                (hasConclusion …)          hasVariableMapping
                                                               Mapping
   fromAnswer            isConsequentOf
                                    hasSourceUsageSourceUsage hasSource
        InferenceStep
                                                     usageTime …
Justification Trace




                                                                                 58
What is a Domain-Specific Markup Language
                 (DSML)

• Medium of
  communication for
  users of the domain

• Follows XML syntax

• Encompasses the
  semantics of the
  domain
                            DSML users
                                         59
         Examples of DSMLs


 MML: Medical Markup Language
 CML: Chemical Markup Language
 MatML: Materials Markup Language
 WML: Wireless Markup Language
 MathML: Mathematics Markup Language




                                        60
MathML: Presentation Markup in
Villani’s works
    <mrow>
       <mi> H </mi>
       <mo> = </mo>
       <mo> ∫ </mo>
       <mi> ρ </mi>
       <mo> log </mo>
       <mi> ρ </mi>
       <mo> d</mo>
       <mi> v <mi>
    </mrow>
A Search Engine Using
    Semantic Web
    background
   Semantic search: extending traditional search
    with the semantic web technology
       Exploiting the explicit meaning of documents (i.e.,
        ontology-based metadata)


   Current semantic search tools
       Form-based, e.g., SHOE, Magnet
       View-based, e.g., GRQL, SQoogle, Ontogator,
        Falcon-S
       QA-based, e.g., AquaLog, ORAKEL
       Keyword-based, e.g., TAP, Squiggle, DOSE
       The search process
   Step1: making sense of the user
    queries
   Step2: translating user queries into
    formal queries
   Step3: Querying the back-end semantic
    data repository
   Step4: Ranking
     Making sense of user queries
   Finding out the meaning of keywords
       Class, e.g., the keyword “phd students”
       Relation, e.g., “author”
       Instance, e.g., “Enrico”, ”director”


   Method: text search
       Labels (rdfs:label)
       Short literals also used in the case of instances
        matching
            When searching for “director”, the instances can be picked up.
Translating user queries into formal
queries

   Input: semantic entity matches of the
    search keywords
       Each keyword -> multiple matches


   Output: formal queries which reflect
    the user query
       One user query -> multiple formal queries.
      Simple queries

   There are only two keywords involved: <subject:keyword>
   Fixed number of combination types
    Subject match Keyword match        Example
    Class         Class                <news: phd students>

                  Property             <news: author>

                  Instance             <news: chief scientist>

    Instance      Property             <victoria:author>

                  Instance             <victoria:yuangui>

    Property      Instance             <member: x-media>

                  Property             <member: author>
    Example
   Pattern: Subject -> Class Cs; Keyword -> Class Ck
   Results: <Is,Relation,Ik> associated with exploratory links.
   Example: news stories about phd students
         <news “KMi success”, mentions-person, Tom-Heath>


   A simplified template in Sesame SERQL:


         select {Is}, {R}, {Ik} from {Is} rdf:type {Cs},
                                     {Ik} rdf:type {Ck},
                                     {Is} R {Ik}
         union
         select {Is}, {R}, {Ik} from {Is} rdf:type {Cs},
                                     {Ik} rdf:type {Ck},
                                     {Ik} R {Is}
Text mining and the Semantic Web
Text mining stages



      Document selection and filtering (IR
       techniques)
      Document pre-processing (NLP
       techniques)
      Document processing (NLP / ML /
       statistical techniques)
      Stages of document processing
   Document selection involves identification and retrieval
    of potentially relevant documents from a large set (e.g.
    the web) in order to reduce the search space. Standard
    or semantically-enhanced IR techniques can be used for
    this.
   Document pre-processing involves cleaning and
    preparing the documents, e.g. removal of extraneous
    information, error correction, spelling normalisation,
    tokenisation, POS tagging, etc.
   Document processing consists mainly of information
    extraction
   For the Semantic Web, this is realised in terms of
    metadata extraction
    IE as an alternative to IR
   Information Extraction returns knowledge
    at a much deeper level than traditional IR
   Constructing a database through IE and
    linking it back to the documents can
    provide a valuable alternative search tool.
   Even if results are not always accurate,
    they can be valuable if linked back to the
    original text
Some example applications
   HaSIE
   KIM
   Threat Trackers
HASIE
   System identifies relevant sections of
    each document, pulls out sentences
    about health and safety issues, and
    populates a database with relevant
    information
      KIM
   KIM is a software platform developed by
    Ontotext for semantic annotation of text.
   KIM performs automatic ontology population
    for Semantic Web
   Indexing and retrieval (an IE-enhanced
    search technology)
     Threat tracker
   Application developed by Alias-I which finds and
    relates information in documents
   Intended for use by Information Analysts who use
    unstructured news feeds and standing collections
    as sources
   Used by DARPA for tracking possible information
    about terrorists etc.
Threat tracker
Semantic Web Services
A Motivating Example
  A Company
  In Germany
     Needs to                            Available on a
     find all                            24/7 basis
     services
     for…       The process should be
 Tax            fully automated: no human
 Preparati      interaction
                                 Payment
                 Located in Berlin,
 on
                 Germany         method
 Software
                                 should be
                                 credit
                                 card
Search topics
   Semantic   Web   in   e-learning
   Semantic   Web   in   D-Library
   Semantic   Web   in   regular search engine
   Semantic   Web   in   Security

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:5/13/2013
language:simple
pages:80
yaofenjin yaofenjin http://
About