161

Shared by: gegeshandong
Categories
Tags
-
Stats
views:
2
posted:
3/29/2012
language:
pages:
35
Document Sample
scope of work template
							                                   Finding and Ranking
                                    Knowledge on the
                                      Semantic Web

                                   Li Ding, Rong Pan, Tim Finin, Anupam
                                    Joshi, Yun Peng and Pranam Kolari

                                             University of Maryland,
                                               Baltimore County
                                                     http://creativecommons.org/licenses/by-nc-sa/2.0/
                                         This work was partially supported by DARPA contract F30602-97-1-0215, NSF
UMBC
an Honors University in Maryland
                                           grants CCR007080 and IIS9875433 and grants from IBM, Fujitsu and HP.
                                                                                                                     1
                                              This talk
                                   •   Motivation
                                   •   Swoogle overview
                                   •   Bots navigate the Semantic Web
                                   •   Ranking Semantic Web content
                                   •   Use cases and applications
                                   •   Conclusions

UMBC
an Honors University in Maryland
                                                                        2
                           Google has made us smarter




UMBC
an Honors University in Maryland
                                                        3
                                   But what about our agents?




                    tell

       register




                                     A Google for knowledge on the Semantic
UMBC                                 Web is needed by people and software agents
an Honors University in Maryland
                                                                                   4
                                              This talk
                                   •   Motivation
                                   •   Swoogle overview
                                   •   Bots navigate the Semantic Web
                                   •   Ranking Semantic Web content
                                   •   Use cases and applications
                                   •   Conclusions

UMBC
an Honors University in Maryland
                                                                        5
                                   title
                     • text




UMBC
an Honors University in Maryland
                                           6
                                              Swoogle Architecture
                                   data        IR analyzer   SWD analyzer   interface
                                   analysis                                  Web Server

                                                                             Web Service
                                                SWD Cache    SWD Metadata
                                   metadata                                  Agent Service
                                   creation
                                                              SWD Reader
                                   SWD
                                   discovery                                    The Web
                                               Candidate
                                                 URLs         Web Crawler


               Swoogle 2: 340K SWDs, 48M triples, 5K SWOs, 97K classes,
                          55K properties, 7M individuals (4/05)
               Swoogle 3: 700K SWDs, 135M triples, 7.7K SWOs, (11/05)
UMBC
an Honors University in Maryland
                                                                                             7
Demo
  1                            Find “Time” Ontology
  We can use a set of keywords to search
  ontology. For example, “time, before, after”
  are basic concepts for a “Time” ontology.
Demo
 2(a)   Digest “Time” Ontology (document view)
Demo
 2(b)    Digest “Time” Ontology (term view)




         TimeZone




        before
                 ………….

           intAfter
Demo
  3    Find Term “Person”




                            Not capitalized! URIref is case sensitive!
Demo
  4    Digest Term “Person”




        167 different properties



          562 different properties
Demo
 5(a)   Swoogle Today
Demo
 5(b)                              Swoogle
                                   Statistics




                                                FOAF


                                             Trustix



                                          W3C




                                            Stanford
UMBC
an Honors University in Maryland
                                                       14
           Swoogle’s Triple Store lets you shop




                                      And check
                                      out your
                                      triples into
                                      any of
                                      several
                                      reasoners


UMBC
an Honors University in Maryland
                                                     15
                                                         Summary
                     2004

                                   Swoogle (Mar, 2004)      Automated SWD discovery
                                                            SWD metadata creation and search
                                                            Ontology rank (rational surfer model)
                                                            Swoogle watch
                                                            Web Interface

                                                            Ontology dictionary
                                   Swoogle2 (Sep, 2004)     Swoogle statistics
                                                            Web service interface (WSDL)
                                                            Bag of URIref IR search
                                                            Triple shopping cart
                                                            Better (re-)crawling strategies
                     2005                                   Better navigation models
                                                            Index instance data
                                   Swoogle3 (July 2005)     More metadata (ontology mapping
                                                              and OWL-S services)
                                                            Better web service interfaces
UMBC                                                        IR component for string literals
an Honors University in Maryland
                                                                                                     16
                                              This talk
                                   •   Motivation
                                   •   Swoogle overview
                                   •   Bots navigate the Semantic Web
                                   •   Ranking Semantic Web content
                                   •   Use cases and applications
                                   •   Conclusions

UMBC
an Honors University in Maryland
                                                                        17
                                     The Semantic Web Onion

                                                                            The “Semantic Web”
                                                                            (About 10M documents)
                                              Universal RDF Graph
                                                                 Physically hosting knowledge
                                                                 (About 100 triples per SWD in average)
                                               RDF Document
                                                                     triples modifying the same subject
                          Literal            Class-instance

                                             Molecule                  Finest lossless set of triples
                   Resource
                                             Triple
                                                                       Atomic knowledge block



                                   Swoogle maintains metadata about objects in
                                   different layers of the Semantic Web Onion.
UMBC
an Honors University in Maryland
                                                                                                        18
                                   Semantic Web Navigation Model
                                                                                   sameNamespace, sameLocalname
                                              Term Search
                                                                                     Extends class-property bond
                                                                                          1

                                   RDF graph                         Resource
                                         literal
                                                                                  SWT

                                                   2
                                                 uses                                         4             5
                                               populates                   3                          defines
                                                                isUsedBy               officialOnto
                                                             isPopulatedBy             isDefinedBy
                                   Web                 SWD           rdfs:subClassOf
                                                                                                      SWO
                                                   6                                                        7
                                   rdfs:seeAlso                                                   owl:imports
                                   rdfs:isDefinedBy                                                   …
                                                                   Document Search

                 Navigating the HTML web is simple; there’s just one kind of link.
                 The SW has more kinds of links and hence more navigation paths.
UMBC
an Honors University in Maryland
                                                                                                                   19
                                   Semantic Web Navigation Model
                                                                                   sameNamespace, sameLocalname
                                              Term Search
                                                                                     Extends class-property bond
                                                                                          1

                                   RDF graph                         Resource
                                         literal
                                                                                  SWT

                                                   2
                                                 uses                                         4             5
                                               populates                   3                          defines
                                                                isUsedBy               officialOnto
                                                             isPopulatedBy             isDefinedBy
                                   Web                 SWD           rdfs:subClassOf
                                                                                                      SWO
                                                   6                                                        7
                                   rdfs:seeAlso                                                   owl:imports
                                   rdfs:isDefinedBy                                                   …
                                                                   Document Search

                 Relations in 1 and 3 and parts of 4 require a global view to discover
UMBC
an Honors University in Maryland
                                                                                                                   20
                                              This talk
                                   •   Motivation
                                   •   Swoogle overview
                                   •   Bots navigate the Semantic Web
                                   •   Ranking Semantic Web content
                                   •   Use cases and applications
                                   •   Conclusions

UMBC
an Honors University in Maryland
                                                                        22
                                   Rank has its privilege
                     • Google introduced a new approach to ranking query
                       results using a simple “popularity” metric.
                        – It was a big improvement!
                     • Swoogle ranks its query results also
                        – When searching for an ontology, class or property,
                          wouldn’t one want to see the most used ones first?
                     • Ranking SW content requires different algorithms for
                       different kinds of SW objects
                        – For SWDs, SWTs, individuals, “assertions”,
                          molecules, etc…

UMBC
an Honors University in Maryland
                                                                               23
                     Google’s PageRank
                     • A page’s rank is a function of
                       how many links point to it and the
                       rank of the pages hosting those links.
                     • The “random surfer” model provides                           Jump to a
                                                                                  random page
                       the intuition:
                                   (1) Jump to a random page
                                   (2) Select and follow a random link on the
                                       page and repeat until ‘bored’            yes
                                                                                       bored?

                                   (3) If bored, go to (1)                                  no
                     • Ranked pages by the relative
                       frequency with which they are visited.                           Follow a
                                                                                      random link

UMBC
an Honors University in Maryland
                                                                                                    24
                                   Ranking Semantic Web Documents
                     • Target: a pure SW dataset
                        – Nodes: a collection of online SWDs (330K SWDs, 1.5%
                          are labeled as ontologies)
                        – Links: in addition to hyperlinks, term level relations are
                          generalized into TM, EX, IM.
                     • Rational surfer model (extension of weighted PageRank)
                        – Semantic content (term level relations) encoded into links
                        – rank of node iteratively spread via links
                        – weight/capacity of link vary according to link semantics
                        – propagate weight to imported ontologies
                     • Evaluation
                        – Method: Compare OntoRank with PageRank for
                          promoting ontologies even using the same Pure SW
                          Dataset
UMBC
an Honors University in Maryland
                                                                                       25
                                                        An Example
                                   http://www.w3.org/2000/01/rdf-schema
                                    wPR =300        OntoRank =403

                                                                       TM

                                                              TM             http://xmlns.com/wordnet/1.6/
                                                                               wPR =3         OntoRank =103


                                                                                         EX

                                                                          http://xmlns.com/foaf/1.0/
                                                                  TM
                                                                          wPR =100      OntoRank =100

                        http://www.cs.umbc.edu/~finin/foaf.rdf
                               wPR =0.2           OntoRank =0.2



UMBC
an Honors University in Maryland
                                                                                                              26
                                            Ontology Dictionary
                     •             Motivation
                                   – One ontology does not always provide all needed
                                     vocabulary
                                   – There could be many scenario that requires
                                     assembling terms from multiple ontologies
                     •             DIY ontology engineering
                                   1. Search an appropriate class C
                                   2. Search for popular properties used for modifying C’s
                                      class instance
                                   3. Go back to step 1 if more classes are needed



UMBC
an Honors University in Maryland
                                                                                             27
                                    Ranking Semantic Web Terms
                     • Pr(Term|Doc) can be measured by the normalized
                       value of the product of the term’s
                                   – Popularity: how many SWDs is using the term.
                                   – Frequency: how many times the term is used in the SWD
                     • SWDs are accessed non-uniformly by OntoRank
                     • TermRank estimates a term’s importance as
                        ∑ Pr(Term|Doc) * OntoRank(Doc)
                     • Evaluation
                        – Compare TermRank with Term’s popularity for the top 10
                          highest rated terms and compose analytical evaluation.



UMBC
an Honors University in Maryland
                                                                                             28
                                              Class-Property Bonds
                           Class-Property Bond
                           (introduced by ontology)            SWD1
                           • foaf:mbox
                           • foaf:name
                                                                    foaf:mbox             Class Definition
                                                                                          • rdfs:subClassOf -- foaf:Agent
   Class-Property Bond                                                                    • rdfs:label – “Person”
   (introduced by instances)                            foaf:name        rdfs:domain
   • foaf:name
   • dc:title
                                                         rdfs:domain

                                        SWD2                                           SWD3
                                                                                       rdf:type
                                   rdf:type                                                         owl:Class
                                                             foaf:Person
                                   foaf:name                                     rdfs:subClassOf
                                               “Tim Finin”                                        foaf:Agent
                                   dc:title                                     rdfs:comment
                                              “Tim’s FOAF File”                    “a human being”



UMBC
an Honors University in Maryland
                                                                                                                        29
                                              This talk
                                   •   Motivation
                                   •   Swoogle overview
                                   •   Bots navigate the Semantic Web
                                   •   Ranking Semantic Web content
                                   •   Use cases and applications
                                   •   Conclusions

UMBC
an Honors University in Maryland
                                                                        30
                                       Applications and use cases
                     • Supporting Semantic Web developers, e.g.,
                                   – Ontology designers
                                   – Vocabulary discovery
                                   – Who’s using my ontologies or data?
                                   – Etc.
                     • Searching specialized collections, e.g.,
                                   – Proofs in Inference Web
                                   – Text Meaning Representations of news stories in
                                     SemNews
                     • Supporting SW tools, e.g.,
                                   – Discovering mappings between ontologies

UMBC
an Honors University in Maryland
                                                                                       32
                                              This talk
                                   •   Motivation
                                   •   Swoogle overview
                                   •   Bots navigate the Semantic Web
                                   •   Ranking Semantic Web content
                                   •   Use cases and applications
                                   •   Conclusions

UMBC
an Honors University in Maryland
                                                                        36
                                            Will it Scale? How?
                     Here’s a rough estimate of the data in RDF documents on the
                     semantic web based on Swoogle’s crawling

                      System/date         Terms     Documents Individuals   Triples   Bytes
                          Swoogle2        1.5x105    3.5x105     7x106      5x107     7x109

                          Swoogle3        2x105       7x105     1.5x107     7.5x107   1x1010

                                   2005   2.5x105     5x106      5x107      5x108     5x1010

                                   2008   5x105       5x107      5x108      5x109     5x1011



                  We think Swoogle’s centralized approach can be made to work
                  for the next few years if not longer.
UMBC
an Honors University in Maryland
                                                                                               37
                                           How much reasoning?
                     • SwoogleN (N<=3) does limited reasoning
                                   – It’s expensive
                                   – It’s not clear how much should be done
                     • More reasoning would benefit many use cases
                                   – e.g., type hierarchy
                     • Recognizing specialized metadata
                                   – E.g., that ontology A some maps terms from B to C




UMBC
an Honors University in Maryland
                                                                                         38
                                                  Conclusion
                        • The web will contain the world’s knowledge in
                          forms accessible to people and computers
                                   – We need better ways to discover, index, search and
                                     reason over SW knowledge
                        • SW search engines address different tasks than
                          html search engines
                                   – So they require different techniques and APIs
                        • Swoogle like systems can help create consensus
                          ontologies and foster best practices


UMBC
an Honors University in Maryland
                                                                                          39
             For more information



                                   http://ebiquity.umbc.edu/
                                                          Annotated
                                                           in OWL

UMBC
an Honors University in Maryland
                                                                40

						
Related docs
Other docs by gegeshandong
GrossIncomeExclusionsLecture5100
Views: 6  |  Downloads: 0
JEG1-Securisation bourse
Views: 7  |  Downloads: 0
High Court Judgment Template
Views: 2  |  Downloads: 0
Case study Academic Representation System
Views: 2  |  Downloads: 0
2007-09-26 Board Minutes
Views: 14  |  Downloads: 0
Succession Planning Chart _1_ - ETHICS MATTERS
Views: 11  |  Downloads: 0
CREDIT-CARD
Views: 5  |  Downloads: 0
Reference Desk Schedule
Views: 6  |  Downloads: 0