Presentation BYU Data Extraction Research Group

Document Sample
Presentation BYU Data Extraction Research Group Powered By Docstoc
					HyKSS: Hybrid Keyword and
    Semantic Search
     Andrew Zitzelberger




                            1
Keyword Search




                 2
Form Based Search




                    3
                            What about?




over 8,000 meters in elevation   less than 100K miles   faster than 100 mph




                                                                              4
5
                     HyKSS
• Hybrid Keyword and Semantic Search
• Semantics – extracted annotations
  – Multiple ontologies
• Keywords – text




                                       6
            Thesis Statement
• HyKSS (hybrid search)
  – Outperforms keyword and semantic search
  – Dynamic query weighting outperforms various
    other hybrid search approaches
  – Allows queries over multiple ontologies
  – Allows pay-as-you-go improvement




                                                  7
Extraction Ontologies




                        8
Data Frames




              9
   Indexing Architecture
                  Document
                  Collection



Keyword Indexer                Semantic Indexer




Keyword Index                  Semantic Index



                                                  10
Indexing Architecture Implementation
                                                Ontology
                                                 Library
           Lucene

                                  OntoES



          Document
          Collection

Keyword                Semantic
Indexer                 Indexer
                                       Sesame

Keyword                Semantic
 Index                   Index
                                                           11
          Query Processing
                     Free Form Query
Keyword Processing                     Semantic Processing

    Pre-Process                           Pre-Process
       Query                                 Query



  Execute Query                          Execute Query



   Post-Process                           Post-Process
      Query                                  Query



                     Combine Results
                                                             12
   Keyword Query Pre-Processing
• Remove Lucene special characters (except quotes)
• Remove (inequality) comparison constraints
• Remove non-phrase stopwords



hondas in "excellent condition" in orem for under 12 grand



          hondas “excellent condition” orem
                                                         13
   Keyword Query Execution and
         Post-Processing
• Executed by Lucene
• Empty Post-Processing step




                                 14
  Semantic Query Pre-Processing
    Individual Ontology Scoring
hondas in "excellent condition" in orem for under 12 grand




                                                             15
  Semantic Query Pre-Processing
     Ontology Set Creation
• For each ontology sorted by score:
  – For each remaining ontology:
     • Add point for each new or subsuming match
     • If added points > 0 add ontology
• Completely subsumed ontologies are removed
  during query generation




                                                   16
Semantic Query Pre-Processing
   Ontology Set Creation
       Vehicle          Location


Price < 12000            US_City=“orem”                    Vehicle


                                             Price <
           Vehicle_Score + 1
                                             12000
ContractualServices     Location
                                                           Contractual
                                                            Services
Price < 12000             US_City=“orem”



      ContractualServices_Score + 1        Vehicle_Score
                                                                         17
  Semantic Query Pre-Processing
   Structured Query Generation
• Open world assumption
• SPARQL query




                                  18
    Semantic Query Execution and
          Post-Processing
• Sesame query execution
• Semantic ranking:
   – 1 point for each requested projection satisfied
   – Normalized by # of projections requested

hondas in "excellent condition" in orem for under 12 grand
   – Projections on Make, Price and US_City



                                                             19
       Hybrid Query Processing
• Linear interpolation:
  – (kw_weight * kw_score) + (sm_weight * sm_score)
• Dynamic solution:
  – # keywords remaining (#kw)
  – concept match score (cms)
        = ½ * (selections + projections)
  – kw_weight = #kw/(#kw + cms)
  – sm_weight = cms/(#kw + cms)

                                                      20
Basic Search




               21
Results Display




                  22
Form Based Search




                23
Results Display
         Experimental Setup –
          Ontology Libraries
• 5 Ontology Levels
  – Number
  – Generic Units
  – Vehicle Units
  – Vehicle
  – Vehicle+




                                25
 Experimental Setup – Query Sets
• 113 syntactically unique queries from
  database students
• 60 syntactically unique queries from linguistic
  students




                                                    26
          Experimental Setup –
          Document Collection
• 250 vehicle advertisements (Craigslist)
  – 100 training, 50 validation, 100 test
• 318 mountain pages (Wikipedia)
• 66 roller coaster (Wikipedia)
• 88 video game advertisements (Craigslist)




                                              27
                  Experiments
1) Training queries over test vehicle documents
2) Test queries over test vehicle documents
3) Training queries over test vehicle documents +
   additional noise
4) Test queries over test vehicle documents + additional
   noise
5) 5 queries over noisy data (Generic Units only)



                                                       28
         Experiments - Metric
• Mean Average Precision




                                29
Experimental Results




                       30
Experimental Results




                       31
Experimental Results




                       32
               Conclusions
• Hybrid search outperforms keyword and
  semantic search
• HyKSS’s dynamic query weighting approach
  outperforms various other weighting
  techniques
• Using multiple does not outperform selecting
  and using a single ontology


                                                 33
              External Image Citations
•   Slide 2 Google search screenshot: http://www.google.com (07/30/11)
•   Slide 3 partial car search form screenshots: http://autotrader.com/fyc (07/30/11)
•   Slide 4 mountain image: http://en.wikipedia.org/wiki/Lhotse (04/26/11)
•   Slide 4 car image: http://en.wikipedia.org/wiki/Honda (04/26/11)
•   Slide 4 roller coaster image: http://en.wikipedia.org/wiki/Kingda_Ka (04/26/11)
•   Slide 4 Wikipedia logo: http://en.wikipedia.org/wiki/Main_Page (04/26/11)
•   Slide 4 craigslist logo: http://provo.craigslist.org/ (04/26/11)




                                                                                        34

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:4
posted:4/23/2012
language:
pages:34