Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out

Efficient Keyword Search over DBLife _ DBLP Data

VIEWS: 14 PAGES: 16

									  Efficient Keyword Search over
  DBLife & DBLP Data

CS511 (Inprogress) Project Presentation, Dec-09-2005

                                Mayssam Sayyadian
                                    Nhung Nguyen
                                           Hieu Li
Introduction
   DBLife: Manages Unstructured Data
       People are familiar with keyword searching
        unstructured data
   … but, DBLife  ER graph
       Entities, mentions, etc. : structured data
        extracted
   DBLP: Well known, available, enriched
    database of publications
       DBLife does not cover all the data in DBLP
    Assumption
   Data is in relational format, not XML
   DBMS provides text indexing at column
    level
       Oracle, SQL Server, DB2, MySql, PostgreSQL
   Support for XML data is subject of future
    work
     Basic Model
    Database: modeled as a graph
        Nodes = tuples
        Edges = references between tuples
             foreign key, inclusion dependencies, ..
             Edges are directed.

eTuner: Tuning Schema …            iMAP: Discovering …          paper


                                                                writes


 Mayssam Sayyadian            AnHai Doan         Pedro Domingos author
 Answer Example
Query: Mayssam AnHai
                                          paper
                eTuner: Tuning Schema …

    writes                           writes

 author                                   author
          Mayssam          AnHai Doan
        Answer Model
   Query: set of keywords {k1, k2, .., kn}
       Each keyword ki matches set of nodes Si
   Answer: rooted, directed tree connecting
    nodes, with one node from each Si
       Root node (we call it an information node) has special
        significance, may be restricted to some relations
            E.g. relations representing entities, not relationships
   Multiple answers ranked by a scoring function
    Score of Result T
   Combining function Score combines
    scores of attribute values of T
   One reasonable choice:
    Score=aTScore(a)/size(T)
   Attribute value scores Score(a) calculated
    using the DBMS's IR Index
       Implementation

                                  EasyDB Components
                     JSPs
Browser / Client
                                       Java Beans
              Http     Java API                              DBLP

                                                      JDBC
                     Servlets
              Http     Java API
                                                             DBLife



                                Web Server
 Searching over Multiple
 Databases: System Architecture
   Preprocessing: Offline             Querying: Online
                                          User
       Index Builder
                                               Q
  DBLife            DBLP                 IR Engine
  IR Index         IR Index                   Tuplesets
             ForeignKey Joins              Top-k
                                         Generator
Schema       +      Join                       SQL Queries
Matching          Discovery
                                Distributed SQL Query Processor



  DBLife         DBLP               DBLife          DBLP
    Top-K Generator
   Contributions:
       Iterative Refinement Algorithm
            A unifying framework to search for Top-K best
             tuple-trees
       Cast previous algorithms into IRA
       Improve them substantially
         IRA Framework
    Concepts:
        Abstract State, Concrete State, Score Interval
    IRA Alg: branch and bound search
1. Abstraction: Create initial abstract states
2. While less than k states output, iteratively:
  (a) Evaluation: Update the score intervals
  (b) Elimination: Eliminate (prune) the space of states
  (c) Refinement: Select an abstract state and refine it
  (d) If the goal state (the top-1 state) is found:
          Output it and remove it.
       IRA - Example
iteration 1                iteration 2                iteration 3
                      K = {P2, P3}, min score = 0.7

....   P [0.6, 1]
                              ..      P1 [0.6, 0.8]


                                  .   P2 0.9
                                                                           Res = {P2, R2}
.. .   Q [0.5, 0.7]                                                        min score = 0.85
                              .       P3 0.7          ..   R1 [0.4, 0.6]

. ..   R [0.4, 0.9]           ...     R [0.4, 0.9]     .   R2 0.85
     IRA Algorithms
   Kite: straight forward adaptation of state of the
    art algorithm (hybrid) to IRA
   aKite: adaptive Kite  able to change and adapt
    over time
   daKite: adaptive Kite algorithm armed with more
    sophisticated refinement rules (read: more cost
    effective search heuristics)
         Preliminary Experiments
                 Currently experiments over DBLP data

        t (sec)
                                   Single DBLP
6
                      Kite
                      aKite
3                     daKite

0
                                                         max CN size
    1             2       3    4     5    6      7   8   9
Future Work
   Better UI & Browsing facilities
   User feedback
   Extend to handle XML data
References
   V. Hristidis, L. Gravano, Y. Papakonstantinou,
    “Efficient IR-Style Keyword Search over
    Relational Databases”
   S. Agrawal, S. Chaudhuri, G Das, “DBXplorer:
    A System for Keyword Search over Relational
    Databases”
   G. Bhalotia, A. Hulgeri, C. Nakhe, S.
    Chakrabati, “Keyword Searching and
    Browsing in Databases using BANKS”

								
To top