Docstoc

How to Develop UDFs_ - Database Group

Document Sample
How to Develop UDFs_ - Database Group Powered By Docstoc
					    DBease: Making Databases
 User-Friendly and Easily Accessible

   Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng

Database Group, Department of Computer Science and Technology,
           Tsinghua University, Beijing 100084, China
How to Access Databases?
• Traditional database-access methods:
  – SQL
     Select title, author, booktitle, year From dblp
     Where title Contains “search” And booktitle Contains “cidr”
  – Query-by-exmaple (Form)

            cidr

  – Keyword Search
    “search cidr”

                              CIDR'11 - DBease (2)
Comparison of Different Methods

      Usability




                  CIDR'11 - DBease (3)
Keyword Search
• Is traditional keyword search good enough?




              Too many
               results!
                     No result!




                             CIDR'11 - DBease (4)
Form-based Search
• Form-based Search has the same problem.



                                            Complicated
                                              and still
                                             no result!




                     CIDR'11 - DBease (5)
Our Solution
                  Type-Ahead
                    Search


                   Type-Ahead Search in Forms
      Usability




                                                 SQL Suggestion




                               CIDR'11 - DBease (6)
What is Type-Ahead Search?




                  CIDR'11 - DBease (7)
Type-Ahead Search
• Advantages
  – On-the-fly giving users instant feedback
  – Helping users navigate the underlying data
  – Tolerating inconsistencies between query and data
  – Supporting Synonyms
  – Supporting XML data
  – Supporting Multiple tables




                          CIDR'11 - DBease (8)
Problem Formulation
• Data: A set of records                                     Edit Distance:
• Query                                              The number of edit operations
                                                   (insertion, deletion, substitution)
  – Q = {p1, p2, …, pl}: a set of prefixes          transformed a string to another
  – δ: Edit-distance threshold
                                                    ed(string, stang) =2
• Result
  – A set of records having all query prefixes or their similar forms (conjunctive)




                                      CIDR'11 - DBease (9)
Indexing
• Trie Index
• Words: root to leaves
• Inverted lists on leaves




                             CIDR'11 - DBease (10)
    Algorithm
•    Step 1: Find similar prefixes incrementally
•    Step 2: Retrieve the leaf nodes of similar prefixes
•    Step 3: Compute union lists of inverted lists of leaf nodes
•    Step 4: Intersect the union lists of query keywords

                                                             =cid r




                                     CIDR'11 - DBease (11)
Type-Ahead Search in Forms
                  Type-Ahead
                    Search


                    Type-Ahead Search in Forms
      Usability




                               CIDR'11 - DBease (12)
What is Type-Ahead Search in Forms?




                  CIDR'11 - DBease (13)
Type-Ahead Search in Forms
• Problem Formulation
  – Data: A relation with multiple attributes
  – Query: A set of prefixes on attributes in a form interface
  – Answers:
     • Local results of the focused attribute
     • Global results of the relation
• Advantages
  – On-the-fly Faceted Search
  – Supporting Aggregation

                                CIDR'11 - DBease (14)
Data Partition
• Global Table à Local Tables


ID   Title          Conf.    Author
1    xml database   VLDB     albert
2    xml database   SIGMOD   bob
                                                                ID Author
3    xml search     VLDB     albert
                                                                A1 albert
4    xml security   VLDB     alice
                                                   ID Conf.     A2 bob
5    rdbms          SIGMOD   charlie
                                                   C1   VLDB    A3 alice
                                                   C2   SIGMO   A4 charlie
                                                        D
                                CIDR'11 - DBease (15)
Indexing
• Each attribute
  – Trie
  – Mapping Tables
     • Local àGlobal
     • Global à Local




                        CIDR'11 - DBease (16)
Our Solution
 Title:         xml database
                (albert)
 Author         xml database (bob)
 :              xml search (albert)
 xml database   xml security (alice)
 xml search
 xml security




                          CIDR'11 - DBease (17)
Our Solution
                                                           a   Trie
Title:    xml   xml database, albert
                xml search, albert
Author al       xml security, alice                        l              L-G Mapping Table
:
 albert                                                                     T1   1, 2
                                                       b         i          T2   3
 alice
                                                                            T3   4
                                                   e                  c
                                                                            T4   5

                                                       r              e   G-L Mapping Table

                                                                            1    T1
                                                       a       5: alice     2    T1
                                                                            3    T2
                                                                            4    T3
                                                   4: albert                5    T4

                           CIDR'11 - DBease (18)
SQL Suggestion
                  Type-Ahead
                    Search


                   Type-Ahead Search in Forms
      Usability




                                                   SQL Suggestion




                               CIDR'11 - DBease (19)
What is SQL Suggestion?




                  CIDR'11 - DBease (20)
SQL Suggestion
• Problem Formulation
  – Data: A database with multiple tables
  – Query: A set of keywords
  – Answers: Relevant SQL queries
• Advantages
  –   Suggest SQL queries based on keywords
  –   Help users formulate SQL queries to find accurate results
  –   Designed for both SQL programmers and Internet users
  –   Group answers based on SQL structures
  –   Support Aggregation
  –   Support Range queries

                                   CIDR'11 - DBease (21)
Our Solution
• Suggest Templates from Keywords                            keyword paper
   – A template is a structure in the
     databases
                                                             ir (a) Query
   – Modeled as a graph
      • Nodes: entities (table names or
         attribute names)
      • Edges: foreign keys or
                                                                (b) Template
         membership
• Suggest SQL queries from Templates
   – Mapping between keywords and
     templates


                                                                (c) SQL


                                     CIDR'11 - DBease (22)
Template Suggestion
• Template Generation
  – Extension from basic entities
    (tables)

  • Template Ranking
  – Template weight
      • Pagerank
  – Relevancy between a keyword and
    an entity
      • Tf*idf

  • Algorithms
  – Fagin algorithms
  – Threshold-based pruning techniques

                                    CIDR'11 - DBease (23)
SQL Suggestion
• SQL suggestion model
  – Mapping from keywords to templates
  – Matching is a set of mappings with all keywords
  – Weighted set-covering problem (NP-hard)
• SQL ranking
  – Relevancy between keywords and attributes
  – Attribute weight
• Algorithms
  – Greedy algorithms

                           CIDR'11 - DBease (24)
        Search: dbease
http://dbease.cs.tsinghua.edu.cn
Differences to Google Instant Search
• Fuzzy prefix matching
• Google firstly predicts queries, and then use the top
  queries to search the documents. Google may
  involve false negatives, while we can find the
  accurate top-k answers.




                         CIDR'11 - DBease (27)
Differences to Complete Search
• Fuzzy prefix matching
• Different index structures
• More efficient




                         CIDR'11 - DBease (28)
Differences to Keyword Search
• Effectiveness
  – SQL Suggestion supports range queries, and aggregation
    functions.
  – SQL Suggestion can group answers.
  – SQL Suggestion can help users to express their query
    intent more accurately.
• Efficiency
  – Faster



                          CIDR'11 - DBease (29)

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:4/3/2014
language:Unknown
pages:29