Google Query Language -- a DSL for Advanced Google Searching

Document Sample
Google Query Language -- a DSL for Advanced Google Searching Powered By Docstoc
					   Google Query Language
   -- a DSL for Advanced Google Searching

                Xiaoqing Wu
        Advisor: Dr. Barrett R. Bryant
Department of Computer and Information Science

• PhD research: Compiler Development
  Environment (CDE)
  – Compiler, interpreter, and integrated development
    environment automatic generation
  – Several Domain-Specific Languages have been
    developed on top of CDE
• GQL: an application based on CDE
  – Internet -- Database
  – Google --Database Management System (DBMS)
  – GQL -- Structured Query Language (SQL)
    more than keyword searching

• Language preference
• File format, date, occurrences, domain
• Image, forum, shopping search
        Query customization in Google

• Filling forms

• Writing meta-tokens directly
   – allintext: Xiaoqing Wu filetype:pdf
                    Why GQL (I)?

• Forms are not flexible
  –   Fixed
  –   Can’t be saved and reused
  –   Filling multiple forms is time-consuming
  –   Mouse operation is slower than keyboard operation
                    Why GQL (II)?

• Meta-tokens are not designed for end-users
  –   Not user friendly
  –   No syntax provided
  –   No type-checking
  –   Ambiguous
       keyword1 keyword3 OR keyword4 "keyword2"
           GQL: A well-formed DSL

• User friendly grammar
  – Natural, SQL-like syntax rules, easy to follow
  – No ambiguity
• IDE support
  – Automatic syntax and type checking
• Program based query
  – Query could be saved and reused
  – Search from old query
• Flexible: numerous forms!
No more forms!

   search {key}*
   from file
   where {constraint}*
       GQL Syntax Grammar
[1] query ::= SEARCH|IMAGE o_keylist occurrence constraints withinstmt
[2] o_keylist ::= keylist |
[3] keylist ::= key | keylist COMMA key
[4] key ::= word | noword | orwordlist | exactword
[5] word ::= STRING
[6] noword ::= NOT word
[7] orwordlist ::= orword OR orword | orwordlist OR orword
[8] orword ::= word | exactword
[9] exactword ::= QSTRING
[10] occurrence ::= FROM OCCVALUE |
[11] constraints ::= WHERE constraintlist |
[12] constraintlist ::= constraint | constraintlist constraint
[13] constraint ::= domain | filetype
[14] domain ::= indomain | outdomain
[15] indomain ::= DOMAIN EQ url
[16] outdomain ::= DOMAIN NE url
[17] url ::= QSTRING
[18] filetype ::= acceptfiletype | rejectfiletype
[19] acceptfiletype ::= TYPE EQ TYPEVALUE
[20] rejectfiletype ::= TYPE NE TYPEVALUE
[21] withinstmt ::= WITHIN QSTRING |
               GQL IDE structure

  GQL IDE              recognizable

                         Google-      Google
 Query       GQL                               Query
                       recognizable   Search
Program     Compiler                           Result
                          tokens      Engine

 Compiler implementation in CDE

      JLex            GQL             CUP
   Specification   Specification   Specification

      JLex                            CUP
     Lexer in                       Parser in
      Java                            Java
                   AST Nodes
Typechecking                         Code generation
 in AspectJ                            in AspectJ

                 Current status

• Basic GQL compiler
• IDE supporting multiple document management
  – Program storage
  – Editing
  – Compiling, type-checking and execution
• Functionality including all features of Google
  web & image search
• Search within old queries
                  Future work

• Extending the grammar to implement all the
  functionality provided by Google
• Adding more strict type-checking for source
  programs written in GQL
• Search result integration.

• To provide more flexibility in online search, a
  SQL-like query language is developed in the
  Google query domain.
• Language programs are used to substitute the
  provided query forms from Google, analogical to
  SQL and query forms in DBMS, e.g. MS-Access.
• The idea could be generalized to other domains,
  especially in online searching, e.g. airfare