Learning Center
Plans & pricing Sign in
Sign Out

Introduction to Databases


									      Variations in
Searching for Information

    CMPT 455/826 - Week 11, Day 2

       Approximate Query Processing
• Abstract1
      – This article describes query processing in the DBO database system.

      – Like other database systems designed for ad hoc analytic processing, DBO is
        able to compute the exact answers to queries over a large relational database in
        a scalable fashion.

      – Unlike any other system designed for analytic processing, DBO can constantly
        maintain a guess as to the final answer to an aggregate query throughout
        execution, along with statistically meaningful bounds for the guess’s accuracy.

      – As DBO gathers more and more information, the guess gets more and more
        accurate, until it is 100% accurate as the query is completed.

      – This allows users to stop the execution as soon as they are happy with the query
        accuracy, and thus encourages exploratory data analysis.

1.   Scalable Approximate Query Processing with the DBO Engine by Chris Jermaine, Subramanian Arumugan, Abhijit
     Pol, and Alin Dobra
   Approximate Query Processing
• Purpose:
  – To get fast intermediate results on queries that could take longer
    than the extra precision is worth

• Technique:
  – Uses random sampling rather than sequential processing to
    keep accumulating more and more exact information

• Comments:
  – The paper is very technical, but the concept is what is important
    to consider
                 Inconsistent Databases
• Abstract2
      – Query answering from inconsistent databases
           • amounts to finding “meaningful” answers to queries posed over database
           • that do not satisfy integrity constraints specified over their schema.

      – A declarative approach to this problem relies on
           •   the notion of repair,
           •   that is, a database that satisfies integrity constraints
           •   and is obtained from the original inconsistent database
           •   by “minimally” adding and/or deleting tuples.

2.   Repair Localization for Query Answering from Inconsistent Databases by Thomas Eiter, Michael
     Fink, Gianluigi Greco, and Domenico Lembo Sapienza
         Inconsistent Databases
• Purpose:
  – A database may become inconsistent in many ways
     • This is particularly challenging in the context of data integration,
         – where a number of data sources, heterogeneous and widely
           distributed, must be presented to the user as if they were a single
           (virtual) centralized database, which is often equipped with a rich set of
           constraints expressing important semantic properties of the application
           at hand.
         – Since, in general, the integrated sources are autonomous, the data
           resulting from the integration are likely to violate these constraints.

  – The standard approach through data cleaning
     • may be insufficient
     • even if only few inconsistencies are present in the data
          Inconsistent Databases
• Technique:
  – The notion of a repair for an inconsistent database
      • a repair is a new database which satisfies the constraints in the
        schema and minimally differs from the original one.
          – The suitability of a possible repair depends on
              » the underlying semantics adopted for the inconsistent database,
              » and on the kinds of integrity constraints allowed on the schema.
      • multiple repairs might be possible
      • the standard way of answering a user query is
          – to compute the answers that are true in every possible repair

• Comments:
            Inconsistent Databases
• Comments:
   – The major problem here is having inconsistent information in a
       • A more important problem is the reason behind the inconsistency in
         information throughout the database.
   – It is difficult to decide what form information should be represented in
     when combining differing database schemes.
       • If this is not done carefully it is likely that the database will end up with
         misleading or inconsistent data.
   – The query is checked against all the possible repairs to the database.
       • The answer is based on some evaluation between the repairs that are
         available, but how likely is it that the query was answered in the desired
   – Instead of doing extra work with rewriting queries as they are asked
       • why not use the information found out by these techniques to determine a
         more permanent fix for the inconsistency of the data
            – If a consistent answer can be determined from an inconsistent database, then it
              seems likely that the information could be made consistent in the database for
              future queries.
               Dynamic Spatial Queries
• Abstract3
      – Conventional spatial queries are usually meaningless in dynamic
           • since their results may be invalidated
           • as soon as the query or data objects move.
      – In this paper we formulate two novel query types,
           • A time-parameterized query
           • A continuous query

3.   Spatial Queries in Dynamic Environments by Yufei Tao and Dimitris Papadias
         Dynamic Spatial Queries
• Purpose:
  – As opposed to traditional, “instantaneous”, queries
      • that are evaluated only once to return a single result,
  – continuous queries
      • may require constant evaluation and updates of the results
      • as the query conditions or database contents change
          Dynamic Spatial Queries
• Technique:
   – A time-parameterized query returns:
       • the objects that satisfy the corresponding spatial query at the time when the
         query is issued
       • the expiry time of the result given the current motion of the query and
         database objects
       • the change that causes the expiration of the result

   – A continuous query retrieves
       • tuples of the form <result, interval>,
       • where each result is accompanied by a future interval, during which it is

       • NOTE: A continuous query can be answered by repetitive execution of TP
         queries until some termination clause is satisfied.
         Dynamic Spatial Queries
• Comments:
  – In addition to getting the correct result from the spatial queries,
    should have addressed how a dynamic database could be
      • E.g. Dynamic environment such as automated car park involves
        both vehicles moving in and out of the parking lot and the database
        being updated on the number of available lots at a given time.

  – There are issues how expiry time is dealt with,
      • what happens when the entity changes direction or velocity, does
        the expiry time remain valid?
           Querying the Semantic Web
• Abstract4
      – The Resource Description Framework (RDF)
           • enables the creation and exchange of metadata as any other Web data.
      – There is a need for sufficiently expressive declarative query languages
           • for querying Web pages that make use of RDF
      – We propose RQL, a new query language
           •   adapting the functionality of semistructured or XML query languages
           •   to the peculiarities of RDF
           •   but also extending this functionality
           •   in order to uniformly query both RDF descriptions and schemas.

4.   Querying the Semantic Web with RQL by G. Karvounarakis, A. Magganaraki, S. Alexaki, V.
     Christophides, D. Plexousakis, M. Scholl, and K. Tolle
     Querying the Semantic Web
• Purpose:
  – RQL adapts the functionality
     •   of semistructured or XML query languages
     •   to the peculiarities of RDF
     •   but also extends this functionality in order
     •   to uniformly query both RDF descriptions and schemas.

  – With RQL users are able to query resources
     • described according to their preferred schema,
     • while discovering how the same resources
     • are also described using another classification schema.
     Querying the Semantic Web
• Technique:
  – We introduce a formal data model and type system
     • for description bases created according to the RDF Model & Syntax
       and Schema specifications

  – In order to support superimposed RDF descriptions,
     • the main modeling challenge is
         – to represent properties as self-existent individuals,
         – as well as to introduce a graph instantiation mechanism permitting
           multiple classification of resources.
     Querying the Semantic Web
• Comments:
  – The typed system used for RQL is extremely useful
      • in that it is actually read from the RDF schema - the type system is
        specific to the schema being used.
  – However all types fit into a finite list of types,
      • which contains literal types, resource types, class types, property
        types and others.
  – The discussion on typing as it relates to RDF
      • would be useful in considering various other approaches to typing
        for other means of modeling (ER or class diagrams).
  – In ER modeling this could be achieved
      • through choosing property names/attributes for a relationship and
        including them in the diagram (and not just “is-a”).
                     Entity Search Engine
• Abstract5
      – The Web has become a rich collection of data-rich pages,
            • on the “surface Web” of static URLs
            • as well as the “deep Web” of database-backed contents

      – The richness of data,
            • while a promising opportunity,
            • has challenged us to effectively find data we need,
            • from one or multiple sources.

      – We are motivated by the need of
            • large scale on-the-fly integration for online structured data.

5.   Entity Search Engine: Towards Agile Best Effort Information Integration over the Web by Tao
     Cheng and Kevin Chen-Chuan Chang
             Entity Search Engine
• Purpose:
  – How do we identify and integrate the structured data
      • embedded in unstructured result pages?
             Entity Search Engine
• Technique:
  – search engines search for pages by keywords.
          – such as Google, Yahoo, or MSN,
      • while being ”IR-style” with a scalable text processing framework,
      • they are not data aware.

  – Integration services exist online for specific domains.
          – such as or
      • They provide “DB-style” precise querying,
      • but they can hardly scale the amount of data and the number of
        sources on the Web.

  – We propose a solution
      • where the two extremes meet,
      • with a synergistic “marriage” in the middle.
            Entity Search Engine
• Comments:
  – There are still problems with sites that embed their data in
    inaccessible formats that cannot be queried

To top