Automatic Database Integration by akm33296


									Automatic Data
 Ramon Lawrence
   University of Manitoba
base Integration
    Ken Barker
   University of Calgary
 Database  integration allows systems within a
  company or on the WWW to interoperate.
 Current integration solutions require manual
  coding which is costly and time-consuming.
 Our integration architecture uses XML and a
  standard dictionary to automate integration.
 Database systems are integrated using 3
  processes: capture, integration, and query.
 The architecture is implemented in a software
  package called Unity.
 Unity is capable of automatic database
  integration and querying using ODBC.
 Applications include data warehouse design,
  web site integration, and systems integration.
Integration Architecture
   Client                          Client
                                                  Architecture Components:
                                                   1) Integrated Context View
          Multidatabase Layer                        • user’s view of integration
            Integrated Context View                2) X-Spec Editor
                                                     • stores schema & metadata
                                                     • uses XML
  X-Spec        Standard            Integration    3) Standard Dictionary
   Editor       Dictionary           Algorithm       • terms to express semantics
                                                   4) Integration Algorithm
                                                     • combines X-Specs into
    Query Processor and ODBC Manager                 integrated context view
                                                   5) Query Processor
                                                     • accepts query on view
 X-Spec                                 X-Spec       • determines data source
                                                     mappings and joins
          Database           Database                • executes queries and
                                                     formats results

             Local Transactions
Integration Processes
 Theintegration architecture consists of three
 separate processes:
          process: independently extracts database
   Capture
    schema information and metadata into a XML
    document called a X-Spec.

   Integration process:combines X-Specs into a
    structurally-neutral hierarchy of database concepts
    called an integrated context view.

   Query  process: allows the user to formulate queries on
    the integrated view that are mapped by the query
    processor to structural queries (SQL) and the results
    are integrated and formatted.
Architecture Components
  The   architecture consists of four components:
   A    standard dictionary (SD) to capture data semantics
          SD terms are used to build semantic names describing
           semantics of schema elements.
    X-Specs for        storing data semantics.
          relational database info. stored and transmitted using XML
          stores semantic names to describe schema elements
    Integration       Algorithm
          Identical concepts in different databases are identified by
           similar semantic names.
          Produces an integrated view of all database concepts.
    Query       Processor
          allows the user to formulate queries on the view
          translates from semantic names in integrated view to SQL
           queries and integrates and formats results
                involves determining correct field and table mappings
                and discovery of join conditions and join paths
Architecture Discussion
  The  architecture automatically integrates
   relational schemas into a multidatabase
  Desirable properties:
     individual   mappings - information sources integrated
      one-at-a-time and independently
     integrated view constructed for query transparency -
      user queries system by semantics instead of structure
     handles schema conflicts - including semantic,
      structural, and naming conflicts
     automated integration - integrated view constructed
      efficiently and automatically
     no wrapper or mediator software is required
     transparent querying - users issue semantic queries
      which are translated to SQL by the query processor
     no changes are performed on underlying databases
 Automatic  database integration is possible by
  using a standard term dictionary and defining
  semantic names for schema elements.
 Integration of data sources has applications to
  the WWW and construction of data warehouses.
 Users are able to transparently query integrated
  systems by concept instead of structure.
 Future Work:
   Test the system in large industrial projects.
   Improve the Unity implementation and develop an
    integration component for a web browser.
   Increase the efficiency of the query processor.
   Allow distributed updates and global updates on all
Background &
Why is Integration Required?
  There   are many integration environments:
    operational systems within an organization
    system integration during company merger
    data warehouses, Intranets, and the WWW

  Users require information from many data
   sources which often do not work together.
  Companies require a global view of their entire
   operations which may be present in numerous
   operational databases for different departments
   and distributed geographically.
  Business-to-Business (B-to-B) and Business-to-
   Consumer (B-to-C) E-commerce demand
   integration of web databases with production
   systems for efficiency and productivity.
What is Integration?
  Database   integration requires conflict resolution
   during schema integration.
  A schema is a description of the data
   organization and format in a database.
  Schema integration combines database schemas
   into a unified view of the data.
     Constructing an integrated view is difficult because
     databases will store different types of data, in varying
     formats, with different meanings, and reference it
     using different names. Subsequently, the construction
     of the integrated view must handle the different
     mechanisms for storing data (structural conflicts), for
     referencing data (naming conflicts), and for attributing
     meaning to the data (semantic conflicts). Data is
     integrated at both the schema level (the description of
     the data) and the data level (individual data instances).
What is the Current Solution?
  SQL   and multidatabase query languages:
     querying   is difficult for numerous databases
  Research     systems:
     do   not use standardization so cannot be automated
  Industrial   standards:
     XML, BizTalk, E-commerce portals
     apply to limited domains/industries
     require standard structures and database changes

         techniques for database integration have
  Current
  some of these problems:
     require integrator to understand all databases
     integration process is manual
     do not hide system complexity from the user
     force changes on the existing database systems
Unity Overview
 Unity is a software package that implements the
  integration architecture with a GUI.
 Developed using Microsoft Visual C++ 6 and
  Microsoft Foundation Classes (MFC).
 Unity allows the user to:
   construct and modify standard dictionaries
   build X-Specs to describe data sources
   integrate X-Specs into an integrated view
   transparently query integrated systems using ODBC
    and automatically generate SQL transactions
A  screen-shot of Unity editing a standard
  dictionary is below.
 Unity is available for demonstration and
 Publications:
   Unity - A Database Integration Tool, R. Lawrence and
    K. Barker, TRLabs Emerging Technology Bulletin,
    January 2000.
   Multidatabase Querying by Context, R. Lawrence and
    K. Barker, DataSem2000, pages 127-136, Oct. 2000.
   Integrating Relational Database Schemas using a
    Standardized Dictionary, To appear in SAC’2001 - ACM
    Symposium on Applied Computing, March, 2001.

 Sponsors:
   NSERC,    TRLabs
 Further   Information:

To top