Information Presentation in Data Warehouse - DOC

Document Sample
Information Presentation in Data Warehouse - DOC Powered By Docstoc
					  Arizona Education
   Data Warehouse
Technical Architecture
Overview
This document provides a technical architecture for the development of the
Arizona Education Data Warehouse (AEDW).


Conceptual Architecture
The term Business Intelligence (BI) incorporates the concept of deriving useful
information from the data in an organization. Designing a BI application involves
multiple layers. The goal of this section is to provide a common framework for
architects and developers. This framework is conceptual, technology agnostic,
and covers the major phases, features, and functionality required to effectively
implement a BI solution. The conceptual architecture in Figure 1 is comprised of
five major areas and a set of cross-cutting concerns.
                             Data Integration




                                                                                  Data Presentation
                                                                  Data Analysis
                                                   Data Storage
               Data Source




              Meta data, Security, Performance & Operations


Figure 1: Conceptual system architecture
Data Storage is the end result of Data Source and Data Integration layers. Data
Storage can be termed as Data Warehouse (DW)/Data Mart (DM). Data
Analysis and Data Presentation will leverage the information stored in Data
Storage.


Data Extraction
                                                PULL by issuing SQL
                                                statement
               OLTP                                                                                     Staging
               (Data)
                                                Native Data (binary &                                 (Code +Data)

                                                compact)
We will be implementing the Pull Model for data extraction from the OLTP
systems primarily to reduce the impact on the source data since it is in a
production environment. Not that in the pull method, the staging server initiates
the extraction, prepares extraction queries, tracks the failures and maintains
necessary metadata. Furthermore, this method will allow us to minimize changes
to the source applications.

We will follow these guidelines for the extraction of data:
   Use the Job Scheduler to launch the extraction process during off-peak
      hours or at specific periodic intervals as is acceptable by the user
      community and our operations group.
   Store metadata required for the extraction on the staging server. Use the
      staging server to maintain the status of all extractions.
      Configure the extraction process to read the metadata for configuration
       parameters and to configure source connections. Make the extraction
       process begin by checking the schema to see if there have been any
       modifications to the source since last extraction process.
      Configure the extraction process to prepare the extraction queries based
       on the metadata. This includes information such as list of the tables to
       extract and how to identify the appropriate delta period.
      Configure the extraction process to keep track of the status for each
       source table. Create a status code—such as initialized, in progress,
       completed, and failed—for each stage of the extraction. During each
       extraction update the status code in the metadata table. Use the status
       code to manage recovery, whether immediately, at a specified recovery
       interval, or during the next normal load process. A simple restart may be
       sufficient if the problem was a failed connection, but actual changes to the
       source may require manual intervention to change the extraction process
       or the staging database.
      After the extraction, check to make sure all the data transferred
       successfully, perhaps by comparing total row counts. In addition, you can
       check the data for errors such as primary key violations, null violations,
       and date time violations (or you can defer quality checks to the
       transformation process.) Also, after the extraction, update the metadata
       information needed to manage the next extraction.

       A pull method extraction may be significantly more complex than
       suggested in the above guidelines. For example, if you extract data from
       multiple sources asynchronously, the extract from one server may
       succeed while the one other fails, which would require special attention to
       the recovery process. Likewise if extractions from different servers happen
    at different times (for example, one at 10:00 PM, and one at 6:00 AM) then
    the extraction process needs to track each extraction separately.


				
DOCUMENT INFO
Description: Information Presentation in Data Warehouse document sample