This document provides a technical architecture for the development of the
Arizona Education Data Warehouse (AEDW).
The term Business Intelligence (BI) incorporates the concept of deriving useful
information from the data in an organization. Designing a BI application involves
multiple layers. The goal of this section is to provide a common framework for
architects and developers. This framework is conceptual, technology agnostic,
and covers the major phases, features, and functionality required to effectively
implement a BI solution. The conceptual architecture in Figure 1 is comprised of
five major areas and a set of cross-cutting concerns.
Meta data, Security, Performance & Operations
Figure 1: Conceptual system architecture
Data Storage is the end result of Data Source and Data Integration layers. Data
Storage can be termed as Data Warehouse (DW)/Data Mart (DM). Data
Analysis and Data Presentation will leverage the information stored in Data
PULL by issuing SQL
Native Data (binary & (Code +Data)
We will be implementing the Pull Model for data extraction from the OLTP
systems primarily to reduce the impact on the source data since it is in a
production environment. Not that in the pull method, the staging server initiates
the extraction, prepares extraction queries, tracks the failures and maintains
necessary metadata. Furthermore, this method will allow us to minimize changes
to the source applications.
We will follow these guidelines for the extraction of data:
Use the Job Scheduler to launch the extraction process during off-peak
hours or at specific periodic intervals as is acceptable by the user
community and our operations group.
Store metadata required for the extraction on the staging server. Use the
staging server to maintain the status of all extractions.
Configure the extraction process to read the metadata for configuration
parameters and to configure source connections. Make the extraction
process begin by checking the schema to see if there have been any
modifications to the source since last extraction process.
Configure the extraction process to prepare the extraction queries based
on the metadata. This includes information such as list of the tables to
extract and how to identify the appropriate delta period.
Configure the extraction process to keep track of the status for each
source table. Create a status code—such as initialized, in progress,
completed, and failed—for each stage of the extraction. During each
extraction update the status code in the metadata table. Use the status
code to manage recovery, whether immediately, at a specified recovery
interval, or during the next normal load process. A simple restart may be
sufficient if the problem was a failed connection, but actual changes to the
source may require manual intervention to change the extraction process
or the staging database.
After the extraction, check to make sure all the data transferred
successfully, perhaps by comparing total row counts. In addition, you can
check the data for errors such as primary key violations, null violations,
and date time violations (or you can defer quality checks to the
transformation process.) Also, after the extraction, update the metadata
information needed to manage the next extraction.
A pull method extraction may be significantly more complex than
suggested in the above guidelines. For example, if you extract data from
multiple sources asynchronously, the extract from one server may
succeed while the one other fails, which would require special attention to
the recovery process. Likewise if extractions from different servers happen
at different times (for example, one at 10:00 PM, and one at 6:00 AM) then
the extraction process needs to track each extraction separately.