Data Warehouse - Architecture - Best Practices - text version page.
Begin page content:
Best Practice 1: Do not wait for an enterprise information model to start
data warehouse efforts.
Enterprise models often take years to complete. Meanwhile, business requirements are
changing rapidly. Data Warehouse efforts begin with a strategic set of information,
preferable targeting a cross section of users.
Develop the data warehouse incrementally.
Best Practice 2: Identify specific requirements for data availability,
freshness (i.e., live, 24 hours old, etc.), and recoverability.
Some data warehouses need to be updated more frequently than others. When the
original data is frequently changing or is more volatile, it may be necessary to update the
data warehouse on a near real time basis.
Best Practice 3: Perform benchmarks on the database design before
constructing the database.
Expect to make changes and adjustments throughout development.
Changes during the early cycles up to, and including implementation, are a primary
mechanism of performance tuning.
Best Practice 4: Use a Federated metadata repository system to provide
managed, current, standard, consistent, and critical business definitions
for the information in the data warehouse.
A managed repository system ensures that data is defined accurately so that it is used in
the manner intended both by end users and application systems from which it was
A repository is able to store versions so that changes to data meanings may be retained
for reference to older information.
Best Practice 5: Ensure that the appropriate security is applied to the data
Prevent unauthorized users from tampering with the data.
Control access to portions of the database on a user-by-user basis. Certain types of data
that is stored in a data warehouse may be sensitive. Providing protection for privacy of
individual and confidentiality of data must be considered.
Best Practice 6: Allow only read only access to end users of data
Updates should only occur to the operational (O L T P) source where the data originates.
Best Practice 7: Direct all information queries against decision support
databases, not O L T P databases. Conversely, operational transactions
should be directed to operation databases only, not O L A P databases.
Data warehouses, and data marts contain data that has been checked for consistency
and integrity, and represents a cross-functional view of data.
Data in transaction (O L T P) systems typically support a specific business group or
O L T P transactions should not depend on a data warehouse database. They require a
stable operational environment that is not affected by ad hoc usage or external data.
Best Practice 8: Establish the data warehouse as the authoritative source
for all other decision support databases.
Data administration policies, procedures and tools are required.
Project design reviews are required. A mandatory deliverable must be identification of all
data sources for the project.
Distributed data warehouse servers and data marts should use an authoritative source
for data feeds.
Best Practice 9: Store atomic-level data in the data warehouse in addition
to summary data.
Atomic data is transaction-level data. It contains much more detail than summary data.
Atomic-level data addresses the business need to recast history. Due to the fast pace of
business change, many organizations are going through multiple reorganizations. After
reorganization, many decision makers want to recast history (e.g., to get a feel for what
test scores would have been like if the number of school districts was already reduced to
respond to legislation or funding).
If only summary level historical data is kept in the data warehouse, it is not possible to
Best Practice 10: Perform periodic validity audits against the data
warehouse information model to ensure a high level of confidence in the
quality and integrity of the data.
Accelerated decision-making requires high quality data. If operational data has changed
or additional data is needed, changes must be made in the information model and in the
data warehouse itself.
The data stored in a data warehouse should conform to the information model.
The source data populating a data warehouse should be verified for consistency and
The data warehouse should still correspond to business needs.
Ensuring the integrity and quality of data is the responsibility of both the business users
and I S.
Best Practice 11: The implementation plan should match critical business
Start with strategic business needs.
Optimize critical data access before less-critical data access.
Optimize high-volume data access before low-volume data access.
Optimize applications and data with high-service-level requirements before those with
Best Practice 12: Plan and budget for the ongoing operations,
administration, and maintenance of a data warehouse.
A support plan for the data warehouse should be documented and implemented.
The data warehouse should be scalable to meet future demands.
Best Practice 13: Use an iterative data warehouse methodology for
development and implementation.
A data warehouse is built a subject area at a time.
A data warehouse is never complete. Change and additions are continual.
Best Practice 14: Load only clean data into the data warehouse.
Data warehouse data must be valid information. It must also represent the intent of the
data from the target system.
Cleansing data for the data warehouse is a major function of an iterative data warehouse