Tour of Data Warehousing
• The ultimate goal of advanced data analysis
is to make decisions that lead as directly as
possible to benefits.
• Data warehouses aim at physically framing
multiple sources of data (e.g., databases
and file collections) in an architecture that
requires the mapping of data from one or
more operational data sources to a target
database management system that supports
the many decision making processes and
business intelligence (BI) systems of an
The Value Chain
Data to Decisions
• Historically, BI systems have evolved through
three main phases.
• Phase 1. Enterprises started to put in place
structured data stores filing data relevant to
their needs. Typically a snapshot of data from
a single operational source (i.e., information
center) or sometimes directly against the
operational source. In the 1980s, decision
support tasks were performed centrally, with
highly skilled individuals analyzing mainframe-
resident data. The results were delivered to
management as hard-copy reports and graphs.
Data to Decisions
• Phase 2. Competitive factors pushed
enterprises toward a better leverage of
the data and they adopted augmented
Decision Support Systems (DSS).
• With data analysis techniques such as
statistics, creating a focused store of
data with subject matter from two or
three operational sources.
• Along with the storage, tools taking
advantage of the structure have been
setup for enterprises to question these
Data to Decisions
• Phase 3. Finally, to cope with the
multiplicity and diversity of the data
stores, enterprises are starting to unify
and rationalize these through data
• In addition to this effort, and due to an
increased competitive pressure,
advanced data analysis techniques are
also implemented to leverage further -
and gain business advantages - the
resulting and ever growing amount of
Three Phases of Data
Data Warehouse Definition
• A data warehouse is a process and architecture
that requires robust planning to implement a
platform, which consists of the selection,
conversion, transformation, consolidation,
integration, cleansing and mapping of data (i.e.,
recent and historical) from multiple operational
data sources to a target DBMS that supports an
enterprise’s decision-making processes and BI
• As an architecture, a data warehouse is more
than a single product and requires significant
planning of five essential components:
operational data sources, data conversion and
extraction, data warehouse DBMS, data
warehouse administration, and BI tools.
Data Mining Definition
• Data mining is the process of
discovering meaningful correlations,
patterns, and trends by sifting through
large amounts of data stored in
• Whether or not this discovery or
exploration is performed by human
analysts, software agents or machine
learning techniques, it is important that
the results provide enterprises with
insights not available through traditional
techniques or predefined relationships
(e.g., relational tables).
• Business Intelligence (BI) is the user-centered
process of exploring data, data relationships
and trends - thereby helping to improve overall
• This involves an iterative process of accessing
data (ideally stored in the data warehouse) and
analyzing it - thereby deriving insights, drawing
conclusions and communicating findings - to
effect change positively within the enterprise.
• BI is an application of a data warehouse, but
does not necessitate a data warehouse. BI is
comprised of four major product segments:
interactive query tools, reporting tools,
advanced DSSs, and EISs.
Decision Support Systems
• DSSs, as a subset of BI, utilizes
organized collections of data,
systems, applications, tools and
techniques by which enterprises
gather and interpret relevant
information about the business and
turn it into highly quantifiable plans,
policies, procedures and metrics
(see Figure 1).
• Data warehousing technology and architectures are
becoming a mainstream activity
• However, many enterprises are balking at the high cost of
implementation and experience difficulty in quantifying
benefits and the return on investment (ROI).
• In addition, the vast majority of efforts to build and manage a
data warehouse still require in-house customization of
products, thus consuming significant resources.
• Many enterprises will need to turn to outside help to obtain
the expertise necessary to perform the architecture
• There are several key trends that indicate that data
warehousing is a market that is growing and will produce
Data Warehousing cont’d
• Increased investment by portable RDBMS vendors is improving support
for data warehousing and complex DSS applications.
• Niche tool vendors are moving to provide comprehensive solutions;
those that do not could be acquired.
• Increasing consulting business, with systems integrators bringing
vertical expertise and data models to the table.
• Specific industries (e.g., retail) will increase granularity and the amount
of historical data, creating 100 terabyte databases easily.
• Inclusion of external data (e.g., demographic) that occurs in a more
flexible and integrated way
• Query monitors that give administrators some relief in managing the
data warehouse and anticipate enterprise’s data needs, but many
problems for administrators in managing the data warehouse
environment (e.g., metadata redundancy and summary table
maintenance) continue with tools slow to emerge and mature.
Supporting Multiple BI Applications
• A data warehouse is more than a single
product and requires significant planning
of five essential components:
– Operational data source
– Data conversion and extraction
– Data warehouse DBMS
– Data warehouse administration
– BI tools
Operational Data Source
• Data administration must take an
active role to help plan extracts and
to work with the business
administrator to define the data
• The data administrator can help
gather the information about the
operational data and assist in
designing the data model that will be
used for the data warehouse DBMS.
Data Conversion and Extraction
• A data warehouse architecture should include
extracts of operational data that are “frozen views of
information” trapped in time capsules, which in
some cases have some level of summarization and
history associated with the view of information.
• The extracts are created either by handcrafted
programs that take time and expense to maintain, or
through tools that help automate the generation of
the extract applications or processes.
• These applications should provide the capabilities to
perform the complex task of integrating data from
multiple sources to create a consolidated view of the
data, as well as the transformation of data for use
by BI applications.
Data Warehouse DBMS
• The RDBMS vendors (e.g., Oracle, IBM, and
Microsoft) have significantly increased the amount of
research and development to improve support for data
warehousing and complex DSS applications.
• This investment is geared toward providing strong
support of complex database schemas with databases
approaching several hundred terabytes.
• Some of the returns (e.g., improved parallel
techniques, bitmap indexing and improved query
algorithms) from this investment are beginning to
emerge out of the labs and into the products.
Data Warehouse Administration
• Data warehousing brings many complex administration
issues that are much different from handling
• With the increasing number of subject areas and the
increasing amount of historical data (i.e., more than
five years) typically maintained, a data warehouse
requires significant amounts of disk storage, memory
and processing power.
• These new administration requirements need
extensive planning to provide data usage auditing,
business data model, directory management,
chargeback, summary tables, security, request control,
query catalog, subscription services, and managing of
operational data extracts.
Business Intelligence (BI)
• BI empowers enterprises with systems that facilitate the
access and analysis of data contained in the data
• A data warehouse in combination with the right BI tools
can be an important part of supporting a business
• The selection of BI tools needs to be done after an
analysis of the business and enterprise needs have been
• It is likely that a single tool will not support the needs of
all users. Therefore, many times several tools will be
selected to perform each of the following functions: ad
hoc queries and report writing, OLAP, advanced DSS
The Data Warehouse Market: Clarifying the
Hype and Confusion
• With most market size estimates reaching as
high as $6 billion, it is clear that data
warehousing has captured the imagination of
• How these estimates have been drawn, or that
they cover a laundry list of sundry items, is less
clear. This is not surprising since the data
warehouse market is a challenging one to
• It is difficult to ascertain definitively whether a
particular product is being used in a structured
data warehouse environment or merely in a
conventional DSS context
Virtual Data Warehouse
• The virtual data warehouse (VDW) is a concept
being touted by some as a more timely and less
risky approach for implementing DSSs than
traditional data warehousing.
• VDW is an architecture for accessing data directly
from data stores, regardless of the data source,
thus eliminating the need to construct a separate
database of redundant data.
• This is not a new idea; Information Builders
incorporated the concept with the Enterprise Data
Access/SQL (EDA/SQL) product in the early 1990s,
and it was previously known as universal data
Virtual Data Warehouse cont’d
• Proponents of VDW tout the speed of deployment
over the traditional processes required for building a
data warehouse, by focusing on user data access
without the heavy analysis and infrastructure
• However, this is misleading because many of the
planning activities (e.g., identifying source data and
transformation of the data) are still required with the
VDW architecture to ensure that the correct data is
available for user access.
• The identification of source data is a complicated
and time-consuming process.
These challenges are plentiful:
– Data integration, consolidation and transformation
– Performance issues of SQL access
– Impact on performance of operational systems
– The cost of repeated extracts to obtain the same data
vs. a single extract to create a database
– Unification of different data management systems
– Schema integration and reconciliation of the different
data models implemented with each database or
– Distributed query processing (i.e., decompose,
distribute and determine where the query should be
processed based on the cost of the query)
• A data mart is a subset of data from the data warehouse,
designed to support the unique business unit
requirements of a specific application.
• If the right planning is done to incorporate the data
mart into the overall data warehouse architecture, data
marts can be beneficial.
• This may save valuable server resources and give the
business analysts more freedom to perform heavy
• A user’s data does not have to reside physically in
their departments for them to be empowered, but
instead could be a set of aggregate tables within the
data warehouse designed to support a specific set of
users or application.
Data Mart Challenges
• The need to create multiple data extraction
and copy management applications could
consume a great deal of IS resources and
prove problematic to manage.
• Improved data quality, consolidation and
sharing are primary goals for creating a
data warehouse, a strategy that could be
jeopardized without centralized control for
the distribution of data to the data marts.
Data Warehouse/Mart Architecture
Data Mart Considerations
• The data mart and data warehouse are not mutually
exclusive. A data mart should be a component of a data
• The data mart should not be used as a replacement for
accessing the data warehouse.
• The cost and benefits of implementing each data mart,
which makes the data architecture a great deal more
complex, needs to be understood and compared with
directly accessing the data warehouse. This is analogous
to the higher cost of items found at a small convenience
store, over those at a large grocery store.