Data Warehousing by harish1991

VIEWS: 63 PAGES: 22

									SEMINAR ON

 DATA
WAREHOUSING




        PRESENTED BY

          T.SURESH,
        REGD.NO: Y1MC09053
                          CONTENTS

1. What is Data Warehousing?
2. Why to build a Data Warehouse?
3. Microsoft Data warehousing framework.
4. Data warehousing framework components.
5. Data warehousing architecture.
6. Data Marts.
7. Building a Data Warehouse from Data Marts.
8. OLAP.
9. Integrated OLAP Analytical capabilities.
10. Data Warehousing and OLAP.
11. Data Warehousing components.
12. Data Warehousing characteristics.
13. Designing and building a Data Warehouse.
14. Data Granularity.
15. Building the OLAP Data model.
16. Process flow within a data warehouse.
17. Data Warehousing and Online Analytical
Processing.
What is Data Warehousing?

                        Online analytical processing (OLAP)
and Data mining represents some of the latest trends in
computing environment and I.T applications to large scale
processing and analysis of data.
      Data Warehouse along with OLAP tools are being
increasingly developed to analyze historical data to identify
past patterns or trends which may be useful in forecasting
future.



A data warehouse is an integrated store of information
collected from other systems that becomes the foundation
for decision support and data analysis. Although there are
many types of data warehouses, based on different design
methodologies and philosophical approaches, they all have
these common traits:
     Information is organized around the major subjects
        of the enterprise (for example, customers, products,
        sales, or vendors), reflecting a data-driven design.
     Raw data is gathered from nonintegrated operational
        and legacy applications, cleansed, and then
        summarized and presented in a way that makes
        sense to end users.
     Based on feedback from end users and discoveries in
        the data warehouse, the data warehouse architecture
        will change over time, reflecting the iterative nature
        of the process.
The data warehousing process is inherently complex and, as a
result, is costly and time-consuming.

             A copy of the data in a database, created
specifically to allow users to query the database. Creating a
data warehouse allows you to structure the data for easiest
reporting and for best security–for example, pre-joining tables
to assist with queries by novice users, or removing fields that
contain sensitive data. It also lets you move the data to another
server to minimize the performance impact that queries can
cause. Because data in a data warehouse is a snapshot of a
database, it must be refreshed periodically; the exact interval
depends on your application needs.

Why to build a data warehouse:
           Business strategy requires answers to questions in
business and future strategy. This means that the decisions are
to be taken quickly and correctly using all the available data.
As the data size increases continuosly and data is required to
be processsed faster and faster the need for data warehousing
technology arises in terms of ability to organise ,maintain large
data and also be able to analyse in few seconds in the manner
and depth required.There are conventional information systems
which        didn’t     succeed       in      meeting       these
requirements.Conventional and data warehousing tackle two
different activity domains-OLTP & OLAP.

             Besides this conventional information system is
not capable of analyzing a large number of past transactions
or large number of data records.
                 The cost of processing the data also increases
as the volume of data increases .As a result the analyst finds
data mart extremely useful for fast and easy analysis. Since the
data flows into data mart (from the Data Warehouse) ,the
department which owns it can easily customize the data.

     Data marts are of 2 types:
1.Multi dimensional OLAP (MOLAP)
2.Relational OLAP (ROLAP)



Data Warehousing Framework
The goal of the Data Warehousing Framework is to simplify
the design, implementation, and management of data
warehousing solutions. This framework has been designed
to provide:
     Open architecture that is easily integrated with and
       extended by third-party vendors.
     Heterogeneous data import, export, validation, and
       cleansing services with optional data lineage.
     Integrated metadata for data warehouse design, data
       extraction/transformation, server management, and
       end-user analysis tools.
     Core management services for scheduling, storage
       management,           performance          monitoring,
       alerts/events, and notification.
The Data Warehousing Framework has been designed from
the ground up to provide an open architecture that can be
extended easily by Microsoft customers and third-party
businesses using industry-standard technology. This allows
organizations to choose best-of-breed components and still
be assured of integration.
Ease of use is a compelling reason for customers and
independent software vendors (ISVs) to choose the Data
Warehousing Framework. Microsoft provides an object-
oriented set of components designed to manage information
in the distributed environment. Microsoft also provides both
entry-level and best-of-breed products to address the many
steps in the data warehousing process.
Determining Business,            User,     and     Technical
Requirements
Before a data warehouse can be built, a detailed project and
implementation plan should be written. The project and
implementation plan includes:
     Building a business case.
     Gathering user requirements.
     Determining the technical requirements.
     Defining standard reports required by users.
     Analyzing client application tools being used.
Building the business case is common at the beginning of any
project. It involves determining the business needs solved by
the project, the costs of the project, and the return on the
investment.
Gathering user requirements largely involves interviewing the
intended users of the data warehouse. The user requirements
determine:
     Data requirements (level of granularity).
     Operational systems within the enterprise containing the
        data.
     Business rules followed by the data.
     Queries required to provide the users with data.
The technical requirements may involve determining:
     Hardware architecture and infrastructure (for example,
        links to remote geographical regions where data marts
        might be located).
     Backup and recovery mechanisms.
     Security guidelines.
     Methods of loading and transforming data from
        operational systems to the data warehouse.
Standard reports required by users should be analyzed to
determine the tables, columns, and selection criteria necessary
to create the reports, and the frequency in which they are
generated. Provisions should also be made for expanding or
modifying the scope of reports as required.
Client application tools should be analyzed to determine if they
can provide enhanced processing capabilities that help in
processing data, performing queries, or generating reports.



Data Warehousing Framework Components
Building the data warehouse requires a set of components
for describing the logical and physical design of the data
sources and their destinations in the enterprise data
warehouse or data mart.
To conform to definitions laid out during the design stage,
operational data must pass through a cleansing and
transformation stage before being placed in the enterprise
data warehouse or data mart. This data staging process can
be many levels deep, especially with enterprise data
warehousing architectures, but is necessarily simplified in
this illustration.




End-user tools, including desktop productivity products,
specialized analysis products, and custom programs, are
used to gain access to information in the data warehouse.
Ideally, user access is through a directory facility that
enables end-user searches for appropriate and relevant data
to resolve questions and that provides a layer of security
between the end users and the data warehouse systems.
Finally, a variety of components can come into play for the
management of the data warehousing environment, such as
for scheduling repeated tasks and managing multiserver
networks.
The     Data    Warehousing    Framework     describes  the
relationships between the various components used in the
process of building, using, and managing a data warehouse.
Two enabling technologies comprise the core Data
Warehousing      Framework:     the   integrated   metadata
repository and the data transport layer (OLE DB). These
technologies make possible the interoperability of many
products and components involved in data warehousing.
OLE DB provides for standardized, high-performance access
to a wide variety of data, and allows for integration of
multiple data types.
Microsoft Repository provides an integrated metadata
repository that is shared by the various components used in
the data warehousing process. Shared metadata allows for
the transparent integration of multiple products from a
variety of vendors, without the need for specialized
interfaces between each of the products:
     Mainframe indexed sequential access method/virtual
        storage    access    method     (ISAM/VSAM)     and
        hierarchical databases
     E-mail and file system stores
     Text, graphical, and geographical data
     Custom business objects
OLE DB defines a collection of COM interfaces that
encapsulates various database management system
services. These interfaces enable the creation of software
components that implement such services. OLE DB
components consist of data providers (which contain and
expose data), data consumers (which use data), and service
components (which process and transport data).
OLE DB interfaces are designed to integrate components
smoothly so that vendors can bring high-quality OLE DB
components to the market quickly. In addition, OLE DB
includes a bridge to ODBC that enables continued support
for the broad range of ODBC relational database drivers
available today.



Data Warehousing Architecture

Many methodologies have been proposed to simplify the
information technology efforts required to support the data
warehousing process on an ongoing basis. This has led to
debates about the best architecture for delivering data
warehouses in organizations.
Two basic types of data warehouse architecture exist:
enterprise data warehouses and data marts.
The enterprise data warehouse contains enterprise-wide
information integrated from multiple operational data
sources for consolidated data analysis. Typically, it is
composed of several subject areas, such as customers,
products, and sales, and is used for both tactical and
strategic decision making. The enterprise data warehouse
contains both detailed point-in-time data and summarized
information, and can range in size from 50 gigabytes (GB)
to more than 1 terabyte. Enterprise data warehouses can be
very expensive and time-consuming to build and manage.
They are usually created from the top down by centralized
information services organizations.
The data mart contains a subset of enterprise-wide data that
is built for use by an individual department or division in an
organization. Unlike the enterprise data warehouse, the data
mart is usually built from the bottom up by departmental
resources for a specific decision-support application or group
of users. Data marts contain summarized and often detailed
data about a subject area. The information in the data mart
can be a subset of an enterprise data warehouse (dependent
data mart) or can come directly from the operational data
sources (independent data mart).
Enterprise data warehouses and data marts are constructed
and maintained through the same iterative process
described earlier. Furthermore, both approaches share a
similar set of technological components.

                    The understandable preference for, and
proliferation of, data marts instead of enterprise data
warehouses is causing a future systems integration issue for
some organizations. Creation of a data warehousing
architecture for the enterprise saves each data mart project team
from having to recreate the elements of the architecture. If an
organization's structure is one that may require future data mart
integration, it is less work to plan and architect for this before
the data marts are created than after the fact.
It behooves organizations that will operate multiple data marts
to establish an information architecture or model. The goal of
the data warehousing architecture is to identify the following
common items:
     source systems for data to populate the data marts

      business dimensions

       semantics (data definitions, measurements, and
        aggregation formulas)
If an organization requires, or will require, that multiple data
marts each download similar data from production source
systems, then the most maintainable architecture may be to
move data first from the source systems to a well designed
enterprise data warehouse or operational data store. Then, each
data mart can be populated from the enterprise data warehouse,
inheriting in the process all of the architectural elements that
were implemented in the enterprise data warehouse. This
approach also reduces the number of interface links between
data marts and source systems that must be maintained.

  Data Marts
A data mart is typically defined as a subset of the contents of a
data warehouse, stored within its own database. A data mart
tends to contain data focused at the department level, or on a
specific business area. The data can exist at both the detail and
summary levels. The data mart can be populated with data
taken directly from operational sources, similar to a data
warehouse, or data taken from the data warehouse itself.
Because the volume of data in a data mart is less than that in a
data warehouse, query processing is often faster.
:
   Characteristics of a data mart include

       Quicker and simpler implementation.
       Lower implementation cost.
       Needs of a specific business unit or function met.
       Protection of sensitive information stored elsewhere in
        the data warehouse.
     Faster response times due to lower volumes of data.
     Distribution of data marts to user organizations.
     Built from the bottom upward.
Departmental or regional divisions often determine whether
data marts or data warehouses are used. For example, if
managers in different sales regions require data from only their
region, then it can be beneficial to build data marts containing
specific regional data. If regional managers require access to all
the organization’s data, then a larger data warehouse is usually
necessary.
Although data marts are often designed to contain data relating
to a specific business function, there can be times when users
need a broader level of business data. However, because this
broader-level data is often only needed in summarized form, it
is acceptable to store it within each data mart rather than
implementing a full data warehouse.
Building a Data Warehouse from Data
Marts
Data warehouses can be built using a top-down or bottom-up
approach. Top-down describes the process of building a data
warehouse for the entire organization, containing data from
multiple, heterogeneous, operational sources. The bottom-up
approach describes the process of building data marts for
departments, or specific business areas, and then joining them
to provide the data for the entire organization. Building a data
warehouse from the bottom-up, by implementing data marts, is
often simpler because it is less ambitious.
A common approach to using data marts and data warehouses
involves storing all detail data within the data warehouse, and
summarized versions within data marts. Each data mart
contains summarized data per functional split within the
business, such as sales region or product group, further
reducing the data volume per data mart.

Data Mart Considerations
Data marts can be useful additions or alternatives to the data
warehouse, but issues to consider before implementation
include:
     Additional hardware and software.
     Time required to populate each data mart regularly.
     Consistency with other data marts and the data
       warehouse.
     Network access (if each data mart is located in a
       different geographical region).


Designing and Processing Aggregations
OLAP tools are typically used to create and manage summary
data. OLAP Services allows aggregations to be stored in a
variety of formats and locations, with dynamic connections to
underlying details in the data warehouse. Summary data is
often generated to satisfy the commonly executed queries in the
data warehouse. Storing preaggregated data increases query
performance, and reduces the load on the data warehouse.
If a data warehouse is built so the data in it does not change,
then preaggregating data in the fact table saves only the disk
space required by the fact table. OLAP Services uses the
processing time that would have been used to preaggregate in
the fact table when it processes the fact table as it builds a cube.
However, precalculated aggregations are stored in the cube and
do not need to be recalculated for each query. If a hybrid OLAP
(HOLAP) or relational OLAP (ROLAP) cube is used, the fact
table is not copied into the cube as it is in multidimensional
OLAP (MOLAP) cubes, so the overhead required to retain
availability of the detail data is only the fact table size, not
processing time or query response time.
Preaggregation strategy when designing a data warehouse for
use by OLAP Services depends on the following variables:
     Stability of the data.
       If the source data changes, the preaggregations have to
       be performed each time, whether preaggregated in the
       fact table or in the OLAP cubes that have to be rebuilt
       from the fact table.
     Query response time.
       With properly designed OLAP cubes, the granularity of
       detail in the fact table has no effect on query response
       time for queries that do not access detail facts.
     Storage requirements.
       A finer level of granularity in the fact table requires
       more storage for the fact table and for MOLAP cubes.
       This is a trade-off against detail availability and choice
       of OLAP cube storage mode. OLAP cubes tend to be
       large regardless of the storage type; therefore the
       storage required to retain fine granularity in the fact
       table may not be particularly significant when compared
       to OLAP storage needs.
When designing the data warehouse for OLAP, the user's needs
should drive the preaggregation strategy. The fact table should
only be preaggregated to the level of granularity below which
no user would want to access detail.

OLAP: -
           Online Analytical Processing (OLAP) systems,
contrary to regular conventional OLTP systems are capable of
analyzing online a large number of transactions or large
number of data records(ranging from mega bytes to tera
bytes).This type of data is usually multi dimensionality is the
key driven for OLAP technology,which happens to be central
to data warehousing.
              Any multidimensional data cant be processed by
conventional SQL type DBMS. For complex real world
problems the data is usually multi dimensional in nature.
 SQL will not be capable of handling it effectively even one
can manage to put that data in a conventional relational
database in normalised table

Integrated OLAP Analytical Capabilities
OLAP is an increasingly popular technology that can
dramatically improve business analysis. Historically, OLAP
has been characterized by expensive tools, difficult
implementation, and inflexible deployment. OLAP Services is
a new, fully featured OLAP capability provided as a
component of SQL Server 7.0. OLAP Services includes a
middle-tier server that allows users to perform sophisticated
analysis on large volumes of data with exceptional results.
OLAP Services also includes a client-side cache and
calculation engine called Microsoft PivotTable Service, which
helps improve performance and reduce network traffic.
PivotTable Service allows end users to conduct analyses
while disconnected from the network.
OLAP Services is a middle-tier OLAP server that simplifies
user navigation and helps improve performance for queries
against information in the data warehouse.

OLAP is a key component of data warehousing, and OLAP
Services provides essential functionality for a wide array of
applications ranging from reporting to advanced decision
support. OLAP functionality within SQL Server 7.0 helps
make multidimensional analysis much more affordable and
bring the benefits of OLAP to a wider audience, from smaller
organizations to groups and individuals within larger
corporations. Coupled with the wide variety of tools and
applications supporting OLAP applications through Microsoft
OLE DB for OLAP, OLAP Services helps increase the number
of organizations that have access to sophisticated analytical
tools and can help reduce the costs of data warehousing.
For more information about Microsoft SQL Server OLAP
Services, see SQL Server Books Online.




Data Warehousing and OLAP
DTS can function independent of SQL Server and can be
used as a stand-alone tool to transfer data from Oracle to
any other ODBC or OLE DB-compliant database.
Accordingly, DTS can extract data from operational
databases for inclusion in a data warehouse or data mart for
query and analysis.

In the illustration, the transaction data resides on an IBM
DB2 transaction server. A package is created using DTS to
transfer and clean the data from the DB2 transaction server
and to move it into the data warehouse or data mart. In this
example, the relational database server is SQL Server 7.0,
and the data warehouse is using OLAP Services to provide
analytical capabilities. Client programs (such as Excel)
access the OLAP Services server using the OLE DB for OLAP
interface, which is exposed through a client-side component
called Microsoft PivotTable service. Client programs using
PivotTable service can manipulate data in the OLAP server
and can even change individual cells.
SQL Server OLAP Services is a flexible, scalable OLAP
solution, providing high-performance access to information
in the data warehouse. OLAP Services supports all
implementations of OLAP equally well: multidimensional
OLAP (MOLAP), relational OLAP (ROLAP), and a hybrid
(HOLAP). OLAP Services addresses the most significant
challenges in scalability through partial preaggregation,
smart client/server caching, virtual cubes, and partitioning.
DTS and OLAP Services offer an attractive and cost-effective
solution. Data warehousing and OLAP solutions using DTS
and OLAP Services are developed with point-and-click
graphical tools that are tightly integrated and easy to use.
Furthermore, because the PivotTable service client is using
OLE DB, the interface is more open to access by a variety of
client applications.


  Data                                   Warehousing
Components
A data warehouse always consists of a number of
components, including:
    Operational data sources.
    Design/development tools.
    Data extraction and transformation tools.
    Database management system (DBMS).
    Data access and analysis tools.
    System management tools.
Several years ago, Microsoft recognized the need for a set of
technologies that would integrate these components. This
led to the creation of the Microsoft Data Warehousing
Framework, a roadmap not only for the development of
Microsoft products such as SQL Server 7.0, but also for the
technologies necessary to integrate products from other
vendors.


Data Warehouse Characteristics
A data warehouse can assist decision support and online
analytical processing (OLAP) applications because it provides
data that is:
     Consolidated and consistent.
     Subject-oriented.
     Historical.
Consolidated and Consistent Data
A data warehouse consolidates operational data from a variety
of sources with consistent naming conventions, measurements,
physical attributes, and semantics.
For example, in many organizations, applications can often use
similar data in different formats: dates can be stored in Julian or
Gregorian format; true/false data can be represented as
one/zero, on/off, true/false, or positive/negative. Different
applications can also use different terms to describe the same
type of data. One application can use the term “balance”
instead of “total amount” to represent the amount of money in a
bank account.
Data should be stored in the data warehouse in a single,
acceptable format agreed to by business analysts, despite
variations in the external operational sources. This allows data
from across the organization, such as legacy data on
mainframes, data in spreadsheets, or even data from the
Internet, to be consolidated in the data warehouse, and
effectively cross-referenced, giving the analysts a better
understanding of the business.
Subject-oriented Data
Operational data sources across an organization tend to hold a
large amount of data about a variety of business-related
functions, such as customer records, product information, and
so on. However, most of this information is also interspersed
with data that has no relevance to business or executive
reporting, and is organized in a way that makes querying the
data awkward. The data warehouse organizes only the key
business information from operational sources so that it is
available for business analysis.
Historical Data
Data in OLTP systems correctly represents the current value at
any moment in time. For example, an order-entry application
always shows the current value of stock inventory; it does not
show the inventory at some time in the past. Querying the stock
inventory a moment later may return a different response.
However, data stored in a data warehouse is accurate as of
some past point in time because the data stored represents
historical information.
The data stored in a data warehouse typically represents data
over a long period of time; perhaps up to ten years or more.
OLTP systems often contain only current data, because
maintaining large volumes of data used to represent ten years of
information in an OLTP system can affect performance. In
effect, the data warehouse stores snapshots of a business’s
operational data generated over a long period of time. It is
accurate for a specific moment in time and cannot change. This
contrasts with an OLTP system where data is always accurate
and can be updated when necessary.
Read-only Data
After data has been moved to the data warehouse successfully,
it typically does not change unless the data was incorrect in the
first place. Because the data stored in a data warehouse
represents a point in time, it must never be updated. Deletes,
inserts, and updates (other than those involved in the data
loading process) are not applicable in a
data warehouse. The only operations that occur in a data
warehouse, when it has been set up, are loading and querying
data.


Designing and Building a                                 Data
Warehouse and OLAP System
The steps required to build a data warehouse include:
    Determining business, user, and technical requirements.
    Designing and building the database.
    Extracting and loading data into the data warehouse.
    Designing and processing aggregations using OLAP
       tools.
    Querying and maintaining the data warehouse and
       OLAP databases.

Data Granularity
A significant difference between an OLTP or operational
system and a data warehouse is the granularity of the data
stored. An operational system typically stores data at the lowest
level of granularity: the maximum level of detail. However,
because the data warehouse contains data representing a long
period in time, simply storing all detail data from an
operational system can result in an overworked system that
takes too long to query.
A data warehouse typically stores data in different levels of
granularity or summarization, depending on the data
requirements of the business. If an enterprise needs data to
assist strategic planning, then only highly summarized data is
required. The lower the level of granularity of data required by
the enterprise, the higher the number of resources (specifically
data storage) required to build the data warehouse. The
different levels of summarization in order of increasing
granularity are:
     Current operational data
     Historical operational data
     Aggregated data
     Metadata
Current and historical operational data are taken, unmodified,
directly from operational systems. Historical data is operational
level data no longer queried on a regular basis, and is often
archived onto secondary storage.
Aggregated, or summary, data is a filtered version of the
current operational data. The design of the data warehouse
affects how the current data is aggregated. Considerations for
generating summary data include the period of time used to
aggregate the data (for example, weekly, monthly, and so on),
and the parts of the operational data to be summarized. For
example, an organization can choose to aggregate at the part
level the quantity of parts sold per sales representative per
week.
There may be several levels of summary data. It may be
necessary to create summary level data based on an aggregated
version of existing summary data. This can give an
organization an even higher level view of the business. For
example, an organization can choose to aggregate summary
level data further by generating the quantity of parts sold per
month.
Metadata does not contain any operational data, but is used to
document the way the data warehouse is constructed. Metadata
can describe the structure of the data warehouse, source of the
data, rules used to summarize the data at each level, and any
transformations of the data from the operational systems.




Building the OLAP Data Model
A fundamental challenge in OLAP implementation is mapping
the initial database schema to the multidimensional model.
This requires a significant programming effort with many of
the products on the market today. In the evolution of OLAP
products, OLAP database design has become a specialized
and arcane process, intricately linked to the specific OLAP
technology being deployed. Consequently, OLAP database
developers are specialized, which has led to high costs in
developing applications, concentrated at the data design
stage.
In most OLAP implementations, it is assumed that the data
has been prepared for analysis through data warehousing,
whereby information has been extracted from operational
systems, cleansed, validated, and summarized prior to
incorporation into an OLAP application. This is a vital step in
the process, which ensures that the data being viewed by
the OLAP user is correct, consistent, and matches
organizational definitions for the data.
Increasingly, information in a data warehouse is organized
in star (or snowflake) schemas, which simplify user
understanding of the data, maximize performance for
decision support applications, and require less storage for
large databases.
The following illustration is an example of a star schema. In
this database schema, a central fact table is linked to
related dimension tables.

A star (snowflake) schema is a relational approximation of
the OLAP data model and can be an excellent starting point
for building OLAP cube definitions. Few OLAP products,
however, have taken advantage of this trend. Generally,
they have not provided easy tools to map a star schema to
an OLAP model, and as a result keep the cost of building
OLAP models extremely high and the development time
unnecessarily long.
Intuitive User Interfaces
One of the key differences in Microsoft SQL Server OLAP
Services version 7.0 is the OLAP Manager user interface,
which has been created with the infrequent OLAP database
administrator in mind. The OLAP Manager is delivered as a
snap-in to the Microsoft Management Console (MMC), and it
shares the same administrative user interface as the entire
Microsoft BackOffice family of products. The obvious benefit
is that the OLAP database administrator is better able to
translate skills from SQL Server and other Microsoft
products.
More value becomes apparent when the power and flexibility
of MMC are understood. OLAP Services includes a full range
of taskpads that guide the novice or infrequent user through
common tasks. OLAP Services also includes a full tutorial on
OLAP concepts and a step-by-step guide to building an OLAP
cube. A full complement of wizards is available for
automating the most common activities, such as creating a
dimension definition.
Furthermore, OLAP Services is optimized for developing data
warehouses in which star or snowflake schemas have been
designed. The Cube Wizard is especially suited to these
prebuilt schemas, and translation to the multidimensional
model is extremely rapid. OLAP Services can easily
accommodate other source schemas should they be
encountered.
To ensure successful interpretation of the OLAP Services
user interface concepts, Microsoft conducted usability tests.
Finally, large-scale beta testing has provided broad
exposure and customer input to OLAP Services. As a result
of the energy spent on the database administrator
requirements, most users are able to build their first cube in
less than 30 minutes.
 PROCESS  FLOW                  WITHIN          A      DATA
WAREHOUSE

The processes are:

1.Extract and load data.
2.Clean and transform data into a form that can cope with large
data volumes.
3.Back up and archive data.
4.Manage queries and direct them to appropriate data source.




Data Warehousing              and     Online        Analytical
Processing




A data warehouse is often used as the basis for a decision-
support system (also referred to from an analytical perspective
as a business intelligence system). It is designed to overcome
some of the problems encountered when an organization
attempts to perform strategic analysis using the same database
that is used to perform online transaction processing (OLTP).

A typical OLTP system is characterized by having large
numbers of concurrent users actively adding and modifying
data. The database represents the state of a particular business
function at a specific point in time, such as an airline
reservation system. However, the large volume of data
maintained in many OLTP systems can overwhelm an
organization. As databases grow larger with more complex
data, response time can deteriorate quickly due to competition
for available resources. A typical OLTP system has many users
adding new data to the database while fewer users generate
reports from the database. As the volume of data increases,
reports take longer to generate.

As organizations collect increasing volumes of data by using
OLTP database systems, the need to analyze data becomes
more acute. Typically, OLTP systems are designed specifically
to manage transaction processing and minimize disk storage
requirements by a series of related, normalized tables.
However, when users need to analyze their data, a myriad of
problems often prohibits the data from being used:

Users may not understand the complex relationships among the
tables, and therefore cannot generate ad hoc queries.
Application databases may be segmented across multiple
servers, making it difficult for users to find the tables in the
first place.
Security restrictions may prevent users from accessing the
detail data they need.
Database administrators prohibit ad hoc querying of OLTP
systems, to prevent analytical users from running queries that
could slow down the performance of mission-critical
production databases.
By copying an OLTP system to a reporting server on a
regularly scheduled basis, an organization can improve
response time for reports and queries. Yet a schema optimized
for OLTP is often not flexible enough for decision support
applications, largely due to the volume of data involved and the
complexity of normalized relational tables.

For example, each regional sales manager in a company may
wish to produce a monthly summary of the sales per region.
Because the reporting server contains data at the same level of
detail as the OLTP system, the entire month’s data is
summarized each time the report is generated. The result is
longer-running queries that lower user satisfaction.

Additionally, many organizations store data in multiple
heterogeneous database systems. Reporting is more difficult
because data is not only stored in different places, but in
different formats.

Data warehousing and online analytical processing (OLAP)
provide solutions to these problems. Data warehousing is an
approach to storing data in which heterogeneous data sources
(typically from multiple OLTP databases) are migrated to a
separate homogenous data store. Data warehouses provide
these benefits to analytical users:

   Data is organized to facilitate analytical queries rather than
    transaction processing.
   Differences among data structures across multiple
    heterogeneous databases can be resolved.
   Data transformation rules can be applied to validate and
    consolidate data when data is moved from the OLTP
    database into the data warehouse.
   Security and performance issues can be resolved without
    requiring changes in the production systems.

              Sometimes organizations maintain smaller, more
topic-oriented data stores called data marts. In contrast to a data
warehouse which typically encapsulates all of an enterprise’s
analytical data, a data mart is typically a subset of the enterprise
data targeted at a smaller set of users or business functions.

Whereas a data warehouse or data mart are the data stores for
analytical data, OLAP is the technology that enables client
applications to efficiently access the data. OLAP provides these
benefits to analytical users:

   Pre-aggregation of frequently queried data, enabling a very
    fast response time to ad hoc queries.
   An intuitive multidimensional data model that makes it easy
    to select, navigate, and explore the data.
   A powerful tool for creating new views of data based upon
    a rich array of ad hoc calculation functions.
   Technology to manage security, client/server query
    management and data caching, and facilities to optimize
    system performance based upon user needs.
   The terms data warehousing and OLAP are sometimes used
    interchangeably. However, it is important to understand
    their differences because each represents a unique set of
    technologies, administrative issues, and user implications.
CONLUSION:

             So, Data Warehousing with OLAP
technology is useful for handling large amounts of
data .It is also useful for analyzing past data for
forecasting , and also for graphical analysis and
handling multidimensional data .This is useful for
fast analysis rather than conventional databases
with OLTP .
.
References:



    Data warehousing in Real World
                         --DENNIS MURRAY
                         --SAM ANAHORY

    Data warehousing with Microsoft SQL server
                         --JAKE STURM

    Data warehousing with oracle
                          --Sima yazdani
                           --Shirley s.wons

     www.olapreport.com

								
To top