Data Warehouse

Document Sample
Data Warehouse Powered By Docstoc
					Data Warehousing
  Introduction
 Basic Elements
 Basic Processes
Evolution of Decision Support Systems
                    1960’s

u   Magnetic Tape
u   Master Files
u   Reports
Evolution of Decision Support Systems
                               1970’s
u   DASD
    –   Direct Access Storage Devices
u   DBMS
    –   Data Base Management Systems
u   OLTP – On Line Transaction Processing
    –   A type of computer processing in which the computer responds
        immediately to user requests
u   OLTP on Databases
u   Database defined as a single source of data for all
    processing
Evolution of Decision Support Systems
                                          1980’s
u   PC/4GL Technology
    –   end-users able to directly control data and systems (notion that data can
        be used for more than OLTP)

u   Management Information System
    –   MIS (later known as Decision Support Systems-DSS) defined as processing
        used to drive management decisions (as opposed to processing exclusively
        used to drive detailed operational decisions)

u   single-database-serving-all-purpose paradigm
    –   One DBMS supporting both regular tr. processing and MIS/DDS
u   Extract Processing
    –   Extract Program
         •   rummages through a file or database, uses some criteria for selection, and, upon
             finding qualified data, transports the data over onto another file or database
         •   used to move data used for DDS out of the way of high performance on line
             transaction processing and to give the DSS user the control over that data
Evolution of Decision Support Systems
                                 Early 1990’s
u   Proliferation of Extracts
    –   Extracts everywhere (and extracts of extracts, and extracts of
        extracts of extracts, and so forth)
u   Productivity Problems
    –   In order to write a corporate report
         •   Data must be located
         •   Lots of customized programs must be written
         •   The programs must cross every technology that the company has
u   Inability to go from Data to Information
    –   Due to the following:
         •   Data Applications were built to service the needs of current transaction
             processing
         •   They were never designed to hold the historical data needed for DSS
             analysis
Evolution of Decision Support Systems
                              Early 1990’s
u   Lack of Credibility of Data
E.g. Account activity report done by 2 analysts
    – No time basis for data (e.g. one analyst extracts data on Sunday evening, another
      extracts data on Wednesday afternoon)
    – Algorithmic differential (e.g. one analyst extracts data on all large accounts,
      another extracts data on all backed-up accounts)
    – The levels of extraction (each additional level of extraction increases the
      probability of discrepancy)
    – External Data (e.g. one analyst is bringing in Wall Street Journal data and
      another is bringing in Business Week data, and they both strip the data identity)
    – No common source of data to begin with (analysts are extracting data from
      different databases within the company)

E.g. Final Result
                          Analyst A                  Analyst B
                          Activity is up 10%         Activity is down 15%
Evolution of Decision Support Systems
                    Late 1990’s - Current
u   Approach Change
    –   Realization that Naturally Evolving Architecture (direct result of
        extract processing method) is not sufficient
u   Architected Environment
    –   Recognition that there are fundamentally two kinds of data
         •   OPERATIONAL DATA (PRIMITIVE DATA)
         •   ANALYTICAL DATA (DERIVED DATA, DSS DATA)
    –   Data Warehouses emerged as the New DSS Architecture
     Need for Data Warehousing

u   Integrated, company-wide view of high-
    quality information.
u   Separation of operational and analytical
    systems and data.
OPERATIONAL vs. ANALYTICAL DATA

Operational Data                        Analytical Data
Application Oriented                    Subject Oriented
Detailed                                Summarized (and/or Detailed)
Current                                 Values over time (snapshots)
Can be updated                          Read (and Append) Only
Control of Update: Major Issue          Control of Update: No Issue
For Clerical Community                  For Managerial Community
Supports Day-to-Day Operations          Supports Managerial Needs
Small Amounts used in a Process         Large Amounts used in a Process
Non-redundant                           Redundancy not an issue
High frequency of Access                Low to modest frequency of Access
Typical time-horizon: days and months   Typical time-horizon: years
OPERATIONAL vs. ANALYTICAL DATA
OPERATIONAL vs. ANALYTICAL DATA
               Hardware Utilization
               (Frequency of Access)


     Operational              Data Warehouse
       Data Warehouse: Definition

u   Data Warehouse: An enterprise-wide structured
    repository of subject-oriented, time-variant,
    historical data used for information retrieval and
    decision support. The data warehouse stores
    atomic and summary data.
    (Oracle Data Warehouse Method)
         Data Warehouse: Definition

u   Data Warehouse: An enterprise-wide structured
    repository of subject-oriented, time-variant, historical data used
    for information retrieval and decision support. The data
    warehouse stores atomic and summary data.
    (Oracle Data Warehouse Method)


u   enterprise-wide refers to the fact that a DW
    provides a company-wide view of the information it
    contains
        Data Warehouse: Definition

u   Data Warehouse: An enterprise-wide structured
    repository of subject-oriented, time-variant, historical data
    used for information retrieval and decision support. The data
    warehouse stores atomic and summary data.
    (Oracle Data Warehouse Method)


u   structured repository refers to the fact that a DW
    is a structured data repository like any other data base
          Data Warehouse: Definition
u   Data Warehouse: An enterprise-wide structured repository of
    subject-oriented, time-variant, historical data used for
    information retrieval and decision support. The data warehouse
    stores atomic and summary data.
    (Oracle Data Warehouse Method)
u   subject-oriented refers to the fundamental difference
    in the purpose of a traditional Database System and a
    DW.
    –   Traditional Database System is developed in order to
        support a specific business process (e.g. shipping company order
        -entry database, dental office appointment management database).
    –   DW is developed to analyze a specific subject area (e.g.
        sales, profit).
         Data Warehouse: Definition
u   Data Warehouse: An enterprise-wide structured repository of
    subject-oriented, time-variant, historical data used for
    information retrieval and decision support. The data warehouse
    stores atomic and summary data.
    (Oracle Data Warehouse Method)


u   time-variant refers to the fact that a DW contains
    slices of data across different periods of time. With
    these data slices, the user can view reports from now
    and in the past.
         Data Warehouse: Definition
u   Data Warehouse: An enterprise-wide structured repository of
    subject-oriented, time-variant, historical data used for
    information retrieval and decision support. The data warehouse
    stores atomic and summary data.
    (Oracle Data Warehouse Method)


u   historical refers to the fact that a DW typically
    contains several years worth of data (as opposed to 60-
    90 days typical time horizon for data in traditional
    operational databases). In fact, DW often does not
    contain the latest transactional data.
        Data Warehouse: Definition
u   Data Warehouse: An enterprise-wide structured repository of
    subject-oriented, time-variant, historical data used for
    information retrieval and decision support.
    The data warehouse stores atomic and summary data.
    (Oracle Data Warehouse Method)


u   information retrieval and decision support
    refers to the fact that a DW is a facility for getting
    information to answer questions. It is not meant for
    direct data entry; batch updates are the norm for
    refreshing data warehouses.
         Data Warehouse: Definition
u   Data Warehouse: An enterprise-wide structured repository of
    subject-oriented, time-variant, historical data used for
    information retrieval and decision support. The data warehouse
    stores atomic and summary           data.
    (Oracle Data Warehouse Method)


u   atomic and summary data refers to the fact that a
    DW, depending on purpose, may contain atomic data,
    summary data, or both.
                    DW is Subject Oriented!

u   Data is organized around major subject areas of an enterprise, and is
    therefore useful for an enterprise-wide understanding of those subjects
u   E.g. a banking operational system keeps independent records of
    customer savings, loans, and other transactions. A warehouse pulls
    this independent data together to provide customer financial
    information (fees, profits, losses …)
u   Examples of subject areas
    –   Customer Financial Information
    –   Toll calls made in the telecommunications industry company
    –   Airline passenger booking information
u   Data from operational systems must be transformed so that is
    consistent and meaningful in the DW
                         Subject Orientation
Insurance Co. Operational Databases            Data Warehouse

              auto


               life

                                                            claim
               health
                                                  Subject

              casualty



  Applications
                         Subject Orientation
Insurance Co. Operational Databases            Data Warehouse

              auto


                                                        premium
               life



               health
                                                           claim


              casualty



  Applications                                  Subjects
                      DW is Integrated!
u   In many organizations, data resides in diverse independent systems,
    making it difficult to acquire meaningful information for analysis.
u   In DW data is completely integrated, even when the underlying
    sources store data differently
u   There is no magic stick: the transformation and integration process
    (which involves ETT – extraction, transformation, and transportation-
    load) can be time consuming and costly, and it requires commitment
    from every part of the organization, particularly top-level managers
              Data Warehouse Purpose
u   Data warehouse is designed and implemented to answer
    these TWO fundamental questions:
    –   Who is buying what?
    –   When and where are they doing so?
u   More specific
    –   Who [which customer] is buying [buying / using / delivering /
        shipping / ordering / returning] what [products / services] from
        where [outlet / store / clinic / branch] on what occasion [when],
        how [credit card / cash / check / exchange / debit] and why
        [causation]?
                   Some Uses of a Data
                      Warehouse
u   Airlines for aircraft deployment, analysis of route profitability,
    frequent flyer promotions, and maintenance
u   Banking for promotion of products and services, and customer service
u   Health care for cost reduction
u   Investment and insurance companies for customer analysis, risk
    assessment, and portfolio management
u   Retail stores for buying pattern analysis, promotions, customer
    profiling, and pricing
u   Telecommunications for product and service promotions.
Source: Oracle Corp. 1999
            Basic Elements of the Data
u   Source System
                   Warehouse
    –   An operational system whose function is to capture the transactions of the
        business
u   Data Staging Area
    –   A storage area and a set of processes used to clean, transform, combine, de-
        duplicate, household, archive, and prepare source data for use in the data
        warehouse
    –   No query and presentation services
u   Presentation Server
    –   Physical machine on which the data warehouse data is organized and stored
u   Dimensional Model
    –   A specific discipline for modeling data that is an alternative to entity-
        relationship model
    –   Used for modeling data warehouses
u   Business Process
    –   A coherent set of business activities which are supported by operational and data
        warehousing systems
           Basic Elements of the Data
                  Warehouse

                      DW                 Data Mart
Scope                 Enterprise          Department
Subjects              Multiple            Single
Data Sources          Many                Few
Typical Size          100 GB to > 1 TB    < 100 GB
Implementation Time   Months, Years       Months
               Basic Elements of the Data
                      Warehouse
u   Data Mart
    –   Dependent Data Mart Vs. Independent Data Mart
    –   Webopedia: “Data Mart: A database designed to help managers make strategic
        decisions about their business. Whereas a data warehouse combines databases
        across an entire enterprise, data marts are usually smaller and focus on a
        particular subject or department. Some data marts, called dependent data
        marts, are subsets of larger data warehouses”
    –   Kimball: “Data Mart:          A logical subset of the complete data warehouse”
                 “Data Warehouse: Union of its constituent data marts”
u   Operational Data Store (ODS)
    –   ODS stores subject-oriented and integrated data from transaction systems in
        order to address operational needs (and possibly current-data analytical needs)
    –   ODS objectives:
         •   to integrate information from day-to-day systems and allow operational lookup
         •   to relieve day-to-day systems of reporting and current-data analysis demands
  Source Systems


  ORACLE
    File
                                                       User A
     DB2
     File

     FLAT                                               User B
      File

   EXCEL
    File




Possible Scenario Within
an Enterprise (Blue – Operational, Red – Analytical)
  Source Systems


  ORACLE
    File

     DB2
     File

     FLAT
      File

   EXCEL
    File




Possible Scenario Within
an Enterprise (Blue – Operational, Red – Analytical)
                                INDEPENDENT
                                                           DEPENDENT
                                 DATA MART
                                                           DATA MART
  ORACLE
    File

     DB2
     File
                                    ETT                  DATA
     FLAT                                              WAREHOUSE
      File

   EXCEL
    File                      OPERATIONAL
                              DATA STORE                  DEPENDENT
                                                          DATA MART


Possible Scenario Within
an Enterprise (Blue – Operational, Red – Analytical)
                            INDEPENDENT
                                              DEPENDENT
                             DATA MART
                                              DATA MART
  ORACLE
    File

     DB2
     File
                              ETT           DATA
    FLAT                                  WAREHOUSE
     File

   EXCEL
    File                   OPERATIONAL
                           DATA STORE        DEPENDENT
                                             DATA MART


Possible Scenario Within
an Enterprise
                            INDEPENDENT
                                              DEPENDENT
                             DATA MART
                                              DATA MART
  ORACLE
    File
                                                          User A
     DB2
     File
                              ETT           DATA
    FLAT                                  WAREHOUSE       User B
     File

   EXCEL
    File                   OPERATIONAL
                           DATA STORE        DEPENDENT
                                             DATA MART


Possible Scenario Within
an Enterprise
                            INDEPENDENT
                                              DEPENDENT
                             DATA MART
                                              DATA MART
  ORACLE
    File
                                                          User A
     DB2
     File
                              ETT           DATA
    FLAT                                  WAREHOUSE
                                                           User B
     File

   EXCEL
    File                   OPERATIONAL
                           DATA STORE        DEPENDENT
                                             DATA MART


Possible Scenario Within
an Enterprise
                            INDEPENDENT
                                              DEPENDENT
                             DATA MART
                                              DATA MART
  ORACLE
    File
                                                          User A
     DB2
     File
                              ETT           DATA
    FLAT                                  WAREHOUSE
                                                           User B
     File

   EXCEL
    File                   OPERATIONAL
                           DATA STORE        DEPENDENT
                                             DATA MART


Possible Scenario Within
an Enterprise
               INDEPENDENT
                                 DEPENDENT
                DATA MART
                                 DATA MART
  ORACLE
    File
                                             User A
    DB2
    File
                 ETT           DATA
    FLAT                     WAREHOUSE
                                              User B
     File

   EXCEL
    File      OPERATIONAL
              DATA STORE        DEPENDENT
                                DATA MART


ODS Purpose
               INDEPENDENT
                                 DEPENDENT
                DATA MART
                                 DATA MART
  ORACLE
    File
                                             User A
    DB2
    File
                 ETT           DATA
    FLAT                     WAREHOUSE
                                              User B
     File

   EXCEL
    File      OPERATIONAL
              DATA STORE        DEPENDENT
                                DATA MART


ODS Purpose
              Basic Elements of the Data
                     Warehouse
u   OLAP (On-Line Analytic Processing)
    –   OLAP: The general activity of querying and presenting text and number data
        from data warehouses for analytical purposes
    –   OLTP: The general activity of updating, querying and presenting text and
        number data from databases for operational purposes
u   End User Applications and Data Access Tools
    –   OLAP applications and tools
u   Metadata
    –   All of the information in the data warehouse environment that is not the actual
        data itself
              Basic Processes of the Data
                      Warehouse
u   Extracting
    –   Reading and understanding the source data, and copying the parts that are
        needed to the data staging area
u   Transforming
    –   Cleaning data (correcting, resolving conflicts, dealing with missing data, etc.)
    –   Purging data (eliminating extracted data not useful for data warehousing)
    –   Combining data sources (matching key values, fuzzy matches on non-key
        values, etc.)
    –   Creating surrogate keys (in order to avoid dependence on legacy keys)
    –   Building aggregates
u   Loading and Indexing
    –   Bulk loading
u   Quality Assurance Checking
    –   Making sure all the newly loaded date can be properly used
              Basic Processes of the Data
                      Warehouse
u   Release/Publishing
    –   Notifying users that the new data is ready
u   Querying
    –   Using the data warehouse (using OLAP tools, data mining, etc.)
u   Data Feedback/Feeding in Reverse
    –   Uploading the clean data from the data warehouse back to a source system
    –   Uploading the clean data from the data warehouse into a data mart
u   Auditing
    –   Establishing where the data came from and what were the calculations
        performed
u   Securing
    –   Access control for ensuring the security of the data warehouse
u   Backing Up and Recovering
    –   System for back up and recovery of data warehouse data and metadata for
        archival purposes and disaster recovery

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:12/10/2013
language:Unknown
pages:41
Abbydoc Abbydoc
About Abbydoc@163.com