Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out
Get this document free

Data Warehouse - Download Now PowerPoint

VIEWS: 179 PAGES: 41

									Data Warehousing
  Introduction
 Basic Elements
 Basic Processes
Evolution of Decision Support Systems
                    1960’s

   Magnetic Tape
   Master Files
   Reports
Evolution of Decision Support Systems
                               1970’s
   DASD
    –   Direct Access Storage Devices
   DBMS
    –   Data Base Management Systems
   OLTP – On Line Transaction Processing
    –   A type of computer processing in which the computer responds
        immediately to user requests
   OLTP on Databases
   Database defined as a single source of data for all
    processing
Evolution of Decision Support Systems
                                          1980’s
   PC/4GL Technology
    –   end-users able to directly control data and systems (notion that data can
        be used for more than OLTP)

   Management Information System
    –   MIS (later known as Decision Support Systems-DSS) defined as processing
        used to drive management decisions (as opposed to processing exclusively
        used to drive detailed operational decisions)
   single-database-serving-all-purpose paradigm
    –   One DBMS supporting both regular tr. processing and MIS/DDS
   Extract Processing
    –   Extract Program
         •   rummages through a file or database, uses some criteria for selection, and, upon
             finding qualified data, transports the data over onto another file or database
         •   used to move data used for DDS out of the way of high performance on line
             transaction processing and to give the DSS user the control over that data
Evolution of Decision Support Systems
                                 Early 1990’s
   Proliferation of Extracts
    –   Extracts everywhere (and extracts of extracts, and extracts of
        extracts of extracts, and so forth)
   Productivity Problems
    –   In order to write a corporate report
         •   Data must be located
         •   Lots of customized programs must be written
         •   The programs must cross every technology that the company has
   Inability to go from Data to Information
    –   Due to the following:
         •   Data Applications were built to service the needs of current transaction
             processing
         •   They were never designed to hold the historical data needed for DSS
             analysis
Evolution of Decision Support Systems
                              Early 1990’s
   Lack of Credibility of Data
E.g. Account activity report done by 2 analysts
    – No time basis for data (e.g. one analyst extracts data on Sunday evening, another
      extracts data on Wednesday afternoon)
    – Algorithmic differential (e.g. one analyst extracts data on all large accounts,
      another extracts data on all backed-up accounts)
    – The levels of extraction (each additional level of extraction increases the
      probability of discrepancy)
    – External Data (e.g. one analyst is bringing in Wall Street Journal data and
      another is bringing in Business Week data, and they both strip the data identity)
    – No common source of data to begin with (analysts are extracting data from
      different databases within the company)

E.g. Final Result
                          Analyst A                  Analyst B
                          Activity is up 10%         Activity is down 15%
Evolution of Decision Support Systems
                    Late 1990’s - Current
   Approach Change
    –   Realization that Naturally Evolving Architecture (direct result of
        extract processing method) is not sufficient
   Architected Environment
    –   Recognition that there are fundamentally two kinds of data
         •   OPERATIONAL DATA (PRIMITIVE DATA)
         •   ANALYTICAL DATA (DERIVED DATA, DSS DATA)
    –   Data Warehouses emerged as the New DSS Architecture
    Need for Data Warehousing

 Integrated, company-wide view of high-
  quality information.
 Separation of operational and analytical
  systems and data.
OPERATIONAL vs. ANALYTICAL DATA

Operational Data                        Analytical Data
Application Oriented                    Subject Oriented
Detailed                                Summarized (and/or Detailed)
Current                                 Values over time (snapshots)
Can be updated                          Read (and Append) Only
Control of Update: Major Issue          Control of Update: No Issue
For Clerical Community                  For Managerial Community
Supports Day-to-Day Operations          Supports Managerial Needs
Small Amounts used in a Process         Large Amounts used in a Process
Non-redundant                           Redundancy not an issue
High frequency of Access                Low to modest frequency of Access
Typical time-horizon: days and months   Typical time-horizon: years
OPERATIONAL vs. ANALYTICAL DATA
             Application:
  Health Club Members-Visit Database
                                                       Subject:
     HEALTHCLUBMEMEBRS                           Health Club Revenue
    MembId Name MembLevel
    111    Joe  A                                     REVENUE
    222    Sue  B                      Rid    Date       GeneratedBy   Amount
    333    Pat  A                      …      …          …             …
    …      …    …                      7234   12/31/1999 NonMember     10
                                       7235   01/01/2000 NonMember     $15
  DAILYVISITSFROMNONMEMBERS            7236   01/01/2000 Member        $100
    Trid   VisitType VisitDate         7237   01/01/2000 Member        $50
    11xx22 YP        01/01/2000        7238   01/01/2000 Member        $100
    11xx23 NP        02/01/2000        7239   02/01/2000 NonMember     $10
    11xx24 YP        02/01/2000        7240   02/01/2000 NonMember     $15
    …      …         …                 …      …          …             …

        MEMBRSHPLEVELS
          ID Type Fee
          A Gold $100
          B Basic $50

           VISITLEVELS
  ID Type                   Fee
  YP With Pool Usage        $15
  NP Without Pool Usage     $10
OPERATIONAL vs. ANALYTICAL DATA
                                                                                  Hardware Utilization
                                                                                  (Frequency of Access)



120
                                      Operational                                                                    120
                                                                                                                                              Data Warehouse
100                                                                                                                  100




 80                                                                                                                  80




 60                                                                                                                  60




 40                                                                                                                  40




 20                                                                                                                  20




  0                                                                                                                   0
      1   2   3   4   5   6   7   8   9   10   11 12   13 14   15 16   17 18   19 20 21 22 23 24 25 26 27 28 29 30         1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2 21 2 2 2 25 2 27 2 2 3 31 3 3 3 35 3 37 3 3 4 41 4 4 4 45 4 47
       Data Warehouse: Definition

   Data Warehouse: An enterprise-wide structured
    repository of subject-oriented, time-variant,
    historical data used for information retrieval and
    decision support. The data warehouse stores
    atomic and summary data.
    (Oracle Data Warehouse Method)
         Data Warehouse: Definition

   Data Warehouse: An enterprise-wide structured
    repository of subject-oriented, time-variant, historical data used
    for information retrieval and decision support. The data
    warehouse stores atomic and summary data.
    (Oracle Data Warehouse Method)


   enterprise-wide refers to the fact that a DW
    provides a company-wide view of the information it
    contains
        Data Warehouse: Definition

   Data Warehouse: An enterprise-wide structured
    repository of subject-oriented, time-variant, historical data
    used for information retrieval and decision support. The data
    warehouse stores atomic and summary data.
    (Oracle Data Warehouse Method)


   structured repository refers to the fact that a DW
    is a structured data repository like any other data base
          Data Warehouse: Definition
   Data Warehouse: An enterprise-wide structured repository of
    subject-oriented, time-variant, historical data used for
    information retrieval and decision support. The data warehouse
    stores atomic and summary data.
    (Oracle Data Warehouse Method)
   subject-oriented refers to the fundamental difference
    in the purpose of a traditional Database System and a
    DW.
    –   Traditional Database System is developed in order to
        support a specific business process (e.g. shipping company
        order-entry database, dental office appointment management database).
    –   DW is developed to analyze a specific subject area (e.g.
        sales, profit).
         Data Warehouse: Definition
   Data Warehouse: An enterprise-wide structured repository of
    subject-oriented, time-variant, historical data used for
    information retrieval and decision support. The data warehouse
    stores atomic and summary data.
    (Oracle Data Warehouse Method)


   time-variant refers to the fact that a DW contains
    slices of data across different periods of time. With
    these data slices, the user can view reports from now
    and in the past.
         Data Warehouse: Definition
   Data Warehouse: An enterprise-wide structured repository of
    subject-oriented, time-variant, historical data used for
    information retrieval and decision support. The data warehouse
    stores atomic and summary data.
    (Oracle Data Warehouse Method)


   historical refers to the fact that a DW typically
    contains several years worth of data (as opposed to 60-
    90 days typical time horizon for data in traditional
    operational databases). In fact, DW often does not
    contain the latest transactional data.
        Data Warehouse: Definition
   Data Warehouse: An enterprise-wide structured repository of
    subject-oriented, time-variant, historical data used for
    information retrieval and decision support .
    The data warehouse stores atomic and summary data.
    (Oracle Data Warehouse Method)


   information retrieval and decision support
    refers to the fact that a DW is a facility for getting
    information to answer questions. It is not meant for
    direct data entry; batch updates are the norm for
    refreshing data warehouses.
         Data Warehouse: Definition
   Data Warehouse: An enterprise-wide structured repository of
    subject-oriented, time-variant, historical data used for
    information retrieval and decision support. The data warehouse
    stores atomic and summary           data.
    (Oracle Data Warehouse Method)


   atomic and summary data refers to the fact that a
    DW, depending on purpose, may contain atomic data,
    summary data, or both.
                    DW is Subject Oriented!

   Data is organized around major subject areas of an enterprise, and is
    therefore useful for an enterprise-wide understanding of those subjects
   E.g. a banking operational system keeps independent records of
    customer savings, loans, and other transactions. A warehouse pulls
    this independent data together to provide customer financial
    information (fees, profits, losses …)
   Examples of subject areas
    –   Customer Financial Information
    –   Toll calls made in the telecommunications industry company
    –   Airline passenger booking information
   Data from operational systems must be transformed so that is
    consistent and meaningful in the DW
                         Subject Orientation
Insurance Co. Operational Databases            Data Warehouse

              auto


               life

                                                            claim
               health
                                                  Subject

              casualty


  Applications
                         Subject Orientation
Insurance Co. Operational Databases            Data Warehouse

              auto


               life                                     premium



               health
                                                           claim


              casualty


  Applications                                  Subjects
                      DW is Integrated!
   In many organizations, data resides in diverse independent systems,
    making it difficult to acquire meaningful information for analysis.
   In DW data is completely integrated, even when the underlying
    sources store data differently
   There is no magic stick: the transformation and integration process
    (which involves ETT – extraction, transformation, and transportation-
    load) can be time consuming and costly, and it requires commitment
    from every part of the organization, particularly top-level managers
              Data Warehouse Purpose
   Data warehouse is designed and implemented to answer
    these TWO fundamental questions:
    –   Who is buying what?
    –   When and where are they doing so?
   More specific
    –   Who [which customer] is buying [buying / using / delivering /
        shipping / ordering / returning] what [products / services] from
        where [outlet / store / clinic / branch] on what occasion [when],
        how [credit card / cash / check / exchange / debit] and why
        [causation]?
                   Some Uses of a Data
                      Warehouse
   Airlines for aircraft deployment, analysis of route profitability,
    frequent flyer promotions, and maintenance
   Banking for promotion of products and services, and customer service
   Health care for cost reduction
   Investment and insurance companies for customer analysis, risk
    assessment, and portfolio management
   Retail stores for buying pattern analysis, promotions, customer
    profiling, and pricing
   Telecommunications for product and service promotions.
Source: Oracle Corp. 1999
            Basic Elements of the Data
   Source System
                   Warehouse
    –   An operational system whose function is to capture the transactions of the
        business
   Data Staging Area
    –   A storage area and a set of processes used to clean, transform, combine, de-
        duplicate, household, archive, and prepare source data for use in the data
        warehouse
    –   No query and presentation services
   Presentation Server
    –   Physical machine on which the data warehouse data is organized and stored
   Dimensional Model
    –   A specific discipline for modeling data that is an alternative to entity-
        relationship model
    –   Used for modeling data warehouses
   Business Process
    –   A coherent set of business activities which are supported by operational and data
        warehousing systems
           Basic Elements of the Data
                  Warehouse

                      DW                 Data Mart
Scope                 Enterprise          Department
Subjects              Multiple            Single
Data Sources          Many                Few
Typical Size          100 GB to > 1 TB    < 100 GB
Implementation Time   Months, Years       Months
               Basic Elements of the Data
                      Warehouse
   Data Mart
    –   Dependent Data Mart Vs. Independent Data Mart
    –   Webopedia: “Data Mart: A database designed to help managers make strategic
        decisions about their business. Whereas a data warehouse combines databases
        across an entire enterprise, data marts are usually smaller and focus on a
        particular subject or department. Some data marts, called dependent data
        marts, are subsets of larger data warehouses”
    –   Kimball: “Data Mart:          A logical subset of the complete data warehouse”
                 “Data Warehouse: Union of its constituent data marts”
   Operational Data Store (ODS)
    –   ODS stores subject-oriented and integrated data from transaction systems in
        order to address operational needs (and possibly current-data analytical needs)
    –   ODS objectives:
         •   to integrate information from day-to-day systems and allow operational lookup
         •   to relieve day-to-day systems of reporting and current-data analysis demands
  Source Systems


  ORACLE
    File
                                                       User A
     DB2
     File

     FLAT                                               User B
      File

    EXCEL
     File




Possible Scenario Within
an Enterprise (Blue – Operational, Red – Analytical)
  Source Systems


  ORACLE
    File

     DB2
     File

     FLAT
      File

    EXCEL
     File




Possible Scenario Within
an Enterprise (Blue – Operational, Red – Analytical)
                                INDEPENDENT
                                                           DEPENDENT
                                  DATA MART
                                                           DATA MART
  ORACLE
    File

     DB2
     File
                                     ETT                  DATA
     FLAT                                              WAREHOUSE
      File

    EXCEL
     File                     OPERATIONAL
                              DATA STORE                  DEPENDENT
                                                          DATA MART


Possible Scenario Within
an Enterprise (Blue – Operational, Red – Analytical)
                            INDEPENDENT
                                              DEPENDENT
                              DATA MART
                                              DATA MART
  ORACLE
    File

     DB2
     File
                              ETT            DATA
    FLAT                                  WAREHOUSE
     File

   EXCEL
    File                   OPERATIONAL
                           DATA STORE        DEPENDENT
                                             DATA MART


Possible Scenario Within
an Enterprise
                            INDEPENDENT
                                              DEPENDENT
                              DATA MART
                                              DATA MART
  ORACLE
    File
                                                          User A
     DB2
     File
                              ETT            DATA
    FLAT                                  WAREHOUSE       User B
     File

   EXCEL
    File                   OPERATIONAL
                           DATA STORE        DEPENDENT
                                             DATA MART


Possible Scenario Within
an Enterprise
                            INDEPENDENT
                                              DEPENDENT
                              DATA MART
                                              DATA MART
  ORACLE
    File
                                                          User A
     DB2
     File
                              ETT            DATA
    FLAT                                  WAREHOUSE
                                                           User B
     File

   EXCEL
    File                   OPERATIONAL
                           DATA STORE        DEPENDENT
                                             DATA MART


Possible Scenario Within
an Enterprise
                            INDEPENDENT
                                              DEPENDENT
                              DATA MART
                                              DATA MART
  ORACLE
    File
                                                          User A
     DB2
     File
                              ETT            DATA
    FLAT                                  WAREHOUSE
                                                           User B
     File

   EXCEL
    File                   OPERATIONAL
                           DATA STORE        DEPENDENT
                                             DATA MART


Possible Scenario Within
an Enterprise
               INDEPENDENT
                                 DEPENDENT
                 DATA MART
                                 DATA MART
  ORACLE
    File
                                             User A
    DB2
    File
                 ETT            DATA
    FLAT                     WAREHOUSE
                                              User B
     File

   EXCEL
    File      OPERATIONAL
              DATA STORE        DEPENDENT
                                DATA MART


ODS Purpose
               INDEPENDENT
                                 DEPENDENT
                 DATA MART
                                 DATA MART
  ORACLE
    File
                                             User A
    DB2
    File
                 ETT            DATA
    FLAT                     WAREHOUSE
                                              User B
     File

   EXCEL
    File      OPERATIONAL
              DATA STORE        DEPENDENT
                                DATA MART


ODS Purpose
              Basic Elements of the Data
                     Warehouse
   OLAP (On-Line Analytic Processing)
    –   OLAP: The general activity of querying and presenting text and number data
        from data warehouses for analytical purposes
    –   OLTP: The general activity of updating, querying and presenting text and
        number data from databases for operational purposes
   End User Applications and Data Access Tools
    –   OLAP applications and tools
   Metadata
    –   All of the information in the data warehouse environment that is not the actual
        data itself
              Basic Processes of the Data
                      Warehouse
   Extracting
    –   Reading and understanding the source data, and copying the parts that are
        needed to the data staging area
   Transforming
    –   Cleaning data (correcting, resolving conflicts, dealing with missing data, etc.)
    –   Purging data (eliminating extracted data not useful for data warehousing)
    –   Combining data sources (matching key values, fuzzy matches on non-key
        values, etc.)
    –   Creating surrogate keys (in order to avoid dependence on legacy keys)
    –   Building aggregates
   Loading and Indexing
    –   Bulk loading
   Quality Assurance Checking
    –   Making sure all the newly loaded date can be properly used
              Basic Processes of the Data
                      Warehouse
   Release/Publishing
    –   Notifying users that the new data is ready
   Querying
    –   Using the data warehouse (using OLAP tools, data mining, etc.)
   Data Feedback/Feeding in Reverse
    –   Uploading the clean data from the data warehouse back to a source system
    –   Uploading the clean data from the data warehouse into a data mart
   Auditing
    –   Establishing where the data came from and what were the calculations
        performed
   Securing
    –   Access control for ensuring the security of the data warehouse
   Backing Up and Recovering
    –   System for back up and recovery of data warehouse data and metadata for
        archival purposes and disaster recovery

								
To top