Docstoc

Data Warehouse class

Document Sample
Data Warehouse class Powered By Docstoc
					  DATA
WAREHOUSE
CONCEPTS
A Definition
 A Data Warehouse:

  • Is a repository for collecting, standardizing, and
    summarizing snapshots of transactional data contained in
    an organization’s operations or production systems

  • provides a historical perspective of information

  • is most often, but not exclusively, used for decision support
    applications and business information queries

  • can be more than one database

  • Is not a new concept
Another Definition

 Decision Support:

   • is a set of tools to easily access data
   • is becoming a critical business tool
   • is usually graphically oriented
   • is empowering end users with tools to access vital business
     information
   • is moving lots of data down to the end user workstation
   • is a rapidly expanding area because of data warehousing
     efforts and projects
Why a warehouse?

 For analysis and decision support, end users
  require access to data captured and stored in an
  organization’s operational or production
  systems

 This data is stored in multiple formats, on
  multiple platforms, in multiple data structures,
  with multiple names, and probably created
  using different business rules
Why do we want a central data store
                                          Lif e              LTD


                            Voluntary
                                                                   Non-Medical
                             Benf its




              Indiv idual
                                                                                 Financial
              Disability




 Account
Management
                                                                                        Underwriting




                                                                     Customer Service
             Sales/Marketing
                                              Financial Analysis




                                  A Good Reason for a Central Data Store
Interesting Statistics

 85% of the Fortune 1000 companies have, are
  implementing, or are looking at, data warehouses (Meta
  Group)

 90% of all information processing organizations will be
  pursuing a data warehouse strategy in the next three
  years (Meta Group)

 The Decision Support industry will be a $1 Billion
  industry by 1997 (IDC & Forrester)
Data Warehouse Evolution - Stage 0
                     End User
    Application      Reports




                                No end user access to
                                production files
  Production Files
                                “What we print” is
                                “what you get”
                     End User
    Application      Reports




  Production Files
 Data Warehouse Evolution - Stage 1
                                    End users denied direct
                     End User
  Application        Reports

                                    access to production
                                    files

                                    Snapshots or copies of
Production Files   Snapshot File
                                    production files are
                                    made available instead

                                    Solution: Provide end
                     End User       users access to
  Application        Reports
                                    production systems




Production Files   Snapshot Files
No Integration Between Systems

                                        a

                              System developed in 1979



                                                                          c
        b
                                                                   Purchased company
                                                                        system

   Purchased Package




                                                         New Application development
             Rebuilt Application
   Data Warehouse Evolution - Stage 2
                                                 Document
                                              Document
                                          Document
       Document
                      Desktop computer
    Document
Document
                                                                                       Document
                                                                                    Document
                  A                                                             Document
                                                            Desktop computer




       Document
    Document
Document

                  b          Mainf rame
                                                                     Document
                                                                Document
                                                            Document
                                                                                          Serv er or Midreange




       Document
    Document
Document          c                           4GL




                                                            Desktop computer
Data Characteristics
Type                    Production         Warehouse

 Data Use              Operational        Mgt Reporting
 Level of detail       Detailed           Summary
 Currency              Real time,         Multiple
                        Latest value       generations
 Longevity             Relatively brief   Forever
 Stability             Dynamic            Static
 Scope of definition   Application wide   Enterprise wide
 Data Operations       Capture/update     Read only
 Data values           Coded              Decoded
Transforming the Logical Model
                                                                                                Brand Group




                                                                          Brand
                                                                                                                      Product
                                   Shipment




                                                                                          SKU


                   Day                                                                                                  Size
                                                  Order Item




   Week                    Month                                                                              Ty pe


                                                               customer




                                                                                                                                Class
          Calendar Y ear
                                              Sales Rep                           Route




                         time
                                              marke t
                                                                    District


                                                                                                  Region
 Key Differences - Part 1
 Key differences between “data jails”
  (operational database) & warehouses

   • Subject orientation - operational systems are application-
     segmented (i.e. banks = auto loan, demand deposit
     accounting or mortgages). Subject areas for banks would be
     customer and each financial product
   • Level of integration - warehouses resolve years of
     application inconsistency in encoding/decoding, data name
     rationalization, etc
   • Update volatility - record at a time updates in operational
     database vs bulk loads in data warehouse
   • Time variance norms include: 30-90 days of transactions for
     operational system, 1-10 years for data warehouses
     Key Differences - Part 2
Characteristic       Operational         Warehouse

Transaction volume   High                Low to huge
Response time        Very fast           Reasonable
Updating             High volume         Very Low
Time Period          Current Period      Past to Future
Scope                Internal            External
Activities           Focused, clerical   Exploratory,
                     operational         analytical, managerial
Queries              Predictable,        Can be Unpredictable,
                     periodic            Ad hoc
Types of Warehouse Configurations

 Enterprise
 Division
 Functional
   • Financial
   • Personnel
   • Engineering/Product
 Departmental
 Special Project
            What’s Really Involved?
                           Data Warehouse
                            Components

 DB/2       VSAM
                                                        M anage me nt Re porting
             Mainframe                                     Sale s/M arke ting
IMS         Applications                                 Custome r Re lations
                                                           Re se rv e Analysis
                                                            Risk Analysis



  DB2/2

                PC
            Applications
                                 Extract Programs                    Reserv es     Customers
                             Data Cleansers/Scrubbers
                             Translators/Transformers
                                   Timing Tools                             Combine d
                                   Data Loading                  Rates
                                                                              Data        Policies
      ???                          File Transfer                            Ware house
             External
             Sources

                                                                         Claims    Premiums




 DB/6000


             Midrange
                                                                            De cision
                                                                          Support Tools
  DB/400
 Typical Users of a Data Warehouse
 Decision Support Analysts, Business Analysts
   • Marketing, Actuaries, Financial, Sales, Executive
 Grocery Store attitudes
   • Going to the store, not knowing what they want
   • Close proximity says give me “everything”
 Explorers
   • Don’t know what they want
   • Search on a random basis, non-repetitively
   • Frequently finds nothing, but when they do, there are huge
     rewards
 Farmers
   • Know what they want
   • Non random searches, finds frequent “flakes of gold”
   • Finds small amounts of data
Advanced Warehouse Topics
 Metadata repositories
   • Information about the data in the warehouse
       • Like a library card catalog

       • Data about when the information was created, what files
         accessed, how much data
       • Data about changes in business rules, processes

       • Context versus Content
           • “What does it mean?”

 Data Mining
   • Drilling down into databases with tools to find specific
     anomolies
 Online Analysis Processing (OLAP)
   • Really means summary data