Data Warehousing by 6cZrt5I

VIEWS: 11 PAGES: 30

									Data Warehousing



                   1
Managerial Questions

   What is a data warehouse?
   Why build one?
   What are the differences between a data warehouse,
    data mart, OPS, and real-time data warehouses?
   How expensive is it?
   What can you use it for?


                                                         2
Definitions
   Data Warehouse:
       A subject-oriented, integrated, time-variant, non-updatable collection
        of data used in support of management decision-making processes
       Subject-oriented: e.g. customers, patients, students, products
       Integrated: Consistent naming conventions, formats, encoding
        structures; from multiple data sources
       Time-variant: Can study trends and changes
       Nonupdatable: Read-only, periodically refreshed
   Data Mart:
       A data warehouse that is limited in scope




                                                                                 3
Need for Data Warehousing
   Integrated, company-wide view of high-
    quality information (from disparate databases)
   Separation of operational and informational
    systems and data (for improved performance)




                                                 4
Evolution
   Evolved out of the early MIS functions
   A new solution was needed because
       Volume and complexity of data was increasing
       Existing systems were designed on an ad hoc
        basis for operational needs
       A system for analytic purposes was needed



                                                       5
The ETL Process
   Capture/Extract
   Scrub or data cleansing
   Transform
   Load and Index


                              6
Static extract = capturing a             Incremental extract = capturing
snapshot of the source data at a point   changes that have occurred since the
                                                                              7
in time                                  last static extract
Fixing errors: misspellings,              Also: decoding, reformatting, time
erroneous dates, incorrect field usage,   stamping, conversion, key generation,
mismatched addresses, missing data,       merging, error detection/logging, locating
                                                                                  8

duplicate data, inconsistencies           missing data
Record-level:                      Field-level:
Selection – data partitioning      single-field – from one field to one field
Joining – data combining           multi-field – from many fields to one, or9

Aggregation – data summarization   one field to many
                                    Update mode: only changes in
Refresh mode: bulk rewriting of
                                    source data are written to data   10
target data at periodic intervals
                                    warehouse
Major Warehouse Applications

   Online analytic processing (OLAP)
   Data mining
   Customer relationship management
   Supply chain management
   Business intelligence


                                        11
Motivating Examples
   Forecasting
   Comparing performance of units
   Monitoring, detecting fraud
   Visualization




                                     12
Informational vs. Operational Systems




                                        13
Structure
   The basic 3 tier database structure:
       Data base
       Model base
       Interface




                                           14
Conceptual Structure of the Warehouse




                                    15
Warehouse Architecture
      Client                       Client

                Query & Analysis



     Metadata     Warehouse



                   Integration



   Source           Source           Source
                                              16
Metadata
Data about data
   What the user can access / needs to know
   How does data get mapped from operational activities
   Rules for data summarization, calculation and presentation


Where do you find it?
   In the warehouse
   In warehouse tools/applications
   In external programs

                                                                 17
Why a Separate Data Warehouse?

   Performance
   Data Access
   Data formats
   Data quality



                                 18
Data Flow




            19
Other Forms of Data Warehouses

   Operational data store (ODS)
   Real-time warehouse
   Prototype warehouse
   Exploration warehouse



                                   20
Warehouse versus ODS
      OPERATIONAL DATA STORE               DATA WAREHOUSE

      Subject oriented                     Subject oriented


      Integrated                           Integrated


      Can be updated                       Cannot be updated


      Current and near-current data        Historical data plus detail data


      Contains some data that is never     Contains much more data than
      included in the warehouse            the ODS

      Serves clerks, day-to-day            Serves analysts and middle and
      decision makers                      upper managers for long-term
                                           decisions
      Full function, update record         Simpler load-access technology
      environment
      Typical update cycle: 2 to 4 hours   Typical update cycle: 24 hours

                                                                              21
Structure of ODS




                   22
Warehouse Applications

   Online Analytic Processing (OLAP)
   Data mining
   Customer relationship management
   Supply chain management
   Business intelligence


                                        23
OLAP




       24
Data Mining
   Associations
   Sequences
   Classifications
   Clusters
   Forecasting



                      25
Why CRM?
   Segmenting customers
   Increased cross-selling
   Increased customer loyalty (potential)
   Improved target marketing




                                             26
Business Intelligence
Measure
 Sales trends

 Buying habits

 Performance metrics

 “What–if “ analyses




                        27
Strategy and the Warehouse




                             28
Managing the Warehouse
   Monitoring activity       Managing redundancy
   Data requirements         Managing staff
   Security                  Training
   Determining the data      Managing user
    model                      responsibilities
   Monitoring metadata



                                                     29
Reasons for Warehouse Failures
             Watson and             Stackoviak                Love
               Haley
         Poor data quality      Failure to involve    Diffusion of
                                everyone in process   leadership

         Not enough money       Failure to            No relation
                                understand why        between business
                                warehouse exists      and IT benefits
         Poor objectives        Incorrect             No senior
                                assumptions            management
                                                      input
         Technical limits       Unrealistic time      Poor use of data
                                frames and scope

         Misunderstanding       Poorly managed        Inappropriate
         legacy data            expectations          systems
                                                      development
         Lack of user support   Short-term            Lack of control
                                rather than
                                long-term view
         Missed deadlines       Failure to learn      Lack of up-front
                                from experience       MIS
                                of others             involvement
         Lack of training,
         expertise

         Poorly managed
         expectations
                                                                         30

								
To top