Learning Center
Plans & pricing Sign in
Sign Out

Size of Data Warehousing Project Scope and scale of projects


  • pg 1
									Data Warehousing
                                Organization of Concepts

                             Large-Scale Data Management for
                              Organizational Decision-Making

 Data Warehouse                      Multidimension Databases                      Data Mining
(Data Integration)                    (Enabling Technology)                 (Relationship Discovery)

       Purpose                                 Purpose                                Purpose

    Basic Features                          Basic Features                          Applications

    Definition                               MDDB vs. Relational Database
    Data Requirements                        Rotation
                                             Ranging                               Association
        Transaction Processing vs.           Roll-Up
            Data Warehousing                 Drill-Down                               Availability Bias
    Data Structures                          Computations                             Sequencial Pattern
    Example                                                                           Classifying
     Architecture                                                                     Hybrid
                                                                               Emergent Applications
    Data Mapping
    Example Architecture of an IBM           PowerPlay
          Data Warehouse

Preparing Data for DW

 Reasons for Failure
           Learning Objectives
• Rationale for Data Warehousing
• Definition and Illustration
• Data Requirements and Data Structure
• Architecture of a Data Warehouse
   – An Example from IBM
• Data Mapping
   – An Example from IBM
• Process of Getting Data into DW
• Common Reasons for Project Failure
          Learning Objectives

• Multidimensional Databases
  – What are they? Why are they needed?
• Contrast MDD and Relational Databases
• When is MDD (In)appropriate?
• MDD Features
  – Rotation, Ranging, Roll-Up and Drill-Down,
• Benefits of MDD
• PowerPlay as an OLAP tool
          Learning Objectives
•   What is data mining?
•   Applications of data mining
•   Data mining functions
•   The SAS data mining method - An example
       The New Technological Environment



Enterprise Network Computing and Client/Server Technology are
changing the way organizations look at all of their information systems
 Implications of Technological Environment
• Emergence of Tools
   – Emergence of large-scale data management tools
   – Emergence of several end-user oriented data
     exploration tools
• Increased flexibility and integration across
   – Transaction environments
   – Decision support environment
• Increased flexibility and integration between
   – Improved interface between transaction environments
     and decision support environments
                             Online Transaction Processing
External Entities                                            Organization

  Extract and Load

Knowledge                                                    Subject Areas
Workers              Online Analytical Processing
             Data Resource Bottleneck
• Knowledge workers need to gather and analyze
  business data to generate business information
• “Data in jail” syndrome
• Data quality problems, including:
   –   missing data
   –   unreliable data
   –   invalid data
   –   inappropriate granularity
   –   problematic semantics
• Data structures for transaction systems designed
  for capture and storage, but not amenable for
               Goals of Warehousing
• Performance (Canned queries, MD Analysis, Ad hoc, Min.
  Impact on Operational System)

• Flexibility (MD Flex, Ad hoc, Change data structure)

• Scalability (No. of Users, Volume of Data)

• Ease of Use (Location, Formulation, Navigation,

• Data Quality (Consistent, Correct, Timely, Integrated)

• Connection to the Detail Business Transactions
           Definition and Illustration

• A data warehouse is a central source of data, stocked with
  data extracted from different operational systems and
• Some organizations, such as Grand Metropolitan PLC,
  consider their data warehouse essentially as front-ends to
  existing production systems
• Other organizations such as Owens-Corning Fiberglass
  Corp. view their data warehouse as a collection of subject-
  or application-oriented databases that are kept separate
  from the operational applications
         Definition and Illustration

• New hybrid information systems which bridge two
  broad classes of systems, namely: storage and
  retrieval systems, and decision support systems
• The objective is not just the storage and retrieval
  of data objects, but also the programmed and ad-
  hoc transformation of data to information for its
  defined user base
         Definition of a Data Mart

• A data mart is defined as “A subset of a data
  warehouse for a single department or function. A
  data mart may have tens of gigabytes of data
  rather than hundreds of gigabytes for the entire
  enterprise.” TechEncyclopedia,
   DW Deals with Four Levels of Data
• Operational data - fundamental to DW data
   – Ex. sales of XYZ at the end of today is 2345 units
• Atomic data - DW data - individual data item and the
  lowest level of data in DW. Ex. sales of XYZ at the end of
  Sept. is 2045 units
   – … at the end of August 1743
   – … at the end of July 1345, etc…
• Summary data - DW data - these summaries are
  calculated ahead of time and stored for recall. Ex. At the
  end of third quarter sales of XYZ for Southeastern region
  was 23,456; at the end of second quarter sales of XYZ…
• Answers to specific question - processed and stored
  for individual user’s pc or workstation.
   – How does the growth of XYZ sales in SE compares with NW
    Database Structures for DW
• DW data requirements are different from
  traditional transaction processing (TP) data. For
      Characteristics TP data                DW data
      Volatility       Dynamic               Static
      Currency         Current               Historical
      Time dimension   Implicitly “now”      Explicit, visible
      Granularity      Primitive, detailed   Detailed and derived
      Updates          Continuous,           Periodic, scheduled
      Tasks            Repetitive            Unpredictable
      Flexibility      Low                   High
      Performance      High performance      Lower performance may be
                       mandatory             acceptable
   Database Structures for DW
• Hierarchical and network structures not
• Simple Relational with some ‘fixing’ can do
  the job
• A new structure - which is quite unsuitable
  for TP data - has emerged for DW use:
  “Multidimensional DB”
  – Time, itself, is a dimension in MDDB
   Database Structures for DW
• All databases have two types of data
  – Dimension data (also called Perspectives)
     •   Tens to a few mil. Rows
     •   One primary key
     •   Textually described
     •   Frequently modified

  – Fact data (also called Measurements)
     •   Millions or billions of rows
     •   Multiple foreign key
     •   Numeric
     •   Don’t change
          Pet store example
• Multi-dimensional Database Structure
              Cats Dogs Fish Gerbils
     Adams 3        12    5    4
     Baker    2     4     22   5
     Carstens 6     2     3    16
 Relational Equivalence
Salesperson   Pet Type   No. Sold
Adams         Cats       3
Adams         Dogs       12
Adams         Fish       5
Adams         Gerbils    4
Baker         Cats       2
Baker         Dogs       4
Baker         Fish       22
Baker         Gerbils    5
Carstens      Cats       6
Carstens      Dogs       2
Carstens      Fish       3
Carstens      Gerbils    16
Information Deployment Architecture at IBM’s Network Systems Division

             Management                             Knowledge Workers                               Executives

                                               Common Access Interface

                          Query Support                                      Information Delivery

                                                         Automated                                Business
                     Business Data
                                                        Chart Facility                          Information

        Automated Extract Applications

            Development       Manufacturing   Finance            Service   Personnel                      Automated

From Loeb, K., Rai, A., Ramaprasad, A., and Sharma, S., “Design, Development, and Implementation of a
Global Information Warehouse: A Case Study at IBM,” Information Systems Journal, 1998.
           Data Mapping Strategy at IBM







                  Support Tools


From Loeb, K., Rai, A., Ramaprasad, A., and Sharma, S., “Design, Development, and Implementation of a
Global Information Warehouse: A Case Study at IBM,” Information Systems Journal, 1998.
    Data Mapping Strategy at IBM -
• Business segment is defined as any
  functional area of IBM such as marketing,
  finance, manufacturing, etc.
• Business processes is any process used by
  any business segment.
• A business segment owner may own
  multiple business processes such as
  software and hardware development.
Getting Data into the Data Warehouse
• Extraction -- extract data
• Cleansing -- to achieve consistent format,
  valid values
• Loading -- input data from multiple sources;
  decide which set of data “rules”
• Transformation
   – Change data type of field from integer to character
   – Summarization -- aggregation
Example of Summarization

Four cells added for summaries by pet:

            Cats Dogs Fish Gerbils
 Adams      3      12      5      4
 Baker      2      4       22     5
 Carstens   6      2       3      16
 Total      11     18      30     25
– Human
   • “Natural” assumptions are not always correct
   • What to do with same item - shoe size - length vs.
   • Is ‘pay-rate’ same as ‘salary’ in another database
   • Validation rules for common human items such as
     U.S. zip code
– Computer-based metadata for people to use
   • create metadata warehouse
      – data descriptions, data structure
      – A business thesaurus; etc.
– Computer-based metadata for computer to use
Why do Data Warehousing Projects Fail?
• Size of Data Warehousing Project
   – Scope and scale of projects
• Financial and Political Support
   – Difficult to show savings in tangible terms
   – Time-lags before productivity improvements realized
   – Threat of downsizing
• Enterprise Culture
   – Ownership of data - whose data is it anyway?
   – Mutual trust among business segment executives
   – Can data inform knowledge creation?
• Skills
   – Ability to integrate business and technical knowledge
   – Knowledge workers skill-level with decision support tools
  Why do Data Warehousing Projects Fail?

• Technology Issues
  – Data warehouse NOT isolated from OLTP or older
    DSS environments
  – Failure to plan database transform
  – Semantics of data not reconciled
        Anatomy of a Failure

                  Please see:

To top