Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out

Size of Data Warehousing Project Scope and scale of projects

VIEWS: 6 PAGES: 28

  • pg 1
									Data Warehousing
                                Organization of Concepts

                             Large-Scale Data Management for
                              Organizational Decision-Making


 Data Warehouse                      Multidimension Databases                      Data Mining
(Data Integration)                    (Enabling Technology)                 (Relationship Discovery)


       Purpose                                 Purpose                                Purpose

    Basic Features                          Basic Features                          Applications

                                                                                     Functions
    Definition                               MDDB vs. Relational Database
    Data Requirements                        Rotation
                                             Ranging                               Association
        Transaction Processing vs.           Roll-Up
            Data Warehousing                 Drill-Down                               Availability Bias
    Data Structures                          Computations                             Sequencial Pattern
    Example                                                                           Classifying
                                               Benefits
                                                                                      Clustering
     Architecture                                                                     Hybrid
                                                 Tools
                                                                               Emergent Applications
    Data Mapping
    Example Architecture of an IBM           PowerPlay
          Data Warehouse

Preparing Data for DW

 Reasons for Failure
           Learning Objectives
• Rationale for Data Warehousing
• Definition and Illustration
• Data Requirements and Data Structure
• Architecture of a Data Warehouse
   – An Example from IBM
• Data Mapping
   – An Example from IBM
• Process of Getting Data into DW
• Common Reasons for Project Failure
          Learning Objectives

• Multidimensional Databases
  – What are they? Why are they needed?
• Contrast MDD and Relational Databases
• When is MDD (In)appropriate?
• MDD Features
  – Rotation, Ranging, Roll-Up and Drill-Down,
    Computations
• Benefits of MDD
• PowerPlay as an OLAP tool
          Learning Objectives
•   What is data mining?
•   Applications of data mining
•   Data mining functions
•   The SAS data mining method - An example
       The New Technological Environment



                                  HP
                                                  Compaq

                                                 DEC
                           IBM




Enterprise Network Computing and Client/Server Technology are
changing the way organizations look at all of their information systems
 Implications of Technological Environment
• Emergence of Tools
   – Emergence of large-scale data management tools
   – Emergence of several end-user oriented data
     exploration tools
• Increased flexibility and integration across
  environments
   – Transaction environments
   – Decision support environment
• Increased flexibility and integration between
  environments
   – Improved interface between transaction environments
     and decision support environments
                             Online Transaction Processing
External Entities                                            Organization

  Extract and Load
  Clean
  Integrate
  Validate
  Transform




Knowledge                                                    Subject Areas
Workers              Online Analytical Processing
             Data Resource Bottleneck
• Knowledge workers need to gather and analyze
  business data to generate business information
• “Data in jail” syndrome
• Data quality problems, including:
   –   missing data
   –   unreliable data
   –   invalid data
   –   inappropriate granularity
   –   problematic semantics
• Data structures for transaction systems designed
  for capture and storage, but not amenable for
  analysis
               Goals of Warehousing
• Performance (Canned queries, MD Analysis, Ad hoc, Min.
  Impact on Operational System)

• Flexibility (MD Flex, Ad hoc, Change data structure)

• Scalability (No. of Users, Volume of Data)

• Ease of Use (Location, Formulation, Navigation,
  Manipulation)

• Data Quality (Consistent, Correct, Timely, Integrated)

• Connection to the Detail Business Transactions
           Definition and Illustration

• A data warehouse is a central source of data, stocked with
  data extracted from different operational systems and
  standardized.
• Some organizations, such as Grand Metropolitan PLC,
  consider their data warehouse essentially as front-ends to
  existing production systems
• Other organizations such as Owens-Corning Fiberglass
  Corp. view their data warehouse as a collection of subject-
  or application-oriented databases that are kept separate
  from the operational applications
         Definition and Illustration

• New hybrid information systems which bridge two
  broad classes of systems, namely: storage and
  retrieval systems, and decision support systems
• The objective is not just the storage and retrieval
  of data objects, but also the programmed and ad-
  hoc transformation of data to information for its
  defined user base
         Definition of a Data Mart

• A data mart is defined as “A subset of a data
  warehouse for a single department or function. A
  data mart may have tens of gigabytes of data
  rather than hundreds of gigabytes for the entire
  enterprise.” TechEncyclopedia,
  http://www.techweb.com/encyclopedia/.
   DW Deals with Four Levels of Data
• Operational data - fundamental to DW data
   – Ex. sales of XYZ at the end of today is 2345 units
• Atomic data - DW data - individual data item and the
  lowest level of data in DW. Ex. sales of XYZ at the end of
  Sept. is 2045 units
   – … at the end of August 1743
   – … at the end of July 1345, etc…
• Summary data - DW data - these summaries are
  calculated ahead of time and stored for recall. Ex. At the
  end of third quarter sales of XYZ for Southeastern region
  was 23,456; at the end of second quarter sales of XYZ…
• Answers to specific question - processed and stored
  for individual user’s pc or workstation.
   – How does the growth of XYZ sales in SE compares with NW
    Database Structures for DW
• DW data requirements are different from
  traditional transaction processing (TP) data. For
  example:
      Characteristics TP data                DW data
      Volatility       Dynamic               Static
      Currency         Current               Historical
      Time dimension   Implicitly “now”      Explicit, visible
      Granularity      Primitive, detailed   Detailed and derived
                                             summaries
      Updates          Continuous,           Periodic, scheduled
                       random
      Tasks            Repetitive            Unpredictable
      Flexibility      Low                   High
      Performance      High performance      Lower performance may be
                       mandatory             acceptable
   Database Structures for DW
          continued...
• Hierarchical and network structures not
  suited
• Simple Relational with some ‘fixing’ can do
  the job
• A new structure - which is quite unsuitable
  for TP data - has emerged for DW use:
  “Multidimensional DB”
  – Time, itself, is a dimension in MDDB
   Database Structures for DW
          continued...
• All databases have two types of data
  – Dimension data (also called Perspectives)
     •   Tens to a few mil. Rows
     •   One primary key
     •   Textually described
     •   Frequently modified

  – Fact data (also called Measurements)
     •   Millions or billions of rows
     •   Multiple foreign key
     •   Numeric
     •   Don’t change
          Pet store example
• Multi-dimensional Database Structure
              Cats Dogs Fish Gerbils
     Adams 3        12    5    4
     Baker    2     4     22   5
     Carstens 6     2     3    16
 Relational Equivalence
Salesperson   Pet Type   No. Sold
Adams         Cats       3
Adams         Dogs       12
Adams         Fish       5
Adams         Gerbils    4
Baker         Cats       2
Baker         Dogs       4
Baker         Fish       22
Baker         Gerbils    5
Carstens      Cats       6
Carstens      Dogs       2
Carstens      Fish       3
Carstens      Gerbils    16
Information Deployment Architecture at IBM’s Network Systems Division

             Management                             Knowledge Workers                               Executives




                                               Common Access Interface

                          Query Support                                      Information Delivery




                                                                                                 Common
                       Common
                                                         Automated                                Business
                     Business Data
                                                        Chart Facility                          Information
                      Warehouse
                                                                                                Warehouse




        Automated Extract Applications




External
Databases
            Development       Manufacturing   Finance            Service   Personnel                      Automated
                                                                                                           Charts




From Loeb, K., Rai, A., Ramaprasad, A., and Sharma, S., “Design, Development, and Implementation of a
Global Information Warehouse: A Case Study at IBM,” Information Systems Journal, 1998.
           Data Mapping Strategy at IBM

                                                        s
                                                     sse




                                                                      Bu
                                                    e
                                                roc




                                                                        sin
                                              P
                                         ss




                                                                           es
                                      ine




                                                                              s
                                    s
                                  Bu




                                                                             Se
                                                                               gm
                                                                                 en
                                                                                    ts
                       Data


                    Decision
                  Support Tools

                    Information




From Loeb, K., Rai, A., Ramaprasad, A., and Sharma, S., “Design, Development, and Implementation of a
Global Information Warehouse: A Case Study at IBM,” Information Systems Journal, 1998.
    Data Mapping Strategy at IBM -
             Continued
• Business segment is defined as any
  functional area of IBM such as marketing,
  finance, manufacturing, etc.
• Business processes is any process used by
  any business segment.
• A business segment owner may own
  multiple business processes such as
  software and hardware development.
Getting Data into the Data Warehouse
• Extraction -- extract data
• Cleansing -- to achieve consistent format,
  valid values
• Loading -- input data from multiple sources;
  decide which set of data “rules”
• Transformation
   – Change data type of field from integer to character
   – Summarization -- aggregation
Example of Summarization

Four cells added for summaries by pet:

            Cats Dogs Fish Gerbils
 Adams      3      12      5      4
 Baker      2      4       22     5
 Carstens   6      2       3      16
 Total      11     18      30     25
                    Metadata
– Human
   • “Natural” assumptions are not always correct
   • What to do with same item - shoe size - length vs.
     width
   • Is ‘pay-rate’ same as ‘salary’ in another database
   • Validation rules for common human items such as
     U.S. zip code
– Computer-based metadata for people to use
   • create metadata warehouse
      – data descriptions, data structure
      – A business thesaurus; etc.
– Computer-based metadata for computer to use
Why do Data Warehousing Projects Fail?
• Size of Data Warehousing Project
   – Scope and scale of projects
• Financial and Political Support
   – Difficult to show savings in tangible terms
   – Time-lags before productivity improvements realized
   – Threat of downsizing
• Enterprise Culture
   – Ownership of data - whose data is it anyway?
   – Mutual trust among business segment executives
   – Can data inform knowledge creation?
• Skills
   – Ability to integrate business and technical knowledge
   – Knowledge workers skill-level with decision support tools
  Why do Data Warehousing Projects Fail?

• Technology Issues
  – Data warehouse NOT isolated from OLTP or older
    DSS environments
  – Failure to plan database transform
  – Semantics of data not reconciled
        Anatomy of a Failure


                  Please see:
www.cio.com/archive/enterprise/111597_data.html

								
To top