Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out

Designing the data warehouse data marts

VIEWS: 23 PAGES: 45

									Designing the data warehouse
        / data marts
   Methodologies and Techniques
Basic principles
         Life cycle of the DW
                   First time load

Operational Databases             Warehouse Database


                        Refresh



                        Refresh


                                                Purge or Archive

                        Refresh
       Oracle Warehouse
Any Source Components
              Any Data Any Access


                                            Relational
                    Relational /              tools
Operational       Multidimensional
   data
                Oracle Medi`




                                              OLAP
              Text, image      Spatial
                                              tools


                               Audio,
 External         Web          video
   data                                  Applications/ Web
 Oracle Intelligence Tools



 IS develops      Business users        Analysts
 user’s Views
   Current           Tactical          Strategic




Oracle Reports   Oracle Discoverer   Oracle Express
       Oracle Data Mart Suite


                          Data Modeling
                      Oracle Data Mart Designer
OLTP                                 Data Mart
Databases                            Database
             OLTP          Ware-
            Engines       housing     Oracle8
                          Engines
                                     SQL*PLUS


      Data                  Data                  Data Access
    Extraction           Management                & Analysis
  Oracle Data Mart       Oracle Enterprise         Discoverer &
      Builder                Manager              Oracle Reports
       “Big Bang” Approach:
          Advantages and
•          Disadvantages
    Advantages:
    – warehouse built as part of major project
      (eg: BPR)
    – Having a “big picture” of the data
      warehouse before starting the data
      warehousing project
• Disadvantages:
    – Involves a high risk, takes a longer time
    – Runs the risk of needing to change
      requirements
Incremental Approach to
Warehouse Development
 Strategy             • Multiple iterations
                      • Shorter
  Definition
    Analysis
       Design           implementations
            Build
                      • Validation of each
         Production
                        phase
  Benefits of an Incremental
          Approach
• Delivers a strategic data warehouse
  solution through incremental development
  efforts
• Provides extensible, scalable architecture
• Quickly provides business benefits and
  ensures a much earlier return of
  investment
• Allows a data warehouse to be built based
  on a subject or application area at a time
• Allows the construction of an integrated
  data mart environment
                Data Mart
• A subset of a data warehouse that
  supports the requirements of a
  particular department or business
  function.

• Characteristics include:
  – Do not normally contain detailed operational
    data unlike data warehouses.
  – May contain certain levels of aggregation
   Dependent Data Mart
 Flat Files
Operational
 Systems                              Marketing



                       Marketing
                         Sales          Sales
                        Finance
                    Human Resources

                        Data           Finance
                      Warehouse

                                      Data Marts


    External Data
Independent Data Mart
 Operational     Flat Files
  Systems




                   Sales or Marketing


 External Data
  Reasons for Creating a Data
             Mart
• To give users more flexible access to
  the data they need to analyse most
  often.
• To provide data in a form that matches
  the collective view of a group of users
• To improve end-user response time.
• Potential users of a data mart are
  clearly defined and can be targeted for
  support
  Reasons for Creating a Data
             Mart
• To provide appropriately structured data as
  dictated by the requirements of the end-user
  access tools.

• Building a data mart is simpler compared with
  establishing a corporate data warehouse.

• The cost of implementing data marts is far
  less than that required to establish a data
  warehouse.
         Data Marts Issues
• Data mart functionality
• Data mart size
• Data mart load performance
• Users access to data in multiple data
  marts
• Data mart Internet / Intranet access
• Data mart administration
• Data mart installation
  Example of DW tool OLAP
• Rotate and drill down to successive
  levels of detail.
• Create and examine calculated data
  interactively on large volumes of data.
• Determine comparative or relative
  differences.
• Perform exception and trend analysis.
• Perform advanced analytical functions
  for example forecasting, modeling, and
  regression analysis
     Original OLAP Rules

1. Multidimensional conceptual view
2. Transparency
3. Accessibility
4. Consistent reporting performance
5. Client-server architecture
     Original OLAP Rules

6. Multiuser support
7. Unrestricted cross-dimensional
  operations
8. Intuitive data manipulation
9. Flexible reporting
10. Unlimited dimensions and
  aggregation levels
 Relational Database Model

        Attribute 1 Attribute 2 Attribute 3 Attribute 4
           Name        Age        Gender      Emp No.
Row 1     Anderson         31       F     1001
Row 2     Green            42       M     1007
Row 3     Lee              22       M     1010
Row 4     Ramos            32       F     1020

        The table above illustrates the employee relation.
   Multidimensional Database
Customer  Store
                Model
                                      Store

                     Time                     Time


          SALES                  FINANCE




Product                     GL_Line

The data is found at the intersection of
 dimensions.
Two dimensions
Three dimensions
Specialised Multidimensional tool
• Benefits:
  – Quick access to very large volumes of data
  – Extensive and comprehensive libraries of
    complex functions
     • analysis
     • Strong modeling and forecasting capabilities
  – Can access multidimensional and relational
    database structures
  – Caters for calculated fields
• Disadvantages:
  – Difficulty of changing model
  – Lack of support for very large volumes of data
  – May require significant processing power
               MOLAP Server
• The application layer
  stores data in a
  multidimensional structure        DSS client
• The presentation layer
  provides the
                                     MOLAP
  multidimensional view              Engine
• Efficient storage and processing Application
• Complexity hidden from the          layer

  user
• Analysis using preaggregated
  summaries and precalculated Warehouse
  measures
           ROLAP Server

• The warehouse stores           DSS client
  atomic data.
• The application layer
                                  ROLAP
  generates SQL for the           engine
  three- dimensional view.      Application
• The presentation layer Multiple layer
                           SQL
  provides the
  multidimensional view.
                                  Warehouse
                                    server
                  MOLAP

                       MDDB

                                 Query


            Periodic
             load                Data
Warehouse              Express           Express
                        Server            user
             ROLAP

                    Cache
             Live
            fetch             Query



             Data             Data
            cache

                    Express           Express
Warehouse
                     Server            user


 Also Hybrid (HOLAP)
          Choosing a Reporting
              Architecture
•   Business needs            Good

•   Potential for growth                          MOLAP

•   interface          Query
                       Performance

•   enterprise architecture           ROLAP
                                OK
•   Network architecture
                                     Simple           Complex
•   Speed of access                           Analysis

•   Openness
          Data Acquisition
• Identify, extract, transform, and transport
  source data
• Consider internal and external data
• Perform gap analysis between source data
  and target database objects
• Plan move of data between sources and target
• Define first-time load and refresh strategy
• Define tool requirements
• Build, test, and execute data acquisition
  modules
                Modeling
• Warehouses differ from operational
  structures:
  – Analytical requirements
  – Subject orientation
• Data must map to subject oriented
  information:
  – Identify business subjects
  – Define relationships between subjects
  – Name the attributes of each subject
• Modeling is iterative
• Modeling tools are available
  Modeling the Data Warehouse
                               1
1. Defining the business
   model                              Select a
2. Creating the dimensional           business
                                      process
   model
                               2, 3
3. Modeling summaries
4. Creating the physical model
         4


             Physical model
Identifying Business Rules

      Location                       Product

Geographic proximity     Type      Monitor      Status

      0 - 1 miles        PC        15 inch      New
      1 - 5 miles        Server    17 inch      Rebuilt
       > 5 miles                   19 inch      Custom
                                   None


         Time                         Store
Month > Quarter > Year      Store > District > Region
Creating the Dimensional Model
  Identify fact tables
   – Translate business measures into fact
     tables
   – Analyze source system information for
     additional measures
   – Identify base and derived measures
   – Document additivity of measures
  Identify dimension tables
  Link fact tables to the dimension
  tables
  Create views for users
         Dimension Tables
Dimension tables have the following
  characteristics:
• Contain textual information that
  represents the attributes of the business
• Contain relatively static data
• Are joined to a fact table through a
  foreign key reference     Product              Channel


                                        Facts
                                       (units,
                                       price)


                            Customer              Time
              Fact Tables
Fact tables have the following characteristics:
• Contain numeric measures (metrics) of the
  business
• May contain summarized (aggregated) data
• May contain date-stamped data
• Are typically additive
• Have key value that is typically a concatenated
  key composed of the primary keys of the
  dimensions
• Joined to dimension tables through foreign
  keys that reference primary keys in the
  dimension tables
Dimensional Model (Star
      Schema)
               Fact table



 Product                       Channel


                  Facts
                 (units,
                 price)


 Customer                       Time



            Dimension tables
Star Schema Model
                Product Table             Store Table
                Product_id                Store_id
                Product_desc              District_id
                …                         ...

                             Sales Fact Table
• Central fact table         Product_id
                             Store_id
• Radiating dimensions       Item_id
                             Day_id
• Denormalized model         Sales_dollars
                             Sales_units
                             ...
                Time Table                Item Table
                Day_id                    Item_id
                Month_id                  Item_desc
                Period_id                 ...
                Year_id
        Star Schema Model

•   Easy for users to understand
•   Fast response to queries
•   Simple metadata
•   Supported by many front end tools
•   Less robust to change
•   Slower to build
•   Does not support history
  Snowflake Schema Model
Product Table       Store Table
                                    District Table
 Product_id          Store_id
                                     District_id
Product_desc        Store_desc
                                    District_desc
                    District_id


         Sales Fact Table
             Item_id
             Store_id
          Sales_dollars
           Sales_units

Time Table          Item Table    Dept Table     Mgr Table
 Week_id              Item_id      Dept_id        Dept_id
Period_id           Item_desc     Dept_desc       Mgr_id
  Year_id             Dept_id      Mgr_id        Mgr_name
    Snowflake Schema Model

• Direct use by some tools
• More flexible to change
• Provides for speedier data loading
• May become large and
  unmanageable
• Degrades query performance
• More complex metadata
       Using Summary Data
 Phase 3: Modeling summaries

• Provides fast access to precomputed
  data
• Reduces use of I/O, CPU, and memory
• Is distilled from source systems and
  precalculated summaries
• Usually exists in summary fact tables
Designing Summary Tables
• Average             • Total
• Maximum             • Percentage
              Units    Sales(€)   Store

 Product A
      Total
 Product B
      Total
 Product C
      Total
  Summary Tables Example
SALES FACTS           SALES BY MONTH/REGION
Sales Region Month    Month Region Tot_Sales$
10,000 North Jan 99   Jan 99 North 41,000
12,000 South Feb 99   Jan 99 East 10,000
11,000 North Jan 99   Feb 99 South 40,000
15,000 West Mar 99    Mar 99 West 17,000
18,000 South Feb 99
20,000 North Jan 99
10,000 East Jan 99
2,000 West Mar 99       SALES BY MONTH
                        Month Tot_Sales
                        Jan 99 51,000
                        Feb 99 40,000
                        Mar 99 17,000
   Summary Management
       in Oracle8i
                               Sales               Sales
                                                 summary
Region

         State
                 City

                  Product        Time


                        Summary advisor
          Summary                             Space
           usage             Summary       requirements
                         recommendations
     The Time Dimension

• Time is critical to the data warehouse.
• A consistent representation of time is
  required for extensibility.


                                       Time
    Sales fact
                                    dimension


       How and where should it be stored?

								
To top