Data Warehousing Overview Issues, Terminology, Products and

Document Sample
Data Warehousing Overview  Issues, Terminology, Products and Powered By Docstoc
					   Cube Computation and
Indexes for Data Warehouses
         CPS 196.03
          Notes 7



                              1
              Processing
l   ROLAP servers vs. MOLAP servers
l   Index Structures
l   Cube computation
                                                     Client
    What to Materialize?
                        Client
l                                 Query & Analysis


l   Algorithms         Metadata     Warehouse



                                     Integration



                     Source           Source           Source




                                                                2
                   ROLAP Server
l   Relational OLAP Server

                          tools


                        ROLAP        Special indices, tuning;
       utilities                     Schema is “denormalized”
                        server

                        relational
                         DBMS


                                                          3
                 MOLAP Server
l   Multi-Dimensional OLAP Server
                                                        Sales




                                              ty
                                                B




                                            Ci
                                               A
                                            milk




                                  Product
                     M.D. tools             soda
                                            eggs
                                            soap
                                                   1   2 3 4
                                                       Date

     utilities
                       multi-                 could also
                    dimensional                 sit on
                                              relational
                       server                   DBMS




                                                                4
                     MOLAP
                                                Total annual sales
                           Date                  of TV in U.S.A.
             1Qtr   2Qtr    3Qtr   4Qtr   sum
   t
  uc


        TV
  od




      PC                                        U.S.A
Pr




    VCR




                                                         Country
 sum
                                                Canada

                                                Mexico


                                                 sum




                                                                     5
               MOLAP


    C      c3 61
         c2 45
                       62       63       64
                    46       47       48
       c1 29     30       31       32
     c0
         B13     14      15     16               60
    b3                                      44
B                                      28        56
    b2   9
                                            40
                                       24        52
    b1   5
                                            36
                                       20
    b0   1       2        3     4
         a0     a1       a2    a3
                     A



                                                      6
          Challenges in MOLAP
l   Storing large arrays for efficient access
    u   Row-major, column major
    u   Chunking
    u   Compressing sparse arrays
l   Creating array data from data in tables
l   Efficient techniques for Cube computation

Topics are discussed in the paper for reading

                                                7
               Index Structures
l   Traditional Access Methods
    u   B-trees, hash tables, R-trees, grids, …
l   Popular in Warehouses
    u   inverted lists
    u   bit map indexes
    u   join indexes
    u   text indexes



                                                  8
        Inverted Lists




                           ...
           inverted        data
 age
             lists       records
index
                                   9
            Using Inverted Lists
l   Query:
    u   Get people with age = 20 and name = “fred”
l   List for age = 20: r4, r18, r34, r35
l   List for name = “fred”: r18, r52
l   Answer is intersection: r18




                                                     10
        Bit Maps




                     ...
 age      bit        data
index    maps      records

                             11
                      Bitmap Index
l    Index on a particular column
l    Each value in the column has a bit vector: bit-op is fast
l    The length of the bit vector: # of records in the base table
l    The i-th bit is set if the i-th row of the base table has the
     value for the indexed column
l    not suitable for high cardinality domains
    Base table         Index on Region              Index on Type




                                                                     12
                Using Bit Maps
l   Query:
    u   Get people with age = 20 and name = “fred”
l   List for age = 20: 1101100000
l   List for name = “fred”: 0100000001
l   Answer is intersection: 010000000000

l   Good if domain cardinality small
l   Bit vectors can be compressed

                                                     13
                     Join
• “Combine” SALE, PRODUCT relations
• In SQL: SELECT * FROM SALE, PRODUCT WHERE ...




                                                  14
Join Indexes
        join index




                     15
Cube Computation for Data
      Warehouses




                            16
              Counting Exercise
l   How many cuboids are there in a cube?
    u   The full or nothing case
    u   When dimension hierarchies are present
l   What is the size of each cuboid?




                                                 17
          Lattice of Cuboids
                     129
                           all


         city          product         date



 city, product        city, date       product, date



 day 2           city, product, date
day 1




                                                       18
Dimension Hierarchies

 all


state



city




                        19
                Dimension Hierarchies
                       all

      city           product           date


city, product       city, date        product, date


                                                      state
                city, product, date
                                                                    state, date
                                              state, product

                                                               state, product, date

  not all arcs shown...

                                                                                      20
        Efficient Data Cube Computation

l   Data cube can be viewed as a lattice of cuboids
    u   The bottom-most cuboid is the base cuboid
    u   The top-most cuboid (apex) contains only one cell
    u   How many cuboids in an n-dimensional cube with L
        levels?


l   Materialization of data cube
    u   Materialize every (cuboid) (full materialization), none
        (no materialization), or some (partial materialization)
    u   Selection of which cuboids to materialize
        è   Based on size, sharing, access frequency, etc.
                                                                  21
                  Derived Data
l   Derived Warehouse Data
    u   indexes
    u   aggregates
    u   materialized views (next slide)
l   When to update derived data?
l   Incremental vs. refresh



                                          22
     Idea of Materialized Views
l   Define new warehouse tables/arrays




                                     does not exist
                                     at any source




                                                      23
          Efficient OLAP Processing
l   Determine which operations should be performed on available cuboids
    u   Transform drill, roll, etc. into corresponding SQL and/or OLAP operations,
        e.g., dice = selection + projection

l   Determine which materialized cuboid(s) should be selected for OLAP:
    u   Let the query to be processed be on {brand, province_or_state} with the
        condition “year = 2004”, and there are 4 materialized cuboids available:
         1) {year, item_name, city}
         2) {year, brand, country}
         3) {year, brand, province_or_state}
         4) {item_name, province_or_state} where year = 2004
         Which should be selected to process the query?

l   Explore indexing structures & compressed vs. dense arrays in MOLAP
                                                                                     24
             What to Materialize?
l    Store in warehouse results useful for
     common queries
l    Example:                        total sales
                   day 2             ...
                  day 1




                                              129

    materialize

                                                    25
        Materialization Factors
l   Type/frequency of queries
l   Query response time
l   Storage cost
l   Update cost

      Will study a concrete algorithm later



                                              26
                  Iceberg Cube
l       Computing only the cuboid cells
        whose count or other aggregates
        satisfying the condition like
                 HAVING COUNT(*) >= minsup
    l   Motivation
         u   Only a small portion of cube cells may be “above the
             water’’ in a sparse cube
         u   Only calculate “interesting” cells—data above certain
             threshold



                                                                     27
          Challenges in MOLAP
l   Storing large arrays for efficient access
    u   Row-major, column major
    u   Chunking
    u   Compressing sparse arrays
l   Creating array data from data in tables
l   Efficient techniques for Cube computation

Topics are discussed in the paper for reading

                                                28

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:8
posted:7/22/2013
language:English
pages:28
Lingjuan Ma Lingjuan Ma MS
About work for China Compulsory Certification. Some of the documents come from Internet, if you hold the copyright please contact me by huangcaijin@sohu.com