Introduction to Data Warehouse

Document Sample
Introduction to Data Warehouse Powered By Docstoc
					   Introduction to
  Data Warehouse

                          (slides in this section
                          are used courtesy of
                          Carrig Emerging Technology
                          Ph: 410- 553- 6760
                          www.c a r r i g e t. c o m )

                                                         1




Introduction to Data Warehousing and Data
 Introduction to Data Warehousing and Data
                  Mining
                   Mining
1) Data Warehouse Introduction
2) Engineering Conflicts
3) OLTP and DSS
4) Stovepipe vs. Integration
5) Data Warehouse Solution
6) Enterprise Information System
7) Security in a Data Warehouse
8) Moving Data to a Data Warehouse
9) Data Marts
10) Data Mining
                                                         2




                                                             1
                      Introduction
                       Introduction

• Key topics for this course include:
   – Data Warehouse
   – Data Mart
   – Data Mining

• Background and review of relational database
  systems
• Main focus on data warehouse and data mining




                                                        3




         Data Warehouse Introduction
         Data Warehouse Introduction
• A data warehouse is a single source for key,
  corporate information needed to enable business
  decisions

• A database application is a piece of software that
  provides a user interface for users to add, delete,
  query and update data

• Typically, a database management system is
  used to actually do the work of adding, deleting,
  querying or updating data

                             Database         Data
    Application              System

                                                        4




                                                            2
 Engineering Conflicts, Query and Update
 Engineering Conflicts, Query and Update

• It is often an engineering problem when data is
  updated and long-running queries occur at the
  same time
• In some cases, the users who are doing updates
  must wait for queries to complete
• One way to avoid this is to make a read-only
  copy of data


                             Database System
       Application
                               Data          Data
                            for update     for query




                                                              5




              OLTP and DSS Defined
              OLTP and DSS Defined
• An application that updates is called an on-line
  transaction processing (OLTP) application

• An application that issues queries to the read-
  only database is called a decision support system
  (DSS)


         OLTP
       Application
                                         Database System


                                         OLTP          DSS
         DSS                             Data          Data
      Application


                                                              6




                                                                  3
     Applications in a Typical Enterprise
     Applications in a Typical Enterprise
• Most organizations have several disparate
  OLTP/DSS applications in several databases

  Finance                     Inventory                 Sales
   OLTP                         OLTP                    OLTP
 Application                 Application              Application



  Finance                    Inventory                  Sales
    DSS                         DSS                      DSS
 Application                Application               Application


                           DATABASE SYSTEM
   Finance                    Inventory                    Sales
    OLTP                        OLTP                       OLTP
    Data                        Data                       Data

   Finance                    Inventory                    Sales
     DSS                         DSS                       DSS
    Data                        Data                       Data

                                                                    7




                Stovepipe vs Integration
                Stovepipe vs Integration
• When systems stand by themselves they are
  often referred to as “stovepipes”

• Systems that easily share data are called “well
  integrated systems”

              Finance
               OLTP                           Inventory
             Application                        OLTP
                                             Application




              Finance
                DSS                           Inventory
             Application                         DSS
                                             Application

                                                                    8




                                                                        4
   Problems with Stovepipe Architecture
   Problems with Stovepipe Architecture
• Problems:
   – Users who wish to access data must query several different DSS to
     find it
   – Data may have fundamental conflicts between DSS
       – a department code table in one DSS may differ in another DSS
       – a measurement may be stored in meters in one DSS and yards in
         another


• Solution:
   – Use a data warehouse, where data is integrated
      from the several different stovepipe systems
   – Data warehouse is really sharing-lite -- you
      don’t have to co-ordinate as much when applications are built
     and you still reap the benefits of data sharing


                                                                         9




              Data Warehouse Solution
              Data Warehouse Solution

• A data warehouse is an attempt to integrate
  separate DSS so that users can query one place
  to find the answers to their questions


• A data warehouse has the key, corporate data in
  the organization

• A data warehouse tracks historical
     data




                                                                      10




                                                                             5
       Data Warehouse -- A Success Story
       Data Warehouse A Success Story

• Largest data warehouse is Wal-Mart (9 TB)

• Uses for Wal-Mart data warehouse
   – Identifies where a new store should be built based on customer demand
   – Identifies how stores are performing across the nation
   – Contains every “scan” from every purchase



• Benefits Wal-Mart gained from their data warehouse
   – Provided competitive advantage over K-Mart
   – Reduced excess inventory in individual stores
   – Avoided wasted funds in building stores which would fail


                                                                     11




            Selling the Data Warehouse
            Selling the Data Warehouse
• A data warehouse project will fail without
  corporate sponsorship
   – Preferably, the project should be sponsored by the CEO
   – The CEO must be sold on the value to the business to
      improve competitive advantage by deploying a data warehouse

• If an active, corporate sponsor does not exist,
  data sources will be very difficult to identify

• Only add data to the warehouse
     that will answer key,
     corporate questions asked
     by the corporate sponsor.
  Otherwise, you will have a data dump
                                                                     12




                                                                             6
      Building a Useful Data Warehouse
      Building a Useful Data Warehouse
• You really need:
   – strong executive sponsorship
   – good knowledge of the data
   – sound software engineering
   – stability from source systems
   – users who want a success


• A 75 percent failure rate is often cited

• It is WORTH the effort!!!



                                                     13




         Enterprise Information System
         Enterprise Information System
• An EIS (Enterprise Information System) allows
  users to query data in a data warehouse

• Users can access key, corporate data in the data
  warehouse
                           Enterprise
                          Information
                            System




                    Data Warehouse

                                                     14




                                                          7
Users of an Enterprise Information System
Users of an Enterprise Information System
• Frequently, multiple EIS are needed to satisfy
  different types of users
   – Some users only want a system that has pre-defined reports so they
     only need to “click one button” to see data they need. These users
     want the system to be no harder to use than a “coffee pot”
   – Other users want to delve into the data and build their own queries


• Executives want a high-level, summary
  data and a simple tool
   – Must be VERY easy to use, users want to click a few
      buttons and get data they want
   – Results must be graphs
   – Users should be able to drill-down into key areas.



                                                                      15




Users of an Enterprise Information System
Users of an Enterprise Information System
• Analysts want a flexible, more detailed tool
   – Often very knowledgeable about the data
   – Willing to do more work to learn about the data
   – Sometimes even learn SQL to issue their own
      ad-hoc queries


• General users want a tool that provides detailed
  data, but is very easy to use
   – Want access to the data warehouse to do
      routine tasks such as
      “Find me Hank’s phone number”, etc.
   – Simple application, but not so focused
     on large reports



                                                                      16




                                                                           8
                          Data Warehouse // EIS
                          Data Warehouse EIS
                                                          Inventory
                                                            OLTP
             Finance                   Inventory            Data
              OLTP                                                      Sales
                                         OLTP                           OLTP
            Application               Application                     Application




Finance                                             Enterprise            S a lle s
                                                                          Sa es
 OLTP                                              Information            OLTP
                                                                          OLTP
 Data                                                                     Data
                                                     System               Data




                                    Data Warehouse

                          Finance      Inventory          Sales
                          Subject       Subject          Subject
                           Area          Area              Area



                                                                                      17




                    Need for Data Warehouses
                    Need for Data Warehouses

     • Data warehouses provide a single place to store key
       corporate data
          – The idea is that users can go one place to find this key data using an
            enterprise information system (EIS)


     • Data warehouse is also a place to store and access
       historical data
          – Users measure performance goals for their company over a period of
            time
          – Company statistics are available
          – Data not stored in the same place is difficult to locate
            and compare, easily lost
          – Single query can be used to access key data


                                                                                      18




                                                                                           9
             Security in Data Warehouse
             Security in Data Warehouse
• Building a data warehouse does increase security
  risk because key, corporate information is all in
  one place

• To mitigate that risk, database system
  components can be used to protect the data
  warehouse. These include
   –   Views
   –   Access control
   –   Security Administration
   –   Encryption
   –   Audit




                                                     19




   Moving Data into the Data Warehouse
   Moving Data into the Data Warehouse
• Moving data from source OLTP systems to the
  data warehouse is the hard part of data
  warehousing

• Updates to the data warehouse are performed
  periodically
   – weekly
   – nightly
   – monthly


• Occasionally, real-time
      data is needed in a data warehouse, but this
  is not very common


                                                     20




                                                          10
        Using Middleware to Move Data
        Using Middleware to Move Data
• Data can be moved to the warehouse via data
  migration software

• This is often called “middleware” because it sits
  between the source OLTP and the data
  warehouse


                               Data
                            Warehouse
    Source                  Migration                                  Data
     OLTP                    Software                                Warehouse
    System                 “Middleware”




                                                                                      21




                 Need for a Data Mart
                 Need for a Data Mart
• A data mart is a subset of the data warehouse
  that may make it simpler for users to access key
  corporate data
   – Sometimes, users only need a piece of data from the data
     warehouse

• The data mart is typically fed from the data
  warehouse
                                            Data Warehouse

                                 Finance               Inventory             Sales
                                 Subject                Subject             Subject
                                  Area                   Area                Area




                                           New York                California
                                           Data Mart               Data Mart


                                                                                      22




                                                                                           11
                             Data Mart in Action
                             Data Mart in Action
                                                            Inventory
                                                              OLTP
               Finance                   Inventory            Data
                OLTP                                                      Sales
                                           OLTP                           OLTP
              Application               Application                     Application




Finance                                               Enterprise              S a lle s
                                                                              Sa es
 OLTP                                                Information              OLTP
                                                                              OLTP
 Data                                                                         Data
                                                       System                 Data




                                      Data Warehouse

                            Finance      Inventory          Sales
                            Subject       Subject          Subject            California
New York                     Area          Area              Area             Data Mart
Data Mart


                                                                                          23




                       Data Mining Introduction
                       Data Mining Introduction
     • Data Mining is done by running software that
       examines a database and looks for patterns in the
       data

     • A data warehouse by itself will respond to queries
       from users
            – It will not tell users about patterns in data that users may not have
              thought about
            – To find patterns in data, data mining is
                used to try and mine key information from
              a data warehouse




                                                                                          24




                                                                                               12
             Advantages of Data Mining
             Advantages of Data Mining
• Data mining allows companies to collect
  information and make them more productive and
  beat their competition

• Data mining helps identify
   – why customers buy certain products
   –   ideas for very direct marketing
   –   ideas for shelf placement
   –   training of employees vs. employee retention
   –   employee benefits vs. employee retention




                                                      25




             Implementing Data Mining
              Implementing Data Mining
• Apply data mining tools to run data mining
  algorithms against data
• There are two approaches:
   – Copy data from the Data Warehouse and mine it
   – Mine the data in the Data Warehouse

• Popular tools use a variety of different data
  mining algorithms:
   – association rules
   – genetic algorithms
   – decision trees
   – neural networks




                                                      26




                                                           13
       Data Mining using Separate Data
       Data Mining using Separate Data
• You can move data from the data warehouse to
  data mining tools
   – Advantages
      – Data mining tools may organize data so they can run faster
   – Disadvantages
      – Could be very expensive to move large
        amounts of data
                                                         Data Warehouse




                                      Data Mining Tool
          Copy of data made
                by the
           Data Mining Tool


                                                                     27




 Data Mining Against the Data Warehouse
 Data Mining Against the Data Warehouse

• Data mining tools can access data directly in the
  Data Warehouse
   – Advantages
      – No copy of data is needed for data mining
   – Disadvantages
      – Data may not be organized in a way that is
        efficient for the tool
                                                     Data Warehouse




                         Data Mining Tool




                                                                     28




                                                                          14
                Data Mining: Summary
                Data Mining: Summary
• Data mining attempts to find patterns in data that
  we did not know about
• Often data mining is just a new buzzword for
  statistics
• Data mining differs from statistics in that large
  volumes of data are used
• Many different data mining algorithms exist and
  we will discuss them in the course
• Examples
   – identify users who are most likely to commit credit card fraud
   – identify what attributes about a person most results in them buying
     product x.
                                                                              29




           SQL Review

                                               (slides in this section
                                               are used courtesy of
                                               Carrig Emerging Technology
                                               Ph: 410- 553- 6760
                                               www.c a r r i g e t. c o m )


                                                                              30




                                                                                   15
                   Introduction to SQL
                    Introduction to SQL



  1) Introduction to SQL
  2) Data Definition Language (DDL)
  3) Data Manipulation Language (DML)
  4) SELECT Construct
  5) SELECT Operators
  6) Wildcard Searches
  7) Aggregate Operators
  8) Calculated Attributes
  9) Sorting Results


                                                             31




Introduction to Structured Query Language
 Introduction to Structured Query Language

• Structured Query Language (SQL) is the language
  used to communicate with a relational database
   – Industry standard
   – Based on set theory


• SQL composed of two types of constructs:
   – Data Definition Language (DDL)
      – Defines the structure of the database
   – Data Manipulation Language (DML)
      – Provides the constructs to input and retrieve data




                                                             32




                                                                  16
                 SQL Overview -- DDL
                 SQL Overview DDL
• Data Definition Language (DDL) is used to
  describe the structure of the database
   – Create tables, indexes, etc.
   – Typical Operations are:
      – CREATE TABLE defines what columns are in the table and
        establishes the table
      – CREATE INDEX defines an index for the table. Indexes are used
        to improve database performance




                                                                         33




                 SQL Overview -- DML
                 SQL Overview DML
• Data Manipulation Language (DML) is used for
  storing, updating, and retrieving data.

• Typical operations include:
   – SELECT is used to retrieve data.
      – Ex: SELECT * FROM PRODUCTS
   – INSERT is used to add new rows to the database.
      – INSERT INTO PRODUCTS VALUES ('food',
        'hardware', 'housewares')
   – UPDATE is used to change rows that already exist in the database.
      – UPDATE PRODUCTS SET PRICE = PRICE + 4
   – DELETE is used to eliminate rows of data from the database.
      – DELETE FROM PRODUCTS


                                                                         34




                                                                              17
                   SELECT Overview
                   SELECT Overview
• SELECT is used to retrieve records from the
  database.

• Single table SELECT constructs:
   –   WHERE
   –   IN
   –   BETWEEN
   –   LIKE
   – Aggregate Operators
   – DISTINCT
   – ORDER BY




                                                    35




                 SELECT Examples
                 SELECT Examples
• Query Purpose: Retrieve names and prices of all
  products

SELECT ProductName, Price
 FROM TinyProducts

• Query Purpose: Retrieve all information for all
  employees from the TinyProducts table

SELECT *
 FROM TinyProducts


                                                    36




                                                         18
               SELECT with WHERE
               SELECT with WHERE
• The WHERE clause is used to filter which
  information is returned from a SELECT

• Query Purpose: Retrieve all information only for
  product type of “food”

SELECT *
 FROM TinyProducts
 WHERE ProductType = ‘Food’




                                                     37




          Use of Boolean Operators
          Use of Boolean Operators
• Conditions can be separated by Boolean
  operators:
   – AND, OR, NOT


• Query Purpose: List all information about food
  products that are either cereal or fruit

SELECT *
FROM TinyProducts
WHERE (ProductName = 'Cereal')
OR (ProductName = 'Fruit')



                                                     38




                                                          19
            Boolean Operator Example
            Boolean Operator Example
• Query Purpose: List the names of all products
  that the type is fruit and the price is less than
  $2.00

SELECT ProductType, ProductName
   FROM TinyProducts
 WHERE Price < 2
   AND ProductName = 'Fruit'




                                                                       39




                         IN Operator
                         IN Operator
• The IN operator allows a search for records that
  match one value in a set of unordered values

• Example questions to use IN:
   – 'Find all products whose type is Food, Hardware, or Housewares'
   – 'Find all food whose type is Meat, Fish, Vegetables, or Fruit'




                                                                       40




                                                                            20
                         IN Example
                         IN Example
• Query Purpose: List the name of Housewares that
  are Cookware, Linens, or Dishes

SELECT ProductName, ProductType
   FROM TinyProducts
  WHERE ProductName in
   ('Cookware', 'Linens', 'Dishes')

instead of:

SELECT ProductName, ProductType
   FROM TinyProducts
 WHERE (ProductName = ’Cookware')
     OR (ProductName = 'Linens')
      OR (ProductName = 'Dishes')                                     41




                    BETWEEN Operator
                    BETWEEN Operator
• The BETWEEN operator allows a search for a range
  of values

• Example Queries:
   – 'Find all fruit between Bananas and Grapes'
   – 'Find all cereals whose price is between $1.50 and $4.00 a box




           1.50                                     4.00




                                                                      42




                                                                           21
                    BETWEEN Example
                    BETWEEN Example
• Query Purpose: Find all products whose price is
  between $2.00 and $8.00

SELECT ProductName, Price
   FROM TinyProducts
  WHERE Price BETWEEN 2.00 AND 8.00

instead of:

SELECT ProductName, Hardware
   FROM TinyProducts
  WHERE (Price >= 2.00) OR (Price <= 8.00)

                                                    43




           Wildcard Searches of Strings
           Wildcard Searches of Strings
• The LIKE operator is used to search parts of a
  string

• The following wildcard characters are used:
   % to match any zero or more characters
   _ to match exactly one character




                                                    44




                                                         22
             Wildcard Search Examples
             Wildcard Search Examples

• Query Purpose: List all products whose name
  starts with an ’C'
SELECT *
   FROM TinyProducts
  WHERE ProductName LIKE 'C%'

• Query Purpose: List all products that have a SKU
  number with the last 2 characters of ’23' when
  you don't know the first character

SELECT *
   FROM TinyProducts
  WHERE SKUNumber LIKE '_23'
                                                                          45




                  Aggregate Operators
                  Aggregate Operators
• MIN, MAX, and AVERAGE are used when computing
  statistics on a range of data

• Query Examples:
   – 'What is the highest batting average on the team?'
   – 'What is the average number of hits for all the little league teams in
     the National League?'
   – 'What are the names of the players that had the lowest average on
     the little league team?'




                                                                          46




                                                                               23
        Aggregate Operators Example
        Aggregate Operators Example

• Query Purpose: Find the minimum, maximum,
  and average batting average of all players in the
  National League of Little League

SELECT MIN(Average), MAX(Average),
       AVG(Average)
   FROM PLAYERS
  WHERE League = 'National'




                                                      47




           SUM and COUNT Operators
           SUM and COUNT Operators
• Use the SUM operator to total the results of a
  query

• COUNT will count the total number of occurrences
  of an item in a search




       1+2+3+4
                                                      48




                                                           24
                SUM And COUNT Examples
                SUM And COUNT Examples
•   Query Purpose: Find the total number of
    homeruns hit by all players in the American
    League?
    SELECT SUM(HomeRuns)
    FROM PLAYERS
    WHERE League='American'

•   Query Purpose: List the names of players that
    have hit 3 home runs in the National League?

    SELECT COUNT(*)
    FROM PLAYERS
    WHERE HomeRuns = '3'
    AND League = 'National'
                                                                     49




                  Calculated Attributes
                  Calculated Attributes
• A new attribute can be obtained by using
  arithmetic operators (+,-, *, /) on other
  numeric attributes

• All operators follow standard precedence:
    – Multiplication and division are computed first left to right
    – Addition and subtraction are computed last left to right
    – Use parenthesis to override the standard precedence




(+,-, *, /)
                                                                     50




                                                                          25
        Calculated Attributes Example
        Calculated Attributes Example
Query Purpose: List all players with their hits, at
bats, and their batting average

SELECT Name, Hits, AtBats,
     (Hits / AtBats)
FROM PLAYERS




                                                      51




               DISTINCT Operator
               DISTINCT Operator
• DISTINCT is used to exclude duplicate
  occurrences in the result of a query

• Query Purpose: List all distinct batting averages

  SELECT DISTINCT(Average)
    FROM PLAYERS




                                                      52




                                                           26
            Sorting Query Results
            Sorting Query Results
• The ORDER BY clause is used at the end of the
  SELECT statement to sort the results of a query

• Use DESC on the end of the ORDER BY clause to
  sort the data in descending order. Otherwise, the
  result will be in ascending order




                                                      53




                Sorting Example
                Sorting Example
• Query Purpose: List all players in ascending
  order of their batting average

  SELECT Name, Average
  FROM PLAYERS
  ORDER BY Average

• For descending order add the keyword DESC

  SELECT Name, Average
  FROM PLAYERS
  ORDER BY Name DESC
                                                      54




                                                           27
        Sorting Calculated Attributes
        Sorting Calculated Attributes
• To refer to a computed attribute in the ORDER BY,
  use its position in the list of columns following
  SELECT

• Query Purpose: List all players in descending
  order of their batting average (here we assume
  batting average is computed at the time of the
  query)

  SELECT Name, Hits, AtBats,
         Hits / AtBats
  FROM PLAYERS
  ORDER BY 3 DESC
                                                   55




                    More SQL
                    More SQL


   1) GROUP BY Construct
   2) HAVING Filter
   3) Multiple Tables
   4) Joins
   5) Equijoins
   6) Cartesian Product
   7) Nulls
   8) OUTER JOIN

                                                   56




                                                        28
                  GROUP BY Clause
                  GROUP BY Clause
• GROUP BY will partition a table into multiple
  groups of related rows.

• As an example, consider the EMPLOYEE table
  where Department partitions the EMPLOYEE set
  into subsets:

                     Engineering

      Marketing                        Customer

                       Finance


                                                  57




               GROUP BY Example
               GROUP BY Example
• Query Purpose: For each department, list the
  average salary using the EMPLOYEE table

SELECT Department, AVG(Salary)
  FROM EMPLOYEE
  GROUP BY Department




                                                  58




                                                       29
           GROUP BYWith WHERE
           GROUP BY With WHERE
            GROUP BY With WHERE
             GROUP BY     WHERE
• To filter data further, we can use the WHERE
  clause with GROUP BY clause

Query Purpose: For each department, list the
  highest salary of their administrative assistants.

  SELECT Department, MAX(Salary)
  FROM EMPLOYEE
  WHERE Title='administrative assistant'
  GROUP BY Department



                                                       59




               HAVING Construct
               HAVING Construct
• HAVING is used to restrict the output of aggregate
  functions, such as SUM, MIN, MAX and AVG, to only
  those groups of rows that meet some condition.

Query Purpose: List the average salary for all
  departments that have more than three
  employees.

  SELECT Department, AVG(Salary)
     FROM EMPLOYEE
  GROUP BY Department
  HAVING COUNT(*) > 3


                                                       60




                                                            30
                    Multi-Table SQL
                    Multi-Table SQL
    • It is often necessary to combine data into multiple
      tables.

   EMPLOYEE                          ATTENDS
EmpID Name Salary              EmpID     Name
1         Fred    200          1           Harvard
2         Ethel 300            2           GMU
3         Mike    400          2           Yale
4          David 100           3           MIT
                               3           Stanford
                               3           GMU

                                                         61




                           Joins
                           Joins
    • Joins are the means by which multiple tables can
      be combined.

    • A join allows us to combine data from different
      tables. A join operation is done through the
      SELECT construct.

    • Types of Joins: Equijoin, Outer Join, Inner Join




                                                         62




                                                              31
                    Equijoin
                    Equijoin
• Joins only those rows where a foreign key
  matches the primary key

• Allows information from multiple tables to be
  linked together in a single query

• Can be used to link as many tables as needed in a
  single query




                                                  63




           Equijoin Query Example
           Equijoin Query Example

• Query Purpose: List the names of all colleges
  attended by Ethel

SELECT b.Name
 FROM EMPLOYEE a, ATTENDS b
 WHERE a.EmpID = b.EmpID
   AND a.Name = 'Ethel'




                                                  64




                                                       32
                      Equijoin Example
                      Equijoin Example

       EMPLOYEE
 EmpID     Name    Salary

 1         Fred    200
 2         Ethel   300
 3         Mike    400



                            ATTENDS
             EmpID             College          GPA

             1                 Harvard          2.45
             2                 GMU              3.79
             2                 Nova             3.65
             3                 Yale             2.85
             3                 Nova             2.65
             3                 GMU              4.0



                                                                         65




            Warning about Joining Tables
            Warning about Joining Tables
• A join is really just a subset of a cartesian
  product. When no fields are 'joined' in the WHERE
  clause, a cartesian product is produced
     – Restated in English: When the linking condition is omitted from
       the WHERE clause, you get a lot of excess garbage that you
       probably do not want.

Sample Query:

     SELECT b.Name
     FROM EMPLOYEE a, ATTENDS b
     WHERE a.Name = 'Ethel'


                                                                         66




                                                                              33
               Cartesian Product
               Cartesian Product
• Each row in one table with every other row in other
  table
a.EmpID a.Name a.Salary         b.EmpID     b.GPA

2        Ethel     300          1           3.4
2        Ethel     300          2           2.8
2        Ethel     300          3           3.7
2        Ethel     300          4           3.5
         ....




                                                    67




                       Nulls
                       Nulls
• An attribute may be defined as null.

• This indicates that the value is unknown and
  avoids the need for user-defined special
  indicators.

• To prevent a column from having nulls, specify
  NOT NULL on the column in the CREATE TABLE
  statement when setting up the database.




                                                    68




                                                         34
                 Nulls Examples
                 Nulls Examples
Statement Purpose: Add an employee whose salary
  is unknown

INSERT INTO EMPLOYEE (3,'Hank', NULL)

Query Purpose: Find all employees whose salary is
  unknown (or null)

SELECT *
 FROM EMPLOYEE
 WHERE Salary IS NULL


                                                      69




                    OUTER JOIN
                    OUTER JOIN
• An OUTER JOIN is used when the query should
  return a result row even for rows that do not have
  corresponding data in one of the tables.

• A LEFT OUTER JOIN returns all rows from the
  'left' table.

• Nulls are returned when a row in the 'left' table
  has no corresponding rows in the right table.




                                                      70




                                                           35
            LEFT OUTER JOIN Example
            LEFT OUTER JOIN Example
• Query Purpose: List the college GPAs for each
  employee. Include employees who have not
  attended any colleges

SELECT a.Name, b.GPA
 FROM EMPLOYEE a
 LEFT OUTER JOIN ATTENDS b
 on a.EmpID = b.EmpID




                                                                      71




           LEFT OUTER JOIN Example
           LEFT OUTER JOIN Example
• Result of the outer join
   – All employees are listed.
   – For an equijoin, only those who attended a college would be listed
   – Here, employee number 4 did not attend college, but is still
     retrieved by the outer join.

          Name                   GPA

              ---------- -----
                Fred       2.45
                Ethel      3.79
                Ethel      3.65
                Mike       2.85
                Mike       2.65
                Mike       4.00
                David      NULL                                       72




                                                                           36
  Advanced SQL

                        (slides in this section
                        are used courtesy of
                        Carrig Emerging Technology
                        Ph: 410- 553- 6760
                        www.c a r r i g e t. c o m )

                                                       73




            Advanced SQL
            Advanced SQL



1) Finding the nth element in a list
2) Finding the median
3) Correlated subquery
4) Data Definition Language Constructs




                                                       74




                                                            37
                 Find the Nth Element
                 Find the Nth Element
• It is very common to try to find the nth element in
  a list.
   – Examples:
       – Who makes the second highest salary in marketing department?
       – What is the fifth best product in sales?
   – This can be done with a program that uses SQL to access the
     database: SQL is sent to the database and the program keeps
     retrieving the result set until the threshold is crossed.

• We show another way of
  doing this using standard SQL.




                                                                        75




   Find the Nth Element: Example Table
   Find the Nth Element: Example Table
• Consider a table, called TEST, with just one
  column, x, with the following values:

                                    X
                                    4
                                    5
                                    8




                                                                        76




                                                                             38
         Find the Nth Element: Step 1
         Find the Nth Element: Step 1
• First join TEST with itself, this yields each
  element matched with every other element:

                   4                   4
                   4                   5
                   4                   8
                   5                   4
                   5                   5
                   5                   8
                   8                   4
                   8                   5
                   8                   8
                                                                               77




         Find the Nth Element: Step 2
         Find the Nth Element: Step 2
• Next keep only those rows where the first column
  is greater than or equal the second column.
4          4                       4                    4
4          5                       5                    4
4          8                       5                    5
5          4                       8                    4
5          5                       8                    5
5          8                       8                    8
8          4
8          5       Notice the pattern that just developed, each number on the
                   list now has a certain number of values that match on the
8          8       right. This number matches the position of this value in
                   the list. For example, 4 has only one match as it is the first
                   number in the list, 5 has two matches, 8 has three matches.

                                                                               78




                                                                                    39
        Find the Nth Element: Step 3
        Find the Nth Element: Step 3
• Now group by the column on the left and identify
  the size of each group.

• The same ideas can be applied to any SELECT
  statement output.

         4          4        4          1
         5          4        5          2
         5          5        8          3
         8          4
         8          5
         8          8

                                                     79




     Finding the Nth Element: Example
     Finding the Nth Element: Example
• Query Purpose: Find the information about the
  product with the second highest price.

SELECT a.ProductName, a.ProductType,
       a.Price, a.SKUNumber
  FROM TinyProducts a, TinyProducts b
 WHERE a.Price >= b.Price
 GROUP BY a.ProductName,a.ProductType,
          a.Price, a.SKUNumber
HAVING COUNT(*) =
      (SELECT COUNT(*)-1 FROM TinyProducts)




                                                     80




                                                          40
    Finding the Top N Elements: Example
    Finding the Top N Elements: Example
• To ask for the top n values instead of the nth
  value, specify a range (>=) instead of just an
  equality (=) in the HAVING.

• Query Purpose: Find information about the
  products with the two highest prices.

  SELECT a.ProductName, a.ProductType, a.Price, a.SKUNumber
       FROM TinyProducts a, TinyProducts b
     WHERE a.Price >= b.Price
        GROUP BY a.ProductName,a.ProductType,
                a.Price, a.SKUNumber
    HAVING COUNT(*) >=
        (SELECT COUNT(*)-1 FROM TinyProducts)
      ORDER BY a.Price


                                                                81




                   Finding the Median
                   Finding the Median
• The median is defined as the element in the
  middle of the list.

• Query Purpose: Find the median price in
  TinyProducts.

SELECT   a.ProductName, a.ProductType, a.Price, a.SKUNumber
  FROM   TinyProducts a, TinyProducts b
 WHERE   a.Price >= b.Price
 GROUP   BY a.ProductName,a.ProductType, a.Price, a.SKUNumber
HAVING   COUNT(*) = (SELECT (COUNT(*)/2)+1 FROM TinyProducts)




                                                                82




                                                                     41
                       Using Subqueries
                       Using Subqueries
• A subquery may be used in the middle of a query.

• Query Purpose: Find the information about the
  highest priced product, using a simple subquery.

SELECT a.ProductName, a.ProductType, a.Price, a.SKUNumber
  FROM TinyProducts a
 WHERE Price = (SELECT MAX(PRICE) FROM TinyProducts)




                                                                         83




                    Correlated Subquery
                    Correlated Subquery
• If the subquery references a data element from
  outside of the subquery, it is called a correlated
  subquery.
    – For each row in the outer part of the query, the correlated subquery
      is executed.

The following query will indicate who makes more money than ‘Ethel’
SELECT a.Name, a.Salary
  FROM Employee a WHERE EXISTS
      (SELECT     b.Salary
         FROM     Employee b
        WHERE     a.Salary > b.Salary
          AND     b.Name = 'Ethel')



                                                                         84




                                                                              42
            Other Data Manipulation
            Other Data Manipulation

• INSERT
   – Add rows to a single table


• UPDATE
   – Modify rows in a single table


• DELETE
   – Remove rows from a single table




                                                    85




                  INSERT Examples
                  INSERT Examples
• Statement Purpose: Add a record for employee
  #1, ’Fred' with a salary of 200 to the EMPLOYEE
  table

INSERT INTO Employee VALUES
    (1, ’Fred', 200)

• Statement Purpose: Copy all rows in the
  EMPLOYEE table and place them in NEW_EMPLOYEE

INSERT INTO New_Employee
 SELECT * FROM Employee


                                                    86




                                                         43
                UPDATE Example
                UPDATE Example
• Statement Purpose: Modify Fred’s salary to 150

UPDATE Employee
  SET Salary = 150.00
  WHERE Name = 'Fred'

• Statement Purpose: Give all employees a ten
  percent raise

UPDATE Employee
  SET Salary = Salary * 1.10


                                                   87




               DELETE Examples
               DELETE Examples
• Statement Purpose: Remove all employees who
  have a salary higher than 100.

DELETE FROM Employee
  WHERE Salary > 100

• To remove all employees:

DELETE FROM Employee




                                                   88




                                                        44
            CREATE TABLE Example
            CREATE TABLE Example
• Statement Purpose: Create a table to store
  employee information

  CREATE TABLE EMPLOYEE
       (EmpId       SMALLINT,
        Name        CHAR(10),
        Salary      DECIMAL(5,2))

To drop the EMPLOYEE table

      DROP TABLE EMPLOYEE


                                                               89




   Data Warehouse
       Security

                                (slides in this section
                                are used courtesy of
                                Carrig Emerging Technology
                                Ph: 410- 553- 6760
                                www.c a r r i g e t. c o m )

                                                               90




                                                                    45
              Data Warehouse Security
              Data Warehouse Security


   1) Key Security Services
   2) Views
   3) Access Control
   4) Roles
   5) Encryption
   6) Audit Trails
   7) Security Holes
   8) Intrusion Detection
   9) Misuse Detection

                                                                         91




                        Introduction
                         Introduction
• A key feature provided by database systems is
  good security services.
   – In a database system with good security, applications do not have
     to worry about problems that arise with security violations.


• A data warehouse also requires good security
  services because it holds key, corporate data.

                                         Database System


           EIS
                                            Security
                                            Services



                                                                         92




                                                                              46
                 Key Security Services
                 Key Security Services
• Access Control
   – Controls who accesses what data


• Administration of Access Control
   – Used to give access to users as well as track who has various
     accesses and what kind of accesses are given to a user or group of
     users
   – Audit tracks the usage of the data warehouse




                                                                          93




          Security in a Data Warehouse
          Security in a Data Warehouse
• A data warehouse consolidates organizations key
  data in one place.
   – A data warehouse increases the security risk that unauthorized
     users will try to obtain this data


• Security aspects of EIS applications must be
  designed and implemented very thoroughly.

• Access control and audits are
     two of the critical components
     of security.




                                                                          94




                                                                               47
   Data Warehouse Security Components
   Data Warehouse Security Components
• Database system components that can be used
  to protect a data warehouse include:
   – Views
      – Allow users to only see certain rows or columns of data
   – Access control
      – Indicate which users have access to what data
   – Administration
      – This component is used to actually give access to groups of users
        and to define the accesses given to either an individual or a group.
      – Encryption
      – Protect data from access outside of the DBMS
   – Audit
      – Track what users are doing



                                                                           95




               Views in Data Warehouse
               Views in Data Warehouse
• A view is a logical view into one or more tables.
  Users may be given access to the view without
  access to the base table.

• Views provide some security assistance because
  they can hide data from users.
        EMPLOYEE
        Name        Address                        Salary
          Hank       1 South Street                 $50,000
          Esther     2 North Street                 $80,000
          Tom        34 Main Street                 $90,000
          Sue        45 Easy Street                 $28,500
          Dave       56 5th Avenue                  $35,000
          Pete       7 Broadway                     $60,000
          Kathy      89 Western Avenue              $85,000




                                                                           96




                                                                                48
                            View Example
                            View Example
• A view called SAFE_EMPLOYEE may be created
  as:
       CREATE VIEW SAFE_EMPLOYEE AS
          (SELECT name, address FROM EMPLOYEE)

   Now users of the view SAFE_EMPLOYEE will not
   even know that salary exists.

SAFE_EMPLOYEE
Name        Address                      Salary
 Hank        1 South Street
 Esther      2 North Street
 Tom         34 Main Street                                VIEW (SAFE_EMPLOYEE)
 Sue         45 Easy Street
 Dave                                                “Salary” is effectively hidden
             56 5th Avenue
 Pete        7 Broadway
 Kathy       89 Western Avenue




                                                                            97




                           Updating Views
                           Updating Views
• Restrictions exist on updating views. For the
  EMPLOYEE table, it is possible to insert into the
  SAFE_EMPLOYEE view.
       – Example :
       INSERT INTO SAFE_EMPLOYEE VALUES (‘Hank’, 300)

       This will insert a NULL into the SALARY column of the base table
         EMPLOYEE.


• Other restrictions to view updates exist:
       – Cannot update a view that is defined with an aggregate
       – Cannot update a view that is defined with a GROUP BY



                                                                            98




                                                                                      49
         Data Warehouse Access Control
         Data Warehouse Access Control
• Access control is implemented in a data
  warehouse with the SQL Grant and Revoke
  commands.
• Syntax
   – GRANT <ALL|UPDATE|DELETE|INSERT|SELECT> ON
     <object-name> TO <user name>
   – Example: GRANT SELECT ON EMPLOYEE TO MARY

• Access control is done by DBAs and creators of
  tables.

• To remove access the REVOKE command is used.
   – Example: REVOKE SELECT ON EMPLOYEE FROM MARY

                                                                           99




                       Database Roles
                       Database Roles
• Roles provide security administration by allowing
  users to be grouped into roles. Accesses may
  then be given to a group of users.
   – As an example, some roles for a company might be:
      – Administrative assistant
      – Loan officer
      – Salesperson

• Accesses may be assigned based on roles.
   – This dramatically simplifies administration.
   – If new tables are created, it is not necessary to add thousands of
     new accesses.
   – Examples:
       CREATE ROLE loan_officer AS (Hank, John, Mike)
       GRANT SELECT ON LOAN TO LOAN_OFFICER


                                                                          100




                                                                                50
     Example of Application-based Roles
     Example of Application-based Roles
• Consider:

                                  Database
 Users            Applications                 Data
                                  System



• If the database system controls accesses than it
  does not matter what the application does,
  accesses are controlled consistently (same for
  SALES as MARKETING)

• However, more fine-grained access control can
  be granted in the application.


                                                      101




                   Application Roles
                   Application Roles
• The application can restrict:
   – Data entry screens
   – Reports


• Care must be taken to restrict users in a
  consistent fashion so that a user cannot jump to
  a different application and avoid security set up
  by another application.




                                                      102




                                                            51
  Role Based Security in a Data Warehouse
  Role Based Security in a Data Warehouse
• Both application and database level security are
  useful in a data warehouse.

• Database level security is needed so that users
  are only allowed to see data they
      need to see.

• Application level security can
     be used to control access to
     certain menus so that users do
     not even know what reports exist.




                                                                        103




                          Encryption
                          Encryption
• Encryption is the process of coding data so that it
  can only be read by users who have the key that
  allows them to decrypt the data.
   – Example: A message “sell 500 shares” would appear as “xyzzy”
     without the key. Once the key is paired with the encrypted string
     “xyzzy”, it can then be decrypted.
   – The size of the key is a factor in how difficult it is to attack the
     encryption scheme.

• Three places where encryption might be used in a
  data warehouse:
   – Network
   – Data
   – Tape backups


                                                                        104




                                                                              52
                  Network Encryption
                  Network Encryption
• In a data warehouse application, data and queries
  are transmitted through a network.
   – Attackers might be able to steal network traffic just by breaking
     into the network medium.

• One way to reduce the risk of this threat is to
  encrypt traffic on the network.
                                                      User



                                         Network




                                                               Data
                                                             Warehouse
 Application                    Database
                                 System
                                                             Tape Backup

                                                                           105




                  Network Encryption
                  Network Encryption
• Network encryption is critical because the
  network connects all of the key components in a
  data warehouse.

• Encrypting network traffic mitigates the risk that
  an attacker could succeed with the “man in the
  middle” attack.

• Without this, it may be possible for the
  “man in the middle” to masquerade as
  another user and circumvent existing
  application and database security.



                                                                           106




                                                                                 53
                Data Encryption
                Data Encryption
• Data encryption refers to encrypting the actual
  data in the data warehouse.

• If the attackers were to retrieve data from
  the warehouse, they would have to
  decrypt it in order to read it.




     EIS              Database            Data
                       System           Warehouse




                                                          107




               Backup Encryption
               Backup Encryption
• Periodically, databases are copied to some kind
  of long-term storage (usually tapes).

• If the database is encrypted, but the tapes are not
  encrypted, the risk exists of someone walking off
  with the tapes.


      EIS             Database            Data
                      System            Warehouse




                                            Tape Backup




                                                          108




                                                                54
                          Audit Trails
                          Audit Trails
• Audit trails are a means of tracking queries,
  updates, deletes, and additions of new data to the
  data warehouse.
   – Audit trails are turned on when the DBMS is started and all
     activity that uses the data warehouse is tracked in the audit trail.

• If a user is suspected of an evil deed, the audit
  trail can be examined to identify what data has
  been accessed by users.




                                                                            109




              Details of DW Audit Trails
              Details of DW Audit Trails
• An audit trail of a database system typically
  includes the following information:
   – User ID, Date, Time, Object that has been accessed (table or view),
     Action that accessed the object (INSERT, UPDATE, DELETE,
     SELECT)
   – For UPDATE, the old value and new value is tracked.

• For data warehouses, the SELECT is often used
  to track the queries that have
       been run against the warehouse.




                                                                            110




                                                                                  55
         Other Uses for DW Audit Trails
         Other Uses for DW Audit Trails
• Audit trails can be used to identify the most
  popular data in the warehouse.
   – This information can be used to optimize queries

• An additional use for audit trails is performance
  tuning of the data warehouse.
   – Administrators know where to focus their efforts
   – Reduces administrative overhead




                                                                       111




     Dealing with Known Security Holes
     Dealing with Known Security Holes
• Commercial database systems and operating
  systems are often filled with holes that allow
  users to obtain unauthorized access.
   – To reduce the risk of these known holes, vendors often provide
     “fixes” to their products as soon as these holes become public.


• It is important to constantly keep up with known
  security holes and apply the latest fixes as soon
  as they are released.

• One of the key risks surrounding a data
  warehouse is that privileged users have the “keys
  to the kingdom”.

                                                                       112




                                                                             56
          The Risk of “Privileged Users”
          The Risk of “Privileged Users”
• "Privileged users" include:
   – Data warehouse administrators
   – Operating system programmers
   – Operators in the computer center

   – These users can:
      – Modify, delete and query any data in the warehouse
      – Modify the audit trail to mask their actions
      – Give other users unauthorized access


• Numbers of "privileged users" could
     be anywhere from 20 to 30 in some
  organizations.


                                                                        113




   Reducing the Risk of Privileged Users
   Reducing the Risk of Privileged Users
• One way to reduce the risk of privileged users is
  to separate security administration from database
  administration.
   – This would separate the task of giving accesses and managing the
     audit trail from the task of making sure the data in the warehouse
     was correct and properly optimized.




     Security Services
        Access Control                  Security             Database
            Audit                       Services             Services

     Database Services                                   Database Tuning
                                      Access Control
      Database Tuning                                   Query Optimization
                                          Audit
     Query Optimization                                      Backups
          Backups



                                                                        114




                                                                              57
           Information Security Attacks
            Information Security Attacks
• Two types of Information security attacks on data
  warehouses are:
   – Intrusion
       – An intrusion occurs when an unauthorized user gains access to the
         data warehouse.
   – Misuse
       – Misuse, often referred to as the insider
                problem occurs when a user who has access
                 to the warehouse uses that access for an
                unauthorized purpose


• Audit Trails can be used to
      identify either type of attack,   but
  identification of misuse is typically MUCH harder
  to do than intrusion.
                                                                        115




                   Intrusion Detection
                    Intrusion Detection
• An intrusion is defined as an unauthorized
  access to a system. The assumption is the user is
  external to the environment (e.g.; a hacker).
• To reduce the risk of intrusion, intrusion
  detection tools are used.
   – These tools monitor access to the data warehouse and sound an
     alarm if unauthorized accesses are detected.

                           INTRUSION DETECTION SYSTEM



                            DATA
    USER                  WAREHOUSE




                                                                        116




                                                                              58
                    Misuse Detection
                    Misuse Detection
• Unwanted access by a user that has the ability to
  access data is referred to as misuse.
   – This is also known as the insider problem.
   – Some estimates have shown that 80 % of computer crime is a
     result of misuse.

• For data warehouses the threat of misuse is high
  especially by privileged users.




                                                                  117




                          Summary
                          Summary
• DBMS Security is useful for data warehouses to
  hide data from users with views and to restrict
  access to data with GRANT and REVOKE.

• Application Level Security assists EIS that
  access data warehouses by hiding certain reports
  from users.
• Encryption can be used to further protect against
  the risk of someone walking off with the data
  warehouse.
• Audit Trails are useful for:
   – Catching attackers
   – Identifying usage trends of the data warehouse
                                                                  118




                                                                        59
 Moving Data
     to the
Data Warehouse
                        (slides in this section
                        are used courtesy of
                        Carrig Emerging Technology
                        Ph: 410- 553- 6760
                        www.c a r r i g e t. c o m )

                                                       119




 Moving Data to the Data Warehouse
 Moving Data to the Data Warehouse


1) Moving Data into the Data Warehouse
2) Updating the Data Warehouse
3) Full Refresh
4) Copy Only the Changes
5) BCP
6) Simple Transformations
7) Complex Transformations
8) Commercial ETL Tools

                                                       120




                                                             60
   Moving Data into the Data Warehouse
   Moving Data into the Data Warehouse
• Data must be moved to the data warehouse from
  source systems.

• Some key issues:
   – Determine the frequency of data updates -- how often should data
     be moved from source systems to the data warehouse.
   – Various means of updating data in the warehouse exist:
      – SQL Commands
      – Database system load programs (e.g.; SQL Server’s BCP)
      – Commercial tools




                                                                    121




          Updating the Data Warehouse
          Updating the Data Warehouse
• OLTP (On-Line Transaction Processing) Systems
  have to send their updates to the data warehouse.


  Finance                  Inventory                    Sales
   OLTP                      OLTP                       OLTP
 Application              Application                 Application




                          Data Warehouse

       Finance                 Inventory                Sales
       Subject                  Subject                Subject
        Area                     Area                   Area


                                                                    122




                                                                          61
       Frequency of Updates to the Data
       Frequency of Updates to the Data
                 Warehouse
                 Warehouse
• Updates may occur daily, weekly, monthly, or in
  real-time.

  Finance                   Inventory                      Sales
   OLTP                       OLTP                         OLTP
 Application               Application                   Application




                                                        Mo
                             Wee
      Da




                                                          nth
                                kly
        ily




                                                             ly U
            Up




                                    Upd
              da




                                                                 pda
                te




                                       ate




                                                                    te
                           Data Warehouse

       Finance                      Inventory              Sales
       Subject                       Subject              Subject
        Area                          Area                 Area


                                                                         123




   Determining the Frequency of Updates
   Determining the Frequency of Updates
• Requirements should drive update frequency

• Range of updates runs from real-time, to
  quarterly.
   – Real time update
      – Expensive
      – Requires update of warehouse while users are
        querying
   – Daily update
      – Somewhat cheaper than real time, but significant
        maintenance required if the warehouse has lots of tables.
   – Monthly or weekly update
      – Much more manageable




                                                                         124




                                                                               62
                     Updating the Warehouse
                     Updating the Warehouse
• Full Refresh vs. Only the Changes
                                                                          Sales
                           Inventory                                      OLTP
                             OLTP                                       Application
  Finance                 Application
   OLTP




                                                            F u l l ges for
                                                             cha
 Application




                             Ch




                                                                   n
                                                                   refr
                               an
                                 ges




                                                                        e s h ther ta
                                  sin




                                                                             of s
      Fu




                                                                              o
                                     ce
        ll R




                                                                                  o m e les
                                        las
            efr




                                           tu
               esh




                                                                                       tabl
                                                                                       b
                                             pd
                                               ate




                                                                                            es
                           Data Warehouse

       Finance                   Inventory                           Sales
       Subject                    Subject                           Subject
        Area                       Area                              Area


                                                                                                 125




                          Full Refresh
                          Full Refresh
• Copy the entire source table in the OLTP system
  to the destination table in the Data Warehouse.



     Source OLTP                                     Target Data Warehouse




                                                           Target
          Source                                           Table
           Table




                                                                                                 126




                                                                                                       63
                         Copy Only the Changes
                         Copy Only the Changes
      • Copy only the changes to the source table in the
        OLTP system to the destination table in the data
        warehouse.

                    Source OLTP                             Target Data Warehouse

                     Source
                      Table

                                                                       Target
                                                                       Table



Modified data
since last update
to the warehouse
                                                         Data from two updates ago.
                                                         Historical data no longer in
                                                         source OLTP.
                                                                                        127




                Full Refresh vs. Only the Changes
                Full Refresh vs. Only the Changes
      • Full Refresh
            – Pros
               – Much easier to implement
               – Less chance of messing up your database (good data integrity)
            – Cons
               – Can take a lot longer to actually do -- may “run out of night”
               – Can lose out on warehouse ability to track historical data.


      • Only the Changes (DELTA)
            – Pros
               – Tracks historical data
            – Cons
               – Can be very hard to implement
               – Can require changes in source applications (more on this later)
                                                                                        128




                                                                                              64
    Full Refresh Using INSERT-SELECT
    Full Refresh Using INSERT-SELECT
• One way to move data from one table to another
  is via the INSERT-SELECT.
   – Syntax: INSERT INTO <target_table>
                 <any sql SELECT statement>


• Example:
   INSERT INTO DW_EMPLOYEE
          SELECT *
              FROM EMPLOYEE




                   TARGET

                                                   129




  Updating Changes Using INSERT-SELECT
  Updating Changes Using INSERT-SELECT
• Changes may be moved by adding a WHERE
  clause to the INSERT-SELECT.

• Example:
   – INSERT INTO DW_EMPLOYEE
          SELECT *
              FROM EMPLOYEE
             WHERE DATE-UPDATED =
              DATEPART(m, CURRENT_TIMESTAMP)




                                                   130




                                                         65
                  Updating Using BCP
                  Updating Using BCP
• BCP is the bulk copy program that comes with MS
  SQL Server.
   – Bulk copy (BCP) moves data to or from a flat file to a SQL table.


• Syntax:
  bcp <table> [in | out] <data file>

                                                   Target Data
       Source OLTP                                  Warehouse

                     Unload                Load
         Source                Temporary              Target
                                  Flat
          Table                   File                Table




                                                                     131




                       BCP Example
                       BCP Example
• To bulk copy data from the publishers table in
  the pubs database to the publishers.txt data file
  in ASCII text format, execute from the command
  prompt:
   bcp pubs..publishers out publishers.txt -c
      -Sservername -Usa -Ppassword


• To bulk copy data from the publishers.txt file
  into the pub2 table in the pubs database, execute
  from the command prompt:
   bcp pubs..pub2 in publishers.txt -c
      -Sservername -Usa -Ppassword



                                                                     132




                                                                           66
                       Simple Transformation
                       Simple Transformation
 • In addition to moving data from OLTP to the
   warehouse, it is often necessary to transform
   data.
      – Example: System A stores TOTAL_CLOTH in meters and system
        B stores TOTAL_CLOTH in yards. Before the data is moved from
        system A, we need to transform the data.



                                                                                            Store 32
         Store 31                                                                     (Pattern = 32,
      (Pattern = 31,
                                           TRANSFORMATION                      Total Cloth = 20    meters)
TOTAL_CLOTH = 50    yards )


                                         Data Warehouse
                              P a t t e r n = 3 1 , T o t a l C l o t h = 5 0 yards
                              P a t t e r n = 3 2 , T o t a l C l o t h = 7 0 yards

                                                                                                         133




                       Complex Transformation
                       Complex Transformation
 • More complex transformations occur when a
   value in a source table must be moved to several
   locations in a data warehouse.

                                                                  BLUE3 4 8 4
                                                        (Color = Blue, 34 Inches, LS)



                                                                     34 in                84
                                                                                                 TO
                                                                                              RT
                                                           CONVERT TO                      VE    4
                                                                                        ON DE 8 and
                                                                                       C O
                              BLUE
                                                          CENTIMETERS                     C eves) bles
                                                                                          g sle o ta
                                                                                      (lon in tw
                                                                                        put


                                                        TABLE 2
            TABLE 1                                                                   TABLE 3      TABLE 4
                                                       86.36 cm
                                                                                        Long         Long
             COLOR                     Data Warehouse                                  Sleeves      Sleeves




                                                                                                         134




                                                                                                               67
                 Commercial ETL Tools
                 Commercial ETL Tools
• Key tools in the marketplace
   –   Informatica
   –   Ardent
   –   DecisionBase (Platinum)
   –   Microsoft Data Transformation Services


• All provide libraries of common transformations.

• All provide the ability to
       code complex transformations.




                                                 135




           Data Transformation Services
           Data Transformation Services




                                                 136




                                                       68
  Choose a Source
  Choose a Source




                       137




Choose a Destination
Choose a Destination




                       138




                             69
Choose to use a Query for Transfer
Choose to use a Query for Transfer




                                     139




        Enter SQL Query
        Enter SQL Query




                                     140




                                           70
Choose Destination TableName
Choose Destination TableName




                               141




   Verify Transformation
   Verify Transformation




                               142




                                     71
Decide When to Run Transformation
Decide When to Run Transformation




                                    143




        Final Verification
        Final Verification




                                    144




                                          72
                           Run Transformation
                           Run Transformation




                                                                                                 145




                                    Check Results
                                    Check Results

select *
   from orderfact

orderid   orderdate           productid productname                  quantity unitprice   discount

10248     1996-07-04 00:00:00.000   11   Queso Cabrales                 12    14.0000          0.0
10248     1996-07-04 00:00:00.000   42   Singaporean Hokkien Fried      10     9.8000          0.0
10248     1996-07-04 00:00:00.000   72   Mozzarella di Giovanni          5    34.8000          0.0
10249     1996-07-05 00:00:00.000   14   Tofu                            9    18.6000          0.0
10249     1996-07-05 00:00:00.000   51   Manjimup Dried Apples          40    42.4000           0.0




                                                                                                 146




                                                                                                       73
                     Summary
                     Summary
• ETL is one of the hard parts of building a data
  warehouse.

• Either full refreshes of data or just the changes
  may be done.

• Doing full refresh is easy, but historical data is
  lost and it may take a lot of time.

• Tracking changes is a tough business.

• ETL commercial tools are beginning to mature
  and can lessen the pain of this task.



                                                                  147




    More Ways of
   Moving Data to
         the
   Data Warehouse
                                   (slides in this section
                                   are used courtesy of
                                   Carrig Emerging Technology
                                   Ph: 410- 553- 6760
                                   www.c a r r i g e t. c o m )

                                                                  148




                                                                        74
          More Ways of Moving Data
          More Ways of Moving Data
           to the Data Warehouse
            to the Data Warehouse


  1) Determining What Data Has Changed
  2) Recovery Logs
  3) Triggers
  4) Insert Triggers
  5) Delete Triggers
  6) Update Triggers
  7) Manual Detection



                                                   149




         More Ways of Moving Data
         More Ways of Moving Data
          to the Data Warehouse
           to the Data Warehouse
• There is a need to move data into the data
  warehouse from OLTP and DSS applications

• The problem is detecting what data needs to be
  moved into the data warehouse

• Three methods:
      – Recovery Logs
      – Triggers
      – Manual Techniques




                                                   150




                                                         75
           Determining What Data Has Changed
           Determining What Data Has Changed

   • Problem: How to get updates made to the source
     to the same information in the data warehouse?



       SOURCE                       How to get updates from
                                      Source Table A to           DATA WAREHOUSE




                                                       ?
                                    Data Warehouse Table B
             A
          LE                                                               LE
                                                                              B
       TAB                                                              TAB
                             UP
                               DA
                                 TE
                                   S




                                               P
                                            OLT




                                                                                             151




   Determining What Data Has Changed (cont.)
   Determining What Data Has Changed (cont.)
   • Problem: How to get updates made to multiple
     sources to the same information in the data
     warehouse?

               SOURCE                                             DATA WAREHOUSE

             A
          LE
       TAB                                                       LE
                                                                    A             LE
                                                                                     B
           “ROW X”
                     Employee
                      NAME DEPT.    SALARY
                                                   ?          TAB
                                                              “ROW X”
                                                                               TAB
                                                                                   “ROW X”
  ROWTES
       X




                      Fred Mktg     35000
                                                         EmployeeCount
     A




                      Hank Sales    60000



                                                   ?
  UPD




                                                          DEPT    COUNT     SalaryInfo
                      Sue   IT      71000                 Mktg    1
                                                                            DEPT   AVG SAL TOT SAL
                      Joe   Sales   50000                 Sales   1   2
                                                          IT      1         Mktg 35000      35000
            UPDATES                                       HR      0         IT    71000     71000
                                                                            HR        0         0
   P
OLT
            Insert into
            Employee Values                                                 Sales 60000     60000
            (‘Joe’,’Sales’,’50000)                                            55000      110000



                                                                                             152




                                                                                                     76
              What is the Recovery Log?
              What is the Recovery Log?
• Recovery log is used for transaction processing
   – Used to handle errors
   – Does contain before and after image.


• Recovery log can be used to
       identify the data to be updated
  in the data warehouse.
   – Change Data Capture Utility
      – This scans the database log and identifies all changes that the user
        is interested in and either writes them to a file or stores them in
        another table.




                                                                           153




   Change Data Capture Utility in Action
   Change Data Capture Utility in Action


 SOURCE


  OLTP                                                  DATA
                                                                   All changes
                                                                    to DBMS
                              DBMS
                                                           LOG

                                                     RECOVERY LOG


                                        S
                                      AD
                                    RE
                   CHANGE
                     DATA
                   CAPTURE                                  DATA WAREHOUSE
                    UTILITY            WRITES




                                                                           154




                                                                                 77
        Example of Using Recovery Log
        Example of Using Recovery Log
• Consider an update to the Employee table
   – The information is recorded in the log
   – The change data capture reconstructs update
                                                       LOG
   – Can then be sent to the data warehouse
                                                   TABLE=EMPLOYEE
                                                       SSN=10
      UPDATE EMPLOYEE
       Where SSN=10                                OldSalary=100,
 SET Salary=Salary*2.0
                                                   NewSalary=200



                      CHANGE
                       DATA
                     CAPTURE                          DATA
                   RECONSTRUCTS                     WAREHOUSE
                      UPDATE

                                                               155




               Using the Recovery Log
               Using the Recovery Log
• Recovery logs are usually in proprietary format.
  Use commercial tools to read the log and identify
  the changes.

• Commercial tools such as CA’s log analyzer can
  place the results of their work in a table.




                                                               156




                                                                     78
      Summary of Change Data Capture
      Summary of Change Data Capture
• Pro
   – Log exists anyway, might as well use it to find what has changed


• Con
   – Some difficult scenarios may occur where it is hard to see what the
     new update should be in the Data Warehouse.

   – Proprietary format, may not be supported in many DBMS and will
     always lag behind DBMS development.

   – Many tables will be in the source that have nothing to do with the
     data warehouse, but change data capture will process their changes
     as well.




                                                                        157




                            Triggers
                            Triggers
• Triggers allow DBA’s to specify that when an
  “event” such as an INSERT, UPDATE, or DELETE
  occurs on a table, another event is triggered.
   – Triggers are used to identify changes that are needed by the
     warehouse.
   – A trigger can be added to a source table and whenever the source
     table is updated, an update can be placed either directly in the
     warehouse or in a staging table that tracks all updates.


• Triggers can be used to detect the
      changes and perform data
      warehouse updates.
   – A different trigger might be run on key updates so that the data
     warehouse nightly process would know what data has changed.

                                                                        158




                                                                              79
                         Example of a Trigger
                         Example of a Trigger

                                                                           STAGING
STEP 2                                                      X, Y
                     A                 When values are
                  LE
               TAB
                                       inserted, sets off           TRIGGER inserts
                                         the TRIGGER                values (X, Y) into
                                                                   a “STAGING” area       STEP 3
          Values (X, Y) are inserted

                                                                             Nightly
                                                                             Process



                                                                 Nightly Process inserts
                                                                    values (X, Y) into
 STEP 1                                                           the Data Warehouse

                                                        STEP 4        DATA WAREHOUSE


                                                                                A
           INSERT into                                                       LE
             TABLE A                                                      TAB
          VALUES (X, Y)                                                   Values (X, Y)




                                                                                             159




                   Real-Life Trigger Example
                   Real-Life Trigger Example
 • OLTP/DSS Data - Employee table:
     –Employee (ssn, name, salary)


 • DW Data - Summary table:
     –EmployeeStatistics (total number employees,
        total salary paid, average salary).


 • When a row is inserted in the employee table, we
   need to do an insert into the EmployeeStatistics
   table.
     – Shown on the next page




                                                                                             160




                                                                                                   80
                 Insert Trigger Example
                  Insert Trigger Example
CREATE TRIGGER EmployeeInsertTrigger
ON Employee
FOR INSERT AS
    BEGIN
    UPDATE EmployeeStatistics
            SET NoEmployee = NoEmployee +
                              (SELECT COUNT(*) FROM INSERTED)

           UPDATE EmployeeStatistics
               SET TotSalary = TotSalary +
                               (SELECT SUM(Salary) FROM INSERTED)
          UPDATE EmployeeStatistics
                SET AvgSalary = TotSalary / NoEmployee
    END




                                                                161




                Insert Trigger in Action
                 Insert Trigger in Action
      COMMANDS                             RESULTS

 INSERT INTO EMPLOYEE
VALUES (1, 'John', 300)                (1 ROW(S) AFFECTED)



 INSERT INTO EMPLOYEE
VALUES (2,'Mike', 400)                 (1 ROW(S) AFFECTED)



                                Employee
SELECT * FROM EMPLOYEE          EmpId Name             Salary
                                ------ --------------------------
                                1      John            300.00
                                2      Mike            400.00

                                EmployeeStatistics
    SELECT * FROM               NoEmployee TotSalary    AvgSalary
  EMPLOYEESTATISTICS            ---------- ----------   ---------
                                2              700.00      350.00

                                                                162




                                                                      81
                    Delete Trigger Example
                    Delete Trigger Example
CREATE TRIGGER EmployeeDeleteTrigger
ON Employee
FOR DELETE AS
  BEGIN
    DECLARE @numberEmployee int

      UPDATE EmployeeStatistics
       SET NoEmployee = NoEmployee - (SELECT COUNT(*) FROM DELETED)

      UPDATE EmployeeStatistics
        SET TotSalary = TotSalary - (SELECT SUM(Salary) FROM DELETED)

   SELECT @numberEmployee = NoEmployee FROM EmployeeStatistics

   IF @numberEmployee > 0
      BEGIN
          UPDATE EmployeeStatistics
            SET AvgSalary = TotSalary / NoEmployee
      End
    ELSE
       UPDATE EmployeeStatistics SET AvgSalary = 0.0
  END


                                                                        163




                   Update Trigger Example
                   Update Trigger Example
CREATE TRIGGER EmployeeUpdateTrigger
ON Employee

FOR UPDATE AS
BEGIN

 IF UPDATE (Salary)
     UPDATE EmployeeStatistics
       SET TotSalary = TotSalary -
           (SELECT SUM(Salary) FROM DELETED) +
           (SELECT SUM(Salary) FROM INSERTED)

                 UPDATE EmployeeStatistics
                    SET AvgSalary = TotSalary / NoEmployee
END




                                                                        164




                                                                              82
            Summary of Using Triggers
            Summary of Using Triggers
• Pro
   – Only needed for tables whose data is going to go to the DW


• Con
   – Additional work needed to create detailed triggers

   – Non-trivial to generate a trigger to implement appropriate action

   – May not be acceptable for commercial software on source system




                                                                         165




Other Ways to Determine What Has Changed
Other Ways to Determine What Has Changed
• There are other manual ways of detecting the
  change and doing DW updates
   – Look at each row of OLTP and the data in the warehouse
   – Compare the differences between the two files, if the data is not in
     the warehouse, add it!



            OLTP                                 DATA WAREHOUSE

     Hank                                        Hank
     John
                                                           RE
                                                         PA
                                                 John
     Mike                                               M
                                                      CO
                                                 Mike
     Sam
                       ADD THE DIFFERENCES



                                                                         166




                                                                               83
 Manually Identifying What Has Changed
 Manually Identifying What Has Changed

• Pro
  – Flexible



• Con
  – Very expensive

  – Could take a long time




                                     167




                         Summary
                         Summary

• Recovery Logs
• Triggers
• Manual Detection




                                     168




                                           84
Data Warehouse
    Design

                        (slides in this section
                        are used courtesy of
                        Carrig Emerging Technology
                        Ph: 410- 553- 6760
                        www.c a r r i g e t. c o m )

                                                       169




       Data Warehouse Design
       Data Warehouse Design


1) Overview
2) Describing a Design - ER Diagrams
3) Design Normalization
4) Star Schema Design




                                                       170




                                                             85
                           Overview
                           Overview
• How to describe a design
   – Entity Relationship (ER) Diagram


• Types of Designs

   – Normalized
   – Star Schema
   – Snowflake




                                                                          171




                   Describing a Design
                   Describing a Design
• Different techniques exist, the most prevalent is
  the ER (Entity-Relationship) Diagram

• Entities
   – Things that occur in the real world, usually nouns e.g.; employee,
     part, product, etc.

• Relationships
   – How entities interact, example: one employee may attend many
     colleges -- usually verbs
   – Types of relationships
      – 1-1
      – 1-Many
      – Many-1
      – Many-Many

                                                                          172




                                                                                86
                Examples of Relationships
                Examples of Relationships
1-1                                      1-MANY




MANY-1                                  MANY- MANY




                                                                               173




                       Normalized Design
                       Normalized Design
• Methodology
      – All 1-1 relationships are placed in a single table.
      – Many-many relationships require two tables that store the single-
        valued relationships and one linking table that indicates how the
        entities are related. The relationship is represented in the linking
        table by referencing keys in the two tables that represent each
        entity in the relationship.

• Checking the design
      – In a Normalized Design, there are many different normalized
        forms. Each normal form (NF) builds on the previous one so that a
        table in 2NF is, by definition, in 1NF.
          – 1NF
          – 2NF
          – 3NF

                                                                               174




                                                                                     87
  Dealing With Many-Many Relationships
  Dealing With Many-Many Relationships
• For Many-Many
   – Two 1-1 Tables (SUPPLIER, PARTS)
   – One linking table (SP)
   – Ex: Suppliers, Parts are the 1-1, SP is the linking table that says
     who sells what parts.



     SUPPLIER                                           PARTS
      S#     SNAME                                      P#      PNAME
      1    SEARS                                        1       HAMMERS
      2    OFFICE DEPOT                                 2       NAILS

                            SP
                             S#   P#
                             1    1
                             1    2
                             2    1
                             2    2



                                                                           175




             Normalized Design: Example
             Normalized Design: Example
• A store sells a product which is supplied by a
  given vendor. The product is purchased by a
  customer at a certain time.
   – Entities: Customer, Product, Store
   – Relationships: Customer buys Product
      – Product is located in Store
      – Product is supplied By a Vendor

                                              VENDOR




  CUSTOMER                   PRODUCT                         STORE




                  BUYS                  IS-LOCATED-IN


                                                                           176




                                                                                 88
         Checking a Normalized Design
         Checking a Normalized Design
• Normalization
  – Used to reduce data insertion, delete, and update anomalies caused
    by bad designs.
  – Enables users to quickly check a design and make sure there are no
    glaring holes in the design.
  – 1NF
      – All “cells” are atomic -- i.e. each entry in a column contains only
        one value
  – 2NF
     – All non-key values are functionally dependent upon the entire
       primary key -- i.e. if the primary key changes, all other columns
       change.
  – 3NF
     – No transitive dependencies -- i.e. all keys are completely
       dependent on the primary key. If the primary key changes, all
       non-key columns are affected.
                                                                           177




        Overview of Normalized Design
        Overview of Normalized Design

• Pro
  – Relatively easy to change


• Con
  – Queries can involve numerous joins
  – The massive number of tables and links between tables makes it
    hard for customers to build their own queries




                                                                           178




                                                                                 89
                         Star Schema
                         Star Schema
• Methodology
   – Single fact table in the middle describing a key event (e.g. sale)
     surrounded by dimension tables (i.e. location, time, employee)
                         D = DIMENSIONS



                                           D2
           D1


                           FACT
                                                           D3
         D5
                                      D4

                                                                          179




             Star Schema: Methodology
             Star Schema: Methodology
• Identify a key fact that occurs.
   – Usually some event creates a real fact. Selling a product in a store
     on Wednesday, patient visiting a hospital, etc.

• Identify all the dimensions of the data being used.
  Think of a dimension as a way to slice the data.
   – Ex: by time, by product, by customer, etc.

• Drill down operations are very well supported




                                                                          180




                                                                                90
                          Star Schema: Example
                          Star Schema: Example
• A store sells a product which is supplied by a
  given vendor. The product is purchased by a
  customer at a certain time.

• Fact
   – CustomerPurchase

• Dimensions are
   – Customer
   – Product
   – Time
   – Vendor




                                                                                                 181




                   Star Schema: Example (cont.)
                   Star Schema: Example (cont.)

                                                                Time
                          Customer



                                                Sale                           Price

                     Store
                                                       Product




  SALE     SALE ID          CUST. ID       STORE ID    PROD. ID         PRICE          TIME

           1                3              7           4                $3.00          4/24/99


CUSTOMER       CUST. ID         NAME           PHONE        Buys Apples         Has Big Car

               3                FRED           1234         Y                   Y


                   TIME         DAY        MONTH           QTR          YEAR

                                24     4               2Q              99


                                                                                                 182




                                                                                                       91
                        Star Schema: Overview
                        Star Schema: Overview

• Pro
        – Easy for users to navigate and understand


• Con
        – Performance
           – Can end up with one monster fact table, millions of rows
        – Flexibility
           – Not as easy for customers to change the design




                                                                                                183




                             Snowflake Schema
                             Snowflake Schema
• Several stars can be connected to form a snowflake

                                      MARKETING

                                          Distrib-            Ad
                                           ution



                                                Direct Mail
                                                                   Price

                                      Sales                                         PRODUCT
         SALES                                                                             Manu-
                                                       Location                          facturing
                Marketing                                                   Parts
Revenue


                                                                                     Make
            Sale                                                                     Chips
                                                                                                      Cost
                            Product
                                                                           Price
Price
                                                                                        Labor
               Vendor




                                                                                                184




                                                                                                             92
                   Summary
                   Summary
• Two basic types of design
  – Star Schema
  – Normalized


• Many Data Warehouse vendors sell products built
  specifically for the star schema

• Some data warehouses insist that normalization
  is the way to build the data warehouse.




                                                              185




     Building a
   Data Warehouse

                               (slides in this section
                               are used courtesy of
                               Carrig Emerging Technology
                               Ph: 410- 553- 6760
                               www.c a r r i g e t. c o m )

                                                              186




                                                                    93
             Building a Data Warehouse
             Building a Data Warehouse



   1) Top Down Approaches
   2) Enterprise Data Model Approach
   3) "Let Data Users Decide"
   4) "Let Data Warehouse Builders Decide"
   5) "Let Senior Management Decide"
   6) Bottom Up Approach




                                                        187




          Building the Data Warehouse
          Building the Data Warehouse
• How to decide what data goes into the data
  warehouse?

• Methods:
   – Top Down
      – Using Enterprise Data Models
      – "Let data users decide" approach
      – "Let data warehouse builders decide" approach
      – "Let senior management decide" approach

   – Bottom Up
      – Combine data marts into a data warehouse




                                                        188




                                                              94
          Using Enterprise Data Models
          Using Enterprise Data Models
• Use the Enterprise Data Model to decide what
  data goes into the data warehouse.
   – Model key processes. This approach says let the business decide.
   – Identify key data used by these processes in an enterprise data
     model -- might be a giant Entity-Relationship diagram.

• Put data in the warehouse based on the
  enterprise data model.




                                                                         189




     An Enterprise Data Model Example
     An Enterprise Data Model Example

                                       PUT IN                   SELL CHIPS
   MAKE                                 BAGS
   CHIPS




                                                BUY MORE
   COUNT                                        POTATOES

     $$




                               CHIP
                             SUPPLIERS


       CHIP
      RECIPES                                              INGREDIANTS



                                                                         190




                                                                               95
        "Enterprise Data Model" Approach
         "Enterprise Data Model" Approach

• Pro
  – All inclusive -- no chance of leaving key data out.


• Con
  – Very difficult to build an EDM.
  – If the business model changes, you may have to rebuild the
    Enterprise Data Model and the data warehouse.


• Ways of Avoiding the Con
  – In some cases you can buy an EDM -- if the business is common
    enough the packaged EDM might be very close and then you just
    have to modify it to fit your business.


                                                                          191




               "Let Data Users Decide"
                "Let Data Users Decide"
• Let the users of the data warehouse choose what
  data will go into the warehouse.
   – The data users deciding the data warehouse data and design will
     pay for it as well.
   – Also, you can charge users who
      query the data as well.
                                            SOURCE




                                                                      USERS




                                                     DATA WAREHOUSE

                                                                          192




                                                                                96
        "Let Data Users Decide": An Example
         "Let Data Users Decide": An Example

                                  DATA WAREHOUSE




   DATA
                                         DATA                     DATA
demographics
                                              Ethnic
                                                                 budget
                         trends               group

           Advertising
               ?                                 Age        spending
                                                                       Revenue
                                                                          ?
                                  education       ?


      MARKETING                           HUMAN
                                        RESOURCES              FINANCE



                                                                          193




           "Let Data Users Decide" Approach
            "Let Data Users Decide" Approach
 • Pro
      – Reduces budget problems
      – Users know best!

 • Con
      – Requires marketing
      – Could end up with data in the warehouse that is meaningless to the
        people who run the place.
      – Users may not place important data in the warehouse because their
        budget is small.
      – Users who need the data may not use the DW because of budget
        concerns.

 • Ways of Mitigating the Con
      – Do not just take money -- try to determine if data is really
        corporate.
                                                                          194




                                                                                 97
     Pay As You Go Warehouse Analogy
     Pay As You Go Warehouse Analogy




  I-495




                                                   195




   "Let Data Warehouse Builders Decide"
    "Let Data Warehouse Builders Decide"
• The technical staff who is building the warehouse
  decides what data gets put in the warehouse.

              LETS PUT
          INFORMATION ON
            HOW TO BUILD
           VIRUSES IN THE
          DATA WAREHOUSE




                                  DATA WAREHOUSE




                                                   196




                                                         98
  "Let Data Warehouse Builders Decide"
   "Let Data Warehouse Builders Decide"
                Approach
                Approach
• Pro
   – Very easy to design
   – Does not take much time
   – Do not have to deal with users

• Con
   – Could easily result in data DUMP not data warehouse

• Ways to mitigate the con
   – Talk to lots of users to help you guess what should go in the DW




                                                                        197




        “Let Senior Management Decide”
        “Let Senior Management Decide”
• The senior management decides what data goes
  into the warehouse.
• Asking the senior management is the safest way
  to build a data warehouse.
• Identify the key questions on senior
  management’s mind and get the data to answer
  these questions.




                                                                        198




                                                                              99
“Let Senior Management Decide” Approach
“Let Senior Management Decide” Approach
• Pro
   – Ensures executive support for the project

• Con
   – Senior management does not have much time for this -- you will
     have to only get a few questions at a time
   – This dramatically increases visibility - if you do not move quickly
     senior management will become very angry with the DW.

• Ways to mitigate the con
   – Do your homework before talking to the senior management -- talk
     to the aides of senior management to find out what is on their
     mind.
   – Allocate resources so you can plan to move very quickly once you
     hear from the senior management.

                                                                       199




                  Bottom-Up Approach
                  Bottom-Up Approach
• Move data from existing OLTP Applications to
  data marts.

• Combine data marts into a data warehouse.

                                 DATA
                               WAREHOUSE




               DATA               DATA             DATA
               MART               MART             MART
                25                 50               200
               YARDS             METERS             CM


               OLTP               OLTP             OLTP
                APP               APP               APP


                                                                       200




                                                                             100
                 Bottom-Up Approach
                 Bottom-Up Approach

• Pro
   – Data marts are much easier to build than full-fledged DW.


• Con
   – Could end up with a bunch of stove pipe data marts.


• Ways to mitigate the con
   – Develop standards for data when building the data marts so that
     you can glue data from different data marts together.




                                                                       201




     Recommendations for an Approach
     Recommendations for an Approach


        "Let senior management decide"




                                                                       202




                                                                             101
User Interface to
       the
Data Warehouse
                         (slides in this section
                         are used courtesy of
                         Carrig Emerging Technology
                         Ph: 410- 553- 6760
                         www.c a r r i g e t. c o m )

                                                        203




  User Interface to the Data Warehouse
  User Interface to the Data Warehouse



1) Introduction
2) Types of Users
3) Functions Users Want to Do
4) Approaches to Building a User Interface
5) Hand Built
6) Class Libraries
7) OLAP Tools
8) Types of User Interfaces

                                                        204




                                                              102
                           Introduction
                            Introduction
• A User Interface (UI) is a front end application
  designed for the user that presents information in
  a simplified manner.
   – Data in a data warehouse does nothing if users cannot access it
   – Users do not want to learn SQL to drive DW applications
   Finance                       Inventory                  Sales
    OLTP                           OLTP                     OLTP
  Application                    Application              Application




                              DATA WAREHOUSE
    Finance                      Inventory                     Sales
     OLTP                          OLTP                        OLTP
     Data                          Data                        Data




                            USER INTERFACE


                                                                        205




                  Building User Interfaces
                  Building User Interfaces
• DW applications have different types of users
  with different functionality requirements.
   – It is critical to identify the key users.
   – Once you do this, you need to identify their functional
     requirements.

• There are three main approaches to building UI’s
         – Build your own entirely
         – Use commercial Class Libraries
         – Using OLAP Tools




                                                                        206




                                                                              103
                        Types of Users
                        Types of Users


                                CEO


           Executive         Executive         Executive


           Marketing            Sales           Finance


            Analysts          Analysts          Analysts


           Everyone          Everyone          Everyone

                                                                       207




                    Types of Users (cont.)
                    Types of Users (cont.)
• Executives
  –   People who run the place
  –   Need answers quickly
  –   May not be very technical
  –   Expect UI to get them what they
      want quickly and efficiently without
      any need for special training

• Analysts
  – Have time to really analyze data and think about it
  – May have strong statistical and IT background
    (i.e. Power user of Excel)
  – Expect UI to have many complex features, and
      provide the ability to generate new queries and perform statistical
    analysis of the data.

                                                                       208




                                                                             104
                   Types of Users (cont.)
                   Types of Users (cont.)
• Regular User
  – All other users
  – Just need some simple answers to simple questions like “What is
    Hank’s phone number)
  – Expect UI to be simplistic, easy to understand, and provide access
    to basic information.




                                                                     209




            Subject Matter Experts Expect
            Subject Matter Experts Expect
• Query data in the data warehouse
• Trend analysis
  – “show me how much money we have spent on computers in the
    last four years”
Trend



  Sales




          1995                                                    1999

• Benchmark to competitors
  – “what are all our competitors charging for product X”

                                                                     210




                                                                           105
               Subject Matter Experts Expect (cont.)
               Subject Matter Experts Expect (cont.)
• Drill Down
   – “on that chart you just showed me, I noticed that revenue was
     down in Region #4. Please drill down and show me the breakdown
     of each area in Region #4.”

                        DRILL DOWN
                        WAL-MART

               20
     REVENUE




               15                        Y Values
               10
                5                        X Values
                0
                    1   2     3      4
                        REGIONS

                                                             DRILL DOWN




                                              Revenue
                                                                     DC
                                                        MD     VA

                                                                Region 4
                                                                           211




     Approaches to Building User Interfaces
     Approaches to Building User Interfaces
• Hand-Built
   – Write all of your own code


• Use Class Libraries
   – Use an object oriented approach and buy the CLASS libraries that
     do all the hard work


• OLAP
   – Use an On-Line Analytical Processing package to build user
     interfaces for you.




                                                                           212




                                                                                 106
          Architecture of User Interfaces (cont.)
          Architecture of User Interfaces (cont.)
• Hand Built

                                                              DATA
   USER INTERFACE
                                                            WAREHOUSE
      i.e. JAVA

                                                              DBMS


                                                                        Commercial
                                                                        Off The Shelf
• Class Libraries                                                         (COTS)
                                        GRAP
                                             HIC
                                          CLAS S
                                              S
                                        LIBR
                                            ARY                OLAP
           USER
                                                              CLASS
        INTERFACE
                                                             LIBRARY


                                              USER E
                                                 FAC
                                            INTER SS
Hand                                           CLA
                                                   RY
Built                                         LIBRA



                                                                                213




          Architecture of User Interfaces (cont.)
          Architecture of User Interfaces (cont.)
• OLAP                                                              E
                                     YEAR                         OR
                                                                ST

                                                                               REGION



                                              Result Cube


                     Commercial
                     Off The Shelf
                        (COTS)

                                            REVENUE
   USER INTERFACE




                    DBMS


                                                                                214




                                                                                        107
            Hand-Building User Interfaces
            Hand-Building User Interfaces
• Write all the code yourself
   – Requires many design documents, coding and testing for all of the
     code components.


• Pros

   – Very flexible

• Cons
   – Could take a long time to develop
   – Requires substantial resources
   – May need lots of testing and debugging




                                                                            215




  Using Class Libraries to Build User Interfaces
  Using Class Libraries to Build User Interfaces
• Write initial user dialog yourself and call class
  libraries for the hard part (graphics and data
  access functionality).

• Pro
   – Many class libraries available -- avoid doing a lot of coding
     yourself

• Con
   – Not as flexible -- if the class library does not do what you want it
     to do you have to
       – Find a new class library
       – Live without the functionality
   – Can take a while to find the class library you need and learn how
     to interface to it
                                                                            216




                                                                                  108
   Using OLAP Tools to Build User Interfaces
   Using OLAP Tools to Build User Interfaces
• Many different OLAP tools
   –   Need to survey an OLAP tool
   –   Buy an OLAP tool
   –   Install it
   –   If it does not match all requirements some code may be needed to
       communicate with the OLAP tool.


• Three types multi-dimensional OLAP

   –   Relational OLAP (ROLAP)
   –   Multi-dimensional (MOLAP)
   –   Hybrid (HOLOP)
   –   Distributed (DOLAP)


                                                                      217




  Summary of Tools for UI Development of DW
  Summary of Tools for UI Development of DW
• Tools that may be used include:
   – Development of in-house software
      – Do it all yourself
      – Use Class Libraries
   – OLAP
      – ROLAP
      – MOLAP
      – HOLAP
      – DOLAP
• Different tools or techniques may be useful
  depending upon what kind of user interface is
  being developed.
   – Executive Information Systems
   – Analytical Systems
   – Enterprise Information Systems

                                                                      218




                                                                            109
                     Types of User Interfaces
                     Types of User Interfaces
 • Executive Information System
     – Developed for the person who runs the place
 • Analytical System
     – Developed for business analysts
 • Enterprise Information System
     – Developed for users throughout the organization
                                     CEO
  EXECUTIVE
  INFORMATION SYSTEM

                       Executive   Executive   Executive


                       Marketing    Sales      Finance


                       Analysts    Analysts    Analysts
                                                           ANALYTICAL SYSTEM

ENTERPRISE
INFORMATION SYSTEM     Everyone    Everyone    Everyone

                                                                               219




               Executive Information System
               Executive Information System
 • The Executive IS is developed specifically for
   people who run the organization.
 • Development process:
     – No clean life cycle
     – Prototype constantly. Usually have to guess at
       what executives will want to see
     – Show executives let them come up with ideas
       for revisions
     – Drill down functionality required

 • Tools
     – Frequently hand-built, but purchasing a class library can help
       lower the development cost.
     – May just want to use tools that allow development of a
       subscription service in which users may “Subscribe” to a few
       canned reports.
                                                                               220




                                                                                     110
                     Analytical System
                     Analytical System
• Analytical systems are user interfaces developed
  for business analysts in an organization.

• Development process:
   – Allow users to drag-and-drop data around to further the analysis of
     this data.
   – More complex interface is acceptable
   – Users may be required to know some SQL knowledge

• Tools:
   – OLAP Tools are frequently used to build the interface




                                                                      221




            Enterprise Information System
            Enterprise Information System
• Enterprise IS is written for the general user to
  retrieve simple, key information.
• Development process:
   – Frequently developed in-house
   – So many users around that you really cannot pick a few and ask
     what they need.
   – Simpler than Executive IS as it does not require drill down
     functionality.

• Tools
   – Place some simple, key information
     on a few screens and control
     access and then deploy.


                                                                      222




                                                                            111
      Summary of Types of User Interfaces
      Summary of Types of User Interfaces
• Executive Information System
  – For the senior executives
  – Use in-house development or in -house development augmented by
    class libraries

• Analytical System
  – OLAP may make sense here as the interface is more complicated,
    but OLAP has drawbacks due to:
     – Data sparseness
     – No well accepted query language

• Enterprise Information System
  – Much simpler than executive system
  – Good candidate for in-house development

                                                                 223




                                                                       112

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:5
posted:10/9/2011
language:English
pages:112