Docstoc

Business Intelligence

Document Sample
Business Intelligence Powered By Docstoc
					   Understand Information Systems
   Understand Data Warehouses
   Case in Point –”Wal-Mart “
   BI Tools & Technologies




               Introduction BI Session II 23.10.06Smita   1
                                Bhatia
Where   do you fit in this BI picture ?




              Introduction BI Session II 23.10.06Smita   2
                               Bhatia
                    BI Vendors
   Business Objects
   Cognos
   Actuate
   Applix
   Information Builders
   Informatica
   MicroStrategy
   Microsoft
   Oracle
   SAS



                    Introduction BI Session II 23.10.06Smita   3
                                     Bhatia
Business Processes
             (Ex.:
           Banking)


                                               Decisional
 (agreement with a credit                      processes
         card)
                                             Management
          (grant a loan)
                                              processes

       (transaction on              Operational processes
        bank account




                Introduction BI Session II 23.10.06Smita    4
                                 Bhatia
Types of Information Systems




    Introduction BI Session II 23.10.06Smita   5
                     Bhatia
MAJOR TYPES OF SYSTEMS IN ORGANIZATIONS




             Introduction BI Session II 23.10.06Smita   6
                              Bhatia
               MAJOR TYPES OF SYSTEMS IN
                    ORGANIZATIONS



Transaction Processing Systems
 (TPS):

• Basic business systems that serve the operational
  level

• A computerized system that performs and records
  the daily routine transactions necessary to the
  conduct of the business




                 Introduction BI Session II 23.10.06Smita   7
                                  Bhatia
MAJOR TYPES OF SYSTEMS IN
    ORGANIZATIONS

           Payroll TPS




   Introduction BI Session II 23.10.06Smita   8
                    Bhatia
MAJOR TYPES OF SYSTEMS IN
     ORGANIZATIONS


Types of TPS Systems




    Introduction BI Session II 23.10.06Smita   9
                     Bhatia
               MAJOR TYPES OF SYSTEMS IN
                    ORGANIZATIONS



     Knowledge Work Systems (KWS):

Knowledge level
 Inputs: Design specs
 Processing: Modeling
 Outputs: Designs, graphics
 Users: Technical staff and professionals


Example: Engineering work station


                 Introduction BI Session II 23.10.06Smita   10
                                  Bhatia
             MAJOR TYPES OF SYSTEMS IN
                  ORGANIZATIONS



 Management Information System (MIS):

Management level
 Inputs: High volume data
 Processing: Simple models
 Outputs: Summary reports
 Users: Middle managers


Example: Annual budgeting


               Introduction BI Session II 23.10.06Smita   11
                                Bhatia
  MAJOR TYPES OF SYSTEMS IN
       ORGANIZATIONS


Management Information System
           (MIS)




     Introduction BI Session II 23.10.06Smita   12
                      Bhatia
              MAJOR TYPES OF SYSTEMS IN
                   ORGANIZATIONS


          Management Information System (MIS)

   Structured and semi-structured decisions

   Report control oriented

   Past and present data

   Internal orientation

   Lengthy design process


                  Introduction BI Session II 23.10.06Smita   13
                                   Bhatia
              MAJOR TYPES OF SYSTEMS IN
                   ORGANIZATIONS



      Decision Support System (DSS):

Management level
 Inputs: Low volume data
 Processing: Interactive
 Outputs: Decision analysis
 Users: Professionals, staff


Example: Contract cost analysis

                 Introduction BI Session II 23.10.06Smita   14
                                  Bhatia
  MAJOR TYPES OF SYSTEMS IN
       ORGANIZATIONS
Decision Support System (DSS)




           Figure 2-
     Introduction BI Session II 23.10.06Smita   15
                      Bhatia
    MAJOR TYPES OF SYSTEMS IN
         ORGANIZATIONS


Decision Support System (DSS)




        Introduction BI Session II 23.10.06Smita   16
                         Bhatia
              MAJOR TYPES OF SYSTEMS IN
                   ORGANIZATIONS



     Executive Support System (ESS):

Strategic level
 Inputs: Aggregate data
 Processing: Interactive
 Outputs: Projections
 Users: Senior managers


Example: 5-year operating plan


                Introduction BI Session II 23.10.06Smita   17
                                 Bhatia
   MAJOR TYPES OF SYSTEMS IN
        ORGANIZATIONS
Executive Support System (ESS)




      Introduction BI Session II 23.10.06Smita   18
                       Bhatia
              MAJOR TYPES OF SYSTEMS IN
                   ORGANIZATIONS

             Executive support system (ESS)

   Top level management

   Designed to the individual

   Ties CEO to all levels

   Very expensive to keep up

   Extensive support staff


                  Introduction BI Session II 23.10.06Smita   19
                                   Bhatia
INTERRELATIONSHIPS AMONG SYSTEMS




       Introduction BI Session II 23.10.06Smita   20
                        Bhatia
Let's imagine a conversation between the Chief
Information Officer of Wal-Mart and a sales guy from
Sybase.

       Wal-Mart: "I want to keep track
         of sales in all of my stores
              simultaneously."


                                         Sybase:     "You   need   our
                                         wonderful RDBMS software.
                                         You can stuff data as sales
                                         are rung up at cash registers
                                         and     simultaneously query
                                         data out right here in your
                                         office.




                  Introduction BI Session II 23.10.06Smita               21
                                   Bhatia
                 Wal -Mart



So Wal-Mart buys
 a    $1    million
 Hewlett-Packard
 multi-CPU server
 and a $500,000
 Sybase     license
 and           build
 themselves a nice
 normalized     SQL
 data model
                             22
SALES table
                                                                  date/time of
  product id     store id     quantity sold
                                                              sale

                                                                       1997-10-22
                      17                   1
                                                                        09:35:14
       567
                                                                       1997-10-22
       219            16                   4
                                                                        09:35:14
                                                                       1997-10-22
       219            17                   1
                                                                        09:35:17
  PRODUCTS table

 store id      Prod id      location                             Ref No
       16        34         510 Main Street                        415-555-1212
       17        58         13 Maple Avenue                        914-555-1212
 ...


                            Introduction BI Session II 23.10.06Smita                23
                                             Bhatia
STORES table



store id    city id   store location                           phone number
      16        34         510 Main Street                         415-555-1212
      17        58        13 Maple Avenue                          914-555-1212
...

 CITIES table


city id     city name                         state                    population
      34        San Francisco                      California              700,000
      58         East Fishkill                      New York                30,000




                        Introduction BI Session II 23.10.06Smita                     24
                                         Bhatia
After a few months of stuffing data
into these tables…..

                                  I noticed that there
                                       was a Colgate
Mary :A Wal                       promotion recently,
                                 directed at people who
-Mart Executive                    live in small towns.
                                    How much Colgate
                                 toothpaste did we sell
                                      in those towns
                                  yesterday? And how
                                 much on the same day
                                      a month ago?"



             Introduction BI Session II 23.10.06Smita     25
                              Bhatia
Mary’s Query is executed…………
   Select sum(sales.quantity_sold) from sales,
    products,       stores,       cities     where
    products.manufacturer_id = 68 -- restrict
    to           Colgate-Palmolive             and
    products.product_category = 'toothpaste'
    and    cities.population    <      40000   and
    sales.datetime_of_sale::date                 =
    'yesterday'::date -- restrict to yesterday
    and sales.product_id = products.product_id
    and sales.store_id = stores.store_id and
    stores.city_id = cities.city_id


                  Introduction BI Session II 23.10.06Smita   26
                                   Bhatia
What Happens then ?




The $1 million HP Unix box crawls to a near-halt.
The cash registers stop being able to ring up
customers. Eventually the dbadmins realize that
the system collapses every time Ms. Mary’s
toothpaste query gets run. They complain to
Sybase tech support.



                                                    27
   Wal-Mart: "We type in the toothpaste query and
    our             system                wedges."


   Sybase: "Of course it does! You built an on-line
    transaction processing (OLTP) system. You can't
    feed it a decision support system (DSS) query
    and       expect      things     to      work!"




   Wal-Mart: "But I thought the whole point of SQL
    and your RDBMS was that users could query
    and           insert           simultaneously."

                     Introduction BI Session II 23.10.06Smita   28
                                      Bhatia
 Sybase: "Uh, not exactly.“


Wal-Mart: "Can you fix your system so
 that it doesn't lock up?“

 Sybase: "No. But we made this great
 loader tool so that you can copy
 everything from your OLTP system
 into a separate DSS system at 100
 GB/hour."


             Introduction BI Session II 23.10.06Smita   29
                              Bhatia
What next for Wal-Mart ?
   Basically what Sybase wants Walmart to do
    is set up another RDBMS installation on a
    separate computer. Walmart needs to buy
    another $1 million monster HP Unix box.
    They need to buy another $500,000 RDBMS
    license.   They   also    need    to   hire
    programmers to make sure that the OLTP
    data is copied out nightly and stuffed into
    the DSS system. Walmart is now building
    the data warehouse.


                 Introduction BI Session II 23.10.06Smita   30
                                  Bhatia
   Wal-Mart: "You said SQL was great and would
    solve our problems. But we can't ask our most
    important       questions       in      SQL."




   Sybase: "We only sold you an OLTP system
    and then a DSS system. These questions
    you've brought to us are online analytical
    processing (OLAP) queries. You can't expect
    to run these against a relational database.
    You need an OLAP system. It will only cost
    you another $1 million in hardware and
    $500,000 in software licenses.


                  Introduction BI Session II 23.10.06Smita   31
                                   Bhatia
A Data Warehouse is ...
A data warehouse is a
     subject-oriented,
     integrated,
     time-variant, and
     nonvolatile
collection of data in support of
  management’s decisions



                Introduction BI Session II 23.10.06Smita   32
                                 Bhatia
… subject-oriented ...
 The data in the warehouse is defined and
  organized in business terms, and is grouped
  under business-oriented subject headings, such
  as
    customers

    products

    sales

rather than individual transactions.
 Normalization is not relevant.




                 Introduction BI Session II 23.10.06Smita   33
                                  Bhatia
… integrated ...
   The data warehouse contents are defined such that
    they are valid across the enterprise and its
    operational and external data sources




                                                                   Data warehouse
                      Operational systems
       The data in the warehouse should be
          clean
          validated
          properly integrated



                        Introduction BI Session II 23.10.06Smita               34
                                         Bhatia
… time-variant ...
   All data in the data warehouse is time-stamped at
    time of entry into the warehouse or when it is
    summarized within the warehouse.
   This chronological recording of data provides
    historical and trend analysis possibilities.
   On the contrary, operational data is overwritten,
    since past values are not of interests.




                  Introduction BI Session II 23.10.06Smita   35
                                   Bhatia
… nonvolatile ...
   Once loaded into the data warehouse,
    the data is not updated.
   Data acts as a stable resource for
    consistent reporting and comparative
    analysis.
   On the contrary, operational data is
    updated (inserted, deleted, modified).


                Introduction BI Session II 23.10.06Smita   36
                                 Bhatia
Which Data in the Warehouse?
   A data warehouse contains five types
    of data:
     Current detail data
     Old detail data

     Lightly summarized data

     Highly summarized data

     Metadata

   Granularity of the data: a key design issue

                  Introduction BI Session II 23.10.06Smita   37
                                   Bhatia
Flow of Data
   Operational
   Environmen
        t
         Clean the
           data
                                                          Purge
                       Reside in
                                                         Summari
                      warehouse
                                                           ze
                                                         Archive

              Introduction BI Session II 23.10.06Smita            38
                               Bhatia
An Example of Data Integration

                                                       Operational
     Checking Account System                                        data
          Jyoti Das (name)
          Female (gender)                                        Customer
    Bounced check #145 on 1/5/95
        Opened account 1994                                       Jyoti Das
                                                                   Female
     Savings Account System                                 Bounced check #145
             Jyoti Das                                             Married
            F (gender)
       Opened account 1992                                 Owns 25 Shares Exxon
                                                            Customer since 1992
    Investment Account System
             Jyoti Das
       Owns 25 Shares Exxon
                                                                            data
        Opened account 1995                                            warehouse

                         Introduction BI Session II 23.10.06Smita                   39
                                          Bhatia
Cost and Size of a Data Warehouse
   Data warehouses are expensive undertakings (
    Avg cost: $2.2 million).
   Since a data warehouse is designed for the
    enterprise it has a typical storage size running
    from 50 Gb to over a Terabyte.
   Parallel computing to speed up data retrieval




                     Introduction BI Session II 23.10.06Smita   40
                                      Bhatia
                   The Data Mart
   A lower-cost, scaled-down version of the
    data warehouse designed for the strategic
    business unit (SBU) or department level.
   An excellent first step for many
    organizations.
   Main problem: data marts often differ from
    department to department.
   Two approaches:
       data marts  enterprise-wide system
       data warehouse  data marts

                     Introduction BI Session II 23.10.06Smita   41
                                      Bhatia
       An Architecture for Data Warehousing
                                metadata

                                                                         EIS


                                                                DSS

external sources
                   extraction         used                            OLAP
                   cleaning           by
                                data
                   validation warehouse
                                                                data
                   summariza
                   tion.                                        mining


  operational                                                         query
  databases


                                 data mart
                     Introduction BI Session II 23.10.06Smita             42
                                      Bhatia
On-Line Analytical Processing
(OLAP)
   Term introduced by E.F. Codd (1993) in
    contrast to On-Line Transaction Processing
    (OLTP)
   The OLAP Council’s definition:
“A category of software technology that enables
    analysts, managers and executives to gain insight
    into data through fast, consistent, interactive
    access to a wide variety of possible views of
    information that have been transformed from raw
    data to reflect the real dimensionality of the
    enterprise as understood by the user”


                    Introduction BI Session II 23.10.06Smita   43
                                     Bhatia
On-Line Analytical Processing
(OLAP)
   Basic idea: users should be able to
    manipulate enterprise data models across
    many dimensions to understand changes
    that are occurring.
   Data used in OLAP should be in the form of
    a multi-dimensional cube.


                                             Market
                                                      Product
                 Introduction BI Session II 23.10.06Smita       44
                                  Bhatia
Dimensional Hierarchies

   Each dimension can be hierarchically structured
                                    Year                       Country

          Type of
          product                 Month                         State

          Product
                                   Week                         City

            Item
                                     Day                        Store

                    Introduction BI Session II 23.10.06Smita             45
                                     Bhatia
OLAP Operations
   Rollup: decreasing the level of detail
   Drill-down: increasing the level of
    detail
   Slice-and-dice: selection and projection


   Pivot: re-orienting the multidimensional
    view of data

                 Introduction BI Session II 23.10.06Smita   46
                                  Bhatia
Implementing Multi-dimensionality

   Multi-dimensional databases (MDDB)
   To make relational databases handle
    multidimensionality, two kinds of tables are
    introduced:
      Fact table: contains numerical facts. It is long
       and thin.
       Dimension tables: contain pointers to the fact table.
        They show where the information can be found. A
        separate table is provided for each dimension.
        Dimension tables are small, short, and wide.



                       Introduction BI Session II 23.10.06Smita   47
                                        Bhatia
Data Warehouse
     Use
     Wal-Mart
     Fingerhut




    Introduction BI Session II 23.10.06   48
Some insights on Wal-Mart
   A 200,000- square-foot Wal-Mart opens every few days;
   A Barbie doll is sold at a Wal-Mart every two seconds.
   The company credits much of its success to the control of its
    data. Many of Wal-Mart’s large-volume suppliers, such as Procter
    & Gamble, have direct access to the Wal-Mart data warehouse, so
    they deliver goods to specific stores as needed.
   Wal-Mart pays such companies for their product only when it is
    sold.
   Procter & Gamble ships most of its items this way, eliminating
    paperwork and sales calls on both sides, a clear benefit of
    aggregate targeting.
   The Wal-Mart data warehouse is also the source for corporate
    wide decisions. The company can effect price changes globally
    and instantaneously at its more than 3000 locations.




                        Introduction BI Session II 23.10.06Smita   49
                                         Bhatia
         Wal-Mart Data Warehouse

   Wal-Mart dominates retail market
   Heavy user of information technology
   Supply chain distribution to 2,900 outlets
       A critical success factor
   Data warehouse of 101 terabytes
       Possibly world’s largest
       Investment over $1 billion
       Can handle 35,000 queries per week
            Benefits over $12,000 per query




                         Introduction BI Session II 23.10.06Smita   50
                                          Bhatia
                         Wal-Mart
   Initial data warehouse
       point-of-sale & shipment data
   Added data
       Inventory
       Forecast
       Demographic
       Markdown
       Return
       Market basket information


                      Introduction BI Session II 23.10.06Smita   51
                                       Bhatia
Wal-Mart Data Warehouse
   Process 65 million transactions per
    week
   65 weeks of data per item
       By store
       By day
   Support decision making
   Many users have access
       Including 3,500 vendor partners

                   Introduction BI Session II 23.10.06Smita   52
                                    Bhatia
 Business Intelligence Technologies
Increasing potential
                                                               End User
to support business    Decision
     decisions         Making
                   Data Presentation                           Business
                   Visualization Techniques                     Analyst
                       Data Mining                                Data
                   Information discovery                        Analyst
                    Data Exploration
           OLAP, DSS, EIS, Querying and Reporting

             Data Warehouses / Data Marts                 DB
                       Data Sources                    Admin
   Paper, Files, Information Providers, Database Systems,
                             OLTP

                    Introduction BI Session II 23.10.06Smita         53
                                     Bhatia
        BI Activities & Tools

Dash Board
                 OLTP                        ETL                   OLAP




                  Data Visualization                           Reporting
DW & Data Mart
                                                              & Query Tools




                   Introduction BI Session II 23.10.06Smita                   54
                                    Bhatia
        BI Activities & Tools

 Balanced                   Alerts                             Notification
Score Card                                                      Systems




             Data Mining
                                                     Analytics




                    Introduction BI Session II 23.10.06Smita                  55
                                     Bhatia