Docstoc

Microsoft PowerPoint - presentation_nazirmcdonald_final

Document Sample
Microsoft PowerPoint - presentation_nazirmcdonald_final Powered By Docstoc
					  Lessons learned in developing a data
warehouse for a tertiary institution in South
                  Africa
       Amer Nazir                           Supervisor
 M. Sc. Computer Science              Prof Theo McDonald



        Department of Computer Science and Informatics,
                    University of the Free State
                                                                                                       Outline
Outline

  Introduction
  Background
     Business intelligence tools and technologies
     Business intelligence tools and technologies
     Data warehousing basic concepts
     Data warehousing basic concepts
  Lessons learned
  Conclusion




                                                                                                           2
           Lessons learned in developing a data warehouse for a tertiary institution in South Africa
                                                                                          Introduction
Introduction

  Industry in first world countries is experiencing a dramatic
  increase in the use of data warehousing (DW) tools and
  technologies.
  In South Africa pioneering organizations like Electricity
  Companies and Banks set up very large databases.
  Tertiary institutions have been slow to follow suit.
  Tertiary institutions have developed into large business in and
  of themselves.
  This change has resulted in a more business-like management.




                                                                                                       3
           Lessons learned in developing a data warehouse for a tertiary institution in South Africa
                                                                                          Introduction
Introduction

  The environmental factors that encourage tertiary institutions
  to investigate DW options are:
     Decreases in governmental financial support
     Decreases in governmental financial support
     Faculty supplies
     Faculty supplies
     Research funding
     Research funding
     Increases in student tuitions
     Increases in student tuitions
     Competition
     Competition
     Faculty salaries
     Faculty salaries
     Faculty support and the expectations from the students, parents
     Faculty support and the expectations from the students, parents
     and employers.
     and employers.
     Universities as businesses need strategic information in order to
     Universities as businesses need strategic information in order to
     survive (Wierschem et al., 2003).
     survive (Wierschem et al., 2003).



                                                                                                       4
           Lessons learned in developing a data warehouse for a tertiary institution in South Africa
                                                                                                    Background
 Evolving role of information


 Historically                          Data
                                      Data
                                    Processing
                                   Processing

DSS             1960
 Beginning

 Theory Development      1970


    DSS Implementation
                                  1980                                             Knowledge
                                                                                   Knowledge
                                                                                      Work
                                                                                     Work
        EIS


              Business
              Intelligence
                              Currently
                                                                                                                5
                    Lessons learned in developing a data warehouse for a tertiary institution in South Africa
                                                                                           Background
Business intelligence (BI)

  Business intelligence is the process of getting enough of the
  right information in a timely manner and usable form.
  Analyzing this information in a way so that it can have a
  positive impact on company:
     Business strategy
     Tactics
     Operations (Wally, 2003).




                                                                                                       6
           Lessons learned in developing a data warehouse for a tertiary institution in South Africa
                                                                                           Background
Business intelligence tools


Back-end tools
  Infrastructure tools that deal with:
  Extracting data
  Cleaning data
  Transforming data
  Re-organizing data
  Optimizing data for use in decision making (Lokken, 2001).




                                                                                                       7
           Lessons learned in developing a data warehouse for a tertiary institution in South Africa
                                                                                            Background
Business intelligence tools

Front-end tools
  Designed to extract knowledge
  Designed to extract knowledge
  and insight from the data once
  and insight from the data once
  it has been prepared. These
  it has been prepared. These
  include:
  include:
  Reporting
  Reporting
  Querying
  Querying
  On-Line analysis
  On-Line analysis
  Exploration
  Exploration
  Visualization
  Visualization
  Decision modeling and
  Decision modeling and
  planning
  planning
  Data mining
  Data mining
  Portals, dashboards, and
  Portals, dashboards, and
  scorecards (Lokken, 2001).
  scorecards (Lokken, 2001).
                                                                                                        8
            Lessons learned in developing a data warehouse for a tertiary institution in South Africa
                                                                                Background
Difference between information and BI

               Information                                 Business Intelligence
                                                                                Predict the success of
                                                                                Predict the success of
  Add a new student                                                             first-year students.
                                                                                first-year students.
                                    Student
  How many student
                                    Database
  enrolled in last five
  years

                                                                                What is the
                                                                                 What is the
  Add a new course                                       Business               percentage in the
                                                                                 percentage in the
                                    Courses
                                                                                increase of student
                                                                                 increase of student
  Print the list of                 Database
                                                        Intelligence            from last 5 years?
                                                                                 from last 5 years?
  courses department                                        Tools
  wise


  Add a new staff
                                        HR
                                                                               Print out the list of
                                                                               Print out the list of
  Print out the list of             Database                                   students, who are
                                                                                students, who are
  staff                                                                        more likely to pass?
                                                                                more likely to pass?



    Online Transaction Processing                         Online Analytical Processing (OLAP)
                                                                                                        9
                  (OLTP)
              Lessons learned in developing a data warehouse for a tertiary institution in South Africa
                                                                                           Background
Data warehousing is the basis for BI


  Data warehouse empowers institutional decision makers by
  placing inquiry and analysis tools at their fingertips.
  Users can produce customized reports anytime, anywhere.
  Easy access and quick information delivery support
  administrative decisions at all levels and improve the way the
  organization does business.
  Reduces the effort once spent by the MIS department
  developing ad-hoc programs and answering questions
  (Donhardt et al., 2001).




                                                                                                       10
           Lessons learned in developing a data warehouse for a tertiary institution in South Africa
                                                                                            Background
Data warehousing basis concepts

Online transaction processing (OLTP) systems
  The systems that are used to run the day-to-day business of company,
  The systems that are used to run the day-to-day business of company,
  the so called bred-and-butter systems (Ponniah, 2001).
  the so called bred-and-butter systems (Ponniah, 2001).
  OLTP systems are the backbone of the DW systems.
  OLTP systems are the backbone of the DW systems.
Data warehousing (DW) systems
  A DW is a copy of transaction data specifically structured for querying
  A DW is a copy of transaction data specifically structured for querying
  and analysis (Kimball, 2006).
  and analysis (Kimball, 2006).
  Provides an integrated and total view of the enterprise.
  Provides an integrated and total view of the enterprise.
  Makes the enterprise’s current and historical information easily
  Makes the enterprise’s current and historical information easily
  available for decision making.
  available for decision making.
  Makes decision-support transactions possible without hindering OLTP
  Makes decision-support transactions possible withoutis a subset of a
                                              A data mart hindering OLTP
  systems.
  systems.                                    data warehouse for use by a
  Presents a flexible and interactive source of strategic information.
  Presents a flexible and interactive source of strategic information.
                                              single department or function.


                                                                                                        11
            Lessons learned in developing a data warehouse for a tertiary institution in South Africa
                                                                                                Background
Difference between OLTP and DW systems

    Traditional projects start with requirements and end with data.
    Data warehousing projects start with data and end with
    requirements.


Feature                      OLTP Systems                                         Data warehousing
                                                                                      systems
Data content       Current values                                        Archived, derived, summarized
Data structure     Optimized for transaction                             Optimized for complex queries
Access freelance   High                                                  Medium to low
Access type        Read, update, delete                                  Read
Usage              Predictable, repetitive
                   Predictable,                                          Adhoc, random, heuristic
                                                                         Adhoc, random,
Response time      Sub-seconds
                   Sub-seconds                                           Several seconds to minutes
Users              Large number                                          Relatively small number
                                                                                                            12
                Lessons learned in developing a data warehouse for a tertiary institution in South Africa
                                                                                            Background
DW architecture components

  Data acquisition
  Data acquisition
  Data storage
  Data storage
  Information delivery.
  Information delivery.




                                                                                                          Report
                                    OLTP



                                                                                    Data
                                                            Staging
                                   Legacy
                                                              area             warehouse
                                                                                                          OLAP

                                    Other



                                                                                                        Data mining
                                                                                                               13
            Lessons learned in developing a data warehouse for a tertiary institution in South Africa
                                                                                             Background
Data acquisition

   Data is extracted from different sources
   Data is extracted from different sources
   Moved to staging area for data preparation
   Moved to staging area for data preparation
   Finally moved to DW for loading.
   Finally moved to DW for loading.
Staging area
In staging area file is extracted by performing transformations
In staging area file is extracted by performing transformations
    like:
    like:
      • Sort
      • Sort
      • Merge
      • Merge                               OLTP
      • Resolving inconsistencies
      • Resolving inconsistencies
      • Data cleansing etc.
      • Data cleansing etc.                                                                                Data
                                                        Staging
                                                               Legacy
                                                                                          area           warehouse


                                                                Other



                                                                                                                  14
             Lessons learned in developing a data warehouse for a tertiary institution in South Africa
                                                                                              Background
Data storage

   Data is stored using a star schema //
   Data is stored using a star schema                                                         Dimension
   dimensional model.
   dimensional model.
   One large central table called fact
   One large central table called fact
   table.
   table.
   Smaller attendant tables called                                      Dimension                 Fact    Dimension
   Smaller attendant tables called
   dimensional tables.
   dimensional tables.
 Fact table
 Fact table
  Stores the measures of
  Stores the measures of                                                                      Dimension
  the business.
  the business.
                      Dimension tables
                       Dimension tables
  Store aggregate or
  Store aggregate or
  summarized data.
  summarized data.      Represents business
                         Represents business                                                     Data
                        dimensions.
                         dimensions.
                                                                                            warehouse
                                   Attributes are of textual
                                   Attributes are of textual
                                   format.
                                   format.

                                                                                                             15
              Lessons learned in developing a data warehouse for a tertiary institution in South Africa
                                                                                            Background
Information delivery

  Data warehouse can satisfy users of all levels of
  Data warehouse can satisfy users of all levels of
  management by providing data in the form of:
  management by providing data in the form of:
     Reports
     Reports
     Excel pivot tables
     Excel pivot tables
     Online analytical processing (OLAP)
     Online analytical processing (OLAP)
     Data mining etc.
     Data mining etc.
                                                                                                    Report


                                                                          Data
                                                                     warehouse                      OLAP




                                                                                               Data mining
                                                                                                             16
            Lessons learned in developing a data warehouse for a tertiary institution in South Africa
                                                                                         Background
OLAP

   OLAP technology organizes data in multidimensional tables
   OLAP technology organizes data in multidimensional tables
   also called cubes.
   also called cubes.
   Provides access to the DW through an interactive graphical
   Provides access to the DW through an interactive graphical
   user interface.
   user interface.
   Users can drill seamlessly into information when additional
   Users can drill seamlessly into information when additional
   details are required.
   details are required.




                                                                                                     Prod uct
                                                                                                     n
                                                                                      Revenu Locatio
                                                                                            e
                                                                                                                17
         Lessons learned in developing a data warehouse for a tertiary institution in South Africa
                                                                                          Background
Data mining

    Data mining refers to extracting or mining knowledge from
    Data mining refers to extracting or mining knowledge from
    large amounts of data.
    large amounts of data.
    Provides deeper understanding of the patterns previously
    Provides deeper understanding of the patterns previously
    unseen using current available reporting capabilities.
    unseen using current available reporting capabilities.




                                                                                                      18
          Lessons learned in developing a data warehouse for a tertiary institution in South Africa
Lessons learned
Lessons learned
  Presentation
                                                                                Lessons learned
OLTP system is not suitable for strategic information

     In 1986 in-house system was developed using IBM platform
     In 1986 in-house system was developed using IBM platform
         Programming languages that became obsolete
         Programming languages that became obsolete
         Developers having left the team
         Developers having left the team
         System was designed in patches.
         System was designed in patches.
     In 2003 purchased a new OLTP system with huge cost
     In 2003 purchased a new OLTP system with huge cost
         Numerous data errors due to data transfer
         Numerous data errors due to data transfer
         Lack of data integrity
         Lack of data integrity
         1000 of tables
         1000 of tables
         Needs customization.
         Needs customization.




                                                                                                       20
           Lessons learned in developing a data warehouse for a tertiary institution in South Africa
                                                                                 Lessons learned
Third party Software is not suitable for strategic information


    VALPAC system
      Users find the interface unfriendly.
      Users find the interface unfriendly.
      Allows reporting on only two years data that is minimal for
      Allows reporting on only two years data that is minimal for
      institutional forecasting and planning.
      institutional forecasting and planning.
    HEMIS systems
      Designed basically according to the format specified by the
      Designed basically according to the format specified by the
      DOE.
      DOE.
      Does not covers institutional all level of activities.
      Does not covers institutional all level of activities.
      Complex procedures must be written for reporting.
      Complex procedures must be written for reporting.
      OLTP based database structure complex for querying and
      OLTP based database structure complex for querying and
      reporting.
      reporting.
    Data cleansing
      These systems only validate data without the support of data
      These systems only validate data without the support of data
      cleansing tools.
      cleansing tools.
                                                                                                        21
            Lessons learned in developing a data warehouse for a tertiary institution in South Africa
                                                                                     Lessons learned
Problems in OLTP database can be solved in the staging area


OLTP data (20213)
 Plan Semester           Module           Location                                  Staging area
  4500      2041          RIS615            SOMPG                                         Data
            2041          RIS691              BFN                                Transformations
  4500      2042          RIS616            SOMPG
  4792      2042          RIS700              BFN
DW -- Student enrollment fact
DW Student enrollment fact
 Plan    Semester      Module             Location          Primary            Primary             Entrance
                                                            Location           Plan                Category
 4500    2041          RIS615             SOMPG             SOMPG              Primary             Transfer Postgraduate


 4500    2041          RIS691             BFN               SOMPG              Primary             Transfer Postgraduate

 4500    2042          RIS616             SOMPG             SOMPG              Secondary           Transfer Postgraduate

 4792    2042          RIS700             BFN               BFN                Primary             Entering Postgraduate




                                                                                                                           22
                Lessons learned in developing a data warehouse for a tertiary institution in South Africa
                                                                                  Lessons learned
Data warehouse simplifies database structure


      OLTP system is designed to optimize transactional processing.
      OLTP system is designed to optimize transactional processing.
      To optimize update operations data redundancy is minimized.
      To optimize update operations data redundancy is minimized.
      Normalized database structure makes extracting data complex.
      Normalized database structure makes extracting data complex.
      Linking also creates a sever load on the database.
      Linking also creates a sever load on the database.



                                               ACAD_CAR_TBL                             STDNT_CAR_TERM
    ACAD_PROG_TBL

                              ACAD_PROG                              ACAD_GROUP_TBL
    PROG_REASON
                                                TERM_TBL                                ACAD_PLAN_TBL

    GRADING_BASIS
                                                                     ACAD_PLAN
                              TERM_VAL_TBL

                                                                                                         23
             Lessons learned in developing a data warehouse for a tertiary institution in South Africa
                                                                                 Lessons learned
Data warehouse simplifies database structure


      The dimensional model is simple and easy to extract data
      The dimensional model is simple and easy to extract data
      without writing complex queries.
      without writing complex queries.
      It is also optimized to speed up queries.
      It is also optimized to speed up queries.

                                                  LOCATION
                                                  DIMENSION                              ENTRANCE
             STUDENT                                                                     CATEGORY
            DIMENSION                                                                    DIMENSION

                                           STUDENT ENROLLMENT
                                                  FACT


            PROGRAM                                                                      STUDENT
            DIMENSION                                                                ENROLLMENT FLAGS
                                                                                        DIMENSION
                                                     TIME
                                                  DIMENSION


                                                                                                        24
            Lessons learned in developing a data warehouse for a tertiary institution in South Africa
                                                                                   Lessons learned
Start small


      The development cost of a DW can be reduced by reducing the
      The development cost of a DW can be reduced by reducing the
      overall scope of the project.
      overall scope of the project.
      The project can also be broken down into smaller components
      The project can also be broken down into smaller components
      and developed over a longer period of time.
      and developed over a longer period of time.
      Thomas (1997) research shows that higher education adopt
      Thomas (1997) research shows that higher education adopt
      bottom-up approach with:
      bottom-up approach with:
          Student data mart at first
          Student data mart at first
          Financial data mart second
          Financial data mart second
                                                           Almuni DM
          Human resources data mart third etc.
          Human resources data mart third etc.
                                                                                       Facilities DM

                                                                                               Building DM

                                                                                 Human Resources DM

                                                                                                      Financial DM

                                                                                        Student DM


                                                                                                                     25
              Lessons learned in developing a data warehouse for a tertiary institution in South Africa
                                                                                 Lessons learned
Use what is available


      To cut down costs no special hardware or software need to be
      To cut down costs no special hardware or software need to be
      purchased.
      purchased.
      Search in-house for developers.
      Search in-house for developers.
      Gathered detail requirements before and during the
      Gathered detail requirements before and during the
      development process.
      development process.




                                                                                                        26
            Lessons learned in developing a data warehouse for a tertiary institution in South Africa
                                                                                 Lessons learned
Make use of summarized granularity


      The data grain is an important characteristic of the fact table
      The data grain is an important characteristic of the fact table
      and is the level of detail for the measurements or metrics.
      and is the level of detail for the measurements or metrics.
      Some fact tables may just contain summary data and are called
      Some fact tables may just contain summary data and are called
      aggregate fact tables.
      aggregate fact tables.




 Student enrollment fact
  Student    Semester               Location            Qualification Enrolled_ Success_FTE
                                                                        FTE
   20123        2041                   BFN                    4500                     .06              .06
   20123        2042                   BFN                    4500                     .04              .04


                                                                                                              27
            Lessons learned in developing a data warehouse for a tertiary institution in South Africa
                                                                                Lessons learned
Make use of junk dimensions


     A junk dimension is a convenient grouping of flags and
     A junk dimension is a convenient grouping of flags and
     indicators.
     indicators.
     In a student data mart junk dimensions are very useful to
     In a student data mart junk dimensions are very useful to
     enrich data for providing certain statistics.
     enrich data for providing certain statistics.




                                                                                                       28
           Lessons learned in developing a data warehouse for a tertiary institution in South Africa
                                                                                Lessons learned
Get management involved


     To have a successful DW it is imperative to get management
     To have a successful DW it is imperative to get management
     involved as soon as possible.
     involved as soon as possible.
     To get support of management gradually make them
     To get support of management gradually make them
     knowledgeable of what a DW can do for them.
     knowledgeable of what a DW can do for them.
     During demonstrations give them opportunity to ask for any
     During demonstrations give them opportunity to ask for any
     student statistics.
     student statistics.
     Provide the answer immediately by just dragging, dropping and
     Provide the answer immediately by just dragging, dropping and
     drilling down in the pivot tables.
     drilling down in the pivot tables.
                                                                                   Directors
                                                               Admin                                   Managers

                                                       Business
                                                       Sponsor

                                                            User
                                                           Group
                                                                                                              29
           Lessons learned in developing a data warehouse for a tertiary institution in South Africa
                                                                                               Conclusion
Conclusion


     Current situations in South Africa are enforcing tertiary
     institutions to enter the DW arena.
     Several different challenges like:
        Low budgets
        Low budgets
        Dirty OLTP
        Dirty OLTP
        Requirement to provide statistics to the government.
        Requirement to provide statistics to the government.
        Information required for in-house strategic planning and
        Information required for in-house strategic planning and
        decision making.
        decision making.
     Lessons were shared in this study in the hope that it will
     be valuable to other tertiary institutions.




                                                                                                         30
             Lessons learned in developing a data warehouse for a tertiary institution in South Africa
Q U E S T I O N S
 A N S W E R S

				
DOCUMENT INFO