Deviation Detection for Educational OLAP - University of Manchester by dffhrtcv3

VIEWS: 2 PAGES: 16

									Deviation Detection for Educational OLAP
                Database
                                                                  Event




                                                                             Question


                               Learner



Fei He                                   y= a learner’s feedback on one question
                                             for a event
Supervisor: Dr Martin Brown
                                            Is it unusual (good or bad) ?
Control Systems Centre
The University of Manchester
                 E–Learning
                   Educational learning activities are accomplished over networks (Internet, LAN,
                    or WAN) based on electronic format (WebCT, blackboard, etc)
                   Tool for assisting users to find the best and most appropriate ways
                    in learning process
                   Feedback (learner/event portfolio – assessments, appraisals, reflection, event
                    quality, etc. e.g. rating books – amazon.co.uk) useful information to
                    administrator or analyst

                    Challenges
                   Large amount of feedback information need to be stored (Foundation Training
                    in the North West ~1000 learners ,20 assessments, 12 appraisals, 50 reflections,
                    40 teaching quality ratings)
                   How to extract most useful information and realize “free style” exploration



CSC 17th May 2006                                                                                    2/16
                Example: Work-based Learning System




                    The iSUS (intelligent signup system) learning cycle

CSC 17th May 2006                                                         3/16
                Deviation Detection Scenarios
          Deviation Detection: a technique to identify the abnormal or unusual data



                                              2                                Technique

                                                                                   yi1i2          yi1i2
                                                                                                   ˆ
                                 1                                di1i2        
                                                                                            in             in
                                                                          in
                                                                                           i i  12   in




               Deviation Detection in the iSUS Learning Cycle.


CSC 17th May 2006                                                                                               4/16
              OLAP (On-Line Analytic Processing)
                    Why Use It?
                    Large amount of data need to be stored in structure and efficiently

                    Definition:
                   Fast analysis of multi-dimensional information

                    Proving database structures and tools so that users can explore and
                    analyze the relationships in the database in an intuitive, efficient way.

                    Important Elements
                   Dimensions: categorical information (learner, event, time, teacher,
                    question, etc)– hierarchies
                   Facts (Measures): figures related to these dimensions (Feedback
                    information to be analyzed, e.g. attendance, quality rating, etc)



CSC 17th May 2006                                                                               5/16
              OLAP Data Schemas
               ROLAP – Rational OLAP                          MOLAP-Multidimensional OLAP

                                            Dimension Table
                             Fact Table




                               Measure


               •   Star Schema                                   •Data are stored in a
               •   Store data in relations, mature               multidimensional cube

               •Good for sparse data, no fact no                 •Fast data retrieval and query
               storage                                           property


CSC 17th May 2006                                                                                 6/16
                      Linear Modeling
                 Additive Fit Model
                                                                                    ˆ
                                                                                   Yij    i   j
                                              People
                                                                                          m n
                                                                                    min  (Yi , j  Yi , j ) 2
                               Lisa    Mike      Kate   Peter   Sally                                 ˆ
                                                                                  
                   Question1    4       4         5      4        3                     i 0 j 0
       Question




                                                                                   m
                                                                                  
                   Question2    2       3         3      2        1                 i  0
                                                                                   i 0
                   Question3    1       1         2      1        2                n
                                                                                    j  0
                   Question4    3       3         4      3        3                j 0
                                                                                  


                                                                        Row
                               Lisa    Mike      Kate   Peter   Sally
                   Question1     ˆ
                                Y1,1    ˆ
                                       Y1,2       ˆ
                                                 Y1,3    ˆ
                                                        Y1,4     ˆ
                                                                Y1,5
                                                                        Effects
                                                                         1        ˆ
                                                                                  Y2, 4     2   4
                   Question2    ˆ
                               Y2,1     ˆ
                                       Y2,2       ˆ
                                                 Y2,3    ˆ
                                                        Y2,4     ˆ
                                                                Y2,5     2
                   Question3    ˆ
                               Y3,1     ˆ
                                       Y3,2       ˆ
                                                 Y3,3    ˆ
                                                        Y3,4     ˆ
                                                                Y3,5     3
                   Question4    ˆ
                               Y4,1     ˆ
                                       Y4,2       ˆ
                                                 Y4,3    ˆ
                                                        Y4,4     ˆ
                                                                Y4,5     4

             Column Effects     1     2        3     4      5


CSC 17th May 2006                                                                                                 7/16
              Non-linear Modeling
         Multiplicative Fit Model
                                                                      m n

                      ˆ                                         min  (Yi , j  Yi , j ) 2
                                                                                  ˆ
                     Yi , j  h  ai  b j                    
                                                                    i 0 j 0

                                                              
                                                                 m

          More General:                                         ai  0
                                                               i 0
                                                               n
                                                               bj  0
                      ˆ
                     Yi , j    h  ai  b j                 j 0
                                                              



         Nonlinear Transformation Model
                      ˆ                     1
                     Yi , j 
                                1  exp( (    i   j )

         Hybrid (Additive-Plus-Multiplicative) model
                      ˆ
                     Yi , j  m  i   j  k i   j




CSC 17th May 2006                                                                             8/16
              Prediction Standard Deviation
              (Confidence Intervals)

         For Linear Additive Model
                                                                                                                   yi1i2    in    yi1i2
                                                                                                                                   ˆ       in
          Observation:                    Y  X  e                                              di1i2        
                                                                                                          in
                                                                                                                           i i  12   in

          Model Prediction:                ˆ
                                          Yij    i   j                    ˆ
                                                                                Y  Xˆ

          Standard Deviation of Prediction:                                                           Standard Deviation

                                            Var ( y)   n 2 xT ( X T X )1 x
                                                    ˆ
         For Nonlinear Models
          Locally Approximated by:                          ˆ
                                                           Yi  f ( xi , )  f ( xi , )  giT ˆ
          According to delta method:                            Var ( y)  g T  g   n 2 g T H 1g
                                                                        ˆ
          Where,
                                2 Err( ) N  y yˆ ˆ                     2 y 
                                                                                 ˆ
                    H i, j                                [ yi  yi ]
                                                                      ˆ              
                                 i  j   i 1 
                                                     ˆ ˆ
                                                    i  j                 ˆ ˆ
                                                                            i  j 
                                                                                    
                                                N
                                                      y y
                                                       ˆ ˆ
                                            
                                               i 1
                                                        ˆ ˆ
                                                       i  j

CSC 17th May 2006                                                                                                                               9/16
                Missing Data
                                          Lisa   Mike    Kate    Peter    Sally
                              Question1    4      4       5        4       3
                              Question2    2      3                2       1
                              Question3           1       2        1       2
                              Question4    3      3       4        3       3



                   Lisewise Deletion: Omit the cells of missing data, and do analysis on remains.
                    Easy for implementation, leads unbiased parameter estimation. Substantially
                    decreases the sample size for analysis.

                   Mean Substitution: Replacing all missing data in a variable by the mean of that
                    variable on the remaining data. Reduce the variance of variable

                   Expectation Maximization: An iterative procedure based on the principle of
                    maximum likelihood estimation. Expectation: compute expect value of
                    missing data given parameters. Maximization: maximize the likelihood
                    function and estimate new parameters given data set.

CSC 17th May 2006                                                                                    10/16
                High Dimensional Model
                For real case, it usually has more than 2 or 3 dimension

                   Additive Model
                      ˆ
                     Yi , j     i   j                    ˆ
                                                              Yi , j ,k    i   j   k

                   Multiplicative Model
                      ˆ
                     Yi , j  q  haib j                       ˆ
                                                              Yi , j ,k  q  haib j ck

                   Hybrid Model (Additive-plus-Multiplicative)

                      ˆ
                     Yi , j  m  i   j  ki  j

                      ˆ
                     Yi , j ,k  p  ai  b j  ck  haib j  lai ck  mbj ck  naib j ck



CSC 17th May 2006                                                                              11/16
              Defining Different Deviations

         Real users are interested on:
       What data are deviations in their respective group-byes
       What data need to be drill down further for deviations under
        them
       Which path beneath the data contains larger deviation and most
        need to be drill down
        Above questions can be solved by the following three definitions
        SelfDev, InDev, PathDev




CSC 17th May 2006                                                      12/16
              Defining Different Deviations

         SelfDev: denotes the exception value of a cell relative to other cells in the
          same group-bye,
                                                             yi1i2   iN    yi1i2
                                                                            ˆ       iN
                              SelfDev( yi1i2   iN )  max(                                t , 0)
                                                                     i i 12   iN



         InDev: denotes the total degree of surprise of all cells reachable by drill-down
          from this cell. InDev could be calculated as the maximum SelfDev values of
          all elements beneath this cell.

         PathDev: denotes the degree of surprise of each drill-down path from this cell.
          For one particular path, its PathDev is defined as the maximum of SelfDev of
          all cells reachable by drilling down along this path.



CSC 17th May 2006                                                                                   13/16
               Example
     A three-dimension data cube with dimension People, Question and Event

     1.    Question      All
             Event       All


                             People
                               Peter        David        Kate        Daniel    James              Nick
             SelfDev            0            0            2            0           0               0


     2.      Event       All                                           3.     People       Kate

           SelfDev     Question
                                                                                       E1          E2    E3   E4   E5
            People      Q1             Q2    Q3     Q4          Q5
                                                                              Q3       0            2    2    4    0
             Peter       0             0      1     0           0
             David       0             0      0     1           0
             Kate        1             1      3     2           1
            Daniel       0             0      0     0           0
            James        2             0      1     1           0
             Nick        0             0      0     1           0



CSC 17th May 2006                                                                                                       14/16
              Considering Hierarchies
                                             General Approach
                                        1. Calculate all the possible
                                          aggregated models
                                         a) Dimension Indices
                                         b) Level Indices
                                         c) Model’s prediction
                                         d) SelfDev value of each cell

                                        2. Construct the relationship
                                          between different kinds
                                          models




CSC 17th May 2006                                                        15/16
              Summary

         Apply OLAP technique to E-Learning Problem.
         Discuss the deviation detection technique, propose the
          method for modeling and calculating prediction
          confidence intervals, and extend to high dimensional
          case with missing data.
         Develop the general approach for analysis and
          exploration of high dimensional OLAP data.



CSC 17th May 2006                                                  16/16

								
To top