Docstoc

The MV Plot

Document Sample
The MV Plot Powered By Docstoc
					A Plot for Visualizing
  Multivariate Data
        Rida E. A. Moustafa

     George Mason University
         ADM Group,AAL

     rmoustaf@galaxy.gmu.edu
       rmustafa@aalcpas.com
                       Talk Outline

   The Theory of MV-Plot.
   Detecting Linear Structures with MV-plot.
   Detecting Non-Linear Structures with MV-plot.
   Comparisons with other methods and application on real data.
           MV-Plot Theory

Given an observation x=(x1,x2,…,xd)
We define m and v as follows:

                        d
     m  f ( x)    1
                    d   | x
                        j 1
                                   j   |

                                           d
     v  g ( x, f ( x))       1
                               d       | x
                                       j 1
                                               j    f ( x ) |2



Computing m and v for every observation
produces vector of m and v.

What is the relationship between m and v?
                MV-Relationship in 2-d

                        2
           mi     1
                   2    | xij |  1 (| xi1 |  | xi 2 |)
                       j 1
                                   2


                            2
           vi    1
                  2      | xij  mi |2 
                         j 1
                                              1
                                              2   xi1  xi 2


• Normalizing the data in range (0,1) avoid the abs-value in computing m.
• Close to the PC in 2-d
MV- detects linear structure(s)

If the data is linear in the original space

xi 2  w1 xi1  w0  mi                  1
                                          2   ( w1  1) xi1  w0 ;
vi     1
        2   ( w1  1) xi1  w0
if    w1   
( w1  1)  ( w1  1)  a1 ; w0  a0
 mi  a1 xi1  a0 ; vi  a1 xi1  a0

 It will be linear in the MV-space!!
MV- detects linear structure(s)
             d 1               
  m j  d  ( w j  1) xij  w0 
         1

             j 1               
              d 1                              
  v j  d 2  (d  1) w j  1) xij  (d  1) w0 
        d 1

              j 1                              

                 d 1          
          m j   a j xij  a0 
                 j 1          
                 d 1          
          v j   a j xij  a0 
                 j 1          
Detecting Linear structure(s)
         Example I
Detecting Linear structure(s)
         Example II
Detecting Linear structure(s)
        Example III
      Detecting nonlinear data
           with MV-plot

   MV- plot can detect nonlinear structure
    in the data set without any changes in
    the equations.
Detecting nonlinear structure
   x, cos(x)  m  x  cos(x), v | x  cos(x) |
   x, sin( x)  m  x  sin( x), v | x  sin( x) |
        Detecting Sphere(s)
Case I:
• The sphere radius R
• The sphere center is the origin

                              2

vi2  d  xij  mi   d  xij  dmi2      
          d                       d
      1                 1     2

          j 1                    j 1

v m 
    2
    i
                 2
                 i
                     R2
                     d    .
    Detecting Sphere(s)
Case II:
• The sphere radius R
• The sphere center is not the origin

                                        2

                                   
            d
vi2  d  xij  x cj  x cj  mi 
      1

            j 1


                                            
      d
 d  ( xij  x cj ) 2  d ( x cj  mi ) 2
  1

     j 1

v m 
     2
     i
                   2
                   i
                       R2
                       d    .
Detecting Sphere(s)
            Application on Real data

   Fisher’s IRIS data (150x4)
          3-classes of( 50 point each)
   Process control data (600x60)
          6-classes of (100 points each)
   Pollen data (3,848x5) (Wegman’s data)
          2-classes (linear and nonlinear)
      Related Dimensional
      Reduction Methods



   Multidimensional Scaling
   Fisher Discriminate Analysis
   Principal Component
IRIS (R. A. Fisher) Dataset
      150-cases in 4-dim
Time Series Dataset
 600-cases in 60-dim
                         Pollen dataset
                        3,848-points in 5-dim

Other methods:
Require more storage
and speed.
Even if it work, we
expect bad results on
this particular data.




                             (Wegman2002)
                       Pollen dataset

Linear and Nonlinear
mixed structures.
The linear structure in the
      Pollen data set




  17+16+18+17+14+16=98 Linear, 3750 nonlinear
                     Summary

   MV-algorithm can discover the linear and
    nonlinear pattern at the same time.
   MV-algorithm can discover symmetric data.
   MV-algorithm deals with large multivariate
    data.

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:5
posted:9/23/2011
language:English
pages:22