Basic principles of probability theory

Document Sample
Basic principles of probability theory Powered By Docstoc
					                   Proximity matrices and scaling
•   Purpose of scaling
•   Similarities and dissimilarities
•   Classical Euclidean scaling
•   Non-Euclidean scaling
•   Horseshoe effect
•   Non-Metric Scaling
•   Example
                                   Proximity matrices
There are many situations when dissimilarities (or similarities) between individuals are
      measured. Purpose of multidimensional scaling is to find relative orientation (in some
      sense) of the individuals. If dissimilarities are ordinary Euclidean distances then purpose
      could be finding relative coordinates of the individuals. Scaling is also used for seriation
      purposes. E.g. in archeology, history. In this case only one dimensional configuration is
      sought.
In general dissimilarity matrix is called proximity matrix. I.e. how close are individuals to
      each other.
Euclidean distances are simplest of all and they are relatively easy to work with. Distances are
      defined as:
                              1)d ii  0 2)d ij  d ji
A matrix with the elements of square of these distances is denoted as D. This distance matrix is
      said to be Euclidean if n points corresponding to this matrix can be embedded in an
      Euclidean space so that distances between the points in the Euclidean space is equal to
      the corresponding elements of the matrix (it can be modified to add weights). Using
      orthogonal coordinate system Euclidean distances can be written.
                             n
                      d   ( xik  x jk ) 2
                        2
                       ij
                            k 1
If distances correspond to Euclidean distances then there is an elegant solution to the problem
        of finding configuration from the distance matrix.
                       Similarity and dissimilarity
There are situations when instead of distances (similarities) between the object
   similarities are given. Similarities can be defined as a matrix with elements with
   conditions:
                  1)crs  csr   2) crs  crr   for any r and s
For example correlation matrix can be considered as a similarity matrix. We can
   convert dissimilarities (distances) to similarities using:
                                d rs  (crr  css  2crs )1/ 2
This matrix obeys the conditions required by the metric distances. If similarity matrix C
   is positive semidefinite then derived distance matrix is Euclidean.
                                                           Metric scaling
Suppose that we have an nxn matrix of square of pairwise distances that are Euclidean. Denote
     this matrix and its elements with D and dij2. We want to find n points in k dimensional
     Euclidean space so that distances between these points are equal to the elements of D.
     Denote nxk matrix of points by X. Let us define Q=XXT. Then for distances and elements
     of Q we can write:
                                              k                                     k
                              d   ( xrj  xsj ) , qrs   xrj xsj
                                  2
                                  rs
                                                                     2

                                            j 1                                    j 1


Then the following relation between elements of D and Q can be written:
                              d rs  qrr  qss  2qrs
                                2


We can find the positions of the pointsnif we assume that centroid of the elements of X is 0, i.e.
                                       xrj  0
                                      r 1
Then we can write the following relations:
                              n                   n                      n                 n

                              d rs   qrr  nqss  2 qrs   qrr  nqss
                                 2

                            r 1                           r 1
Here we used the fact that centroidr of the points1is 0. Furthermore we can write:
                                      1           r

                                   n    n                      n              n                      n

                                   d
                                  r 1 s 1
                                                      2
                                                      rs    n  qrr  n  qss  2n  qrr
                                                              r 1           s 1                   s 1
Using these identities we can express the diagonal elements of the Q using the elements of D:
                      n
                               1 n n 2                             1 n 2     1                  n     n

                       qrr  2n  d rs ,
                      r 1       r 1 s 1
                                                              qss   d rs  2
                                                                   n r 1   2n
                                                                                                d
                                                                                               r 1 t 1
                                                                                                           2
                                                                                                           rt
                                    Metric scaling: Cont.
Let us denote matrix of diagonal elements of Q by E=diag(q11,q22,,,,qnn). Then relation between
      elements of D and Q could be written in a matrix form:
                                   D  11T E  E11 T  2Q
If we use the relation between diagonal terms of Q and elements of D we can write:
                                11T E  11T D/n  11T D11T /(2n2 )
                                E11 T  D11T /n  11T D11T /(2n2 )
Then we can write relation between Q and D:
                          (I  11T / n )D(I  11T / n )  2Q  2XX T
Thus we see that if we have the matrix of dissimilarities D we can find matrix Q using the
     relation:
                                                            1
                         Q  (I 11T /n)(I  11T /n),    D
                                                            2
Since D is Euclidean then matrix Q is positive semi-definite and symmetric. Then we can use
      spectral decomposition of Q (decomposition using eigenvalues and eigenvectors):
                             Q  T TT
If Q is positive semi-definite then all eigenvalues are non-negative. We can write (recall the form
       of the Q=XXT):
                            Q  T1/2 (T1/ 2 ) T          X T1/2
                    
This gives matrix of n coordinates (principal coordinates). Of course this configuration is not
      unique with respect to rotation and inversion.
                                 Metric scaling: Cont.
     Algorithm for metric scaling (finding principal coordinates) can be written as
          follows:
     1)   Given a matrix of dissimilarities form the matrix  with elements of –1/2dij2
     2)   Subtract from each element of this matrix averages of raw and column where it
          is located. Add to each element the total average. Denote it by Q
     3)   Find the eigenvalues and corresponding eigenvectors of Q. The dimensionality
          of the representation corresponds to the number of non-zero eigenvalues of this
          matrix.
     4)   Normalise eigenvectors so that:
          iT i  i , where - s are eigenvalues and - s are eigenvectors




                                                  Dimensionality
“Goodness of fit” of k dimensional configuration to the original matrix of dissimilarity is
     measured using “per cent trace” defined as:
                                                 k

                                                     i
                                         k    i 1
                                                  n
                                                           100 %
                                                
                                                i 1
                                                       i




Note that principal coordinate and principal component analysis are similar in some sense. If X is
      nxp data matrix and we have corresponding nxn dissimilarity matrix calculated from this,
      then principal component scores coincide with the coordinates calculated using scaling.
It might happen that dissimilarity matrix is not Euclidean. Then some of the calculated
      eigenvalues may become negative and the coordinate representation may include
      imaginary coordinates. If these eigenvalues are small then they can be ignored and set to 0.
If some eigenvalues are negative then goodness of fit can be modified in two different ways:
                               k                                k

                                   i                              2
                                                                      i
                            i 1
                              n
                                         100 %, or           i 1
                                                                 k
                                                                          100 %
                             |  |
                             i 1
                                     i                         
                                                               i 1
                                                                      2
                                                                      i
                               Horseshoe effect
In many situations when objects are close to each other then distances are measured more
   accurately. Other distances are usually measured less accurately. As a result scaling
   techniques usual pulls objects closer to each other that are in reality far from each other.
   Relative positions of objects close to each other are defined better than for object far from
   each other. This is called horseshoe effect. This effect is present in many dimension reduction
   techniques also.
Example: Define a similarity matrix 51x51 between objects using:

                                          8   1 | r  s | 3
                                          ...
                                          
                                          
                                          1 22 | r  s | 24
                                          0
                                               | r  s | 25


Then the distance matrix is derived from this matrix using the relation given above. For this
   matrix the result of metric scaling is:
                                     Other types of scaling
Classical scaling is one of the techniques used to find configuration from the dissimilarity matrix.
      There are several other techniques. Although they do not give direct algebraic solution,
      with modern computers they can be implemented and used. One of these techniques
      minimises the function:           n n
                                    ' '   ( ij  d ij ) 2
                                          i 1 j 1
Where -s are calculated distances. There is no algebraic solution. In this case 1) an initial
      dimension is chosen, 2) starting configuration is postulated and 3) this function is
      minimisied. If dimension is changed then whole procedure is repeated again. Then values
      of the function at the minimum for different dimensions can be used for scree-plot to
      decide dimensionality.
One of the techniques uses the following modification of the calculated distances:
                                      ˆ
                                      dij  a  b ij
Then it used in two types of functions. First of them is the standardised residual sum of squares:
                                                                          1/ 2
                                                                   2
                             STRESS   ( d ij  d ij ) 2 /  d ij 
                                                   ˆ
                                       i j                   i j     
Another one is the modified version of this:
                                                                      1/ 2
                                                                   4
                            SSTRESS   ( d ij  d ij ) 2 /  d ij 
                                               2   ˆ2
                                       i j                   i j     
Both these functions must be minimised iteratively using one of the optimisation techniques.
                                      Non-metric scaling
Although Euclidean and metric scaling can be applied for wide range of the cases there might be
      cases then requirement of these techniques might not be satisfied. For example:
1)    Dissimilarity of the objects may not be true distance but only order of similarity. I.e. we
      can only say that for a certain objects we should have (M=1/2 n(n-1))
                                       d i1 j1  d i2 j2  ...  d iM jM

2)    Measurements may have such large errors. We can be sure only on the order of distances
3)    When we use metric scaling we assume that true configuration exists and what we measure
      is an approximation for interpoint distances for this configurations. It may happen that we
      measure some function of the interpoint distances.
These cases are usually handled using non-metric scaling. One of the techniques is to use the
      function STRESS with constraints:

                               d ij  d rs ,   ij   rs , for all i, j, r , s


This technique is called non-metric to distinguish it from the previously described techniques.
If the observed distances are equal then usual technique is to exclude them from constraints.
Non-metric scaling techniques can handle missing distances simply by ignoring them.
                        Example: U.K. distances
Here is a table of some of the intercity distances:
          Bristol Cardiff Dover Exeter Hull Leeds London
Bristol     0
Cardiff 47            0
Dover 210           245        0
Exeter 84           121     250         0
Hull      231       251     266      305       0
Leeds 220           240     271      294      61    0
London 120          155      80      200 218 199      0
                                         Example:
Result of the metric scaling. Two dimensional coordinates:



                                [,1]    [,2]
                      [1,] -66.96555 42.04003
                      [2,] -76.31011 67.37324
                      [3,] -21.27937 -162.65703
                      [4,] -136.59765 58.13424
                      [5,] 163.00725 31.67896
                      [6,] 152.04793 40.75884
                      [7,] -13.90250 -77.32827
                                                           Example: Plot
Two dimensional plot gives. In this case we see that it gives representation of the UK
     map.

                                   Cardiff
                   Exeter
            50




                                      Bristol                              Leeds
                                                                               Hull
            0
            -50
        y




                                                    London
            -100
            -150




                                                   Dover


                            -100             -50           0    50   100   150
                        R commands for scaling
Classical metric scaling is in the library mva
library(mva)
cc = cmdscale(d, k = 2, eig = FALSE, add = FALSE, x.ret = FALSE)
d is a distance matrix, k is the number of the dimensions required.
You can plot using
plot(cc)
or using the following set of commands:
x = cc[,1] (or x = cc$points[,1])
y = cc[,2] (or y = cc$points[,2])
plot(x,y,main(“Metric scaling results”)
text(x,y,names(cc))

It is a good idea to have a look if number of the dimension requested is
      sufficient. It can be done by requesting eigenvalues and comparing
      them.
                           R commands for scaling
Non-metric scaling can be performed using isoMDS from the library(MASS)
library(MASS)
?isoMDS
then you can use
cc1 = isoMDS(a,cmdscale(a,k),k=2).
The second argument is for the initial configuration. Then we can plot using
x = cc1$points[,1]
y = cc1$points[,2]
plot(x,y,main=“isoMDS scaling”)
text(x,y,names(a))
Another non-metric scaling command is
sammon(a,cmdscale(a,k),k=2)
If you have a data matrix X then you can calculate distances using the command dist
dist(X,method=“euclidean”)
then the result of this command can be used for analysis with cmdscale, isoMDS or
    sammon.
                                References
1)   Krzanowski WJ and Marriout FHC. (1994) Multivatiate analysis. Kendall’s
     library of statistics
2)   Mardia, K.V. Kent, J.T. and Bibby, J.M. (2003) Multivariate analysis