Two-Way MDS by HC1112140287


									           Two-Way MDS


            Simplest Case

      Given  = (ij) i, j=1, 2...n
a matrix of proximities (similarities or

We seek D = (dij) such that F (ij)  dij,
where F is linear, monotone, or other
          specified function.

                                   1/p

           R     
                     p             

where d =  xir - xjr
       ij r1    
                                   

                     with p  1

This is definition of Minkowski - p

or  Lp  metric.
         

If p =1 : "city block"
       p = 2 : Euclidean
       p =  : "maximum" or dominance metric

(Modified Minkowski - p metric is defined
for 0  p 1 if 1 p power is not taken.
Does not satisfy "segmental additivity.")

Minkowski-p, and all metrics, satisfy metric

positivity:         dij  0             i,j

reflexive           dii = 0            i

symmetry:           dij = dji           i,j

triangle            dik  dij + djk     i,j,k

    Segmental additivity, (which is not
necessarily true for all metrics) states that,
for any pair of points i and k, there exists a
third point, j, distinct from i and k, such that

                dik = dij + djk

                                                   (max metric)

                             (city block
                              metric)                  Euclidean


Is o s im ila rity   c o n to u rs   fo r M i n k o w s k i -p     m e tric s

    Segmental additivity is not satisfied by
"modified" Minkowski -p metric mentioned
above, or by some other metrics; e.g. arc-
length metric defined on points restricted to
a closed curve.


       d ij                      d jk

       i .                      . k

                    d ik

   Other metrics are possible, such as
Riemannian metrics, satisfying metric
axioms and, possibly, segmental additivity.

Definition of "Stress"

    Two forms:
                               
                                     2
           (dij d ij)2

    S1 = i, j
                                
                                
                  2             
               dij             
              i, j               

(Where summations are over i and j for
which ij is defined.)
                           2
                   ˆ
          (dij  d )
                    ij    
                          
    S  i, j
                         2 
     2                
            dij d
                      
                       
                       
         i, j 
                        

where d is mean of dij's (over "defined"
index pairs i, j).


                          
                             2
            dij dij)2
                          
                          
                          
     Sa  i, j
                          
                         
                d d 
                     
            ij a       
          i, j 
                         
                          

                 a = 1 or 2

     where d1  0
              d2  d
and d ij = F(ij)

    F is monotonic, in non-metric MDS,
linear (with or without constant term) in
metric MDS, or "multivariate regression"
may be used to define F.

           F defined by                data ('s) are
monotone regression                    ordinal scale
linear regression (with constant)      interval scale
linear regression (without constant)   ratio scale

Linear and "multivariate" (e.g. polynomial)
regression performed by standard O.L.S.
regression methods, with 's playing role of
independent variables, d's that of dependent

   Monotone regression is done via least
squares monotone regression algorithm
(MFIT) described in Kruskal (1964, b).

MDSCAL and KYST use a gradient

   Given X = (xir)

Define the (negative) gradient matrix

               

   G  - S
               
               
               
          ir   

   Given XI on Ith iteration,

   XI+1 = XI + αI GI

where "step size" αI is defined by
procedures described in Kruskal (1964, b).

Definition of Negative Gradient

      Sa             (p  2)
g =-       m x x           (x x )
 ir  x    j ij jr ir           jr ir


              
               2(d - d ) 
     d - d Sa
m K  ij ij
                 ij a 
                         
 ij   
         d  (p 1)       
          ij             
                            

 while K =             2
               Sa  (d - da )2
                  i, j ij

   If p = 2 (Euclidean Case)

g =  m (x x )
 ir    ij jr ir

   Geometric Interpretation of Gradient
Method for Optimizing Stress

   i is:       somewhat too far from j

               slightly too close to k

               too far from l

               much too far from m

       .   l                      .j

                   .   i
of "force
vectors" m
           .                      .   k

    Geometric Interpretation of Gradient
Method (Euclidean case)
    Focusing on a single point i, we first
define difference vectors vij to other points
           l                      j

            vil   .


   Then each difference vector, vij , is
multiplied by mij

   In general (particularly when sa is
   mij > 0 if dij > d ij; i.e. if dij is too large

   while mij < 0 if dij is too small.
    (Or, more generally, mij tends to be
larger in algebraic value the larger is dij
relative to d ij)

     Thus the multiplication of vij by mij
tends to produce a force vector pulling point
i toward j if dij is too large, or away from j if
dij is too small. The greater the magnitude
of the discrepancy the greater that of the
force vector.

    Geometrically this can be pictured as
            l                    j
        m             m
         ij < 0         ik > 0
                        il < 0 k


        (resolution vector for i)

    Thus the force vectors pushing i toward
or away from each other point are added to
produce a resolution force vector i, whose
coordinates are contained in the ith row of
the matrix G.

Then  (step size) times this resolution
vector is added to xi, simultaneously for all
i, by the operation Xnew = Xold + G.

                   Data Options

      Full matrix, diagonal present

Lower (Upper) half matrix, diagonal present

  "       "    "   " , diagonal absent

Lower (Upper) corner matrix

                      N           M


              M     DATA

   Lower corner matrix is M x N matrix,
treated as a submatrix of larger
(M + N) x (M + N) matrix.

    The larger matrix is treated as
symmetric, with missing data in the N x N
and M x M diagonal submatrices. The
N x M upper corner submatrix can be filled
by symmetry, if desired. In using the corner
matrix option one must provide only the
M x N lower (N x M upper) corner matrix,
the computer program treating this as if the
larger square matrix with blocks of missing
data had been provided as input.
    NOTE: In all these data options, it is
possible to indicate specific cell entries as
missing data.

          Multiple Data Matrices
            and Split Options
    It is possible to provide more than one
proximity matrix as input. (One could be a
lower corner matrix, another a full
nonsymmetric matrix, still another an upper
half matrix without diagonals, etc.)
    Multiple data matrices combined with
split options provide a great deal of
                  Split options
1) Split by row
    Combined with lower (upper) corner
matrix option allows multidimensional
unfolding. This combination of options
treats M x N lower corner matrix (say) as an
off-diagonal conditional proximity matrix
(data values are comparable only within
rows). This allows -

           via MDSCAL/KYST
    Data might be preference judgments of M
subjects for N stimuli, in the form on an
M x N conditional proximity matrix, since
order of stimuli (preferences) is meaningful
only within rows (subjects). Treating the data
as an M x N lower corner matrix and splitting
by rows fits Coombs's unfolding model, in
which a subject's preference is assumed
monotonically related to distance of stimulus
from subject's "ideal point" (in Joint space).
    Strictly speaking, unfolding should be
done using split by rows option with
(descending) monotone regression, but it is
sometimes done using a lower (upper) corner
matrix, but not splitting by rows, and/or using
linear regression (with or without constant
    WARNING! Degenerate solutions if
analysis is done improperly!

internal UNFOLDING
   1) Degeneracy if S1 is used (rather than
     - Can be constructed in one
       dimension! (And whether or not
       you split by rows.)

      M subjects             N stimuli
              dij = constant
           for all values of i (subject index)
             and j (stimulus index)
  d  same constant,  i, j
                  2
         (dij  d ij)
so S2  ij                 0      0!
     1          2        non-zero
             dij

    Ergo, if S1 is used, a trivial degenenerate
    solution with perfect (0) Stress (S1) is
    always possible, with either monotone
    regression, or linear regression (with
    This will not work with S2, since
normalization factor in denominator is:

       (d - d )2
     ij

which will equal 0 in this case, since dij is
constant for i  j (i.e., for all values of i and
j, for which data we defined).

    Thus S2 would = 0 (undefined) - but can
be shown to approach a non-zero value (in
the limit) as this degenerate configuration is
approached. (In fact, S2 will approach a

limiting value of 1.0, which is the maximal
value, and thus distinctly non-optimal.)

    2) Degenerate solution is also possible if
S2 is used with (descending) monotone
regression, but without splitting by rows.

    Let i, j represent index pair
corresponding to smallest preference (largest
dissimilarity) in M x N matrix.

   Again, a degeneracy is possible in one
            1         1         1

       i                           j

                all j  j    all i  i

        dij = 1 for all i and j except i, j

while dij = 3

 D 3
 s           step function
 t 2         monotone
 a           regression - Zero Stress!
 c 1


    (Numerator of S2 is zero, Denominator
is non-zero!)

    If there are ties for smallest preference
(largest dissimilarity) this will not work
precisely, but can probably be approached.

   It will work precisely even in this case,
   however, if the primary option for ties is
   used (rather than secondary option) in
   monotone regression.

   Options for ties

        Primary option. If ij = kl, then
there is no penalty in stress function if
dij  dkl.

     Secondary option. In case above, there
is a penalty, since dij must = dkl.

  Conclusions re use of KYST or
MDSCAL5 for unfolding:

   1) Use S2, not S1!
   2) If using monotone regression, split by

   (Probably would be a good idea to use
secondary option for ties as well.)

Other split options

    split by groups (of rows)
    split by decks - used

primarily to allow separate regressions for
different data matrices.

Note: Regression types can be different for
each block.

   (e.g., linear with constant for one,
ascending monotone for another, and linear
without constant for a third.)

But - If multiple matrices are input, with no
split options, same regression is fit over all
data - i.e., a single regression equation is fit
to all data. Proximity data is thus treated as
comparable both within and between all

Definition of Stress when split options used

  Let Sab be the Stress (type a) for block b.

          Sa *   1 BS 2
                 N  ab
                   B b

Weights for data values
    In addition to missing data options
already discussed, it is possible to provide
continuously valued weights for each data
value. These can either be input as a
separate data array, or computed as a
function of data values.

Use of split by block option for hybrid


   - "hybrid" analysis midway
     between metric and nonmetric
   - Input two copies of same data
     matrix, use monotone
     regression on one, linear
     regression for other.
Monotone Regression Algorithm
    Assume ascending (non-decreasing)

        1) Order distances (dependent
variables in same order as dissimilarities
       2) Start with finest possible partition
(each distance in block of one.)

       3) Beginning with first block
(corresponding to smallest dissimilarity:
            3-a) Check if block is "down
satisfied" (Mean value of block  that of
next lower block). [First block is down
satisfied by definition.]
            3-b) If block is not down
satisfied, merge with immediately lower
block. d value within merged block is then
defined to be mean of all d values in block.
            3-c) Check whether new block is
down satisfied. Continue this process until
resulting block is down satisfied.
           3-d) Go through analogous
process checking on "up satisfaction."
Continue until resulting block is up satisfied.
            3-e) Check again on down
satisfaction of block. Continue until
resulting block is both down- and up-

           3-f) Proceed to next higher block,
go through same process.
          3-g) Continue until all blocks are
up and down satisfied.

           Illustrative Example of Monotone
                  Regression Algorithm

                  Distance                                     Stage
Stimulus   Rank     (and                                                             Final
Pair                first       2           3          4        5       6      7      d 's
  ED        1         3         3           3          3        3       3      3       3
  EB        2         5         4           4          4        4       4      4       4
  AB        3         3
  CD        4         8          8          7
  AC        5         6      second                    6        6       6      6       6
  AD        6         4       block        third
  AE        7         8       not up      block       third     8       8
  CE        8         9      satistis-   (not up-    block
                               fied         sat)    (not up-    8.5
  CB        9        8        Merge       Merge        sat)            7.67   7.75   7.75
                              blocks      blocks     Merge
  BD        10       6         2&3         3&4       blocks     6

NOTE: This monotone regression algorithm
is central to many other "nonmetric"
techniques, such as MONANOVA,
nonmetric options in PREFMAP, ALSCAL,


To top