# Two-Way MDS by HC1112140287

VIEWS: 0 PAGES: 29

• pg 1
```									           Two-Way MDS

(MDSCAL5 AND KYST)

Simplest Case

Given  = (ij) i, j=1, 2...n
a matrix of proximities (similarities or
dissimilarities)

We seek D = (dij) such that F (ij)  dij,
where F is linear, monotone, or other
specified function.

(1)
                  1/p


R     
    p             


where d =  xir - xjr






ij r1    





                  

with p  1

This is definition of Minkowski - p

or  Lp  metric.
      

If p =1 : "city block"
p = 2 : Euclidean
p =  : "maximum" or dominance metric

(Modified Minkowski - p metric is defined
for 0  p 1 if 1 p power is not taken.

(2)
Minkowski-p, and all metrics, satisfy metric
axioms:

positivity:         dij  0             i,j

reflexive           dii = 0            i
minimality:

symmetry:           dij = dji           i,j

triangle            dik  dij + djk     i,j,k
inequality:

necessarily true for all metrics) states that,
for any pair of points i and k, there exists a
third point, j, distinct from i and k, such that

dik = dij + djk

(3)
P=
(max metric)

P=1
P=2
(city block
metric)                  Euclidean
metric

P<1

Is o s im ila rity   c o n to u rs   fo r M i n k o w s k i -p     m e tric s

(4)
Segmental additivity is not satisfied by
"modified" Minkowski -p metric mentioned
above, or by some other metrics; e.g. arc-
length metric defined on points restricted to
a closed curve.

j
.

d ij                      d jk

i .                      . k

d ik

Other metrics are possible, such as
Riemannian metrics, satisfying metric

(5)
Definition of "Stress"

Two forms:
1

                  
    2
 (dij d ij)2





S1 = i, j
                   
                   
     2             

  dij             




i, j               



(Where summations are over i and j for
which ij is defined.)
1
                  2

          ˆ
 (dij  d )

2
           ij    
                 
S  i, j



2 
2                
  dij d
        
              
              
i, j 


 


where d is mean of dij's (over "defined"
index pairs i, j).

(6)
or

                
1
                   2
 dij dij)2
                
                
                
Sa  i, j




2
                
               
 d d 
      

  ij a       


i, j 
               

                

a = 1 or 2

where d1  0
d2  d

and d ij = F(ij)

F is monotonic, in non-metric MDS,
linear (with or without constant term) in
metric MDS, or "multivariate regression"
may be used to define F.

(7)
F defined by                data ('s) are
monotone regression                    ordinal scale
linear regression (with constant)      interval scale
linear regression (without constant)   ratio scale

Linear and "multivariate" (e.g. polynomial)
regression performed by standard O.L.S.
regression methods, with 's playing role of
independent variables, d's that of dependent
variables.

Monotone regression is done via least
squares monotone regression algorithm
(MFIT) described in Kruskal (1964, b).

(8)
MDSCAL and KYST use a gradient
method.

Given X = (xir)

       

G  - S
       
       
       
x




  ir   

Given XI on Ith iteration,

XI+1 = XI + αI GI

where "step size" αI is defined by
procedures described in Kruskal (1964, b).

(9)

 Sa             (p  2)
g =-       m x x           (x x )
ir  x    j ij jr ir           jr ir
ir

where

        
2(d - d ) 
d - d Sa


m K  ij ij
           ij a 
                   
ij   
   d  (p 1)       

    ij             
                      

while K =             2
Sa  (d - da )2
i, j ij

If p = 2 (Euclidean Case)

g =  m (x x )
ir    ij jr ir
j

(10)
Method for Optimizing Stress

i is:       somewhat too far from j

slightly too close to k

too far from l

much too far from m

.   l                      .j

.   i
Resolution
of "force
vectors" m
.                      .   k

(11)
Method (Euclidean case)
Focusing on a single point i, we first
define difference vectors vij to other points
(j)
l                      j

vil   .
i
vij

k
vik

Then each difference vector, vij , is
multiplied by mij

In general (particularly when sa is
"small")

mij > 0 if dij > d ij; i.e. if dij is too large

while mij < 0 if dij is too small.
(12)
(Or, more generally, mij tends to be
larger in algebraic value the larger is dij

relative to d ij)

Thus the multiplication of vij by mij
tends to produce a force vector pulling point
i toward j if dij is too large, or away from j if
dij is too small. The greater the magnitude
of the discrepancy the greater that of the
force vector.

Geometrically this can be pictured as
follows:
l                    j
i
m             m
ij < 0         ik > 0
m
il < 0 k

g
i

(resolution vector for i)

(13)
Thus the force vectors pushing i toward
or away from each other point are added to
produce a resolution force vector i, whose
coordinates are contained in the ith row of
the matrix G.

Then  (step size) times this resolution
vector is added to xi, simultaneously for all
i, by the operation Xnew = Xold + G.

(14)
Data Options

Full matrix, diagonal present

Lower (Upper) half matrix, diagonal present

"       "    "   " , diagonal absent

Lower (Upper) corner matrix

N           M

N

M     DATA

Lower corner matrix is M x N matrix,
treated as a submatrix of larger
(M + N) x (M + N) matrix.

(15)
The larger matrix is treated as
symmetric, with missing data in the N x N
and M x M diagonal submatrices. The
N x M upper corner submatrix can be filled
by symmetry, if desired. In using the corner
matrix option one must provide only the
M x N lower (N x M upper) corner matrix,
the computer program treating this as if the
larger square matrix with blocks of missing
data had been provided as input.
NOTE: In all these data options, it is
possible to indicate specific cell entries as
missing data.

(16)
Multiple Data Matrices
and Split Options
It is possible to provide more than one
proximity matrix as input. (One could be a
lower corner matrix, another a full
nonsymmetric matrix, still another an upper
half matrix without diagonals, etc.)
Multiple data matrices combined with
split options provide a great deal of
flexibility.
Split options
1) Split by row
Combined with lower (upper) corner
matrix option allows multidimensional
unfolding. This combination of options
treats M x N lower corner matrix (say) as an
off-diagonal conditional proximity matrix
(data values are comparable only within
rows). This allows -

(17)
INTERNAL UNFOLDING
via MDSCAL/KYST
Data might be preference judgments of M
subjects for N stimuli, in the form on an
M x N conditional proximity matrix, since
order of stimuli (preferences) is meaningful
only within rows (subjects). Treating the data
as an M x N lower corner matrix and splitting
by rows fits Coombs's unfolding model, in
which a subject's preference is assumed
monotonically related to distance of stimulus
from subject's "ideal point" (in Joint space).
Strictly speaking, unfolding should be
done using split by rows option with
(descending) monotone regression, but it is
sometimes done using a lower (upper) corner
matrix, but not splitting by rows, and/or using
linear regression (with or without constant
term).
WARNING! Degenerate solutions if
analysis is done improperly!

(18)
DEGENERACIES IN KYST-MDSCAL
internal UNFOLDING
1) Degeneracy if S1 is used (rather than
S2).
- Can be constructed in one
dimension! (And whether or not
you split by rows.)

M subjects             N stimuli
dij = constant
for all values of i (subject index)
and j (stimulus index)

d  same constant,  i, j
ij
 2
 (dij  d ij)
so S2  ij                 0      0!
1          2        non-zero
 dij
ij

(19)
Ergo, if S1 is used, a trivial degenenerate
solution with perfect (0) Stress (S1) is
always possible, with either monotone
regression, or linear regression (with
constant).
This will not work with S2, since
normalization factor in denominator is:

-
(d - d )2
 ij
ij

which will equal 0 in this case, since dij is
constant for i  j (i.e., for all values of i and
j, for which data we defined).

Thus S2 would = 0 (undefined) - but can
0
be shown to approach a non-zero value (in
the limit) as this degenerate configuration is
approached. (In fact, S2 will approach a

(20)
limiting value of 1.0, which is the maximal
value, and thus distinctly non-optimal.)

2) Degenerate solution is also possible if
S2 is used with (descending) monotone
regression, but without splitting by rows.

Let i, j represent index pair
corresponding to smallest preference (largest
dissimilarity) in M x N matrix.

Again, a degeneracy is possible in one
dimension!
1         1         1

i                           j

all j  j    all i  i

dij = 1 for all i and j except i, j

while dij = 3

(21)
D 3
i
s           step function
t 2         monotone
a           regression - Zero Stress!
n
c 1
e

Preference

(Numerator of S2 is zero, Denominator
is non-zero!)

If there are ties for smallest preference
(largest dissimilarity) this will not work
precisely, but can probably be approached.

It will work precisely even in this case,
however, if the primary option for ties is
used (rather than secondary option) in
monotone regression.

(22)
Options for ties

Primary option. If ij = kl, then
there is no penalty in stress function if
dij  dkl.

Secondary option. In case above, there
is a penalty, since dij must = dkl.

Conclusions re use of KYST or
MDSCAL5 for unfolding:

1) Use S2, not S1!
2) If using monotone regression, split by
rows!

(Probably would be a good idea to use
secondary option for ties as well.)

(23)
Other split options

split by groups (of rows)
split by decks - used

primarily to allow separate regressions for
different data matrices.

Note: Regression types can be different for
each block.

(e.g., linear with constant for one,
ascending monotone for another, and linear
without constant for a third.)

But - If multiple matrices are input, with no
split options, same regression is fit over all
data - i.e., a single regression equation is fit
to all data. Proximity data is thus treated as
comparable both within and between all
matrices.

(24)
Definition of Stress when split options used

Let Sab be the Stress (type a) for block b.

Then
N
Sa *   1 BS 2
N  ab
B b

Weights for data values
In addition to missing data options
already discussed, it is possible to provide
continuously valued weights for each data
value. These can either be input as a
separate data array, or computed as a
function of data values.

(25)
Use of split by block option for hybrid
analysis

E.g.:

- "hybrid" analysis midway
between metric and nonmetric
- Input two copies of same data
matrix, use monotone
regression on one, linear
regression for other.
Monotone Regression Algorithm
Assume ascending (non-decreasing)
regression.

1) Order distances (dependent
variables in same order as dissimilarities
(data).
(each distance in block of one.)

(26)
3) Beginning with first block
(corresponding to smallest dissimilarity:
3-a) Check if block is "down
satisfied" (Mean value of block  that of
next lower block). [First block is down
satisfied by definition.]
3-b) If block is not down
satisfied, merge with immediately lower

block. d value within merged block is then
defined to be mean of all d values in block.
3-c) Check whether new block is
down satisfied. Continue this process until
resulting block is down satisfied.
3-d) Go through analogous
process checking on "up satisfaction."
Continue until resulting block is up satisfied.
3-e) Check again on down
satisfaction of block. Continue until
resulting block is both down- and up-
satisfied.

(27)
3-f) Proceed to next higher block,
go through same process.
3-g) Continue until all blocks are
up and down satisfied.

(28)
Illustrative Example of Monotone
Regression Algorithm

Distance                                     Stage
Stimulus   Rank     (and                                                             Final

Pair                first       2           3          4        5       6      7      d 's
stage)
ED        1         3         3           3          3        3       3      3       3
EB        2         5         4           4          4        4       4      4       4
AB        3         3
CD        4         8          8          7
AC        5         6      second                    6        6       6      6       6
AE        7         8       not up      block       third     8       8
CE        8         9      satistis-   (not up-    block
fied         sat)    (not up-    8.5
CB        9        8        Merge       Merge        sat)            7.67   7.75   7.75
blocks      blocks     Merge
BD        10       6         2&3         3&4       blocks     6
3&4

NOTE: This monotone regression algorithm
is central to many other "nonmetric"
techniques, such as MONANOVA,
nonmetric options in PREFMAP, ALSCAL,
etc.

(29)

```
To top