VIEWS: 0 PAGES: 29 POSTED ON: 12/14/2011 Public Domain
Two-Way MDS (MDSCAL5 AND KYST) Simplest Case Given = (ij) i, j=1, 2...n a matrix of proximities (similarities or dissimilarities) We seek D = (dij) such that F (ij) dij, where F is linear, monotone, or other specified function. (1) 1/p R p where d = xir - xjr ij r1 with p 1 This is definition of Minkowski - p or Lp metric. If p =1 : "city block" p = 2 : Euclidean p = : "maximum" or dominance metric (Modified Minkowski - p metric is defined for 0 p 1 if 1 p power is not taken. Does not satisfy "segmental additivity.") (2) Minkowski-p, and all metrics, satisfy metric axioms: positivity: dij 0 i,j reflexive dii = 0 i minimality: symmetry: dij = dji i,j triangle dik dij + djk i,j,k inequality: Segmental additivity, (which is not necessarily true for all metrics) states that, for any pair of points i and k, there exists a third point, j, distinct from i and k, such that dik = dij + djk (3) P= (max metric) P=1 P=2 (city block metric) Euclidean metric P<1 Is o s im ila rity c o n to u rs fo r M i n k o w s k i -p m e tric s (4) Segmental additivity is not satisfied by "modified" Minkowski -p metric mentioned above, or by some other metrics; e.g. arc- length metric defined on points restricted to a closed curve. j . d ij d jk i . . k d ik Other metrics are possible, such as Riemannian metrics, satisfying metric axioms and, possibly, segmental additivity. (5) Definition of "Stress" Two forms: 1 2 (dij d ij)2 S1 = i, j 2 dij i, j (Where summations are over i and j for which ij is defined.) 1 2 ˆ (dij d ) 2 ij S i, j 2 2 dij d i, j where d is mean of dij's (over "defined" index pairs i, j). (6) or 1 2 dij dij)2 Sa i, j 2 d d ij a i, j a = 1 or 2 where d1 0 d2 d and d ij = F(ij) F is monotonic, in non-metric MDS, linear (with or without constant term) in metric MDS, or "multivariate regression" may be used to define F. (7) F defined by data ('s) are monotone regression ordinal scale linear regression (with constant) interval scale linear regression (without constant) ratio scale Linear and "multivariate" (e.g. polynomial) regression performed by standard O.L.S. regression methods, with 's playing role of independent variables, d's that of dependent variables. Monotone regression is done via least squares monotone regression algorithm (MFIT) described in Kruskal (1964, b). (8) MDSCAL and KYST use a gradient method. Given X = (xir) Define the (negative) gradient matrix G - S x ir Given XI on Ith iteration, XI+1 = XI + αI GI where "step size" αI is defined by procedures described in Kruskal (1964, b). (9) Definition of Negative Gradient Sa (p 2) g =- m x x (x x ) ir x j ij jr ir jr ir ir where 2(d - d ) d - d Sa m K ij ij ij a ij d (p 1) ij while K = 2 Sa (d - da )2 i, j ij If p = 2 (Euclidean Case) g = m (x x ) ir ij jr ir j (10) Geometric Interpretation of Gradient Method for Optimizing Stress i is: somewhat too far from j slightly too close to k too far from l much too far from m . l .j . i Resolution of "force vectors" m . . k (11) Geometric Interpretation of Gradient Method (Euclidean case) Focusing on a single point i, we first define difference vectors vij to other points (j) l j vil . i vij k vik Then each difference vector, vij , is multiplied by mij In general (particularly when sa is "small") mij > 0 if dij > d ij; i.e. if dij is too large while mij < 0 if dij is too small. (12) (Or, more generally, mij tends to be larger in algebraic value the larger is dij relative to d ij) Thus the multiplication of vij by mij tends to produce a force vector pulling point i toward j if dij is too large, or away from j if dij is too small. The greater the magnitude of the discrepancy the greater that of the force vector. Geometrically this can be pictured as follows: l j i m m ij < 0 ik > 0 m il < 0 k g i (resolution vector for i) (13) Thus the force vectors pushing i toward or away from each other point are added to produce a resolution force vector i, whose coordinates are contained in the ith row of the matrix G. Then (step size) times this resolution vector is added to xi, simultaneously for all i, by the operation Xnew = Xold + G. (14) Data Options Full matrix, diagonal present Lower (Upper) half matrix, diagonal present " " " " , diagonal absent Lower (Upper) corner matrix N M N M DATA Lower corner matrix is M x N matrix, treated as a submatrix of larger (M + N) x (M + N) matrix. (15) The larger matrix is treated as symmetric, with missing data in the N x N and M x M diagonal submatrices. The N x M upper corner submatrix can be filled by symmetry, if desired. In using the corner matrix option one must provide only the M x N lower (N x M upper) corner matrix, the computer program treating this as if the larger square matrix with blocks of missing data had been provided as input. NOTE: In all these data options, it is possible to indicate specific cell entries as missing data. (16) Multiple Data Matrices and Split Options It is possible to provide more than one proximity matrix as input. (One could be a lower corner matrix, another a full nonsymmetric matrix, still another an upper half matrix without diagonals, etc.) Multiple data matrices combined with split options provide a great deal of flexibility. Split options 1) Split by row Combined with lower (upper) corner matrix option allows multidimensional unfolding. This combination of options treats M x N lower corner matrix (say) as an off-diagonal conditional proximity matrix (data values are comparable only within rows). This allows - (17) INTERNAL UNFOLDING via MDSCAL/KYST Data might be preference judgments of M subjects for N stimuli, in the form on an M x N conditional proximity matrix, since order of stimuli (preferences) is meaningful only within rows (subjects). Treating the data as an M x N lower corner matrix and splitting by rows fits Coombs's unfolding model, in which a subject's preference is assumed monotonically related to distance of stimulus from subject's "ideal point" (in Joint space). Strictly speaking, unfolding should be done using split by rows option with (descending) monotone regression, but it is sometimes done using a lower (upper) corner matrix, but not splitting by rows, and/or using linear regression (with or without constant term). WARNING! Degenerate solutions if analysis is done improperly! (18) DEGENERACIES IN KYST-MDSCAL internal UNFOLDING 1) Degeneracy if S1 is used (rather than S2). - Can be constructed in one dimension! (And whether or not you split by rows.) M subjects N stimuli dij = constant for all values of i (subject index) and j (stimulus index) d same constant, i, j ij 2 (dij d ij) so S2 ij 0 0! 1 2 non-zero dij ij (19) Ergo, if S1 is used, a trivial degenenerate solution with perfect (0) Stress (S1) is always possible, with either monotone regression, or linear regression (with constant). This will not work with S2, since normalization factor in denominator is: - (d - d )2 ij ij which will equal 0 in this case, since dij is constant for i j (i.e., for all values of i and j, for which data we defined). Thus S2 would = 0 (undefined) - but can 0 be shown to approach a non-zero value (in the limit) as this degenerate configuration is approached. (In fact, S2 will approach a (20) limiting value of 1.0, which is the maximal value, and thus distinctly non-optimal.) 2) Degenerate solution is also possible if S2 is used with (descending) monotone regression, but without splitting by rows. Let i, j represent index pair corresponding to smallest preference (largest dissimilarity) in M x N matrix. Again, a degeneracy is possible in one dimension! 1 1 1 i j all j j all i i dij = 1 for all i and j except i, j while dij = 3 (21) D 3 i s step function t 2 monotone a regression - Zero Stress! n c 1 e Preference (Numerator of S2 is zero, Denominator is non-zero!) If there are ties for smallest preference (largest dissimilarity) this will not work precisely, but can probably be approached. It will work precisely even in this case, however, if the primary option for ties is used (rather than secondary option) in monotone regression. (22) Options for ties Primary option. If ij = kl, then there is no penalty in stress function if dij dkl. Secondary option. In case above, there is a penalty, since dij must = dkl. Conclusions re use of KYST or MDSCAL5 for unfolding: 1) Use S2, not S1! 2) If using monotone regression, split by rows! (Probably would be a good idea to use secondary option for ties as well.) (23) Other split options split by groups (of rows) split by decks - used primarily to allow separate regressions for different data matrices. Note: Regression types can be different for each block. (e.g., linear with constant for one, ascending monotone for another, and linear without constant for a third.) But - If multiple matrices are input, with no split options, same regression is fit over all data - i.e., a single regression equation is fit to all data. Proximity data is thus treated as comparable both within and between all matrices. (24) Definition of Stress when split options used Let Sab be the Stress (type a) for block b. Then N Sa * 1 BS 2 N ab B b Weights for data values In addition to missing data options already discussed, it is possible to provide continuously valued weights for each data value. These can either be input as a separate data array, or computed as a function of data values. (25) Use of split by block option for hybrid analysis E.g.: - "hybrid" analysis midway between metric and nonmetric - Input two copies of same data matrix, use monotone regression on one, linear regression for other. Monotone Regression Algorithm Assume ascending (non-decreasing) regression. 1) Order distances (dependent variables in same order as dissimilarities (data). 2) Start with finest possible partition (each distance in block of one.) (26) 3) Beginning with first block (corresponding to smallest dissimilarity: 3-a) Check if block is "down satisfied" (Mean value of block that of next lower block). [First block is down satisfied by definition.] 3-b) If block is not down satisfied, merge with immediately lower block. d value within merged block is then defined to be mean of all d values in block. 3-c) Check whether new block is down satisfied. Continue this process until resulting block is down satisfied. 3-d) Go through analogous process checking on "up satisfaction." Continue until resulting block is up satisfied. 3-e) Check again on down satisfaction of block. Continue until resulting block is both down- and up- satisfied. (27) 3-f) Proceed to next higher block, go through same process. 3-g) Continue until all blocks are up and down satisfied. (28) Illustrative Example of Monotone Regression Algorithm Distance Stage Stimulus Rank (and Final Pair first 2 3 4 5 6 7 d 's stage) ED 1 3 3 3 3 3 3 3 3 EB 2 5 4 4 4 4 4 4 4 AB 3 3 CD 4 8 8 7 AC 5 6 second 6 6 6 6 6 AD 6 4 block third AE 7 8 not up block third 8 8 CE 8 9 satistis- (not up- block fied sat) (not up- 8.5 CB 9 8 Merge Merge sat) 7.67 7.75 7.75 blocks blocks Merge BD 10 6 2&3 3&4 blocks 6 3&4 NOTE: This monotone regression algorithm is central to many other "nonmetric" techniques, such as MONANOVA, nonmetric options in PREFMAP, ALSCAL, etc. (29)