Modified Multi-Dimensional Scaling (MDS)
Algorithm for Mining Gene Expression Patterns
X.J. Ge*, S. Yonamene*, Y.M. Mi*, S. Tsutsumi**, Y. Kobune**,
H. Aburatani** and S. Iwata*
*Research into Artifacts, Center for Engineering (RACE), The University of Tokyo,
Komaba 4-6-1, Meguro-ku, Tokyo 153-8904, Japan
**Department of Life Sciences, Research Center for Advanced Science and Technology (RCAST), The
University of Tokyo, Komaba 4-6-1, Meguro-ku, Tokyo 153-8904, Japan
ABSTRACT: The dataset of Golub et al. is analyzed by using
dimensionality-reduction techniques, including principal component
analysis (PCA), multi-dimensional scaling (MDS) and a modified
MDS algorithm. These methods produce snapshots that are helpful
for class discovery.
Data Set 2: Golub, et al. Science 286: 531(1999).
Gene expression patterns can be considered as points in multi-
dimensional Euclidean spaces. As the high dimensionality causes
difficulty in analysis, it is helpful to have a low-dimensional,
representation that captures some characteristics of the raw
In principal component analysis (PCA), the raw data points are
linearly projected to some plane with maximum variance.
In Multi-dimensional Scaling (MDS), data points are
represented on low-dimensional space such that the distances
between points are preserved. MDS is nonlinear.
n-D data points 2-D map
Results of PCA
A linear projection of gene expression patterns
using the first two principal components. Samples
of ALL and AML are roughly mapped into
different clusters. 3
Results of conventional MDS
MDS minimizes the objective function:
is the distance between points in
the x-y plot
is the Euclidean distance
between gene expression
patterns. Mapping of gene expression patterns by multi-
dimensional scaling (MDS). AML and two subtypes
of ALL samples are found in different regions. But
the classification is difficult without clinical
Goal: Enlarge trans-cluster distances
to make separation easier.
Physics background: condensation of
atoms to form solids with minimum
Mapping of gene expression patterns by a
modified multi-dimensional scaling (MDS)
algorithm. AML and two subtypes of ALL
samples are found in different regions.
• The difference between AML and ALL can be discovered even
using linear methods like principal component analysis(PCA). But
for more complicated data structures, such as the difference
between subtypes of ALL, PCA is not sufficient.
• Multi-dimensional scaling (MDS) can produce 2-D maps of gene
expression patterns that reveal more complicated data structures.