Docstoc

Modified Multi-dimensional Scaling Algorithm for Mining Gene

Document Sample
Modified Multi-dimensional Scaling Algorithm for Mining Gene Powered By Docstoc
					  Modified Multi-Dimensional Scaling (MDS)
Algorithm for Mining Gene Expression Patterns
        X.J. Ge*, S. Yonamene*, Y.M. Mi*, S. Tsutsumi**, Y. Kobune**,
                         H. Aburatani** and S. Iwata*
        *Research into Artifacts, Center for Engineering (RACE), The University of Tokyo,
                         Komaba 4-6-1, Meguro-ku, Tokyo 153-8904, Japan
**Department of Life Sciences, Research Center for Advanced Science and Technology (RCAST), The
             University of Tokyo, Komaba 4-6-1, Meguro-ku, Tokyo 153-8904, Japan


ABSTRACT:    The dataset of Golub et al. is analyzed by using
dimensionality-reduction techniques, including principal component
analysis (PCA), multi-dimensional scaling (MDS) and a modified
MDS algorithm. These methods produce snapshots that are helpful
for class discovery.
               Data Set 2: Golub, et al. Science 286: 531(1999).
                                                                                            1
Background
  Gene expression patterns can be considered as points in multi-
dimensional Euclidean spaces. As the high dimensionality causes
difficulty in analysis, it is helpful to have a low-dimensional,
representation that captures some characteristics of the raw
dataset.
  In principal component analysis (PCA), the raw data points are
linearly projected to some plane with maximum variance.
  In Multi-dimensional Scaling (MDS), data points are
represented on low-dimensional space such that the distances
between points are preserved. MDS is nonlinear.

                            Similarity
    n-D data points                          2-D map
                              matrix

                                                                   2
Results of PCA




       A linear projection of gene expression patterns
       using the first two principal components. Samples
       of ALL and AML are roughly mapped into
       different clusters.                                 3
 Results of conventional MDS

MDS minimizes the objective function:



  is the distance between points in
  the x-y plot
   is the Euclidean distance
   between gene expression
   patterns.                   Mapping of gene expression patterns by multi-
                               dimensional scaling (MDS). AML and two subtypes
                               of ALL samples are found in different regions. But
                               the classification is difficult without clinical
                               information.
                                                                         4
   Modified MDS

Goal: Enlarge trans-cluster distances
to make separation easier.
Physics background: condensation of
atoms to form solids with minimum
free energy.

                         (Objective function)
                                                Mapping of gene expression patterns by a
                                                modified multi-dimensional scaling (MDS)
                                                algorithm. AML and two subtypes of ALL
                                                samples are found in different regions.

                  (Dissimilarity)

                                                                                 5
Conclusions



• The difference between AML and ALL can be discovered even
using linear methods like principal component analysis(PCA). But
for more complicated data structures, such as the difference
between subtypes of ALL, PCA is not sufficient.
• Multi-dimensional scaling (MDS) can produce 2-D maps of gene
expression patterns that reveal more complicated data structures.




                                                                   6

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:1
posted:7/25/2013
language:English
pages:6