Analysis of Data Mining Visualization Techniques Using ICA AND SOM Concepts
The International Journal of Computer Science and Information Security (IJCSIS) is a reputable venue for publishing novel ideas, state-of-the-art research results and fundamental advances in all aspects of computer science and information & communication security. IJCSIS is a peer reviewed international journal with a key objective to provide the academic and industrial community a medium for presenting original research and applications related to Computer Science and Information Security. . The core vision of IJCSIS is to disseminate new knowledge and technology for the benefit of everyone ranging from the academic and professional research communities to industry practitioners in a range of topics in computer science & engineering in general and information & communication security, mobile & wireless networking, and wireless communication systems. It also provides a venue for high-calibre researchers, PhD students and professionals to submit on-going research and developments in these areas. . IJCSIS invites authors to submit their original and unpublished work that communicates current research on information assurance and security regarding both the theoretical and methodological aspects, as well as various applications in solving real world information security problems. . Frequency of Publication: MONTHLY ISSN: 1947-5500 [Copyright � 2011, IJCSIS, USA]
- views:
- 87
- posted:
- 2/14/2011
- language:
- English
- pages:
- 10

(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 1, January 2011
ANALYSIS OF DATA MINING VISUALIZATION TECHNIQUES
USING ICA AND SOM CONCEPTS
K.S.RATHNAMALA,1 Dr.R.S.D. WAHIDA BANU2
1 Research Scholar of Mother Teresa Women’s University, Kodaikanal
2 Professor& Head, Dept. of Electronics& Communication Engg., GCE.
This research paper is about data mining (DM) and Agglomerative hierarchical methods, Time series
visualization methods using independent segmentation, Finding patterns by proximity,
component analysis and self organizing map for Clustering validity indices, Feature selection and
gaining insight into multidimensional data. A new weighing Fast ICA.
method is presented for an interactive visualization 1. INTRODUCTION
of cluster structures in a self-organizing Map. By The tasks that are encountered within data mining
using a contraction model, the regular grid of self- research are predictive modeling, descriptive
organizing map visualization is smoothly changed modeling, discovering rules and patterns,
toward a presentation that shows better the exploratory data analysis, and retrieval by content.
proximities in the data space. A Novel Visual Data Predictive modeling includes many typical tasks of
Mining method is proposed for investigating the machine learning such as classification and
reliability of estimates resulting from a Stochastic regression. Descriptive modeling that is ultimately
independent component analysis (ICA) algorithm. about modeling all of the data e.g., estimating its
There are two algorithms presented in this paper probability distribution. Finding a clustering,
that can be used in a general context. Fast ICA for segmentation or informative linear representation
independent binary sources is described. The are common subtasks of descriptive modeling.
model resembles the ordinary ICA model but the Particular methods for discovering rules and
summation is replaced by the Boolean Operator patterns emphasize finding interesting local
OR and the multiplication by AND. A heuristic characteristics and patterns instead of global
method for estimating the binary mixing matrix is models.
also proposed. Furthermore, the differences on the Descriptive data mining techniques for
results when using different objective function in data description can be divided roughly into three
the FastICA estimation algorithm is also discussed. groups:
KEY WORDS: Proximity preserving projections for (visual)
Independent component analysis, Self investigation of the structure of the data.
organizing map, Vector quantization, patterns,
171 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 1, January 2011
Partitioning the data by clustering and beginning gives a global ordering for the map. The
segmentation . kernel width σ(t) is then decreased monotonically
Linear projections for finding interesting linear along with iteration steps which increases the
combinations of the original variables using flexibility of the map to provide lower quantization
principal component analysis and independent error in the end. If the radius is run to zero, the
component analysis. batch SOM becomes identical to K-means.
A clustering is a partition of the set of all The batch SOM is a computational short-
data items C= {1,2,....N} into K disjoint clusters cut version of the basic. Despite the intuitive clarity
C = U iK = 1Ci and elegance of the basic SOM, its mathematical
analysis has turned out to be rather complex. This
2.SELF- ORGANIZING MAP
comes from the fact that there exists no cost
The basic Self-organizing map is formed of
function that the basic SOM would minimize for a
K map units organized on a regular k x l low-
probability distribution .
dimensional grid-usually 2D for visualization.
In general, the number of map codebook vectors
Associated to each map unit i, there is a
governs the computational complexity of one
1. Neighborhood kernel h(dij,σ(t))where the
iteration step of the SOM. If the size of the SOM is
distance dij is measured from map unit i to others
scaled linearly with the number of data vectors, the
along the grid (output space), and
load scales to O (MN2). But on the other hand, the
2. a codebook vector ci that quantize the data space
selection of K can be made following, e.g., N as
(input space).
suggested in and the load decreases to O (MN1.5).
The magnitude of the neighborhood kernel
It is suggested that the SOM Toolbox applies to
decreases monotonically with the distance dij. A
small to medium data sets up to, say, 10 000-100
typical choice is the Gaussian kernel .
000 records. A specific problem is that the memory
Batch algorithm
consumption in the SOM Toolbox grows
One possibility to implement a batch SOM
quadratically along with the map size K.
algorithm is to add an extra step to the batch K-
In practice, the SOM and its variants have
means procedure.
been successful in a considerable number of
∑ = 1 C h(d ,α (t ))c ) , ∀i
K
j ij j application fields and individual applications. In
ci :=
j
∑ = 1 C h(d σ (t ))
K
j j dj , the context of this paper interesting application
areas close to VDM include
A relatively large neighborhood radius in the
172 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 1, January 2011
Visualization and UI techniques especially in merely provide a partition of the items in the
information retrieval, and exploratory data analysis sample: the agglomerative hierarchical methods
in general. provide an example of this case.
Context-aware computing. The family of partitional methods is often opposed
Industrial applications for process monitoring to the hierarchical methods. Agglomerative
and analysis. hierarchical methods do not aim at minimizing a
Visualization capabilities, data and noise global criteria for partitioning, but join data items
reduction by topoloigically restricted vector in bigger clusters in a bottom-up manner. In the
quantization and practical robustness of the SOM beginning, all samples are considered to form their
are of benefit to data mining. There are also own cluster. After this, at N-1 steps the pair of
methods for additional speed-ups in the SOM for clusters having minimal pairwise dissimilarity δ
especially large datasets in data mining and in are joined, which reduces the number of remaining
document retrieval applications. clusters by one. The merging is repeated until all
The SOM framework is not restricted to data is in one cluster. This gives a set of nested
Euclidean space or real vectors. A variant of the partitions and a tree presentation is quite a natural
SOM in a non-Euclidean space or real vectors. A way of representing the result.
variant of the SOM in a non-Euclidean space is Here we list the between-cluster dissimilarities δ
presented to enhance modeling and visualizations of some of the most common agglomeration
of hierarchically distributed data. This method strategies the single linkage (SL), complete linkage
uses a fisheye distortion in the visualization. Also (CL) and average linkage (AL) criteria.
self-organizing maps and similar structures for
δ1 = δ SL = min dij , iεCk , jεC1
symbolic data exist and have been applied also to
δ 2 = δ CL = mixdij , iεCk , jεC1
context-aware computation.
1
3.AGGLOMERATIVE HIERARCHICAL δ 3 = δ AL = ∑
Ck C1 iεC k
∑d
ε
j C1
ij
METHODS:
Some clustering methods construct a model of the Where Ck, Cl, (k≠l) are any two distinct clusters.
input data space that inherently would allow SL and CL are invariant for monotone
classifying a new sample into some of the transformations of dissimilarity. SL is reported to
determined clusters. K-means partition the input be noise sensitive but capable of producing
data space in this manner. Some other methods elongated or chained clusters while CL and AL
173 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 1, January 2011
tend to produce more spherical clusters. If segmentation with SSE cost and vector
similarities are used instead, the merging occurs quantization. In vector quantization, the borders of
for maximum pairwise cluster similarity. the nearest neighbors regions Vi are defined by the
4. TIME SERIES SEGMENTATION: codebook vectors, whereas in segmentation, the
In addition to the basic cluster analysis mean vectors, Ci are determined by the segments
tasks, other clustering methods that include Ci but cannot directly be used to infer the segment
auxiliary constraints are also discussed here. The borders.
time series segmentation where the data items have Minimizing the cost in Eq- 1 for
some natural order, e.g., time, which must be taken segmentation aims at describing each segment by
into account; a segment always consists of a its mean value. It may also be seen as splitting the
sequence of subsequent samples of the time series. sequence so that the (biased) sample variance
A K-segmentation divides X into K computed by pooling the sample variances of the
segments Ci with K -1 segment borders C1,......, segments together is minimal.
CK-1 so that Algorithms
The basic segmentation problem can be
C1 = [x(1), x(2)......, x(c1)], ., CK= [x(cK-1+1), x(cK-
solved optimally using dynamic programming.
1+2),....,x(cN)] Eq -1
The dynamic programming algorithm finds also
This is the basic time series segmentation
optimal 1,2,.... K-1 segmentations while searching
task where each segment is considered to emerge
for an optimal K-segmentation. The computational
from a different model; Furthermore, we consider
complexity of dynamic programming is of order O
the case where the data to be segmented is readily
(KN2) if the cost of a segmentation can be
available.
calculated in linear time. It may be too much when
As in the basic clustering task, we wish to
there are large amounts of data.
minimize some adequate cost function by selection
Another class are the merge-split
of the segment border. We stay with costs which
algorithms of which the local and global iterative
are sums of individual segment costs that are not
replacement algorithms (LIR and GIR) resemble
affected by changes in other segments. An
the batch K-means in the sense that at each step
example of such a function is an SSE cost function
they change the descriptors of partition (Segment
like that of Eq-1 where ci is the mean vector of data
borders vs. codebook vectors) to match with a
vectors in segment Ci. There is, of course, a
necessary condition of local optimum. The LIR
fundamental difference between time series
gets more easily stuck in bad local minima, and the
174 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 1, January 2011
GIR was considerably better in this sense, yet still within-cluster dispersion (Scatter) DW, between
sensitive to the initialization. The GIR and LIR cluster dispersion DB, and their sum, the total
algorithms can be seen as variants of the “Pavlidis dispersion DT, that is constant and independent of
algorithm” that changes the borders gradually the clustering. For data in a Euclidean space.
toward a local optimum. K
DW = ∑ DW (i ), DW (i ) = ∑ ( x( j ) − ci) ( x( j ) − ci) T
The test procedures use random i =1 ε
j Ci
K
initialization for the segments. As in the case of K- DB = ∑ / Ci / ci − c) (ci − c)T
i =1
means, the initialization matters, and it might be N
DT = DW + DB = ∑ ( x( j ) − c) ( x( j ) − c)T
advisable to try an educated guess for initial i =1
positions. One possibility to create a more Where K is the number of clusters, Ci is the
effective segmentation algorithm is to combine average of the data in cluster Ci, and c is the
several greedy methods. For example, the basic average of all data. These quantities can be
bottom-up and top-down methods can be fine- formulated also for a general dissimilarity matrix.
tuned by merge-split methods. The dispersion matrices can be used as a
Applications basis for different cost functions. Tow criteria
Time series and other similar invariant to (non-singular) linear transformations
segmentation problems arise in different of data based on the dispersion matrices;
−1
applications, e.g., in approximating functions by maximizing trace Dw DW . Minimizing det (DW)
piecewise linear functions. This might be done for gives the maximum likelihood solution for a model
the purpose of simplifying or analyzing contour or where all clusters are assumed to have a Gaussian
boundary lines. Another aim, important in distribution with the same covariance matrix.
information retrieval, is to compress or index The aforementioned criteria may be
voluminous signal data. Other applications in data difficult to optimize. Therefore a scale dependent
analysis span from phoneme segmentation into criteria, minimization of trace (DW) has become
finding sequences in biological or industrial popular, presumably because it can be
process data. (Suboptimally) minimized with the fast and
5. VECTOR QUANTIZATION: computationally light K-means algorithm that is
Suggested by intuitive aim of the basic clustering shortly described in more detail.
task, adequate global clustering criteria can be Minimization of trace (DW) is the same as
obtained by minimizing / maximizing a function of minimizing the sum of squared errors (SSE)
175 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 1, January 2011
between a data vector x(i) and the nearest cluster nearest neighbor conditions that are necessary for
centroid Cj: optimal vector quantization.
K
SSE = ∑ ∑ x( j ) − ci 2
i −1 ε
x ( j ) Ci 1. Given a codebook of vectors ci i=1,2,......K
The above Eq. is encountered in vector associate the data vectors into codebook
quantization, a form of clustering that is vectors according to the nearest neighbor
particularly intended for compressing data. In condition. Now, each code book vector has a
vector quantization, the cluster centroids appearing set of data vectors Ci associated to it.
in the above Eq. are called codebook vectors. The 2. Update the codebook vectors to the centroids
codebook vectors partition the input space in of sets Ci according to the centroid condition.
nearest neighbor regions Vi. A region Vi That is, for all i set ci :=(1/⎪Ci⎪)∑j∈Ci xj.
associated with the nearest cluster centroid by 3. Repeat form step 1 until the codebook vectors
ci do not change any more.
Vi= {x: x-ci ≤ x-c1 ;∀∫}
When the iteration stops, a local minimum for the
(nearest neighbor condition). quantity SSE is achieved K-means typically
converges very fast. Furthermore, when K<< N,
Cluster Ci in the above Eq is now the set of input
K-means is computationally far less expensive than
data points that belong to Vi.
the hierarchical agglomerative methods, since
K-means
computing KN distances between codebook
k-means refers to a family of algorithms
vectors and the data vectors suffices.
that appear often in the context of vector
Well known problems with the K-means
quantization. K-means algorithms are
procedure are that it converges but to a local
tremendously popular in clustering and often used
minimum and is quite sensitive to initial
for exploratory purpose. As a clustering model the
conditions. A simple initialization is to start the
vector quantizer has an obvious limitation. The
procedure using K randomly picked vectors from
nearest neighbor regions are convex, which limits
the sample. A first aid solution for trying to avoid
the shape of clusters that can be separated.
bad local minima is to repeat K-means a couple of
We consider only the batch k-means algorithm;
times from different initial conditions. More
different sequential procedures are explained. The
advanced solutions include using some form of
batch K-means algorithm proceeds by applying
stochastic relaxation among other modifications.
alternatively in successive steps the centroid and
176 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 1, January 2011
⎧ ⎫
⎪ δ (C C ) ⎪
6. CLUSTERING VALIDITY INDICES ν AB = min ⎨ A k , l ⎬
{
k , k ≠ l ⎪ max Δ B (Cm ) ⎪
{
The clustering methods in this paper do not directly ⎩ m ⎭
make a decision of the number of clusters but require where δA is some between-cluster dissimilarity
it as a parameter. This poses a question which measure and ΔB is some measure of within-cluster
number of clusters fits best to the "natural structure" dispersion (diameter), e.g.,
of the data. The problem is somewhat vaguely Δ1 (Ck ) = max d ij , i, jε Ck
defined since the utility of clusters is not explicitly
1
stated with any cost function. An approach to solve
Δ 2 (Ck ) =
Ck − Ck
2 ∑d
i , jεC k
ij .
this is the "add-on" relative clustering validity
There are literally dozens of relative cluster
criteria. Basically, one clusters first the data with an
validity indices and as is obvious, the selection of
algorithm with cluster number K = 2,3,... , Kmax.
the R-index is hardly optimal but a working
Then, the index is computed for the partitions, and
solution and it is only meant to roughly guide the
(local) minima, maxima, or knee of the index plot
exploration.
indicate the adequate choice(s) of K.
6.1. Finding interesting linear projections
Two examples of such indices Davies-Bouldin
Finding patterns in data can be assisted by
type indices are among the most popular relative
searching an informative recoding of the original
clustering validity criteria:
variables by a linear transformation. The linearity
I K Δ(Ci ) + Δ(C j )
I DB =
K
∑R
i =1
i, Ri = max
δ (Ci , C j )
, ∀j, j ≠ i is at the same time the power and the weakness of
these methods. On one hand, a linear model is
where Δ(Ci) is some adequate scalar measure for
limited, but on the other hand, potentially both
within-cluster dispersion and δ( Ci, Cj) for between
computationally more tractable and intuitively
cluster dispersion. A simplified variant of this, the
more understandable than a non-linear method.
R-index (IR) is
6.2. Independent component analysis
i K S in
I R = ∑ k ,Where In the basic, linear and noise-free, ICA model, we
K k =1 S kex
have M latent variables si, i.e., the unknown
1 1
S kin =
Ck 2 ∑
i , jεCk
d ij , and S kex = min C
l Cl
∑
iεC k
∑ di, j
jεC1
(l ≠ k ).
independent components (or source signals) that
k
In preliminary experiments, the R-index gave
are mixed linearly to form M observed signals,
reasonable suggestions for a sensible number of
variables xi. When X is the observed data, the
clusters with a given benchmarking data set.
model becomes
177 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 1, January 2011
X=AS -Eq-2 whitening is performed the demixing matrix for the
where A is an unknown constant matrix, called the original, centered data is W = W* Λ-½ ET.
mixing matrix, and S contains the unknown Here, we present the symmetrical version of the
independent components; FastICA algorithm where all independent com-
S = [s( 1) s(2)... s(N)] consisting of vectors s(i), s ponents are estimated simultaneously:
= [s1 s2 ... sM]T. The task is to estimate the mixing 1. Whiten the data. For simplicity, we denote here
matrix A (and the realizations of the independent the whitened data vectors by x and the mixing
components si) using the observed data X alone. matrix for whitened data with W.
The independent components must have non- 2. Initialize the demixing matrix
Gaussian distributions. However, what is often T
[
W = w1 w2 wM ,
T T
] e.g., randomly.
estimated in practice, is the demixing matrix W for
3. Compute new basis vectors using update rule
( ) ( )
S = WX, where W is a (pseudo)inverse of A.
wj := E g ( wT x) x − E − g ' ( wT x ) w j
j j
This kind of problem setting is pronounced in blind
where g is a non-linearity derived from the
signal separation (BSS) problems, such as the
objective function J; in case of kurtosis it becomes
"cocktail party problem" where one has to resolve
g(u) = u3, and in case of skewness g(u) = u2. Use
the utterance of many nearby speakers in the Same
sample estimates for expectations.
room. Several algorithms for performing ICA have
4. Orthogonalize the new W, e.g., by W:=
been proposed, and the FastICA algorithm is
W(WTW)-1/2.
briefly described in the next section.
5. Repeat from step 3 until convergence.
6.3. FastICA
There is also a deflatory version of the FastICA
The FastICA algorithm is based on finding
algorithm that finds the independent components
projections that maximize non-Gaussianity
one by one. It searches for a new component by
measured by an objective function. A necessary
using the fixed point iteration (in step 3 of the
condition for independence is uncorrelatedness,
procedure above) in the remaining subspace that is
and a way of making the basic ICA problem
orthogonal to previously found estimates.
somewhat easier is to whiten the original signals
Both practical and theoretical reasons make the
X. Thereafter, it suffices to rotate the whitened
FastICA an appealing algorithm. It has very com-
data Z suitably, i.e., to find an orthogonal demixing
petitive computational and convergence properties.
matrix that produces the estimates for the
Furthermore, FastICA is not restricted to resolve
independent components S = W*Z. When the
either super or sub-Gaussian sources of the original
178 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 1, January 2011
sources as it is the case with many algorithms. could work for data emerging from sources and
However, the FastICA algorithm faces the same basis vectors that are "sparse enough".
problems related to suboptimal local minima and Consequently, we experimented how far the
random initialization which appear in many other performance of the basic ICA can be pushed, using
algorithms-including K-means and GIR. reasonable heuristics, without elaborating
Consequently, a special tool Icasso for VDM style something completely new. In this paper, the
assessment of the results was developed in the experiment can be seen as a feasibility study for
course of this paper . using ICA where the data was close to binary.
6.4. ICA and binary mixture of binary signals Furthermore, there are similar problems in other
Next, we consider a very specific non-linear application fields, prominently in text document
mixture of latent variables, the problem of the analysis where such data is encountered. Since the
Boolean mixture of latent binary signals and basic ICA model is not the optimal choice for
possibly binary noise. The mixing matrix AB, the handling such problems in general, probabilistic
observed data vectors xB and the independent, models and algorithms have recently been
latent source vectors sB all consist now of binary developed for this purpose.
vectors ∈{0,1}M. The basic model in Eq.-2 is First the estimated linear mixing matrix A is
replaced by a Boolean expression normalized by dividing each column with the
element whose magnitude is largest in that column.
∧ sB,
n
xiB = ∨ aij
B
i = 1,2…M
j =1 j Second, the elements below and equal to 0.5 are
where ∧ is Boolean AND and ∨ Boolean OR. rounded to zero and those above 0.5 to one:
Instead of using Boolean operators it could be AB =U ( AΛ − T )
ˆ ˆ
written xB = U(ABsB) using a step function U as a Where the diagonal scaling matrix Λ has elements
post-mixture non-linearity. The mixture can be
1
λi = where
further corrupted by binary noise: exclusive-OR ˆ
s max(ai )
type of noise.
⎧min ai if ⎣min ai ⎦ > ⎣max ai ⎦
ˆ ˆ ˆ
On one hand, the basic ICA cannot solve the
s max(ai ) = ⎨
ˆ
⎩max ai otherwise.
ˆ
problem in the above eqn. The methods for post- ˆ
Where max ai means taking the maximum and
non-linear mixtures that assume invertible non-
min ˆ
ai the minimum element of the column vector
linearity cannot be directly applied either. On the
ˆ
ai , Matrix T contains thresholds, here we set tij =
other hand, it seems possible that the basic ICA
179 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 1, January 2011
0.5, ∀i, j. As supposed, this trick works quite well 5. Hyvarinen, A., Karhunen, J., and Oja, E. (20010.
Independent Component Analysis. Wiley Inter-
with sparse data and skewness E (y3) works better
science.
than kurtosis as a basis for the objective function
6. Lampinen, J. and Kostiainen, T. (20020. Generative
on a wide range of sparsity data, except for noisy
Probability Density Model in the Self-Organizing
data.
Map. In Seiffert and Jain (2002), chapter 4, pages 75-
Conclusion:
92.
In a nutshell, new ways have been presented to 7. Ultsch, A. (20030. Maps for the Visualization of
develop data mining techniques using SOM and High-Dimensional Data Spaces. In WSOM2003
ICA as data visualization methods e.g., to be used (2003). CD-ROM.
in process analysis, an exploratory method of
8. WSOM2003 (2003). Proceedings of the Workshop on
investigating the stability of ICA estimates,
Self-organizing Maps (WSOM2003, Hibino,
enhancements and modifications of algorithms Kitakyushu, Japan.
such as the fast fixed-point algorithm for time
9. Grinstein, G.G. and Ward, M.O. (2002). Introduction
series segmentation and a heuristic solution to the
to Data Visualization. In Fayyad et al. (2002), chapter
problem of finding a binary mixing matrix and
1, pages 21-45.
independent binary sources. Both time-series
segmentation and PCA revealed meaningful 10. Kohonen, T. (2001). Self-organizing Maps. Springer,
3rd edition.
contexts from the features in a visual data
exploration. 11. Keim,D.A. and Kriegel, H.-P.(1996). Visualization
REFERENCES: Techniques for Mining Large Database: A
1. Alhoniemi, E. (2000). Analysis of Pulping Data comparison. IEEE Transactions of Knowledgeand
Using the Self-Organizing Map. Tappi Journal, Data Engineering.
83(7):66.
12. Vesanto.J. (2002). Data Exploration Process Based on
2. Cheung, Y.-M. (2003). k* -Means: A New
the Self-Organizing Map.
Generalized k-Means Clustering Algorithm. Pattern
Recognition Letters, 24(15):2883-2898.
13. WSO2003 (20030. Proceedings of the Workshop on
3. Grabmeier, J. and Rudolph, A. (2002). Techniques of
Self-Organizing Maps (WSOM2003), Hibino,
Cluster Algorithms in Data Mining. Data Minning
Kitakyushu, Japan.
and Knowledge Discovery, 6(4):303-360.
4. Hoffman, P.E. and Grinstein, G.G. (2002). A Survey 14. Yin, H. (2001) Visualization Induced SOM (ViSOM).
of Visualizations for High-Dimensional Data Mining. In Allinson, N., Yin, H., Allinson, L., and Slack, j.,
In Fayyad et al. (2002), chapter 2, pages 47-82. editors, Advances in Self-Organizing Maps.
180 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
Get documents about "