# CLUSTERING

Document Sample

CLUSTERING

EE 7000-1
Class Presentation
TOPICS

 Clustering basic and types
 K-means, a type of Unsupervised clustering
 Supervised clustering type
   Vector Quantization
   Fuzzy Identification
   Artificial neural net
   Fuzzy-neuro system
What is clustering ?
 A technique that helps to extract more out of data

 Clustering involves grouping data points together
according to some measure of similarity

 Clustering of data is a method by which large sets
of data is grouped into clusters of smaller sets of
similar data
The usage of Clustering

 Some engineering sciences such as pattern recognition, artificial
intelligence have been using the concepts of cluster analysis.
 In the life sciences (biology, botany, zoology, entomology, cytology,
microbiology), the objects of analysis are life forms such as plants,
animals, and insects. The clustering analysis may range from developing
complete taxonomies to classification of the species into subspecies. The
subspecies can be further classified into subspecies.
 Clustering analysis is also widely used in information, policy and
decision sciences. The various applications of clustering analysis to
documents include votes on political issues, survey of markets, survey of
products, survey of sales programs, and R & D.
A Clustering Example

Income: High
Children:1
Car:Luxury
Income: Medium
Cluster 1                        Children:2
Car: Sedan and Car:Truck
Children:3
Income: Low
Income: Medium         Cluster 4
Children:0
Car:Compact
Cluster 3
Cluster 2
Clustering in FDI ?

 Basically used to cluster (thereby identify) data as faulty or
non-faulty
 Also different fault conditions
 Data from the system  processed ( creating residues,
Fourier transform….)  Clustering algorithm to identify
different conditions of the data
Properties of clustering

 Hierarchical : multiple steps, fusion of data to get desired
number of clusters.
 Flat clustering : all clusters are same.
 Non-hierarchical or iterative : assume no. of clusters, assign
instances to them
 Hard : each instance to only one cluster
 Soft : assigns as a probability of belonging to all clusters
 Disjunctive: Instances can be part of more than one cluster
Properties of Clustering
(b) Non-hierarchical, disjunctive
(a) Hard, non-hierarchical                                       d               e
d               e                                a
a                                                                  j           c
j               c                            k           h
k             h                                                        f       b
f       b
g       i                                            g
(d) Hierarchical, hard
(c) Soft, non-hierarchical,                                Non-disjunctive
disjunctive            1               2   3
a         0.4             0.1 0.5
b         0.1             0.8 0.1             g    a c i e d kb j f h
c         0.3             0.3 0.4
...
Types of Clustering

pre-defined classes. ( Classification)
Example: Cluster, given classes : blue, red & yellow



Unsupervised Clustering : The task is to learn a classification
from the data. Discovers natural grouping.
Example : cluster the data: given no. of clusters = 3
K-means algorithm
( a type of unsupervised clustering )

 Specify k, the number of clusters
 Choose k points randomly as cluster centers
 Assign each instance to its closest cluster center using
Euclidian distance
 Calculate the median (mean) for each cluster, use it as its
new cluster center
 Reassign all instances to the closest cluster center
 Iterate until the cluster centers do not change any more
Select the k cluster centers randomly.

Classify t he entire training set. For each patern X i in the training set, find
Loop until the     the nearest cluster center C  and classify X i as a member of C  .
change in cluster
means is less the
amount specified
by the user.

For each cluster, recompute its center by finding the mean of the cluster :
1 Nk
Mk       X jk
N k j 1
where M k is the new mean, N k is the number of training patterns in cluster
k , and X jk is the j - th pattern belonging to cluster k .

Store the k cluster centers.
Initial K cluster centers, calculation of centers in first iteration
Changed cluster centers after first iteration
Change in clusters during second iteration
Final positions of cluster centers centers
Supervised Clustering
Vector Quantization

 Originated from Shannon’s coding theory
 Instead of continuous levels, quatize the codes
 Quantized levels are called codewords collection of them codebook
 For transmission of codes, approximate each code by its nearest
codeword ( Euclidean distance)
 Divide the space containing codewords by perpendicular bisectors of
lines joining two codewords
 Neighboring region of a codeword is called voronoi region
 Basically mapping of k dimensional vectors in the vector space R(k) into
finite set of vectors
Voronoi region formation illustration

DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 44 posted: 11/18/2010 language: English pages: 19