CLUSTERING
Document Sample


CLUSTERING
EE 7000-1
Class Presentation
TOPICS
Clustering basic and types
K-means, a type of Unsupervised clustering
Supervised clustering type
Vector Quantization
Fuzzy Identification
Artificial neural net
Fuzzy-neuro system
What is clustering ?
A technique that helps to extract more out of data
Clustering involves grouping data points together
according to some measure of similarity
Clustering of data is a method by which large sets
of data is grouped into clusters of smaller sets of
similar data
The usage of Clustering
Some engineering sciences such as pattern recognition, artificial
intelligence have been using the concepts of cluster analysis.
In the life sciences (biology, botany, zoology, entomology, cytology,
microbiology), the objects of analysis are life forms such as plants,
animals, and insects. The clustering analysis may range from developing
complete taxonomies to classification of the species into subspecies. The
subspecies can be further classified into subspecies.
Clustering analysis is also widely used in information, policy and
decision sciences. The various applications of clustering analysis to
documents include votes on political issues, survey of markets, survey of
products, survey of sales programs, and R & D.
A Clustering Example
Income: High
Children:1
Car:Luxury
Income: Medium
Cluster 1 Children:2
Car: Sedan and Car:Truck
Children:3
Income: Low
Income: Medium Cluster 4
Children:0
Car:Compact
Cluster 3
Cluster 2
Clustering in FDI ?
Basically used to cluster (thereby identify) data as faulty or
non-faulty
Also different fault conditions
Data from the system processed ( creating residues,
Fourier transform….) Clustering algorithm to identify
different conditions of the data
Properties of clustering
Hierarchical : multiple steps, fusion of data to get desired
number of clusters.
Flat clustering : all clusters are same.
Non-hierarchical or iterative : assume no. of clusters, assign
instances to them
Hard : each instance to only one cluster
Soft : assigns as a probability of belonging to all clusters
Disjunctive: Instances can be part of more than one cluster
Properties of Clustering
(b) Non-hierarchical, disjunctive
(a) Hard, non-hierarchical d e
d e a
a j c
j c k h
k h f b
f b
g i g
(d) Hierarchical, hard
(c) Soft, non-hierarchical, Non-disjunctive
disjunctive 1 2 3
a 0.4 0.1 0.5
b 0.1 0.8 0.1 g a c i e d kb j f h
c 0.3 0.3 0.4
...
Types of Clustering
Supervised Clustering : The task is to learn to assign instances to
pre-defined classes. ( Classification)
Example: Cluster, given classes : blue, red & yellow
Unsupervised Clustering : The task is to learn a classification
from the data. Discovers natural grouping.
Example : cluster the data: given no. of clusters = 3
K-means algorithm
( a type of unsupervised clustering )
Specify k, the number of clusters
Choose k points randomly as cluster centers
Assign each instance to its closest cluster center using
Euclidian distance
Calculate the median (mean) for each cluster, use it as its
new cluster center
Reassign all instances to the closest cluster center
Iterate until the cluster centers do not change any more
Select the k cluster centers randomly.
Classify t he entire training set. For each patern X i in the training set, find
Loop until the the nearest cluster center C and classify X i as a member of C .
change in cluster
means is less the
amount specified
by the user.
For each cluster, recompute its center by finding the mean of the cluster :
1 Nk
Mk X jk
N k j 1
where M k is the new mean, N k is the number of training patterns in cluster
k , and X jk is the j - th pattern belonging to cluster k .
Store the k cluster centers.
Initial K cluster centers, calculation of centers in first iteration
Changed cluster centers after first iteration
Change in clusters during second iteration
Final positions of cluster centers centers
Supervised Clustering
Vector Quantization
Originated from Shannon’s coding theory
Instead of continuous levels, quatize the codes
Quantized levels are called codewords collection of them codebook
For transmission of codes, approximate each code by its nearest
codeword ( Euclidean distance)
Divide the space containing codewords by perpendicular bisectors of
lines joining two codewords
Neighboring region of a codeword is called voronoi region
Basically mapping of k dimensional vectors in the vector space R(k) into
finite set of vectors
Voronoi region formation illustration
Get documents about "