# K-means Clustering Algorithm with Meliorated Initial

Document Sample

```					       33                      3                                                                                                                                                 2007   2
Vol.33                   No.3                                              Computer Engineering                                                                            February 2007

·                                          ·                             1000   3428(2007)03 0065      02                                    A                                        TP39

k-means
071002

k-means
k
k-means
k-means

K-means Clustering Algorithm with Meliorated Initial Center
YUAN Fang, ZHOU Zhiyong, SONG Xin
College of Mathematics and Computer, Hebei University, Baoding 071002

Abstract           The traditional k-means algorithm has sensitivity to the initial start center. To solve this problem, a new method is proposed to find the
initial start center. First it computes the density of the area where the data object belongs to; then finds k data objects all of which are belong to high
density area and the most far away to each other, using these k data objects as the initial start centers. Experiments on the standard database UCI
show that the proposed method can produce a high purity clustering result and eliminate the sensitivity to the initial start centers.
Key words                Data mining; Clustering; K-means algorithm; Clustering center

3
k     nj

J = ∑∑ d (x j , z i )
i =1 j =1
[1]                                                                                                [1]
k-means
k                       n
[2]                 [3]
k-means                     STING              CLIQUE                                                                             k
[4]             [5]
CURE                                    [6]
k-means                                                                                      (1)   n                                          k
(2)                  (3)         (4)                                J
k-means              (3)                                          (                  )

(4)
k-means

k-means
(1)
k-means
k                                                    k

k                        k
1

X = {x i | x i ∈ R p , i = 1, 2,… , n}
[1]
k                                    z 1 , z 2 ,… , z k
k
w j ( j = 1, 2,… , k )                          k
1                                                                                                                                                           (05213573)
d (x i , x j ) =      (x i − x j ) (x i − x j )
T
(2004406)
2                                                                                                      (1965         )
1
zj =
Nj
∑x
x∈w j                                                                                     2006-04-27                 E-mail          yuanfang@mail.hbu.edu.cn

65
1           k-means                      k-means

xi
Iris                          Balance-scale                     New-thyroid                  Haberman                      Wine

xi                                             18,9,101         82.00%             466,561,599    49.92%        185,206,59      62.79%       198,291     51.96%         47,105,3       56.74%
Minpts                                               35,6,21          88.67%             384,49,465     46.88%        110,25,122      78.14%       74,120      51.96%         25,117,141     70.22%
xi                            12,11,66         53.33%             616,496,589    50.88%        84,73,183       84.19%       19,182      50.00%         104,55,19      57.30%
143,36,85        57.33%             362,18,139     46.88%        204,202,91      65.12%       240,222     51.96%         35,119,34      57.30%
11,56,53         84.67%             22,115,401     50.72%        198,194,171     62.79%       161,216     50.98%         15,89,131      70.22%
16,2,29          57.33%             247,121,529    44.96%        94,200,46       86.05%       158,202     51.96%         133,35,101     70.22%
47,5,42          52.00%             472,111,1      48.48%        1,102,202       69.77%       78,246      51.63%         116,153,86     57.30%
14,10,127        82.00%             316,128,345    50.40%        21,207,152      62.79%       108,145     75.82%         113,18,96      57.87%
12,49,134        76.77%             281,510,362    47.04%        55,116,207      62.79%       18,176      50.00%         26,112,169     70.22%
13,10,96         66.67%             501,127,10     46.72%        160,6,39        84.19%       162,190     52.29%         31,154,59      57.30%
70.08%                            48.29%                           71.86%                 53.86%                       62.47%
17,61,126 88.67%                    1,475,601      51.20%        26,202,154      84.19%       146,63      75.82%         61,19,3        57.30%
3.3
D                                                                                                                  Iris New-thyroid                Haberman 3
D                                                                                    1                                       k-means
z1                     z1                                                   2                                                                                             20
z2                     D                              xi       z1 , z 2                   d (xi , z1 )                                                   Balance scale                  6

d (xi , z 2 )          z3
5                     4
max(min( d ( x i , z 1 ), d ( x i , z 2 )) i = 1, 2,… , n
Wine
xi       zm
Wine                      178                         3
max(min(d (x i , z 1 ), d (x i , z 2 ),… , d ( x i , z m-l )) i = 1, 2,… , n
14
xi , xi ∈ D                             k
Alcohol(        1)    Flavanoids(    8)
2                                                   k-means                                                                Color_intensity(     11) 3
k-means                                                                                       (0.34~5.08) (1.28~13) (11.03~14.83)
k                    n                                                                                         Proline          13
k                                                           (278~1680)                         Proline
(1)                                                      d(xi, xj)
(2)
D                                                           4
(3)                                                                  1              z1                                     k-means
(4)       z1                                             2                     z2       z2       D
(5)       z3              max(min( d (x i , z 1 ), d ( x i , z 2 )) , i = 1, 2,… , n                                                                           k-means
x i z3 D                                                                                                                                                                                   k-means

(6)            z4                   max(min(d ( x i , z 1 ), d (x i , z 2 ), d (x i , z 3 )))
i = 1, 2,… , n             xi   z4 ∈ D
…
1             .               [M].        :                                     , 2002:
(7)         zk             max(min(d (xi , z j ))) , i =1,2,…, n , j = 1, 2,… ,k - 1                                     138-139.
xi        zk ∈ D                                                                                                       2 MacQueen J. Some Methods for Classification and Analysis of
Multivariate Observations[C]//Proceedings of the 5th Berkeley
(8)              k                                       k-means
Symposium on Mathematical Statistics and Probability, 1967.
3 Wang Wei, Yang Jiong, Muntz R. STING: A Statistical Information
3                                                                                                                              Grid Approach to Spatial Data Mining[C]//Proc. of the 23rd
3.1                                                                                                                            International Conference on Very Large Data Bases, 1997.
UCI                                                         Iris             Balance             4 Agrawal R, Gehrke J, Gunopulcs D. Automatic Subspace Clustering of
scale New-thyroid Haberman Wine 5                                                                                              High Dimensional Data for Data Mining Application[C]//Proc. of
UCI                                                                                                                            ACM SIGMOD Intconfon Management on Data, Seattle, WA, 1998:
94-205.
5 Guha S, Rastogi R, Shim K. Cure: An Efficient Clustering Algorithm
for Large Database[C]//Proc. of ACM-SIGMOND Int. Conf.
3.2                                                                                                                            Management on Data, Seattle, Washington, 1998: 73-84.
k-means                                                6             ,         .                                             [J].
k-means                                   1                                        , 2003, 19(1).

66

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 14 posted: 4/27/2010 language: English pages: 2
How are you planning on using Docstoc?