Clustering Algorithm of High-dimensional Data Based on
Document Sample


34 10 2008 5
Vol.34 No.10 Computer Engineering May 2008
· · 1000 3428(2008)10 0101 02 A TP311.13
( 445000)
(CAHD) n
2 CAHD 30%
Clustering Algorithm of High-dimensional Data Based on Unit Region
XIE Kun-wu, HU Jun-peng
(School of Information Engineering, Hubei Institute for Nationalities, Enshi 445000)
Abstract This paper proposes a Clustering Algorithm of High-dimensional Data(CAHD). Unit regions with intensive data points are found by
employing the two-way search strategy in the designated n-dimensional space or its subspaces, and these intensive modules are clustered by a
case-by-phase approach. Two-way search strategy can effectively reduce the search space, improve the efficiency of algorithms, and cluster intensive
regional unit only uses one by one with two machines and displacement direction. Experimental results show that the running time CAHD algorithm
spent is 30% less than other algorithms with the same number of categories found.
Key words clustering algorithm; high-dimensional data; unit
1 Lik, Rik]) aij uri j [Lij, Rij] uri
j uri [Lij, Rij] pi
Lij dj Rij pi uri
P_num(uri) P_num(uri) T uri
dur uri = ([ai1, Li1, Ri1], [ai2, Li2, Ri2], , [aik, Lik,
Rik]) [Lij, Rij] urn= ([an1, Ln1, Rn1], [an2,
Ln2, Rn2], , [ank, Lnk, Rnk]) [Lnj, Rnj] [Lij]=
[Rnj] [Lnj]=[Rij] k-1 [Lij, Rij]=[Lnj, Rnj]
[1] uri urj
dur
2 (1) k
dur dur
(2)
2 (Clustering Algorithm of
High-dimensional Data, CAHD)
[2]
Apriori 2
2
1 S k
S k-1 S’ S’ k-1
(i1, i2, , ik k 2 S k-1 S
k S k S’ S’
(2002BA901A02)
n (2004AA210B01)
P= {p1, p2, , pn} k S = {1×2× ×k} pi = (d1, (1970 )
d2, , dk) S
ur={ur1, ur2, , urm} uri ([ai1, Li1, Ri1], [ai2, Li2, Ri2], , [aik, 2007-06-28 E-mail xiekunwu@163.com
101
Ck k dur Dk-1 (k-1)
k S k-1 dur dur ai
S’ S S S’ n-k n-k-1
3 CAHD Gen_ub(RCk)
3.1 for i = 1 to n-k
I m I={i1, i2, , im} D INSERT INTO RCk-1
SELECT u 1 .[L1 , R 1], u 1 .[L 2 , R 2], , u 1 .[L i-1 , R i-1 ], u 1 .[Li+1 ,
T T⊆I
Ri+1], , u1.[Lk-1, Rk-1], u2.[Ln-k, Rn-k]
X⇒Y X, Y ⊂ I X∩Y = ∅
FROM RCk
(itemset) return RCk
k-itemsets k k RCk-1 n–k-1 dur RCk n-k
(support) X ur
sup(X) X⊂I D X 3.2
s sup(X)=s
X ⇒ Y sup(X Y) CAHD(D)
sup(X Y)/sup(X) 1 DUR 1 =1 UR n = {ur 1 , ur 2 , , ur m },k = 1,S = {}
2 (1) 2 repeat
minsup (2) 3 DURk+1 = {}
4 DURk+1 = Gen_bu(DURk)
minconf (1)
5 URn-k = Gen_ub(DURn-k+1)
6 Prunning(ref DURk+1, ref URn-k)
dur
7 Mark(S)
3.1.1 8 k = k+1
1 9 until (n-k = k) or (DUR k+1 = NULL) or (UR n-k dur)
10 S’’ = NULL
2 11 for i = 1 to |S| {
CAHD 12 S’ = NULL
2 13 for row = i+1 to |S|
14 S’[row] = And(S[i], S[row])
15 if S’[row] <> S’[row-1] then
CAHD 2 RC C
16 S’’ = S’’ S’[row-1]
k
17 if S’[row] ‘0’ then continue
Ck k RCk 18 }
n-k n 19 return Merge(S’’)
k n-k CAHD D n D
Ck Cn-k 2 ur m 1 1 dur
|Cn-k| |Ck| Ck ur DUR1 m URn S
Ckdur dur Sk n 2~
Cn-k ur ur Sk 9 2 dur
ur Prunning() dur Mark
dur 2 ur dur S 10~ 19
Cn-k dur F S’’ Merge
k+1 n–k-1 1 F dur
dur ur 4
|Ck| |Cn-k| Cn-k ur Ck CLIQUE CAHD
ur 2
3.1.2 [3]
k k+1 (1)|D|( ) 100 000
Gen_bu(Dk-1) (2)n( ) 10~50
INSERT INTO Ck
(3)Cn( ) 5
SELECT u1.[L1, R1], u1.[L2, R2], , u1.[Lk-1, Rk-1], u2.[Lk-1, Rk-1]
(4)Dom(A)( ) 50
FROM Dk-1 u1, Dk-1 u2
(5)C( ) 5 ( 107 )
WHERE u1.a1 = u2.a1, u1.L1 = u2.L1, u1.R1 = u2.R1,
u1.a2 = u2.a2, u1.L2 = u2.L2, u1.R2 = u2.R2,
u1.ak-2 = u2.ak-2, u1.Lk-2 = u2.Lk-2, u1.Rk-2 = u2.Rk-2,
u1.ak-1 < u2.ak-1
return Ck
102
Get documents about "