VIEWS: 164 PAGES: 7 CATEGORY: Emerging Technologies POSTED ON: 10/9/2012
(IJCSIS) International Journal of Computer Science and Information Security, Vol. 10, No. 9, September 2012 Additive Update Algorithm for Nonnegative Matrix Factorization Do Van Tuan Pham Van At Tran Dang Hien Hanoi College of Commerce and Tourism Hanoi University of Communications Vietnam National University Hanoi – Vietnam and Transport hientd_68@yahoo.com dvtuanest@gmail.com phamvanat83@vnn.vn Abstract—Nonnegative matrix factorization (NMF) is an emerging technique with a wide spectrum of potential min f (W , H ) W 0, H 0 applications in data analysis. Mathematically, NMF can be formulated as a minimization problem with nonnegative constraints. This problem is currently attracting much attention Since the objective function is not convex, most methods from researchers for theoretical reasons and for potential fail to find the global optimal solution of the problem and only applications. Currently, the most popular approach to solve NMF get the stationary point, i.e. the matrix pair (W, H) satisfies the is the multiplicative update algorithm proposed by D.D. Lee and Krush-Kuhn-Tucker (KKT) optimal condition [2]: H.S. Seung. In this paper, we propose an additive update algorithm, that has faster computational speed than the algorithm of D.D. Lee and H.S. Seung. Wia 0, H bj 0 T T Keywords - nonnegative matrix factorization; Krush-Kuhn- ((WH V ) H ) ia 0, (W (WH V ))bj 0 Tucker optimal condition; the stationarity point; updating an T element of matrix; updating matrices; Wia ((WH V ) H )ia 0, I. INTRODUCTION H bj (W T (WH V ))bj 0, i, a, b, j Nonnegative matrix factorization approximation (NMF) is an approximate representation of a given nonnegative matrix Where V R nxm by a product of two nonnegative matrices W R nxr and H R rxm : (WH V ) H T f (W , H ) / W W T (WH V ) f (W , H ) / H V W H To update the H or W (the remaining fixed) often use the Because r is usually chosen by a very small number, the gradient direction reverse with a certain appropriate steps, so size of the matrices W and H are much smaller than V. If V is a that a reduction in the objective function, on the other still to data matrix of some object then W and H can be viewed as an ensure non-negative of H and W. Among the known algorithms approximate representation of V. Thus NMF can be considered solve (1.3) must be mentioned algorithm LS (DD Lee and HS an effective technique for representing and reducing data. Seung [6]). This algorithm is a simple calculation scheme, easy Although this technique appeared only recently, it has wide to install and gives quite good results, so now it remains one of application, such as document clustering [7, 11], data mining the algorithms are commonly used [10, 12]. LS algorithm is [8], object recognition [5] and detecting forgery [10, 12] adjusted using the following formula: To measure the approximation in (1.1) often use the Frobenius norm of difference matrix ~ f H ij H ij ij H ij 1 2 f (W , H ) WH V F H ij ij (W T A W TWH )ij 2 n m 1 ((WH ) ij Vij ) 2 ~ f 2 i1 j 1 Wij Wij ij W ij Thus NMF can be formulated as an optimization problem ~T ~ ~T with nonnegative constraints: Wij ij ( AH WHH )ij 1 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 10, No. 9, September 2012 By selecting ij and ij by the formula: pT q min T T ,0.99 max : x p 0 p A Ap H ij Wij ij T ij ~ ~ T (W WH )ij (WHH )ij The coefficients j and i are determined by the function g(A,b,x) as follows: then the formula (1.5) and (1.6) become: ~ j g (W ,V j , H j ), j 1..n ~ (W T A)ij ~ ( AH T )ij H ij H ij Wij Wij ~~ i g ( H T ,ViT ,Wi T ), i 1..m (W TWH )ij (WHH T )ij However, the experiments showed that improvement of EF The adjustment formula uses multiplication so this Gonzalez and Y. Zhang has not really bring obvious effect. algorithm is called the method of multiplicative update. With Also as remarks in [3], the LS algorithm and GZ algorithm (EF ~ ~ Gonzalez - Y. Zhang [3]) are not guaranteed the convergence this adjustment to ensure non-negative of W and H . In [6] to a stationary point. New algorithm uses addition for updating, prove the monotonic decrease of the objective function after so it is called additive update algorithm. adjustment: In this paper we propose a new algorithm by updating each ~ ~ element of every matrix W and H based on the idea of f (W , H ) f (W , H ) nonlinear Gauss - Seidel method [4]. Also with some assumptions, the proposed algorithm ensures reaching This algorithm LS has the advantages of simple easy to stationary point (Theorem 2, subsection III.B). Experiments implement on the computer. However, the coefficients ij and show that the proposed algorithm converges faster than the algorithms LS and GZ. ij are selected in a special way should not reach the minimum The content of the paper is organized as follows. In section 2, in each adjustment. This limits the speed of convergence of the we present an algorithm to update an element of the matrix W algorithm. or H. This algorithm will be used in section 3 to construct a To improve the convergence speed, E.F. Gonzalez and Y. new algorithm for NMF (1.3). We also consider some Zhang has improved LS algorithm by using a coefficient for convergence properties of new algorithm. Section 4 presents a each column of H and a coefficient for each row of W. In other scheme for installing a new algorithm on a computer. In words instead of (1.5) (1.6) using the following formula: section 5, we present experimental results comparing the calculation speed of algorithms. Finally some conclusions are ~ given in section 6. H ij H ij jij (W T A W T WH )ij ~ ~~ II. ALGORITHM FOR UPDATING AN ELEMENT OF MATRIX Wij Wij i ij ( AH T WHH T )ij A. Updating an element of matrix W The coefficients j and i are calculated through the In this section, we consider the algorithm for updating an element of W, while retaining the remaining elements of W and function: H. Suppose Wij is adjusted by adding parameter: g ( A, b, x ) ~ Wij Wij (A is the matrix. B and x is the vector). This function is defined as follows: ~ If W is an obtained matrix, then by some matrix q AT (b Ax) p x. /( AT Ax) q operations, we have: Where the symbol “./” and “ ” denote component-wise ~ (WH )ab , a i, b 1..m division and multiplication, respectively. Then calculate (WH )ab g ( A, b, x ) by the formula: (WH )ib H jb , a i, b 1..m So from (1.2) it follows: ~ f (W , H ) f (W , H ) g ( ) 2 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 10, No. 9, September 2012 Where cannot occur simultaneously. From this and 1 because Wij 0 , it follows that case (2.7.a) cannot happen. So g ( ) ( p 2 ) q 2 case (2.7.b) must occur and we have g ( ) 0 . Therefore, m 2 from (2.2) we obtain p H jb b1 ~ m f (W , H ) f (W , H ) q (WH V ) ib * H jb b 1 Conversely, if (2.8) is satisfied, it means that: q=0 or q>0 ~ To minimize f (W , H ) , one needs to define so that and Wij = 0. So from (2.6), it follows 0 . Therefore, by g ( ) achieves the minimum value on the condition (2.1) we have ~ Wij Wij 0 . Because g ( ) is a quadratic function, ~ Wij Wij then can be defined as follows: Thus lemma is proved. B. Updating an element of matrix H 0, q 0 ~ Let H be matrix obtained from the update rule: q max( , wij ), q 0 p ~ H ij H ij q ,q 0 p where is defined by the formulas: Formula (2.6) always means because if q 0 then by (2.4) n we have p>0 2 u Wai From (2.3) and (2.6), we get: a 1 g ( ) 0 , if (q = 0) or (q>0 and Wij =0) (2.7.a) n g ( ) 0 , otherwise (2.7.b) v W a 1 ai (WH V ) aj By using update formulas (2.1) and (2.6), the monotonous decrease of the objective function f(W,H) is confirmed in the 0, v 0 following lemma . v max( , H ij ), v 0 LEMMA 1: If conditions KTT are not satisfied at Wij , then: u v ~ ,v 0 f (W , H ) f (W , H ) u ~ By the same discussions used in lemma 1, we have Otherwise: W W LEMMA 2: If conditions KTT are not satisfied at Hij, then: Proof. From (2.4), (2.5) it follows ~ T f (W , H ) f (W , H ) q ((WH V ) H )ij ~ Otherwise: H H Therefore, if conditions KTT (1.4) are not satisfied at Wij , III. THE PROPOSED ALGORITHM then properties A. Updating matrices W and H Wij 0, q 0,Wij q 0 In this section we consider the transformation T from (W, ~ ~ H) to ( W , H ) as follows: Modify elements of W by subsection II.A 3 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 10, No. 9, September 2012 Modify elements of H by subsection II.B objective function values f (W k , H k ) actually decreases In other words, the transformation monotonously: ~ ~ (W , H ) T (W , H ) shall be carried out as follows: f (W k 1 , H k 1 ) f (W k , H k ), k 1 Step 1: Initialise ~ ~ k k W W,H H Moreover, the sequence f (W , H ) is bounded below ~ by zero, so Theorem 1 implies the following corollary. Step 2: Update elements of W k k COROLARRY 2. Sequence f (W , H ) is a For j=1,…,r and i=1,…,n convergence sequence. In other words, there exists non- ~ ~ negative value f such that: Wij Wij is computed by (2.4)-(2.6) lim f (W k , H k ) f ~ k Step 3 : Update elements of H For i=1,…,r and j=1,…,m Now we consider another convergence property of Algorithm III.B. ~ ~ H ij H ij THEOREM 2. Suppose (W , H ) is a limit point of the k k is computed by (2.10) - (2.12) sequence (W , H ) and From Lemmas 1 and 2, we easily obtain the following m important property of the transformation T. 2 LEMMA 3: If solution (W,H) does not satisfy the condition H b 1 jb 0, j 1...r KTT (1.4), then n ~ ~ 2 f (W , H ) f (T (W , H )) f (W , H ) W a 1 ai 0, i 1...r ~ ~ In the contrary case, then: (W , H ) (W , H ) Then (W , H ) is the stationary point of the problem (1.3) Following property is directly obtained from Lemma 3. Proof. By assumption, (W , H ) is the limit of some COROLLARY 1:For any (W , H ) 0 , if set t t subsequence (W , H ) of the sequence (W , H ) : k k k k ~ ~ (W , H ) T (W , H ) ~ ~ ~ ~ lim (W tk , H t k ) (W , H ) then (W , H ) (W , H ) or f (W , H ) f (W , H ) k B. Algorithm for NMF (1.3) By conditions (3.1), (3.2), the transformation T is The algorithm is described through the transformation T as follows: continuous at (W , H ) . Therefore from (3.3) we get: Step1. Initialize W=W1>=0, H=H1>=0 lim T (W tk , H t k ) T (W , H ) k Step 2. For k=1,2,... t k 1 Moreover, since T (W k , H k ) (W t t , H tk 1 ) , then: W k 1 , H k 1 T W k , H k t k 1 From Corollary 1, we obtain the following important lim (W , H tk 1 ) T (W , H ) k property of above algorithm. k THEOREM 1. Suppose (W , H ) is a sequence of k Using a continuation of the object function f(W,H), from (3.3), (3.4) we have solutions created by algorithm 3.2, then the sequence of 4 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 10, No. 9, September 2012 lim f (W tk , H tk ) f (W , H ) From formulas (2.1), (2.4), (2.6), (4.3) and (4.4), we have a k new scheme for updating matrix W as follows. lim f (W tk 1 , H tk 1 ) f (T (W , H )) 2) Scheme for updating matrix W k Because, on the other hand, by Corollary 2, sequence For j = 1 To r f (W k , H k ) is convergent, it follows: m p H2 jb b1 f (T (W , H )) f (W , H ) For i=1 To n Therefore, by Lemma 3, (W , H ) must be a stationary m point of problem (1.3). q Dib * H jb b 1 Thus theorem is proved. IV. SOME VARIATIONS OF ALGORITHM 0, q 0 In this section we provide some variations for algorithm in max( q , w ), q 0 ij subsection III.B (Algorithm III.B) to reduce the volume of p calculations and increase convenience for the installation program. Wij Wij A. Evaluate computational complexity End For i To update an element Wij by formulas (2.1), (2.4), (2.5), (2.6), we need to use m multiplications for calculating p and End For j m*(n*m*r) multiplications for calculating q. Similarly to The total number of operations used to adjust the matrix W update element Hij by using (2.9), (2.10), (2.11), (2.12), we is: 2×n×m×m×r + m×r need n multiplications for computing u and (n*m*r)*n multiplications for computing v. It follows that the number of 3) Updating Hij calculations to make a loop Similarly, the formula (2.11) for v becomes ~ ~ ( transformation(W , H ) T (W , H ) ) of the Algorithm III.B is n 2*n*m*r*(1+n*m*r) (4.1) v Wai Daj (4.5) a 1 B. Some variations for updating W and H According to this formula, we only use n multiplications to 1) Updatting Wij calculate v. After adjusting Hij by formula (2.9), we need to If set recalculate matrix D by the following formula: D=WH - V (4.2) ~ Dab , b j , a 1..n Dab (4.6) Then the formula (2.5) for q becomes: Daj H ai , b j, a 1..n m q Dib * H jb (4.3) So we only need to adjust the jth column of D and need to use n multiplications. b 1 If one considers D as known, the calculation of q in (4.3) From formulas (2.9), (2.10), (2.12), (4.5) and (4.6), we have needs m multiplications. After updating Wij by the formula a new scheme for updating matrix H as follows. ~ (2.1), we need to recalculate the D from W to be used for the 4) Scheme for updating matrix H ~ ~~ adjustment of other elements of W: D WH V For i = 1 To r ~ From (2.1) and (4.2), it is seen that D is determined from n 2 D by the formula: u Wai a 1 ~ Dab , a i, b 1..m For j=1 To m Dab (4.4) Dib H jb , a i, b 1..m n So we only need to adjust the ith row of D and need to use v Wai Daj m multiplications. a 1 5 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 10, No. 9, September 2012 0, v 0 Thus if h(W, H) is smaller, then (W,H) is closer to the stationary point of the problem (1.3). To get a quantity v independent with the size of W and H, we use following max( , H ij ), v 0 u formula: H ij H ij (W , H ) (W , H ) W H Dib Dib H jb , a i, b 1..m in which W is the number of elements of the set: End for j End for i min( W ia , ((WH V ) H T ) ia ) | i 1...n, a 1...r The total number of operations used to adjust the matrix H is: 2×n× m×r + n×r. and H is the number of elements of the set: Using the above results together, we can construct a new calculating scheme for the Algorithm III.B as follows. C. New calculating scheme for the Algorithm III.B min( H bj , (W T (WH V ) bj ) | j 1...m, b 1...r 1. Initialize W=W1>=0, H=H1>=0 (W , H ) is called a normalized KKT residual. Table 1 D=WH-V presents the value (W , H ) of the solution (W, H) received 2. For k=1,2,... by each algorithm implemented in given time periods on the data set of size (n, m, r) = (200,100,10) in which V, W1, H1 was Update W by using subsection IV.B.2 generated randomly with Vij 0,500 , (W )ij 0,5 , 1 Update H by using subsection IV.B.4 ( H )ij 0,5 . 1 the computational complexity of this scheme is as follows: Initialization step needs n*m*r multiplications for TABLE I. NORMALIZED KKT RESIDUALS VALUE (W , H ) computing D. Each loop needs n*m*r + r*(n+m) Time (sec) New NMF GZ LS 60 3.6450 3700.4892 3576.0937 multiplications. 120 1.5523 3718.2967 3539.8986 Comparing with (4.1), number of operations has now greatly reduced. 180 0.1514 3708.6043 3534.6358 240 0.0260 3706.4059 3524.6715 V. EXPERIMENTS In this section, we present results of 2 experiments on the 300 0.0029 3696.7690 3508.3239 algorithms: New NMF (new proposed additive update algorithm), GZ and LS. The programs are written in MATLAB The results in Table 1 show that the two algorithms GZ and and run on a machine with configurations: Intel Pentium Core LS cannot converge to a stationary point (value (W , H ) is 2 P6100 2.0 GHz, RAM 3GB. New NMF is built according to still large). Meanwhile, New NMF algorithm still possible the schema in subsection IV.C. converges to a stationary point because value (W , H ) A. Experiment 1 reaches of approximately equal value 0. Used to compare the speed of convergence to stationary B. Experiment 2 point of the algorithms. First of all condition KKT (1.4) is equivalent to the following condition: Used to compare the convergence speed to the minimum value of objective function f(W, H) of the algorithms (W , H ) 0 implemented in given time periods on the data set of size (n, m, r ) = (500,100,20), in which V, W1, H1 was generated where: randomly with Vij 0,500 , (W 1 ) ij 0,1 , n r (W , H ) min(Wia , ((WH V ) H T ) ia ) ( H 1 ) ij 0,1 . The algorithms are run 5 times with 5 different i 1 a 1 pairs of W1, H1 generated randomly in the interval [0,1]. r m Average values of objective function after 5 times of min( H bj , (W T (WH V ) bj ) performing the algorithms in each given time period are b 1 j 1 presented in Table 2. 6 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 10, No. 9, September 2012 TABLE II. AVERAGE VALUES OF OBJECTIVE FUNCTION [3] E. F. Gonzalez and Y. Zhang, Accelerating the Lee-Seung algorithm for non-negative matrix factorization, tech. report, Tech Report, Department Time (sec) New NMF GZ LS of Computational and Applied Mathematics, Rice University, 2005. 60 57.054 359.128 285.011 [4] L. Grippo and M. Sciandrone. On the convergence of the block nonlinear gauss-seidel method under convex constraints. Operations 120 21.896 319.674 273.564 Research Letters, 26 (2000), pp. 127-136. 180 18.116 299.812 267.631 [5] D. D. Lee and H. S. Seung, Learning the parts of objects by non- negative matrix factorization, Nature, 401 (1999), pp. 788-791. 240 17.220 290.789 264.632 [6] D. D. Lee and H. S. Seung, Algorithms for non-negative matrix factorization, in Advances in Neural Information Processing Systems 13, 300 16.684 284.866 262.865 MIT Press, 2001, pp. 556-562. 360 16.458 281.511 261.914 [7] V. P. Pauca, J. Piper, and R. J. Plemmons. Nonnegative matrix factorization for spectral data analysis, Linear Algebra and Its Applications, 416 (2006), pp. 29-47. [8] V. P. Pauca, F. Shahnaz, M. W. Berry, and R. J. Plemmons, Text mining The results in Table 2 show that the objective function using non-negative matrix factorizations, In Proceedings of the 2004 value of the solutions generated by two algorithms GZ and LS SIAM International Conference on Data Mining, 2004. is quite large. Meanwhile the objective function value of New [9] L. F. Portugal, J. J. Judice, and L. N. Vicente, A comparison of block NMF algorithm is much smaller. pivoting and interior- point algorithms for linear least squares problems with nonnegative variables, Mathematics of Computation, 63 (1994), VI. CONCLUSIONS pp. 625–643. This paper proposed a new additive update algorithm for [10] Z. Tang, S. Wang, W. Wei, and S. Su, Robust image hashing for tamper detection using non-negative matrix factorization, Journal of Ubiquitous solving the problem of nonnegative matrix factorization. Convergence and Technology, vol. 2(1), 2008, pp. 18-26. Experiments show that the proposed algorithm converges faster [11] W. Xu, X. Liu, and Y. Gong. Document clustering based on non- than the algorithms LS and GZ. The proposed algorithm has a negative matrix factorization. In SIGIR ’03: Proceedings of the 26th simple calculation scheme too, so it is easy to install and use in annual international ACM SIGIR conference on Research and applications. development in informaion retrieval, New York, NY, USA, 2003, ACM Press, pp. 267-273. REFERNCES [12] H. Yao, T. Qiao, Z. Tang, Y. Zhao, H. Mao, Detecting copy-move forgery using non-negative matrix factorization, IEEE Third [1] D. P. Bertsekas. On the Goldstein-Levitin-Polyak gradient projection International Conference on Multimedia Information Networking and method. IEEE Transactions on Automatic Control, 21 (1976), pp. 174- Security, China, 2011, pp. 591-594. 184. [2] D. P. Bertsekas. Nonlinear Programming. Athena Scientific, Belmont, MA 02178-9998, second edition, 1999. 7 http://sites.google.com/site/ijcsis/ ISSN 1947-5500