Documents
User Generated
Resources
Learning Center

# Additive Update Algorithm for Nonnegative Matrix Factorization

VIEWS: 164 PAGES: 7

• pg 1
```									                                                            (IJCSIS) International Journal of Computer Science and Information Security,
Vol. 10, No. 9, September 2012

Additive Update Algorithm for Nonnegative Matrix
Factorization
Do Van Tuan                                         Pham Van At
Tran Dang Hien
Hanoi College of Commerce and Tourism                  Hanoi University of Communications
Vietnam National University                      Hanoi – Vietnam                                       and Transport
hientd_68@yahoo.com                         dvtuanest@gmail.com                                  phamvanat83@vnn.vn

Abstract—Nonnegative matrix factorization (NMF) is an
emerging technique with a wide spectrum of potential                                            min f (W , H )                             
W  0, H  0
applications in data analysis. Mathematically, NMF can be
formulated as a minimization problem with nonnegative
constraints. This problem is currently attracting much attention           Since the objective function is not convex, most methods
from researchers for theoretical reasons and for potential             fail to find the global optimal solution of the problem and only
applications. Currently, the most popular approach to solve NMF        get the stationary point, i.e. the matrix pair (W, H) satisfies the
is the multiplicative update algorithm proposed by D.D. Lee and        Krush-Kuhn-Tucker (KKT) optimal condition [2]:
H.S. Seung. In this paper, we propose an additive update
algorithm, that has faster computational speed than the
algorithm of D.D. Lee and H.S. Seung.
Wia  0, H bj  0
                                               
T                  T
Keywords - nonnegative matrix factorization; Krush-Kuhn-
((WH  V ) H ) ia  0, (W (WH  V ))bj  0
                                                                     
Tucker optimal condition; the stationarity point; updating an                                                     T
element of matrix; updating matrices;                                                 Wia  ((WH  V ) H )ia  0,
                                                               
I.    INTRODUCTION                                                 H bj  (W T (WH  V ))bj  0, i, a, b, j
Nonnegative matrix factorization approximation (NMF) is
an approximate representation of a given nonnegative matrix            Where
V  R nxm by a product of two nonnegative matrices
W  R nxr and H  R rxm :                                                           (WH  V ) H T  f (W , H ) / W
            W T (WH  V )  f (W , H ) / H                                
                        V W H                         
To update the H or W (the remaining fixed) often use the
Because r is usually chosen by a very small number, the            gradient direction reverse with a certain appropriate steps, so
size of the matrices W and H are much smaller than V. If V is a        that a reduction in the objective function, on the other still to
data matrix of some object then W and H can be viewed as an            ensure non-negative of H and W. Among the known algorithms
approximate representation of V. Thus NMF can be considered            solve (1.3) must be mentioned algorithm LS (DD Lee and HS
an effective technique for representing and reducing data.             Seung [6]). This algorithm is a simple calculation scheme, easy
Although this technique appeared only recently, it has wide            to install and gives quite good results, so now it remains one of
application, such as document clustering [7, 11], data mining          the algorithms are commonly used [10, 12]. LS algorithm is
[8], object recognition [5] and detecting forgery [10, 12]             adjusted using the following formula:
To measure the approximation in (1.1) often use the
Frobenius norm of difference matrix                                                   ~                  f 
H ij  H ij  ij 
                                 H  ij
                             
1      2
f (W , H )  WH  V F                                                 H ij  ij (W T A  W TWH )ij
2
                      n m                               
1
  ((WH ) ij  Vij ) 2                                              ~                 f 
2 i1 j 1                                                         Wij  Wij   ij 
                                 W  ij
                              
Thus NMF can be formulated as an optimization problem                                               ~T         ~ ~T
with nonnegative constraints:
 Wij   ij ( AH  WHH )ij

1                                  http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 10, No. 9, September 2012

By selecting ij and     ij by the formula:                                         pT q                            
  min  T T ,0.99  max  : x  p  0 
 p A Ap                          
                                 
H ij                Wij
ij        T
  ij    ~ ~ T 
(W WH )ij            (WHH )ij                               The coefficients    j and  i are determined by the function
g(A,b,x) as follows:
then the formula (1.5) and (1.6) become:
~                                               j  g (W ,V j , H j ), j  1..n
~            (W T A)ij ~          ( AH T )ij
H ij  H ij            Wij  Wij     ~~                                             i  g ( H T ,ViT ,Wi T ), i  1..m 
(W TWH )ij           (WHH T )ij
                        
However, the experiments showed that improvement of EF
The adjustment formula uses multiplication so this                     Gonzalez and Y. Zhang has not really bring obvious effect.
algorithm is called the method of multiplicative update. With             Also as remarks in [3], the LS algorithm and GZ algorithm (EF
~   ~                 Gonzalez - Y. Zhang [3]) are not guaranteed the convergence
this adjustment to ensure non-negative of W and H . In [6]                to a stationary point. New algorithm uses addition for updating,
prove the monotonic decrease of the objective function after              so it is called additive update algorithm.
adjustment:
In this paper we propose a new algorithm by updating each
~ ~                                                element of every matrix W and H based on the idea of
f (W , H )  f (W , H )                              nonlinear Gauss - Seidel method [4]. Also with some
assumptions, the proposed algorithm ensures reaching
This algorithm LS has the advantages of simple easy to                 stationary point (Theorem 2, subsection III.B). Experiments
implement on the computer. However, the coefficients ij and              show that the proposed algorithm converges faster than the
algorithms LS and GZ.
 ij are selected in a special way should not reach the minimum
The content of the paper is organized as follows. In section 2,
in each adjustment. This limits the speed of convergence of the           we present an algorithm to update an element of the matrix W
algorithm.                                                                or H. This algorithm will be used in section 3 to construct a
To improve the convergence speed, E.F. Gonzalez and Y.                 new algorithm for NMF (1.3). We also consider some
Zhang has improved LS algorithm by using a coefficient for                convergence properties of new algorithm. Section 4 presents a
each column of H and a coefficient for each row of W. In other            scheme for installing a new algorithm on a computer. In
words instead of (1.5) (1.6) using the following formula:                 section 5, we present experimental results comparing the
calculation speed of algorithms. Finally some conclusions are
~                                                              given in section 6.
H ij  H ij   jij (W T A  W T WH )ij 
~                              ~~                                 II.    ALGORITHM FOR UPDATING AN ELEMENT OF MATRIX
Wij  Wij  i ij ( AH T  WHH T )ij 
A. Updating an element of matrix W
The coefficients    j and  i are calculated through the                 In this section, we consider the algorithm for updating an
element of W, while retaining the remaining elements of W and
function:
H. Suppose Wij is adjusted by adding  parameter:
  g ( A, b, x ) 
~
                         Wij  Wij                               
(A is the matrix. B and x is the vector). This function is
defined as follows:                                                                 ~
If W is an obtained matrix, then by some matrix
                                            
q  AT (b  Ax)  p  x. /( AT Ax)  q                   operations, we have:

Where the symbol “./” and “  ” denote component-wise                            ~         (WH )ab , a  i, b  1..m
division and multiplication, respectively. Then calculate                          (WH )ab                                                       
  g ( A, b, x ) by the formula:                                                             (WH )ib  H jb , a  i, b  1..m
So from (1.2) it follows:
~
                f (W , H )  f (W , H )  g ( )                  

2                                http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 10, No. 9, September 2012

Where                                                                 cannot    occur       simultaneously.     From     this   and
1                                         because Wij  0 , it follows that case (2.7.a) cannot happen. So
                    g ( )  ( p 2 )  q              
2                                         case (2.7.b) must occur and we have g ( )  0 . Therefore,
m
2                                from (2.2) we obtain
                            p H   jb                   
b1                                                                ~
m                                                                       f (W , H )  f (W , H ) 
               q   (WH  V ) ib * H jb            
b 1                                                 Conversely, if (2.8) is satisfied, it means that: q=0 or q>0
~
To minimize f (W , H ) , one needs to define  so that             and Wij = 0. So from (2.6), it follows   0 . Therefore, by
g ( ) achieves the minimum value on the condition                    (2.1) we have
~
Wij  Wij    0 . Because g ( ) is a quadratic function,                                             ~
Wij  Wij 
then  can be defined as follows:

Thus lemma is proved.

B. Updating an element of matrix H
      0, q  0                                             ~
                                                      Let H be matrix obtained from the update rule:
     q
                 max(     , wij ), q  0            
p                                                                      ~
                                                                            H ij  H ij                                 
     q
,q  0
      p
                                                          where    is defined by the formulas:
Formula (2.6) always means because if q  0 then by (2.4)
n
we have p>0                                                                                                     2
                                u   Wai                              
From (2.3) and (2.6), we get:                                                                              a 1

g ( )  0 , if (q = 0) or (q>0 and Wij =0)         (2.7.a)                                   n

g ( )  0 , otherwise                              (2.7.b)
                 v     W
a 1
ai    (WH  V ) aj                  

By using update formulas (2.1) and (2.6), the monotonous                                
decrease of the objective function f(W,H) is confirmed in the                                    0, v  0
following lemma .                                                                           
    v
                  max( , H ij ), v  0                             
LEMMA 1: If conditions KTT are not satisfied at Wij , then:                                  u

    v
~                                                                            ,v  0
f (W , H )  f (W , H )                                               
     u
~                                                         By the same discussions used in lemma 1, we have
Otherwise: W  W
LEMMA 2: If conditions KTT are not satisfied at Hij, then:
Proof. From (2.4), (2.5) it follows
~
T
                       f (W , H )  f (W , H )                               
q  ((WH  V ) H )ij 
~
Otherwise: H  H
Therefore, if conditions KTT (1.4) are not satisfied at Wij ,
III.    THE PROPOSED ALGORITHM
then properties
A. Updating matrices W and H
                 Wij  0, q  0,Wij  q  0                       In this section we consider the transformation T from (W,
~    ~
H) to ( W , H ) as follows:
    Modify elements of W by subsection II.A

3                                     http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 10, No. 9, September 2012

    Modify elements of H by subsection II.B                                    objective function values                             f (W k , H k ) actually decreases
In       other               words,           the          transformation       monotonously:
~ ~
(W , H )  T (W , H ) shall be carried out as follows:
                 f (W k 1 , H k 1 )  f (W k , H k ), k  1                                    
Step 1: Initialise
~   ~                                                                                                                        k      k
W W,H  H                                                         Moreover, the sequence f (W , H ) is bounded below
~                                         by zero, so Theorem 1 implies the following corollary.
Step 2: Update elements of W
k      k
COROLARRY 2.             Sequence f (W , H ) is a
For j=1,…,r and i=1,…,n
convergence sequence. In other words, there exists non-
~     ~                                         negative value f such that:
Wij  Wij  
 is computed by (2.4)-(2.6)                                                                       lim f (W k , H k )  f 
~                                                                                        k 
Step 3 : Update elements of H
For i=1,…,r and j=1,…,m                                           Now we consider another convergence property of
Algorithm III.B.
~      ~
H ij  H ij                                           THEOREM 2. Suppose (W , H ) is a limit point of the
k      k
 is computed by (2.10) - (2.12)                              sequence (W , H ) and
From Lemmas 1 and 2, we easily obtain the following
m
important property of the transformation T.                                                                                         2

LEMMA 3: If solution (W,H) does not satisfy the condition
                                H
b 1
jb     0, j  1...r                         

KTT (1.4), then
n
~ ~                                                                                                                2
f (W , H )  f (T (W , H ))  f (W , H )                                                               W
a 1
ai        0, i  1...r                        

~ ~
In the contrary case, then: (W , H )  (W , H )                                         Then (W , H ) is the stationary point of the problem (1.3)
Following property is directly obtained from Lemma 3.
Proof. By assumption, (W , H ) is the limit of some
COROLLARY 1:For any (W , H )  0 , if set                                                                        t          t
subsequence (W , H ) of the sequence (W , H ) : 
k   k
k   k
~ ~
(W , H )  T (W , H )
~ ~                      ~ ~                                                                       lim (W tk , H t k )  (W , H )                                    
then (W , H )  (W , H ) or f (W , H )  f (W , H )                                                       k 

B. Algorithm for NMF (1.3)
By conditions (3.1), (3.2), the transformation T is
The algorithm is described through the transformation T as
follows:                                                                            continuous at (W , H ) . Therefore from (3.3) we get:

Step1. Initialize W=W1>=0, H=H1>=0                                                                                                  lim T (W tk , H t k )  T (W , H )
k 
Step 2. For k=1,2,...
t k 1
Moreover, since T (W k , H k )  (W
t         t
, H tk 1 ) , then:
              W   k 1
               
, H k 1  T W k , H k            
t k 1
From Corollary 1, we obtain the following important                                    lim (W                 , H tk 1 )  T (W , H ) 
k 
property of above algorithm.
k
THEOREM 1. Suppose (W , H ) is a sequence of
k                                Using a continuation of the object function f(W,H), from
(3.3), (3.4) we have
solutions created by algorithm 3.2, then the sequence of

4                                                       http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 10, No. 9, September 2012

lim f (W tk , H tk )  f (W , H )                         From formulas (2.1), (2.4), (2.6), (4.3) and (4.4), we have a
k                                                    new scheme for updating matrix W as follows.
lim f (W tk 1 , H tk 1 )  f (T (W , H ))              2) Scheme for updating matrix W
k 
Because, on the other hand, by Corollary 2, sequence                      For j = 1 To r
f (W k , H k ) is convergent, it follows:                                                 m
p  H2
jb
b1
f (T (W , H ))  f (W , H ) 
For i=1 To n
Therefore, by Lemma 3, (W , H ) must be a stationary                                               m
point of problem (1.3).                                                                          q   Dib * H jb
b 1
Thus theorem is proved.
IV.   SOME VARIATIONS OF ALGORITHM                                                           0, q  0

In this section we provide some variations for algorithm in
  max(  q , w ), q  0
ij
subsection III.B (Algorithm III.B) to reduce the volume of                                           
      p
calculations and increase convenience for the installation
program.                                                                                         Wij  Wij  
A. Evaluate computational complexity                                                End For i
To update an element Wij by formulas (2.1), (2.4), (2.5),
(2.6), we need to use m multiplications for calculating p and                 End For j
m*(n*m*r) multiplications for calculating q. Similarly to                      The total number of operations used to adjust the matrix W
update element Hij by using (2.9), (2.10), (2.11), (2.12), we              is: 2×n×m×m×r + m×r
need n multiplications for computing u and (n*m*r)*n
multiplications for computing v. It follows that the number of               3) Updating Hij
calculations          to          make             a          loop            Similarly, the formula (2.11) for v becomes
~ ~
( transformation(W , H )  T (W , H ) ) of the Algorithm III.B is
n
2*n*m*r*(1+n*m*r)                     (4.1)                                  v   Wai  Daj                             (4.5)
a 1
B. Some variations for updating W and H
According to this formula, we only use n multiplications to
1) Updatting Wij                                                         calculate v. After adjusting Hij by formula (2.9), we need to
If set                                                                  recalculate matrix D by the following formula:
D=WH - V                              (4.2)                     ~      Dab , b  j , a  1..n
Dab                                                    (4.6)
Then the formula (2.5) for q becomes:
 Daj   H ai , b  j, a  1..n
m
q   Dib * H jb                      (4.3)               So we only need to adjust the jth column of D and need to
use n multiplications.
b 1
If one considers D as known, the calculation of q in (4.3)                  From formulas (2.9), (2.10), (2.12), (4.5) and (4.6), we have
needs m multiplications. After updating Wij by the formula                 a new scheme for updating matrix H as follows.
~
(2.1), we need to recalculate the D from W to be used for the                4) Scheme for updating matrix H
~ ~~
adjustment of other elements of W: D  WH  V                                 For i = 1 To r
~
From (2.1) and (4.2), it is seen that D is determined from                              n
2
D by the formula:                                                                   u   Wai
a 1

~      Dab , a  i, b  1..m                                           For j=1 To m
Dab                                         (4.4)
 Dib  H jb , a  i, b  1..m                                                     n
So we only need to adjust the ith row of D and need to use                                    v   Wai  Daj
m multiplications.                                                                                   a 1

5                                    http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 10, No. 9, September 2012

0, v  0                                       Thus if h(W, H) is smaller, then (W,H) is closer to the

                                                 stationary point of the problem (1.3). To get a quantity
      v                                          independent with the size of W and H, we use following
max( , H ij ), v  0

     u                                           formula:

H ij  H ij                                                                             (W , H )
 (W , H ) 
W   H
Dib  Dib   H jb , a  i, b  1..m
in which     W is the number of elements of the set:
End for j
End for i                                                                     min( W   ia   , ((WH  V ) H T ) ia ) | i  1...n, a  1...r    
The total number of operations used to adjust the matrix H
is: 2×n× m×r + n×r.
and   H   is the number of elements of the set:
Using the above results together, we can construct a new
calculating scheme for the Algorithm III.B as follows.
C.    New calculating scheme for the Algorithm III.B
min( H   bj   , (W T (WH  V ) bj ) | j  1...m, b  1...r      
1. Initialize W=W1>=0, H=H1>=0
(W , H ) is called a normalized KKT residual. Table 1
D=WH-V                                               presents the value (W , H ) of the solution (W, H) received
2. For k=1,2,...                                                          by each algorithm implemented in given time periods on the
data set of size (n, m, r) = (200,100,10) in which V, W1, H1 was
       Update W by using subsection IV.B.2
         
generated randomly with Vij  0,500 , (W )ij  0,5 ,
1
 
       Update H by using subsection IV.B.4
( H )ij  0,5 .
1
the computational complexity of this scheme is as follows:
          Initialization step needs n*m*r multiplications for          TABLE I.         NORMALIZED KKT RESIDUALS VALUE       (W , H )
computing D.
          Each     loop needs       n*m*r     +     r*(n+m)            Time (sec)          New NMF               GZ               LS
60                3.6450            3700.4892          3576.0937
multiplications.
120               1.5523            3718.2967          3539.8986
Comparing with (4.1), number of operations has now
greatly reduced.                                                                  180               0.1514            3708.6043          3534.6358
240               0.0260            3706.4059          3524.6715
V. EXPERIMENTS
In this section, we present results of 2 experiments on the                   300               0.0029            3696.7690          3508.3239
algorithms: New NMF (new proposed additive update
algorithm), GZ and LS. The programs are written in MATLAB                           The results in Table 1 show that the two algorithms GZ and
and run on a machine with configurations: Intel Pentium Core                   LS cannot converge to a stationary point (value (W , H ) is
2 P6100 2.0 GHz, RAM 3GB. New NMF is built according to                        still large). Meanwhile, New NMF algorithm still possible
the schema in subsection IV.C.                                                 converges to a stationary point because value (W , H )
A. Experiment 1                                                                reaches of approximately equal value 0.
Used to compare the speed of convergence to stationary                      B. Experiment 2
point of the algorithms. First of all condition KKT (1.4) is
equivalent to the following condition:                                            Used to compare the convergence speed to the minimum
value of objective function f(W, H) of the algorithms
 (W , H )  0                                       implemented in given time periods on the data set of size (n,
m, r ) = (500,100,20), in which V, W1, H1 was generated
where:                                                                    randomly           with       Vij  0,500 ,           (W 1 ) ij  0,1 ,
n    r
 (W , H )   min(Wia , ((WH  V ) H T ) ia )                           ( H 1 ) ij  0,1 . The algorithms are run 5 times with 5 different
i 1 a 1                                            pairs of W1, H1 generated randomly in the interval [0,1].
r       m                                                             Average values of objective function after 5 times of
  min( H bj , (W T (WH  V ) bj )                                      performing the algorithms in each given time period are
b 1 j 1
presented in Table 2.

6                                  http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 10, No. 9, September 2012

TABLE II.     AVERAGE VALUES OF OBJECTIVE FUNCTION                    [3]  E. F. Gonzalez and Y. Zhang, Accelerating the Lee-Seung algorithm for
non-negative matrix factorization, tech. report, Tech Report, Department
Time (sec)        New NMF          GZ               LS                           of Computational and Applied Mathematics, Rice University, 2005.
60                57.054           359.128          285.011                 [4] L. Grippo and M. Sciandrone. On the convergence of the block
nonlinear gauss-seidel method under convex constraints. Operations
120               21.896           319.674          273.564                      Research Letters, 26 (2000), pp. 127-136.
180               18.116           299.812          267.631                 [5] D. D. Lee and H. S. Seung, Learning the parts of objects by non-
negative matrix factorization, Nature, 401 (1999), pp. 788-791.
240               17.220           290.789          264.632                 [6] D. D. Lee and H. S. Seung, Algorithms for non-negative matrix
factorization, in Advances in Neural Information Processing Systems 13,
300               16.684           284.866          262.865
MIT Press, 2001, pp. 556-562.
360               16.458           281.511          261.914                 [7] V. P. Pauca, J. Piper, and R. J. Plemmons. Nonnegative matrix
factorization for spectral data analysis, Linear Algebra and Its
Applications, 416 (2006), pp. 29-47.
[8] V. P. Pauca, F. Shahnaz, M. W. Berry, and R. J. Plemmons, Text mining
The results in Table 2 show that the objective function                            using non-negative matrix factorizations, In Proceedings of the 2004
value of the solutions generated by two algorithms GZ and LS                           SIAM International Conference on Data Mining, 2004.
is quite large. Meanwhile the objective function value of New                     [9] L. F. Portugal, J. J. Judice, and L. N. Vicente, A comparison of block
NMF algorithm is much smaller.                                                         pivoting and interior- point algorithms for linear least squares problems
with nonnegative variables, Mathematics of Computation, 63 (1994),
VI. CONCLUSIONS                                                    pp. 625–643.
This paper proposed a new additive update algorithm for                       [10] Z. Tang, S. Wang, W. Wei, and S. Su, Robust image hashing for tamper
detection using non-negative matrix factorization, Journal of Ubiquitous
solving the problem of nonnegative matrix factorization.                               Convergence and Technology, vol. 2(1), 2008, pp. 18-26.
Experiments show that the proposed algorithm converges faster                     [11] W. Xu, X. Liu, and Y. Gong. Document clustering based on non-
than the algorithms LS and GZ. The proposed algorithm has a                            negative matrix factorization. In SIGIR ’03: Proceedings of the 26th
simple calculation scheme too, so it is easy to install and use in                     annual international ACM SIGIR conference on Research and
applications.                                                                          development in informaion retrieval, New York, NY, USA, 2003, ACM
Press, pp. 267-273.
REFERNCES                                        [12] H. Yao, T. Qiao, Z. Tang, Y. Zhao, H. Mao, Detecting copy-move
forgery using non-negative matrix factorization, IEEE Third
[1]    D. P. Bertsekas. On the Goldstein-Levitin-Polyak gradient projection            International Conference on Multimedia Information Networking and
method. IEEE Transactions on Automatic Control, 21 (1976), pp. 174-             Security, China, 2011, pp. 591-594.
184.
[2]    D. P. Bertsekas. Nonlinear Programming. Athena Scientific, Belmont,
MA 02178-9998, second edition, 1999.

7                                    http://sites.google.com/site/ijcsis/
ISSN 1947-5500

```
To top