VIEWS: 94 PAGES: 10 POSTED ON: 5/14/2010
Block Matching Using Fast Walsh Search Ngai Li*, Chun Man Mak, Wai Kuen Cham Department of Electronic Engineering The Chinese University of Hong Kong ABSTRACT A fast block matching algorithm, namely Fast Walsh Search, is proposed for motion estimation in block- based video coding. Target blocks in the current frame and their candidates in the reference frame are transformed into Walsh Hadamard coefficients for early rejection of mismatch candidates and so reduce computation requirement. A new matching algorithm is proposed to search for the matching block from the remaining candidates by exploitation of previous computation. The new matching algorithm can not only maintain small computation requirement but also provide high matching accuracy. Experimental results show that our algorithm achieves more accurate motion estimation than those achieved by the popular three-step-search and diamond search with only slightly higher computation time. 1. INTRODUCTION Block-based motion estimation involves searching the reference frame for a candidate block that is closest to the target block in the current frame. Full search block matching (FSBM) algorithm exhaustively searches through all candidates in a search window for the best candidate with the least matching error. However, the computation requirement of FSBM is high, especially when the frame size is large. To reduce the computation time, fast algorithms such as three-step search (TSS) [4], four-step search (FSS) [5], new three-step search (NTSS) [6], and diamond search (DS) [7] were developed. These algorithms search much fewer locations than that of FSBM with an acceptable accuracy. Walsh Hadamard Transform (WHT) has drawn attention to video coding and pattern matching recently. The latest H.264/AVC [8] video coding standard uses WHT to compress DC coefficients. A real time pattern matching algorithm working in Walsh Hadamard (WH) domain was proposed by Hel-Or et al. in [1] and [2]. Their matching algorithm computes a distance using a few WHT coefficients to allow early rejection of mismatch patterns and then focuses on a small number of remaining candidates that are more likely to be the best match of the pattern. Their proposed algorithm reduces WHT overheads by an efficient pruning algorithm that exploits the previous calculation effectively. In this paper, we propose a fast Walsh search (FWS) algorithm which performs block-based motion estimation in WH domain. The algorithm allows early rejection of most mismatch candidates, so speeds up the matching process. In addition, it employs a novel matching algorithm that fully exploits previous calculations to perform block matching among the remaining candidates with high accuracy level. The proposed method achieves more accurate matching result in terms of mean square error (MSE) than TSS and DS with only slight increase in computation time. This paper is organized as follows. Section 2 introduces our proposed FWS. Experimental results are shown in Section 3 and followed by conclusions in Sections 4. 2. PROPOSED ALGORITHM WHT has attracted attention in image coding [3] because of its simplicity. Consider an N×N image block of pixel values I(r,c) where r, c∈[0, N-1], N=2n and n∈Z+. We define the 2D WHT of the image block as (1). W is calculated by modulo-2 arithmetic where the addition ⊕ is logical ‘exclusive or’ and multiplication is logical ‘and’ operations. bi(r) is the ith bit when we consider r as a binary number. N −1 N −1 N −1 c (u , v ) = ∑∑ I (r , c )(−1)W ( r ,c ;u ,v ; N ) where W (r , c; u , v; N ) = ∑ [bi ( r ) p i (u ) ⊕ bi (c ) p i (v )] . (1) r =0 c =0 i =0 pi(u) can be found as follows: p0(u)=bn-1(u), p1(u)=bn-1(u)⊕bn-2(u), …, pn-1(u)=b1(u)⊕b0(u). Similarly, bi(c) and pi(v) are defined. Note that the 2D WHT coefficients c(u,v) where u, v∈[0, N-1] is the projection of the image block onto a basis picture (BP) B(u,v) which is a N×N square matrix representing the pixel values of a BP that has u and v zero-crossings in horizontal and vertical direction respectively where u,v∈[0,N-1]. BPs of WHT are made up of ±1 solely so c(u,v) are calculated by integer additions and subtractions only. The benefits of employing WHT in block matching are in twofold. Firstly, WHT can compress information of highly correlated signals, like image blocks, into a few transform coefficients without heavy distortion. Hence, we may compare target blocks in the current frame and their candidates in the reference frame using the WHT coefficients with little computation and small matching accuracy degradation. Furthermore, the recursive structure of the WH tree [1][2] can be applied to compute the WHT coefficients of blocks in both reference and current frame effectively. It exploits the computation in calculating the coefficient of neighboring block as well as other coefficients of the same block. Fig. 1. Block diagram of proposed algorithm Our proposed algorithm first computes a new matching error, called partial sum-of-absolute difference (PSAD), between the target block in current frame and its candidate blocks in the reference frame using a few WHT coefficients. We reject candidates whose PSAD is larger than a threshold. Finally, we search for the best match from the remaining candidates using another matching error called sum-of- absolute difference of DC Coefficients, which maintains small computation time and accurate block matching simultaneously. The flowchart of our algorithm is shown in Fig. 1. 2.1 APPROXIMATING SUM-OF-ABSOLUTE DIFFERENCE USING PSAD Sum-of-absolute difference (SAD) is the most commonly used matching error in block matching. In this section, we suggest to approximate SAD using a new matching error that requires significantly less computation. Suppose an N×N target block at (x,y) in the current frame is matched with its candidates of the same size at (x+m,y+n) in the reference frame where |m|≤R, |n|≤R, and R is the maximum search distance. Let T(x,y) and C(x+m,y+n) be the N×N square matrices representing the pixel values of the target block and one of the candidate blocks located at (x,y) in current frame and at (x+m,y+n) in reference frame respectively. Define the difference between matrices T(x,y) and C(x+m,y+n) as D(x+m,y+n) = T(x,y) – C(x+m,y+n). (2) Let cD(u,v), cT(u,v) and cC(u,v) be the WHT coefficients of D(x+u,y+v), T(x,y) and C(x+m,y+n) respectively. Applying the WHT on (2) and we have c D (u , v ) = c T (u , v ) − c C (u , v ) ∑ ∑ [T ] N −1 N −1 = ( x, y) ( r , c ) − C ( x + m , y + n ) ( r , c ) ( − 1) W ( r ,c ;u ,v ; N ) r =0 c =0 [ ] N −1 N −1 = ∑∑ D( x + m , y + n ) (r , c) (−1)W ( r ,c ;u ,v ; N ) (3) r =0 c =0 where T(x,y)(r,c), C(x+m,y+n)(r,c) and D(x+m,y+n)(r,c) are the elements of T(x,y), C(x+m,y+n) and D(x+m,y+n) at (r,c) respectively. The SAD, d, between the target block and the candidate block is defined as: N −1 N −1 N −1 N −1 1 d = ∑∑ D( x + m, y + n ) (r , c) ≥ ∑∑ c (u, v) (4) N2 D r =0 c =0 u =0 v =0 We define a new matching error called the partial sum-of-absolute difference dpsa(q), which is the sum of magnitudes of q WHT coefficients of D(x+m,y+n), i.e. 1 d psa ( q ) = N2 ∑ c (u, v ) ≤ d . ( u , v )∈S q D (5) where q≥1 and Sq is a set of q indices corresponding to q of the N2 BPs. [2] describes how to select q of the N 2 BPs using a zigzag path similar to that of JPEG image coding standard so that the total energy of the q WHT coefficients is likely to be the largest. Since dpsa(q) can capture a large proportion of d even when q is small, we can perform early rejection to candidates based on dpsa(q). Given another cD(u’,v’) corresponding to the BP B(u’,v’) where 0≤u’,v’≤N-1, we can obtain the dpsa(q+1) by updating the dpsa(q) iteratively as follows. 1 d psa (q + 1) = d psa (q) + cd (u ' , v') ≤ d . (6) N2 As q increases, the PSAD gets closer and closer to the SAD but the computation requirement also increases. The candidates with PSAD greater than a threshold Tpsa are rejected from further consideration in the next iteration so that computations are focused on candidates that are more likely to be the best match. Our experimental results show that q=2 can provide sufficient accuracy for N=8. 2.2 TWO-LEVEL THRESHOLD SCHEME Our proposed algorithm rejects candidates whose PSADs are greater than a threshold Tpsa. If Tpsa is too small, the candidate with the least SAD may be rejected, so the matching error will be large. However, if Tpsa is too high, most of the candidates in the reference frame will still remain and the computation burden cannot be significantly reduced. We observe that both the PSAD and the SAD of the best match candidate of a target block are small when the target block has few intensity changes. In other words, smooth target blocks require a small threshold called Tpsas, while high activity target blocks need a large threshold called Tpsah to maintain both efficiency and accuracy. We define the activity level La(M) of a target block as (7) where SM contains M indices of the first M AC coefficient along the zigzag La (M ) = ∑ c (u, v) ( u , v )∈S M T (7) path defined in [2]. La(M) is the sum of the first M 2D WHT AC coefficients magnitude. If La(M) is larger than a threshold Tf, then the block is classified as high activity block. M is better small so that La(M) can be easily computed. Tf should be small so that only those blocks with almost no features will be classified as smooth blocks, thus we can maintain good matching performance and speed up the search in uniform areas. 2.3 Block Matching Using SADDCC A large number of candidate blocks are rejected using PSAD dpsa(q=2) which needs significantly less computation than SAD. In the next step, we need to evaluate the remaining candidates with high accuracy level. From (6), we know that dpsa(q) gets closer to the SAD as q increases. For example, when q=N2/4 and Sq={(u,v)| 0≤u, v≤N/2-1}, dpsa(q=N2/4) provides a reasonably good approximation of SAD due to the energy compression ability of the WHT. Generally, we can achieve a higher accuracy level using dpsa(q) of a larger q but the computation requirement is also larger. In this section, we propose to use a new measure called Sum of Absolute Difference of DC Coefficients (SADDCC) that can attain higher matching accuracy than dpsa(q=N2/4) where Sq={(u,v)| 0≤u,v≤N/2-1} but requires even less computation. Let T((xNy) ) and C( N+)m , y +n ) be the N×N square matrices representing the pixel values of the target , (x block and candidate block located at (x,y) in current frame and at (x+m,y+n) in reference frame ( respectively. The superscript indicates the block size explicitly. Similarly, we denote cT N ) (u , v) and (N) (N ) cCN ) (u, v) as the WHT coefficient of T( x , y ) and C( x+ m , y +n ) corresponding to B(u,v). From (3), the ( magnitude of the WHT coefficient of the N×N square matrix D(x+m,y+n) corresponding to the B(u,v) is ∑ ∑ [T ] N −1 N −1 c D (u , v ) = cT N ) (u , v ) − cCN ) (u , v ) = ( ( (N ) ( x,y) ( r , c ) − C ((xN m , y + n ) ( r , c ) ( −1)W ( r ,c ;u ,v ; N ) + ) (8) r =0 c =0 where T((xNy)) (r , c) and C((xN+)m, y+n ) (r , c) are the (r,c)th elements of T((xNy) ) and C( N+)m , y +n ) respectively. We are , , (x now going to match the target block with the remaining blocks in the reference frame using SADDCC. In order to define SADDCC, we need to divide each of these blocks into k2 sub-blocks where k=2n and n∈Z+. Fig.2 shows an example where k equals 4 and Nk equals 2. Sub-blocks from the target block and their candidates are denoted by T((xN,ky)') and C( N'k+)m , y '+ n ) where x’=x+iNk, y’=y+jNk and 0≤i,j<k. Nk=N/k ' (x is the sub-blocks of N size Nk by Nk (a) (b) Fig.2 (a) Original candidate block with N=8. (b)The corresponding sub-blocks with k=4 and Nk=2. sub-block size. Let cTN ) (0,0; x' , y ' ) and cCN k ) (0,0; x'+ m, y '+ n) be the DC coefficients of the sub-blocks ( k ( T((xN,ky)' ) and C(( N'k+)m, y '+ n ) respectively. We define SADDCC ddcc(k) as ' x k −1 k −1 k2 d dcc (k ) = N2 ∑∑ c ( i =0 j =0 T Nk ) (0,0; x ' , y ' ) − cCN k ) (0,0; x '+ m, y '+ n ) . ( (9) In the following, we are going to show that SADDCC is a matching error closer to SAD than dpsa(q=N2/4) with Sq={(u,v)| 0≤u,v≤N/2-1}. We start by stating without proof that, |cD(u,v)| for u,v∈[0,k−1] given in (8) can be expressed as (10). Note that x’ and y’ are functions of i and j respectively. ∑∑ [c ] k −1 k −1 c D (u, v) = ( Nk ) T (0,0; x' , y ' ) − cCN k ) (0,0; x'+ m, y '+ n) (−1)W (u ,v;i , j ;k ) . ( (10) i =0 j =0 From (5) and (10), we can formulate dpsa(q=k2) for Sq ={(u,v) | 0≤ u,v≤ k-1}as: ∑∑ ∑∑ [c ] k −1 k −1 k −1 k −1 1 d psa ( q = k 2 ) = ( Nk ) (0,0; x ' , y ' ) − cCN k ) (0,0; x '+ m, y '+ n) ( −1)W ( u ,v ;i , j ;k ) . ( (11) N2 T u =0 v =0 i =0 j =0 Applying Triangle Inequality on (11), we have ∑∑∑∑ [c ] k −1 k −1 k −1 k −1 1 d psa (q = k 2 ) ≤ ( Nk ) (0,0; x' , y ' ) − cCN k ) (0,0; x'+ m, y '+ n) (−1)W (u ,v;i , j ;k ) . ( (12) N2 T u =0 v =0 i =0 j =0 As |(-1)W(u,v;i,j;k)| equals 1 for all u, v, i, j and k, so the R.H.S. of (12) equals to ddcc(k) as shown in (13). k −1 k −1 k2 d psa (q = k 2 ) ≤ N2 ∑∑ c( i =0 j =0 T Nk ) (0,0; x' , y' ) − cCNk ) (0,0; x'+m, y'+n) = d dcc (k ) ( (13) On the other hand, expressing ddcc(k) defined in (9) in terms of pixel values, we have ∑∑ ∑ ∑ [T(( )) (s, t ) − C(( (s, t )] k −1 k −1 N k −1 N k −1 k2 Nk ) d dcc ( k ) = Nk x ', y ' x ' + m, y ' + n ) (14) N2 i = 0 j = 0 s = 0 t =0 where T((xN, y ')) (s, t ) and C ((xN'+ m , y ' + n ) (s, t ) represent the (s,t)th pixel values at ' k k) T((xN,ky)') and C(( N'k+)m, y ' + n ) ' x respectively. By Triangle Inequality, (14) implies that ddcc(k) is smaller than d, i.e. k −1 k −1 N k −1N k −1 k2 k2 d dcc (k ) ≤ N2 ∑∑ ∑ ∑ T((xN', ky ')) (s, t ) − C((xN'+k m, y '+ n ) (s, t ) = i = 0 j = 0 s =0 t = 0 ) N2 d. As a result, we conclude that SADDCC ddcc(k) is a matching error closer to SAD than dpsa(q=k2). SADDCC is equal to SAD when k equal N. The SADDCC of each remaining candidate are evaluated, and the one with the least SADDCC is the best match of the target block. SADDCC can be computed efficiently because cTN k ) (0,0; x' , y ' ) and cCN k ) (0,0; x'+ m, y '+ n) are the ( ( intermediate results when we calculate PSAD to reject mismatch candidates in reference frame. Owing to the structure of the WH tree in [1] and [2], we have computed cTN k ) (0,0; x' , y ' ) and cCN k ) (0,0; x'+ m, y '+ n) ( ( while we calculate cT(0,0) and cC(0,0) for the PSAD of each candidate.The exploitation of intermediate data saves repeated calculations and contributes to the high speed and accuracy of our fast block matching. For example, Fig. 3 shows the WH tree for finding the first and second coefficients of reference frame when N=4 and k=2. The black dot and the gray dots represent cCN k ) (0,0; x'+ m, y '+ n) and the ( corresponding cC(0,0) of a candidate in reference frame respectively. Computing cC(0,0) (i.e. black dot) along the leftmost path, we actually calculated cCN k ) (0,0; x'+ m, y '+ n) of the corresponding sub-blocks ( (i.e. gray dots). Fig. 3. WH tree for N=4 and k=2 3. EXPERIMENTAL RESULTS The performance of the proposed algorithm has been evaluated with six standard test sequences, namely Football, Foreman, Stefan, Akiyo, Children, and Mother&Daughter. We compare our method with TSS and DS. In the experiment, the block size is 8×8 (i.e. N=8) and maximum search range R=16. We observed that around 70% of candidates were rejected using PSAD with two coefficients. Experimental results show that the use of PSAD with 3 coefficients cannot reject many more candidates so the number of coefficients in PSAD was chosen to be two in the following experiments. Other experimental parameters are Tpsas= 2, Tpsah =30, M =3 and Tf = 300. 3.1. MSE PERFORMANCE Table 1 shows the average MSE over 30 frames of the six sequences using different algorithms, and Fig. 4 shows the MSE of each frame. The performance of FWS in terms of MSE is very close to that of FSBM, and it is significantly better than those of fast algorithms such as TSS and DS. The proposed FWS has significant MSE improvements in fast motion scenes, such as Stefan and Football, because TSS and DS are easily trapped in local minimum in these sequences. However, FWS searches through all candidates in reference frame so that it avoids being trapped in local minimum. FWS can be regarded as a FSBM with an approximated SAD. 3.2. COMPUTATION REQUIREMENTS The computation time of different algorithms are shown in Table 1. We ran tests on a Pentium IV 1.7GHz computer. In general, the computation time of TSS is about 5% of FSBM, while that of DS varies from sequence to sequence. In Akiyo sequence and Foreman sequence, DS works faster than TSS because the motions in these scenes are relatively small. However, in Football sequence, there is a large translation in the background and each player moves vigorously; therefore, DS performs slower than TSS. Although the computation time of our algorithm is slightly more than that of the fast algorithms, the gain in matching accuracy is very significant as shown in Fig. 4. Sequence FSBM TSS DS FWS MSE Time [s] MSE Time [s] MSE Time [s] MSE Time [s] Football 142.58 25.16 260.31 1.84 347.19 1.90 153.27 2.73 Foreman 28.37 7.28 37.94 0.59 33.57 0.44 29.84 0.72 Stefan 134.45 25.36 302.86 1.91 365.82 1.36 148.65 2.63 Akiyo 3.91 25.53 4.57 1.75 4.00 1.25 4.12 2.65 Children 50.31 25.21 80.84 1.80 64.98 1.25 56.53 2.72 Mother&Daughter 18.23 6.61 22.24 0.53 22.06 0.425 20.35 0.71 Table 1. Average MSE of different methods 4. CONCLUSIONS In this paper, we proposed a fast block matching method, Fast Walsh Search, which is based on transforming the target blocks in current frame and their candidates in reference frame into WHT coefficients, so that most mismatches can be rejected in an early stage effectively with PSAD. We further propose to use SADDCC from the intermediate results of the recursive WH tree to search for the best match among the remaining candidates. All these measures significantly reduce computations in the block matching process, but maintain high matching accuracy. Experimental results show that the computation requirement of FWS is similar to the TSS, but the accuracy outperforms TSS and DS significantly in terms of MSE, and comparable with that of FSBM. A demo program of FWS can be found at http://vspc.ee.cuhk.edu.hk/project/fws/ (a) (b) (c) (d) (e) (f) Fig. 4.Frame MSE for test sequences (a)Football, (b)Foreman, (c)Stefan (d)Akiyo (e)Children, f)Mother&Daughter REFERENCES [1] Y. Hel-Or; H. Hel-Or; “Real time pattern matching using projection kernels”, Proc. of Ninth IEEE International Conference on Computer Vision, Vol. 1, Oct. 2003, pp. 1486 – 1493. [2] Y. Hel-Or; H. Hel-Or; “Real time pattern matching using projection kernels”, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 27, No. 9, Sept 2005, pp. 1430 – 1445. [3] W. K. Pratt; J. Kane; H.C. Andrew; “Hadamard transform image coding,” IEEE Proc., Vol. 57, Jan. 1969, pp.58-68. [4] T. Koga; K. Iinuma; A. Hirano; Y. Iijima; T. Ishiguro; “Motion compensated interframe coding for video conferencing,” in Proc. Nat. Tele. Conf., New Orleans, LA, Nov. 1981, pp. G5.3.1-5.3.5. [5] Lai-Man Po; Wing-Chung Ma; “A novel four-step search algorithm for fast block motion estimation,” IEEE Trans on Circuits and Systems for Video Technology, Vol. 6, No. 3, June 1996, pp. 313-317. [6] Reoxiang Li; Bing Zeng; Liou, M.L.; “A new three-step search algorithm for block motion estimation,” IEEE Trans. on Circuits and Systems for Video Technology, Vol. 4, No. 4, Aug. 1994, pp.438-442. [7] Shan Zhu; Kai-Kuang Ma; “A new diamond search algorithm for fast block-matching motion estimation,” IEEE Trans. on Image Processing, Vol. 9, No. 2, Feb. 2000, pp. 287-290. [8] T. Wiegand; G. J. Sullivan; G. Bjontegaard; A. Luthra, “Overview of the H.264/AVC video coding standard,” IEEE Trans. Circuits Syst. Video Technol, Vol. 13, Jul. 2003, pp. 560-576.