Block Matching Using Fast Walsh Search.pdf - Robust Global Motion

Document Sample
Block Matching Using Fast Walsh Search.pdf - Robust Global Motion Powered By Docstoc
					                        Block Matching Using Fast Walsh Search
                              Ngai Li*, Chun Man Mak, Wai Kuen Cham

                                Department of Electronic Engineering
                                The Chinese University of Hong Kong


A fast block matching algorithm, namely Fast Walsh Search, is proposed for motion estimation in block-

based video coding. Target blocks in the current frame and their candidates in the reference frame are

transformed into Walsh Hadamard coefficients for early rejection of mismatch candidates and so reduce

computation requirement. A new matching algorithm is proposed to search for the matching block from

the remaining candidates by exploitation of previous computation. The new matching algorithm can not

only maintain small computation requirement but also provide high matching accuracy. Experimental

results show that our algorithm achieves more accurate motion estimation than those achieved by the

popular three-step-search and diamond search with only slightly higher computation time.


       Block-based motion estimation involves searching the reference frame for a candidate block that

is closest to the target block in the current frame. Full search block matching (FSBM) algorithm

exhaustively searches through all candidates in a search window for the best candidate with the least

matching error. However, the computation requirement of FSBM is high, especially when the frame size

is large. To reduce the computation time, fast algorithms such as three-step search (TSS) [4], four-step

search (FSS) [5], new three-step search (NTSS) [6], and diamond search (DS) [7] were developed. These

algorithms search much fewer locations than that of FSBM with an acceptable accuracy.

       Walsh Hadamard Transform (WHT) has drawn attention to video coding and pattern matching

recently. The latest H.264/AVC [8] video coding standard uses WHT to compress DC coefficients. A real

time pattern matching algorithm working in Walsh Hadamard (WH) domain was proposed by Hel-Or et

al. in [1] and [2]. Their matching algorithm computes a distance using a few WHT coefficients to allow
early rejection of mismatch patterns and then focuses on a small number of remaining candidates that are

more likely to be the best match of the pattern. Their proposed algorithm reduces WHT overheads by an

efficient pruning algorithm that exploits the previous calculation effectively.

         In this paper, we propose a fast Walsh search (FWS) algorithm which performs block-based

motion estimation in WH domain. The algorithm allows early rejection of most mismatch candidates, so

speeds up the matching process. In addition, it employs a novel matching algorithm that fully exploits

previous calculations to perform block matching among the remaining candidates with high accuracy

level. The proposed method achieves more accurate matching result in terms of mean square error (MSE)

than TSS and DS with only slight increase in computation time. This paper is organized as follows.

Section 2 introduces our proposed FWS. Experimental results are shown in Section 3 and followed by

conclusions in Sections 4.


         WHT has attracted attention in image coding [3] because of its simplicity. Consider an N×N

image block of pixel values I(r,c) where r, c∈[0, N-1], N=2n and n∈Z+. We define the 2D WHT of the

image block as (1). W is calculated by modulo-2 arithmetic where the addition ⊕ is logical ‘exclusive or’

and multiplication is logical ‘and’ operations. bi(r) is the ith bit when we consider r as a binary number.

                 N −1 N −1                                                              N −1
     c (u , v ) = ∑∑ I (r , c )(−1)W ( r ,c ;u ,v ; N ) where W (r , c; u , v; N ) = ∑ [bi ( r ) p i (u ) ⊕ bi (c ) p i (v )] .   (1)
                 r =0 c =0                                                              i =0

pi(u) can be found as follows: p0(u)=bn-1(u), p1(u)=bn-1(u)⊕bn-2(u), …, pn-1(u)=b1(u)⊕b0(u). Similarly,

bi(c) and pi(v) are defined. Note that the 2D WHT coefficients c(u,v) where u, v∈[0, N-1] is the

projection of the image block onto a basis picture (BP) B(u,v) which is a N×N square matrix representing

the pixel values of a BP that has u and v zero-crossings in horizontal and vertical direction respectively

where u,v∈[0,N-1]. BPs of WHT are made up of ±1 solely so c(u,v) are calculated by integer additions

and subtractions only.

         The benefits of employing WHT in block matching are in twofold. Firstly, WHT can compress

information of highly correlated signals, like image blocks, into a few transform coefficients without
heavy distortion. Hence, we may compare target blocks in the current frame and their candidates in the

reference frame using the WHT coefficients with little computation and small matching accuracy

degradation. Furthermore, the recursive structure of the WH tree [1][2] can be applied to compute the

WHT coefficients of blocks in both reference and current frame effectively. It exploits the computation in

calculating the coefficient of neighboring block as well as other coefficients of the same block.

                                    Fig. 1.   Block diagram of proposed algorithm

        Our proposed algorithm first computes a new matching error, called partial sum-of-absolute

difference (PSAD), between the target block in current frame and its candidate blocks in the reference

frame using a few WHT coefficients. We reject candidates whose PSAD is larger than a threshold. Finally,

we search for the best match from the remaining candidates using another matching error called sum-of-

absolute difference of DC Coefficients, which maintains small computation time and accurate block

matching simultaneously. The flowchart of our algorithm is shown in Fig. 1.


        Sum-of-absolute difference (SAD) is the most commonly used matching error in block matching.

In this section, we suggest to approximate SAD using a new matching error that requires significantly less

computation. Suppose an N×N target block at (x,y) in the current frame is matched with its candidates of

the same size at (x+m,y+n) in the reference frame where |m|≤R, |n|≤R, and R is the maximum search

distance. Let T(x,y) and C(x+m,y+n) be the N×N square matrices representing the pixel values of the target

block and one of the candidate blocks located at (x,y) in current frame and at (x+m,y+n) in

reference frame respectively. Define the difference between matrices T(x,y) and C(x+m,y+n) as
                                   D(x+m,y+n) = T(x,y) – C(x+m,y+n).                                   (2)
Let cD(u,v), cT(u,v) and cC(u,v)      be the       WHT      coefficients of D(x+u,y+v), T(x,y) and C(x+m,y+n)
respectively. Applying the WHT on (2) and we have
                          c D (u , v ) = c T (u , v ) − c C (u , v )

                                          ∑ ∑ [T                                                              ]
                                          N −1 N −1
                                      =                   ( x, y)   ( r , c ) − C ( x + m , y + n ) ( r , c ) ( − 1) W ( r ,c ;u ,v ; N )
                                           r =0 c =0

                                                      [                              ]
                                          N −1 N −1
                                     = ∑∑ D( x + m , y + n ) (r , c) (−1)W ( r ,c ;u ,v ; N )                                               (3)
                                          r =0 c =0

where T(x,y)(r,c), C(x+m,y+n)(r,c) and D(x+m,y+n)(r,c) are the elements of T(x,y), C(x+m,y+n) and D(x+m,y+n) at


respectively. The SAD, d, between the target block and the candidate block is defined as:
                                       N −1 N −1                                              N −1 N −1
                                 d = ∑∑ D( x + m, y + n ) (r , c) ≥                           ∑∑ c            (u, v)                        (4)
                                       r =0 c =0                                              u =0 v =0

        We define a new matching error called the partial sum-of-absolute difference dpsa(q), which is

the sum of magnitudes of q WHT coefficients of D(x+m,y+n), i.e.
                                            d psa ( q ) =
                                                                            ∑ c (u, v ) ≤ d .
                                                                         ( u , v )∈S q
                                                                                         D                                                  (5)

where q≥1 and Sq is a set of q indices corresponding to q of the N2 BPs. [2] describes how to select q of

the N 2 BPs using a zigzag path similar to that of JPEG image coding standard so that the total energy of

the q WHT coefficients is likely to be the largest. Since dpsa(q) can capture a large proportion of d even

when q is small, we can perform early rejection to candidates based on dpsa(q). Given another cD(u’,v’)

corresponding to the BP B(u’,v’) where 0≤u’,v’≤N-1, we can obtain the dpsa(q+1) by updating the dpsa(q)

iteratively as follows.

                                       d psa (q + 1) = d psa (q) +                     cd (u ' , v') ≤ d .                                  (6)
As q increases, the PSAD gets closer and closer to the SAD but the computation requirement also

increases. The candidates with PSAD greater than a threshold Tpsa are rejected from further consideration

in the next iteration so that computations are focused on candidates that are more likely to be the best

match. Our experimental results show that q=2 can provide sufficient accuracy for N=8.


        Our proposed algorithm rejects candidates whose PSADs are greater than a threshold Tpsa. If Tpsa

is too small, the candidate with the least SAD may be rejected, so the matching error will be large.
However, if Tpsa is too high, most of the candidates in the reference frame will still remain and the

computation burden cannot be significantly reduced. We observe that both the PSAD and the SAD of

the best match candidate of a target block are small when the target block has few intensity changes. In

other words, smooth target blocks require a small threshold called Tpsas, while high activity target blocks

need a large threshold called Tpsah to maintain both efficiency and accuracy. We define the activity level

La(M) of a target block as (7) where SM contains M indices of the first M AC coefficient along the zigzag
                                       La (M ) =      ∑ c (u, v)
                                                   ( u , v )∈S M
                                                                   T                                      (7)

path defined in [2]. La(M) is the sum of the first M 2D WHT AC coefficients magnitude. If La(M) is

larger than a threshold Tf, then the block is classified as high activity block. M is better small so that

La(M) can be easily computed. Tf should be small so that only those blocks with almost no features will

be classified as smooth blocks, thus we can maintain good matching performance and speed up the

search in uniform areas.

2.3     Block Matching Using SADDCC

        A large number of candidate blocks are rejected using PSAD dpsa(q=2) which needs significantly

less computation than SAD. In the next step, we need to evaluate the remaining candidates with high

accuracy level. From (6), we know that dpsa(q) gets closer to the SAD as q increases. For example, when

q=N2/4 and Sq={(u,v)| 0≤u, v≤N/2-1}, dpsa(q=N2/4) provides a reasonably good approximation of SAD due

to the energy compression ability of the WHT. Generally, we can achieve a higher accuracy level using

dpsa(q) of a larger q but the computation requirement is also larger. In this section, we propose to use a

new measure called Sum of Absolute Difference of DC Coefficients (SADDCC) that can attain higher

matching accuracy than dpsa(q=N2/4) where Sq={(u,v)| 0≤u,v≤N/2-1} but requires even less computation.

        Let T((xNy) ) and C( N+)m , y +n ) be the N×N square matrices representing the pixel values of the target
                ,          (x

block and candidate block located at (x,y) in current frame and at (x+m,y+n) in reference frame
respectively. The superscript indicates the block size explicitly. Similarly, we denote cT N ) (u , v) and

                                         (N)           (N )
cCN ) (u, v) as the WHT coefficient of T( x , y ) and C( x+ m , y +n ) corresponding to B(u,v). From (3), the

magnitude of the WHT coefficient of the N×N square matrix D(x+m,y+n) corresponding to the B(u,v) is

                                                                    ∑ ∑ [T                                                      ]
                                                                    N −1 N −1
         c D (u , v ) = cT N ) (u , v ) − cCN ) (u , v ) =
                         (                 (                                     (N )
                                                                                ( x,y)
                                                                                         ( r , c ) − C ((xN m , y + n ) ( r , c ) ( −1)W ( r ,c ;u ,v ; N )
                                                                    r =0 c =0

where T((xNy)) (r , c) and C((xN+)m, y+n ) (r , c) are the (r,c)th elements of T((xNy) ) and C( N+)m , y +n ) respectively. We are
          ,                                                                        ,          (x

now going to match the target block with the remaining blocks in the reference frame using SADDCC. In

order to define SADDCC, we need to divide each of these blocks into k2 sub-blocks where k=2n and

n∈Z+. Fig.2 shows an example where k equals 4 and Nk equals 2. Sub-blocks from the target block and

their candidates are denoted by T((xN,ky)') and C( N'k+)m , y '+ n ) where x’=x+iNk, y’=y+jNk and 0≤i,j<k. Nk=N/k
                                     '           (x

is the

                                                                          sub-blocks of
           N                                                              size Nk by Nk

                                  (a)                                                                                           (b)
           Fig.2 (a) Original candidate block with N=8. (b)The corresponding sub-blocks with k=4 and Nk=2.

sub-block size. Let cTN ) (0,0; x' , y ' ) and cCN k ) (0,0; x'+ m, y '+ n) be the DC coefficients of the sub-blocks
                     (        k                 (

T((xN,ky)' ) and C(( N'k+)m, y '+ n ) respectively. We define SADDCC ddcc(k) as
     '               x

                                              k −1 k −1
                        d dcc (k ) =
                                              ∑∑ c (
                                              i =0 j =0
                                                             Nk )
                                                                    (0,0; x ' , y ' ) − cCN k ) (0,0; x '+ m, y '+ n ) .

In the following, we are going to show that SADDCC is a matching error closer to SAD than

dpsa(q=N2/4) with Sq={(u,v)| 0≤u,v≤N/2-1}. We start by stating without proof that, |cD(u,v)| for

 u,v∈[0,k−1] given in (8) can be expressed as (10). Note that x’ and y’ are functions of i and j

                                  ∑∑ [c                                                                             ]
                                  k −1 k −1
                  c D (u, v) =                ( Nk )
                                              T        (0,0; x' , y ' ) − cCN k ) (0,0; x'+ m, y '+ n) (−1)W (u ,v;i , j ;k ) .
                                  i =0 j =0

From (5) and (10), we can formulate dpsa(q=k2) for Sq ={(u,v) | 0≤ u,v≤ k-1}as:
                                      ∑∑ ∑∑ [c                                                                                        ]
                                      k −1 k −1 k −1 k −1
          d psa ( q = k 2 ) =                                ( Nk )
                                                                      (0,0; x ' , y ' ) − cCN k ) (0,0; x '+ m, y '+ n) ( −1)W ( u ,v ;i , j ;k ) .
                                      u =0 v =0 i =0 j =0

Applying Triangle Inequality on (11), we have

                                     ∑∑∑∑ [c                                                                                          ]
                                     k −1 k −1 k −1 k −1
        d psa (q = k 2 ) ≤                                   ( Nk )
                                                                      (0,0; x' , y ' ) − cCN k ) (0,0; x'+ m, y '+ n) (−1)W (u ,v;i , j ;k ) .
                                     u =0 v =0 i =0 j =0

As |(-1)W(u,v;i,j;k)| equals 1 for all u, v, i, j and k, so the R.H.S. of (12) equals to ddcc(k) as shown in (13).
                                                 k −1 k −1
                     d psa (q = k 2 ) ≤
                                                 ∑∑ c(
                                                 i =0 j =0
                                                              Nk )
                                                                     (0,0; x' , y' ) − cCNk ) (0,0; x'+m, y'+n) = d dcc (k )

          On the other hand, expressing ddcc(k) defined in (9) in terms of pixel values, we have

                                                    ∑∑ ∑ ∑ [T(( )) (s, t ) − C((                                           (s, t )]
                                                     k −1 k −1 N k −1 N k −1
                                              k2                                                      Nk )
                             d dcc ( k ) =                                        Nk
                                                                                 x ', y '             x ' + m, y ' + n )                                               (14)
                                              N2     i = 0 j = 0 s = 0 t =0

where T((xN, y ')) (s, t ) and C ((xN'+ m , y ' + n ) (s, t ) represent the (s,t)th pixel values at
             k                        k)
                                                                                                                                          T((xN,ky)') and C(( N'k+)m, y ' + n )
                                                                                                                                               '              x

respectively. By Triangle Inequality, (14) implies that ddcc(k) is smaller than d, i.e.
                                               k −1 k −1 N k −1N k −1
                                        k2                                                                                     k2
                       d dcc (k ) ≤
                                              ∑∑ ∑ ∑ T((xN', ky ')) (s, t ) − C((xN'+k m, y '+ n ) (s, t ) =
                                               i = 0 j = 0 s =0 t = 0

As a result, we conclude that SADDCC ddcc(k) is a matching error closer to SAD than dpsa(q=k2).

SADDCC is equal to SAD when k equal N. The SADDCC of each remaining candidate are evaluated, and

the one with the least SADDCC is the best match of the target block.

          SADDCC can be computed efficiently because cTN k ) (0,0; x' , y ' ) and cCN k ) (0,0; x'+ m, y '+ n) are the
                                                      (                            (

intermediate results when we calculate PSAD to reject mismatch candidates in reference frame. Owing to

the structure of the WH tree in [1] and [2], we have computed cTN k ) (0,0; x' , y ' ) and cCN k ) (0,0; x'+ m, y '+ n)
                                                               (                            (

while we calculate cT(0,0) and cC(0,0) for the PSAD of each candidate.The exploitation of intermediate

data saves repeated calculations and contributes to the high speed and accuracy of our fast block

matching. For example, Fig. 3 shows the WH tree for finding the first and second coefficients of reference

frame when N=4 and k=2. The black dot and the gray dots represent cCN k ) (0,0; x'+ m, y '+ n) and the

corresponding cC(0,0) of a candidate in reference frame respectively. Computing cC(0,0) (i.e. black

dot) along the leftmost path, we actually calculated cCN k ) (0,0; x'+ m, y '+ n) of the corresponding sub-blocks

(i.e. gray dots).
                                      Fig. 3. WH tree for N=4 and k=2


        The performance of the proposed algorithm has been evaluated with six standard test sequences,

namely Football, Foreman, Stefan, Akiyo, Children, and Mother&Daughter. We compare our method

with TSS and DS. In the experiment, the block size is 8×8 (i.e. N=8) and maximum search range R=16.

We observed that around 70% of candidates were rejected using PSAD with two coefficients.

Experimental results show that the use of PSAD with 3 coefficients cannot reject many more candidates

so the number of coefficients in PSAD was chosen to be two in the following experiments. Other

experimental parameters are Tpsas= 2, Tpsah =30, M =3 and Tf = 300.


         Table 1 shows the average MSE over 30 frames of the six sequences using different algorithms,

and Fig. 4 shows the MSE of each frame. The performance of FWS in terms of MSE is very close to that

of FSBM, and it is significantly better than those of fast algorithms such as TSS and DS. The proposed

FWS has significant MSE improvements in fast motion scenes, such as Stefan and Football, because TSS

and DS are easily trapped in local minimum in these sequences. However, FWS searches through all

candidates in reference frame so that it avoids being trapped in local minimum. FWS can be regarded as

a FSBM with an approximated SAD.

         The computation time of different algorithms are shown in Table 1. We ran tests on a Pentium

IV 1.7GHz computer. In general, the computation time of TSS is about 5% of FSBM, while that of DS

varies from sequence to sequence. In Akiyo sequence and Foreman sequence, DS works faster than TSS

because the motions in these scenes are relatively small. However, in Football sequence, there is a large

translation in the background and each player moves vigorously; therefore, DS performs slower than TSS.

Although the computation time of our algorithm is slightly more than that of the fast algorithms, the gain

in matching accuracy is very significant as shown in Fig. 4.

       Sequence           FSBM                       TSS                   DS                   FWS
                       MSE Time [s]         MSE        Time [s]   MSE       Time [s]   MSE        Time [s]
Football              142.58      25.16     260.31       1.84     347.19      1.90     153.27       2.73
Foreman               28.37        7.28     37.94        0.59     33.57       0.44     29.84        0.72
Stefan                134.45      25.36     302.86       1.91     365.82      1.36     148.65       2.63
Akiyo                  3.91       25.53      4.57        1.75      4.00       1.25      4.12        2.65
Children              50.31       25.21     80.84        1.80     64.98       1.25     56.53        2.72
Mother&Daughter       18.23        6.61     22.24        0.53     22.06       0.425    20.35        0.71
                           Table 1.         Average MSE of different methods


         In this paper, we proposed a fast block matching method, Fast Walsh Search, which is based on

transforming the target blocks in current frame and their candidates in reference frame into WHT

coefficients, so that most mismatches can be rejected in an early stage effectively with PSAD. We further

propose to use SADDCC from the intermediate results of the recursive WH tree to search for the best

match among the remaining candidates. All these measures significantly reduce computations in the block

matching process, but maintain high matching accuracy. Experimental results show that the computation

requirement of FWS is similar to the TSS, but the accuracy outperforms TSS and DS significantly in

terms of MSE, and comparable with that of FSBM. A demo program of FWS can be found at
                (a)                                   (b)                                    (c)

                (d)                                   (e)                                    (f)
Fig. 4.Frame MSE for test sequences (a)Football, (b)Foreman, (c)Stefan (d)Akiyo (e)Children, f)Mother&Daughter


[1] Y. Hel-Or; H. Hel-Or; “Real time pattern matching using projection kernels”, Proc. of Ninth IEEE
    International Conference on Computer Vision, Vol. 1, Oct. 2003, pp. 1486 – 1493.

[2] Y. Hel-Or; H. Hel-Or; “Real time pattern matching using projection kernels”, IEEE Trans. Pattern
    Analysis and Machine Intelligence, Vol. 27, No. 9, Sept 2005, pp. 1430 – 1445.

[3] W. K. Pratt; J. Kane; H.C. Andrew; “Hadamard transform image coding,” IEEE Proc., Vol. 57, Jan.
    1969, pp.58-68.

[4] T. Koga; K. Iinuma; A. Hirano; Y. Iijima; T. Ishiguro; “Motion compensated interframe coding for
    video conferencing,” in Proc. Nat. Tele. Conf., New Orleans, LA, Nov. 1981, pp. G5.3.1-5.3.5.

[5] Lai-Man Po; Wing-Chung Ma; “A novel four-step search algorithm for fast block motion
    estimation,” IEEE Trans on Circuits and Systems for Video Technology, Vol. 6, No. 3, June 1996,
    pp. 313-317.

[6] Reoxiang Li; Bing Zeng; Liou, M.L.; “A new three-step search algorithm for block motion
    estimation,” IEEE Trans. on Circuits and Systems for Video Technology, Vol. 4, No. 4, Aug. 1994,

[7] Shan Zhu; Kai-Kuang Ma; “A new diamond search algorithm for fast block-matching motion
    estimation,” IEEE Trans. on Image Processing, Vol. 9, No. 2, Feb. 2000, pp. 287-290.

[8] T. Wiegand; G. J. Sullivan; G. Bjontegaard; A. Luthra, “Overview of the H.264/AVC video coding
    standard,” IEEE Trans. Circuits Syst. Video Technol, Vol. 13, Jul. 2003, pp. 560-576.