Block Matching Using Fast Walsh Search.pdf - Robust Global Motion
Document Sample


Block Matching Using Fast Walsh Search
Ngai Li*, Chun Man Mak, Wai Kuen Cham
Department of Electronic Engineering
The Chinese University of Hong Kong
ABSTRACT
A fast block matching algorithm, namely Fast Walsh Search, is proposed for motion estimation in block-
based video coding. Target blocks in the current frame and their candidates in the reference frame are
transformed into Walsh Hadamard coefficients for early rejection of mismatch candidates and so reduce
computation requirement. A new matching algorithm is proposed to search for the matching block from
the remaining candidates by exploitation of previous computation. The new matching algorithm can not
only maintain small computation requirement but also provide high matching accuracy. Experimental
results show that our algorithm achieves more accurate motion estimation than those achieved by the
popular three-step-search and diamond search with only slightly higher computation time.
1. INTRODUCTION
Block-based motion estimation involves searching the reference frame for a candidate block that
is closest to the target block in the current frame. Full search block matching (FSBM) algorithm
exhaustively searches through all candidates in a search window for the best candidate with the least
matching error. However, the computation requirement of FSBM is high, especially when the frame size
is large. To reduce the computation time, fast algorithms such as three-step search (TSS) [4], four-step
search (FSS) [5], new three-step search (NTSS) [6], and diamond search (DS) [7] were developed. These
algorithms search much fewer locations than that of FSBM with an acceptable accuracy.
Walsh Hadamard Transform (WHT) has drawn attention to video coding and pattern matching
recently. The latest H.264/AVC [8] video coding standard uses WHT to compress DC coefficients. A real
time pattern matching algorithm working in Walsh Hadamard (WH) domain was proposed by Hel-Or et
al. in [1] and [2]. Their matching algorithm computes a distance using a few WHT coefficients to allow
early rejection of mismatch patterns and then focuses on a small number of remaining candidates that are
more likely to be the best match of the pattern. Their proposed algorithm reduces WHT overheads by an
efficient pruning algorithm that exploits the previous calculation effectively.
In this paper, we propose a fast Walsh search (FWS) algorithm which performs block-based
motion estimation in WH domain. The algorithm allows early rejection of most mismatch candidates, so
speeds up the matching process. In addition, it employs a novel matching algorithm that fully exploits
previous calculations to perform block matching among the remaining candidates with high accuracy
level. The proposed method achieves more accurate matching result in terms of mean square error (MSE)
than TSS and DS with only slight increase in computation time. This paper is organized as follows.
Section 2 introduces our proposed FWS. Experimental results are shown in Section 3 and followed by
conclusions in Sections 4.
2. PROPOSED ALGORITHM
WHT has attracted attention in image coding [3] because of its simplicity. Consider an N×N
image block of pixel values I(r,c) where r, c∈[0, N-1], N=2n and n∈Z+. We define the 2D WHT of the
image block as (1). W is calculated by modulo-2 arithmetic where the addition ⊕ is logical ‘exclusive or’
and multiplication is logical ‘and’ operations. bi(r) is the ith bit when we consider r as a binary number.
N −1 N −1 N −1
c (u , v ) = ∑∑ I (r , c )(−1)W ( r ,c ;u ,v ; N ) where W (r , c; u , v; N ) = ∑ [bi ( r ) p i (u ) ⊕ bi (c ) p i (v )] . (1)
r =0 c =0 i =0
pi(u) can be found as follows: p0(u)=bn-1(u), p1(u)=bn-1(u)⊕bn-2(u), …, pn-1(u)=b1(u)⊕b0(u). Similarly,
bi(c) and pi(v) are defined. Note that the 2D WHT coefficients c(u,v) where u, v∈[0, N-1] is the
projection of the image block onto a basis picture (BP) B(u,v) which is a N×N square matrix representing
the pixel values of a BP that has u and v zero-crossings in horizontal and vertical direction respectively
where u,v∈[0,N-1]. BPs of WHT are made up of ±1 solely so c(u,v) are calculated by integer additions
and subtractions only.
The benefits of employing WHT in block matching are in twofold. Firstly, WHT can compress
information of highly correlated signals, like image blocks, into a few transform coefficients without
heavy distortion. Hence, we may compare target blocks in the current frame and their candidates in the
reference frame using the WHT coefficients with little computation and small matching accuracy
degradation. Furthermore, the recursive structure of the WH tree [1][2] can be applied to compute the
WHT coefficients of blocks in both reference and current frame effectively. It exploits the computation in
calculating the coefficient of neighboring block as well as other coefficients of the same block.
Fig. 1. Block diagram of proposed algorithm
Our proposed algorithm first computes a new matching error, called partial sum-of-absolute
difference (PSAD), between the target block in current frame and its candidate blocks in the reference
frame using a few WHT coefficients. We reject candidates whose PSAD is larger than a threshold. Finally,
we search for the best match from the remaining candidates using another matching error called sum-of-
absolute difference of DC Coefficients, which maintains small computation time and accurate block
matching simultaneously. The flowchart of our algorithm is shown in Fig. 1.
2.1 APPROXIMATING SUM-OF-ABSOLUTE DIFFERENCE USING PSAD
Sum-of-absolute difference (SAD) is the most commonly used matching error in block matching.
In this section, we suggest to approximate SAD using a new matching error that requires significantly less
computation. Suppose an N×N target block at (x,y) in the current frame is matched with its candidates of
the same size at (x+m,y+n) in the reference frame where |m|≤R, |n|≤R, and R is the maximum search
distance. Let T(x,y) and C(x+m,y+n) be the N×N square matrices representing the pixel values of the target
block and one of the candidate blocks located at (x,y) in current frame and at (x+m,y+n) in
reference frame respectively. Define the difference between matrices T(x,y) and C(x+m,y+n) as
D(x+m,y+n) = T(x,y) – C(x+m,y+n). (2)
Let cD(u,v), cT(u,v) and cC(u,v) be the WHT coefficients of D(x+u,y+v), T(x,y) and C(x+m,y+n)
respectively. Applying the WHT on (2) and we have
c D (u , v ) = c T (u , v ) − c C (u , v )
∑ ∑ [T ]
N −1 N −1
= ( x, y) ( r , c ) − C ( x + m , y + n ) ( r , c ) ( − 1) W ( r ,c ;u ,v ; N )
r =0 c =0
[ ]
N −1 N −1
= ∑∑ D( x + m , y + n ) (r , c) (−1)W ( r ,c ;u ,v ; N ) (3)
r =0 c =0
where T(x,y)(r,c), C(x+m,y+n)(r,c) and D(x+m,y+n)(r,c) are the elements of T(x,y), C(x+m,y+n) and D(x+m,y+n) at
(r,c)
respectively. The SAD, d, between the target block and the candidate block is defined as:
N −1 N −1 N −1 N −1
1
d = ∑∑ D( x + m, y + n ) (r , c) ≥ ∑∑ c (u, v) (4)
N2
D
r =0 c =0 u =0 v =0
We define a new matching error called the partial sum-of-absolute difference dpsa(q), which is
the sum of magnitudes of q WHT coefficients of D(x+m,y+n), i.e.
1
d psa ( q ) =
N2
∑ c (u, v ) ≤ d .
( u , v )∈S q
D (5)
where q≥1 and Sq is a set of q indices corresponding to q of the N2 BPs. [2] describes how to select q of
the N 2 BPs using a zigzag path similar to that of JPEG image coding standard so that the total energy of
the q WHT coefficients is likely to be the largest. Since dpsa(q) can capture a large proportion of d even
when q is small, we can perform early rejection to candidates based on dpsa(q). Given another cD(u’,v’)
corresponding to the BP B(u’,v’) where 0≤u’,v’≤N-1, we can obtain the dpsa(q+1) by updating the dpsa(q)
iteratively as follows.
1
d psa (q + 1) = d psa (q) + cd (u ' , v') ≤ d . (6)
N2
As q increases, the PSAD gets closer and closer to the SAD but the computation requirement also
increases. The candidates with PSAD greater than a threshold Tpsa are rejected from further consideration
in the next iteration so that computations are focused on candidates that are more likely to be the best
match. Our experimental results show that q=2 can provide sufficient accuracy for N=8.
2.2 TWO-LEVEL THRESHOLD SCHEME
Our proposed algorithm rejects candidates whose PSADs are greater than a threshold Tpsa. If Tpsa
is too small, the candidate with the least SAD may be rejected, so the matching error will be large.
However, if Tpsa is too high, most of the candidates in the reference frame will still remain and the
computation burden cannot be significantly reduced. We observe that both the PSAD and the SAD of
the best match candidate of a target block are small when the target block has few intensity changes. In
other words, smooth target blocks require a small threshold called Tpsas, while high activity target blocks
need a large threshold called Tpsah to maintain both efficiency and accuracy. We define the activity level
La(M) of a target block as (7) where SM contains M indices of the first M AC coefficient along the zigzag
La (M ) = ∑ c (u, v)
( u , v )∈S M
T (7)
path defined in [2]. La(M) is the sum of the first M 2D WHT AC coefficients magnitude. If La(M) is
larger than a threshold Tf, then the block is classified as high activity block. M is better small so that
La(M) can be easily computed. Tf should be small so that only those blocks with almost no features will
be classified as smooth blocks, thus we can maintain good matching performance and speed up the
search in uniform areas.
2.3 Block Matching Using SADDCC
A large number of candidate blocks are rejected using PSAD dpsa(q=2) which needs significantly
less computation than SAD. In the next step, we need to evaluate the remaining candidates with high
accuracy level. From (6), we know that dpsa(q) gets closer to the SAD as q increases. For example, when
q=N2/4 and Sq={(u,v)| 0≤u, v≤N/2-1}, dpsa(q=N2/4) provides a reasonably good approximation of SAD due
to the energy compression ability of the WHT. Generally, we can achieve a higher accuracy level using
dpsa(q) of a larger q but the computation requirement is also larger. In this section, we propose to use a
new measure called Sum of Absolute Difference of DC Coefficients (SADDCC) that can attain higher
matching accuracy than dpsa(q=N2/4) where Sq={(u,v)| 0≤u,v≤N/2-1} but requires even less computation.
Let T((xNy) ) and C( N+)m , y +n ) be the N×N square matrices representing the pixel values of the target
, (x
block and candidate block located at (x,y) in current frame and at (x+m,y+n) in reference frame
(
respectively. The superscript indicates the block size explicitly. Similarly, we denote cT N ) (u , v) and
(N) (N )
cCN ) (u, v) as the WHT coefficient of T( x , y ) and C( x+ m , y +n ) corresponding to B(u,v). From (3), the
(
magnitude of the WHT coefficient of the N×N square matrix D(x+m,y+n) corresponding to the B(u,v) is
∑ ∑ [T ]
N −1 N −1
c D (u , v ) = cT N ) (u , v ) − cCN ) (u , v ) =
( ( (N )
( x,y)
( r , c ) − C ((xN m , y + n ) ( r , c ) ( −1)W ( r ,c ;u ,v ; N )
+
)
(8)
r =0 c =0
where T((xNy)) (r , c) and C((xN+)m, y+n ) (r , c) are the (r,c)th elements of T((xNy) ) and C( N+)m , y +n ) respectively. We are
, , (x
now going to match the target block with the remaining blocks in the reference frame using SADDCC. In
order to define SADDCC, we need to divide each of these blocks into k2 sub-blocks where k=2n and
n∈Z+. Fig.2 shows an example where k equals 4 and Nk equals 2. Sub-blocks from the target block and
their candidates are denoted by T((xN,ky)') and C( N'k+)m , y '+ n ) where x’=x+iNk, y’=y+jNk and 0≤i,j<k. Nk=N/k
' (x
is the
sub-blocks of
N size Nk by Nk
(a) (b)
Fig.2 (a) Original candidate block with N=8. (b)The corresponding sub-blocks with k=4 and Nk=2.
sub-block size. Let cTN ) (0,0; x' , y ' ) and cCN k ) (0,0; x'+ m, y '+ n) be the DC coefficients of the sub-blocks
( k (
T((xN,ky)' ) and C(( N'k+)m, y '+ n ) respectively. We define SADDCC ddcc(k) as
' x
k −1 k −1
k2
d dcc (k ) =
N2
∑∑ c (
i =0 j =0
T
Nk )
(0,0; x ' , y ' ) − cCN k ) (0,0; x '+ m, y '+ n ) .
(
(9)
In the following, we are going to show that SADDCC is a matching error closer to SAD than
dpsa(q=N2/4) with Sq={(u,v)| 0≤u,v≤N/2-1}. We start by stating without proof that, |cD(u,v)| for
u,v∈[0,k−1] given in (8) can be expressed as (10). Note that x’ and y’ are functions of i and j
respectively.
∑∑ [c ]
k −1 k −1
c D (u, v) = ( Nk )
T (0,0; x' , y ' ) − cCN k ) (0,0; x'+ m, y '+ n) (−1)W (u ,v;i , j ;k ) .
(
(10)
i =0 j =0
From (5) and (10), we can formulate dpsa(q=k2) for Sq ={(u,v) | 0≤ u,v≤ k-1}as:
∑∑ ∑∑ [c ]
k −1 k −1 k −1 k −1
1
d psa ( q = k 2 ) = ( Nk )
(0,0; x ' , y ' ) − cCN k ) (0,0; x '+ m, y '+ n) ( −1)W ( u ,v ;i , j ;k ) .
(
(11)
N2
T
u =0 v =0 i =0 j =0
Applying Triangle Inequality on (11), we have
∑∑∑∑ [c ]
k −1 k −1 k −1 k −1
1
d psa (q = k 2 ) ≤ ( Nk )
(0,0; x' , y ' ) − cCN k ) (0,0; x'+ m, y '+ n) (−1)W (u ,v;i , j ;k ) .
(
(12)
N2
T
u =0 v =0 i =0 j =0
As |(-1)W(u,v;i,j;k)| equals 1 for all u, v, i, j and k, so the R.H.S. of (12) equals to ddcc(k) as shown in (13).
k −1 k −1
k2
d psa (q = k 2 ) ≤
N2
∑∑ c(
i =0 j =0
T
Nk )
(0,0; x' , y' ) − cCNk ) (0,0; x'+m, y'+n) = d dcc (k )
(
(13)
On the other hand, expressing ddcc(k) defined in (9) in terms of pixel values, we have
∑∑ ∑ ∑ [T(( )) (s, t ) − C(( (s, t )]
k −1 k −1 N k −1 N k −1
k2 Nk )
d dcc ( k ) = Nk
x ', y ' x ' + m, y ' + n ) (14)
N2 i = 0 j = 0 s = 0 t =0
where T((xN, y ')) (s, t ) and C ((xN'+ m , y ' + n ) (s, t ) represent the (s,t)th pixel values at
'
k k)
T((xN,ky)') and C(( N'k+)m, y ' + n )
' x
respectively. By Triangle Inequality, (14) implies that ddcc(k) is smaller than d, i.e.
k −1 k −1 N k −1N k −1
k2 k2
d dcc (k ) ≤
N2
∑∑ ∑ ∑ T((xN', ky ')) (s, t ) − C((xN'+k m, y '+ n ) (s, t ) =
i = 0 j = 0 s =0 t = 0
)
N2
d.
As a result, we conclude that SADDCC ddcc(k) is a matching error closer to SAD than dpsa(q=k2).
SADDCC is equal to SAD when k equal N. The SADDCC of each remaining candidate are evaluated, and
the one with the least SADDCC is the best match of the target block.
SADDCC can be computed efficiently because cTN k ) (0,0; x' , y ' ) and cCN k ) (0,0; x'+ m, y '+ n) are the
( (
intermediate results when we calculate PSAD to reject mismatch candidates in reference frame. Owing to
the structure of the WH tree in [1] and [2], we have computed cTN k ) (0,0; x' , y ' ) and cCN k ) (0,0; x'+ m, y '+ n)
( (
while we calculate cT(0,0) and cC(0,0) for the PSAD of each candidate.The exploitation of intermediate
data saves repeated calculations and contributes to the high speed and accuracy of our fast block
matching. For example, Fig. 3 shows the WH tree for finding the first and second coefficients of reference
frame when N=4 and k=2. The black dot and the gray dots represent cCN k ) (0,0; x'+ m, y '+ n) and the
(
corresponding cC(0,0) of a candidate in reference frame respectively. Computing cC(0,0) (i.e. black
dot) along the leftmost path, we actually calculated cCN k ) (0,0; x'+ m, y '+ n) of the corresponding sub-blocks
(
(i.e. gray dots).
Fig. 3. WH tree for N=4 and k=2
3. EXPERIMENTAL RESULTS
The performance of the proposed algorithm has been evaluated with six standard test sequences,
namely Football, Foreman, Stefan, Akiyo, Children, and Mother&Daughter. We compare our method
with TSS and DS. In the experiment, the block size is 8×8 (i.e. N=8) and maximum search range R=16.
We observed that around 70% of candidates were rejected using PSAD with two coefficients.
Experimental results show that the use of PSAD with 3 coefficients cannot reject many more candidates
so the number of coefficients in PSAD was chosen to be two in the following experiments. Other
experimental parameters are Tpsas= 2, Tpsah =30, M =3 and Tf = 300.
3.1. MSE PERFORMANCE
Table 1 shows the average MSE over 30 frames of the six sequences using different algorithms,
and Fig. 4 shows the MSE of each frame. The performance of FWS in terms of MSE is very close to that
of FSBM, and it is significantly better than those of fast algorithms such as TSS and DS. The proposed
FWS has significant MSE improvements in fast motion scenes, such as Stefan and Football, because TSS
and DS are easily trapped in local minimum in these sequences. However, FWS searches through all
candidates in reference frame so that it avoids being trapped in local minimum. FWS can be regarded as
a FSBM with an approximated SAD.
3.2. COMPUTATION REQUIREMENTS
The computation time of different algorithms are shown in Table 1. We ran tests on a Pentium
IV 1.7GHz computer. In general, the computation time of TSS is about 5% of FSBM, while that of DS
varies from sequence to sequence. In Akiyo sequence and Foreman sequence, DS works faster than TSS
because the motions in these scenes are relatively small. However, in Football sequence, there is a large
translation in the background and each player moves vigorously; therefore, DS performs slower than TSS.
Although the computation time of our algorithm is slightly more than that of the fast algorithms, the gain
in matching accuracy is very significant as shown in Fig. 4.
Sequence FSBM TSS DS FWS
MSE Time [s] MSE Time [s] MSE Time [s] MSE Time [s]
Football 142.58 25.16 260.31 1.84 347.19 1.90 153.27 2.73
Foreman 28.37 7.28 37.94 0.59 33.57 0.44 29.84 0.72
Stefan 134.45 25.36 302.86 1.91 365.82 1.36 148.65 2.63
Akiyo 3.91 25.53 4.57 1.75 4.00 1.25 4.12 2.65
Children 50.31 25.21 80.84 1.80 64.98 1.25 56.53 2.72
Mother&Daughter 18.23 6.61 22.24 0.53 22.06 0.425 20.35 0.71
Table 1. Average MSE of different methods
4. CONCLUSIONS
In this paper, we proposed a fast block matching method, Fast Walsh Search, which is based on
transforming the target blocks in current frame and their candidates in reference frame into WHT
coefficients, so that most mismatches can be rejected in an early stage effectively with PSAD. We further
propose to use SADDCC from the intermediate results of the recursive WH tree to search for the best
match among the remaining candidates. All these measures significantly reduce computations in the block
matching process, but maintain high matching accuracy. Experimental results show that the computation
requirement of FWS is similar to the TSS, but the accuracy outperforms TSS and DS significantly in
terms of MSE, and comparable with that of FSBM. A demo program of FWS can be found at
http://vspc.ee.cuhk.edu.hk/project/fws/
(a) (b) (c)
(d) (e) (f)
Fig. 4.Frame MSE for test sequences (a)Football, (b)Foreman, (c)Stefan (d)Akiyo (e)Children, f)Mother&Daughter
REFERENCES
[1] Y. Hel-Or; H. Hel-Or; “Real time pattern matching using projection kernels”, Proc. of Ninth IEEE
International Conference on Computer Vision, Vol. 1, Oct. 2003, pp. 1486 – 1493.
[2] Y. Hel-Or; H. Hel-Or; “Real time pattern matching using projection kernels”, IEEE Trans. Pattern
Analysis and Machine Intelligence, Vol. 27, No. 9, Sept 2005, pp. 1430 – 1445.
[3] W. K. Pratt; J. Kane; H.C. Andrew; “Hadamard transform image coding,” IEEE Proc., Vol. 57, Jan.
1969, pp.58-68.
[4] T. Koga; K. Iinuma; A. Hirano; Y. Iijima; T. Ishiguro; “Motion compensated interframe coding for
video conferencing,” in Proc. Nat. Tele. Conf., New Orleans, LA, Nov. 1981, pp. G5.3.1-5.3.5.
[5] Lai-Man Po; Wing-Chung Ma; “A novel four-step search algorithm for fast block motion
estimation,” IEEE Trans on Circuits and Systems for Video Technology, Vol. 6, No. 3, June 1996,
pp. 313-317.
[6] Reoxiang Li; Bing Zeng; Liou, M.L.; “A new three-step search algorithm for block motion
estimation,” IEEE Trans. on Circuits and Systems for Video Technology, Vol. 4, No. 4, Aug. 1994,
pp.438-442.
[7] Shan Zhu; Kai-Kuang Ma; “A new diamond search algorithm for fast block-matching motion
estimation,” IEEE Trans. on Image Processing, Vol. 9, No. 2, Feb. 2000, pp. 287-290.
[8] T. Wiegand; G. J. Sullivan; G. Bjontegaard; A. Luthra, “Overview of the H.264/AVC video coding
standard,” IEEE Trans. Circuits Syst. Video Technol, Vol. 13, Jul. 2003, pp. 560-576.
Get documents about "