Performance Analysis of Overlapped Motion Compensated Temporal Interpolationusing Open Multiprocessing
Motion Compensation is an essential part of different video compression techniques. Video compression is always required especially for storage and transmission of videos. Motion Compensation is computationally complex and data intensive process. Overlapped Motion Compensated Temporal Interpolation (OMCTI) is a block based approach for the temporal interpolation of skipped frames. It generates interpolated frames with considerably improved quality. In this paper, Open Multiprocessing (OpenMP) based multithreaded solution is proposed to reduce the computation time of Overlapped Motion Compensated Temporal Interpolation. The OpenMP based solution is tested on multi-core processor for evaluation of performance. The paper is concluded with a discussion about the generated experimental results.

2012 1st International Conference on Future Trends in Computing and Communication Technologies
Performance Analysis of Overlapped Motion Compensated Temporal Interpolation
using Open Multiprocessing
Madiha Sher1, Muhammad Asif Manzoor2, Munaza Sher1, Nasru Minallah1
1
Department of Computer Systems Engineering, University of Engineering & Technology, Peshawar, Pakistan
2
Department of Computer Engineering, Umm Al-Qura University, Makkah, Kingdom of Saudi Arabia
madiha@nwfpuet.edu.pk, mamanzoor@uqu.edu.sa, engrmunaza@yahoo.com, nasru.minallah@nwfpuet.edu.pk
Abstract— Motion Compensation is an essential part of receiver end. Simple frame reconstruction methods such
different video compression techniques. Video as frame repetition at the receiver end can produce
compression is always required especially for storage and undesirable results in the output video sequence.
transmission of videos. Motion Compensation is
computationally complex and data intensive process. Motion Compensated Temporal Interpolation [1]
Overlapped Motion Compensated Temporal Interpolation (MCTI) is a technique proposed by Chi-Kong Wong et
(OMCTI) is a block based approach for the temporal al. for the generation of the skipped frames. MCTI is a
interpolation of skipped frames. It generates interpolated block based Motion Compensation algorithm. In MCTI,
frames with considerably improved quality. In this paper, block motions are compensated by tracking the blocks
Open Multiprocessing (OpenMP) based multithreaded between successive received frames at the receiver end.
solution is proposed to reduce the computation time of The trajectory of each block is calculated and then these
Overlapped Motion Compensated Temporal blocks are placed at appropriate locations according to
Interpolation. The OpenMP based solution is tested on the calculated trajectories in the interpolated frames.
multi-core processor for evaluation of performance. The MCTI is a block based algorithm so the inserted frames
paper is concluded with a discussion about the generated tend to be blocky. In order to reduce this blocky effect
experimental results. an Overlapped Motion Compensated Temporal
Interpolation [2] (OMCTI) technique is proposed by
Index terms -- Motion Compensated Temporal
Interpolation (MCTI), Overlapped Motion Compensated
Chi-Kong Wong et al. The main disadvantage of these
Temporal Interpolation (OMCTI), Block based search, motion compensation algorithms is the massive
OpenMP, Multithreading. computational requirements.
Parallel programming approaches can be used to
I. INTRODUCTION overcome the massive computational requirements of
motion compensation. Multithreading method is an
Many modern multimedia applications require low
efficient parallel programming model used for the
bit rate transmission of the video sequences. This
improvement of the computing capability of underlying
limitation of low bit rate transmission of video sequence
system. OpenMP (OMP) is a parallel programming API
is imposed by the limited bandwidth of the transmission
which is used to develop multithreaded applications for
channel. Video compression performs a very significant
shared memory architecture systems. These systems can
role in these multimedia applications. Every digitized
have one or multiple cores. In this paper, we discuss the
video contains substantial amount of redundant data and
OpenMP based multithreaded implementation of
compression can be achieved by taking advantage of
Overlapped Motion Compensated Temporal
these redundancies. The redundant data contained in a
Interpolation for the construction of skipped frames.
video is generally classified into two categories:
The rest of the paper is organized as follow: In
statistical redundancy and subjective redundancy. The
section-II we provide some discussion about the state-of-
objective of video compression is to get rid of both
the-art literature review. Furthermore, an overview of
spatial and temporal domain redundancies. Motion
some motion compensated algorithms and some
Compensation is widely used in video compression,
OpenMP applications are provided in this section.
because of its capability to exploit high temporal
Moreover MCTI, OMCTI and the parallel
correlation between consecutive frames of a video
implementation is discussed in Section-III. The
sequence. To achieve low bit rate encoding requirement,
discussion about our generated experimental results is
temporal subsampling is a very valuable technique and it
provided in Section-IV. Finally Section-V is used to
can also be combined with any video compression
provide the conclusion of this work.
method to gain very high compression ratio. The
skipped frames are required to be reconstructed at the
receiver end in order to achieve high frame rate at the
20
2012 1st International Conference on Future Trends in Computing and Communication Technologies
II. LITERATURE REVIEW have discussed a multithreaded PDE-Based Image
In [3], zonal algorithm is used for motion Registration process using OpenMP on a dual core
compensation. This algorithm requires less computation computer. In this work, the image is divided into four
and produce efficient result. The predicted frame is sub images and every sub image is treated as an
constructed using the current frame and previous frame independent task to be performed on separate core to
by tracking the Macroblocks of the previous frame in the improve the efficiency of the underlying system.
current frame. To improve the performance of the III. METHODOLOGY
interpolate frame, Multihypothesis Motion Compensated
Temporal Interpolation is used. The proposed technique A. Motion Compensated Temporal Interpolation
reduces the artifacts by identifying whether the MCTI is the technique used for the up-sampling of the
examined frame can be predicted or not. received video sequence at the destination in order to
Wavelet based compression produces better interpolate the skipped frames. Generally, for the sake of
subjective quality as compare to block transformation improved compression efficiency, the frames are
coding, as it does not produce blocking artifact [4]. In
skipped to accomplish the low bit rate requirement of the
this approach frames are split into blocks and then 2D
motion vector is computed per block. For motion transmitted video sequence. The interpolated frames are
estimation the center frame is split in the group into inserted in-between the successively received video
moving regions that have similar motion. Prediction of frames. This insertion increases the effective frame rate
frames that are adjacent to central frame is based on of the received video; which results in smoother object
these segments. motion. MCTI algorithm assumes that the motion of an
The used of background frame as reference frame for object is translational and this motion is quite slow
improving the accuracy of motion compensation for hence in-between the successive video frames, the object
video compression is performed in [5]. Among certain motion is almost linear with respect to time. For any k
number of continuous frames the blocks that have kept frame, the objective of temporal interpolation is to
unchanged are constructed as background frame. construct a new frame which has to be added between
Moving objects do not influence the background frames, the present kth and previous (k-1)th received frames at the
so better prediction is obtained when it is used as a decoder side as shown in Fig. 1 [1].
reference along with traditional previous frame. This In MCTI, all the three frames, the inserted frame, the
algorithm also increases the coding efficiency. present kth frame and the previous (k-1)th frames are
Temporal scalability is not supported where there is partitioned into blocks of fixed size N x N. Motion
no motion as it results in blurring of the low frequency vector is calculated for every block. To find this motion
frames [6]. Computational complexity of Motion vector, backward and forward block based motion
Compensated Temporal Filtering (MCTF) can be estimation is performed. The Mean Absolute Difference
reduced by applying MCTF to only those blocks which
(MAD) is selected to find the motion estimation for each
have some motion, and apply temporal filtering to
remaining blocks. Temporal filtering is applied along the block. This selection is made due to the simple nature of
motion trajectory to remove the blurring effect in low Mean Absolute Difference (MAD). For a block A
frequency frames and to reduce the energy in high situated at (x, y) position in the currently kth received
frequency frame which results in compression frame, a search area is defined in the previously received
efficiency. frame (k-1)th of size (2W + 1) x (2W + 1). A motion
In [7], simulation results for various algorithms (blot search is performed in this search area to calculate MAD
search, cross search, logarithmic search) for motion as distortion measure. The MAD between the currently
compensation are obtained. On the bases of results received frame block at position (x, y) and the
obtained from software reference model, blot search previously received frame block at position (x + m, y +
proves to be ideal for hardware implementation on n) is defined as:
FPGA. In [7] RS232 interface with maximum baud rate
of 115200 bps is used for the considered experimental
setup. The designed motion estimation unit is able to
process 55 frames/sec of High Definition TV (HDTV) of
resolution 1920x 1080 pixels, at frequency 116 MHz
and power consumption of 543 mW. (1)
In [8], pipeline implementation of Gauss-Jordan
algorithm is done using OpenMP programming. In this
implementation, among p threads each core is assigned
[n/p] contiguous rows of a Matrices, where n is the
number of cores. Each thread executes n successive
steps of algorithms. Pipelined implementation proves to
be better as compare to other parallel versions of Gauss-
Jordon like, Rowblock and RowCyclic.
With the multicore technology being widely used,
multithreading method are used for improving the
computation capacity by executing multiple threads of
same application on different cores. Chen Lin et al [9]
21
2012 1st International Conference on Future Trends in Computing and Communication Technologies
Figure 1. Motion Estimation and Interpolation Figure 2. Diamond Search Algorithm
The block having the minimum of MAD (x, y) (m,
n) for all location (m, n) with in the defined (2W + 1) x
(2W + 1) search area is consider as best matched block.
Thus the motion vector is calculated for every block
of the inserted frame and then the Motion
Compensation is utilized for the construction of the
interpolated frame.
B. Block Based Search for Motion Vector Estimation
Block based search algorithm determines the extent
of displacement of an object on block by block basis.
For every block in the current kth frame, a block is
matched from the previous (k-1)th frame and this
matching is done according to certain criterion. In the
case of MCTI, Mean absolute difference is used as a
selection criterion.
Full search is very preliminary algorithm for the
block based Motion Compensation. In full search, every
block of the current kth frame is compared with every
Figure 3. Sub Blocks for overlapped MCTI
possible block in the previous (k-1)th frame within the
fixed size widow and the best match is found. 2. The MAD point found in the previous search
Although, the full search algorithm gives best Peak- step is re-positioned as the center point to form a new
Signal-to-Noise-Ratio among the entire block based LDSP. If the new MAD point obtained is Located at the
search algorithm and is very simple in nature but full center position, go to Step 3; otherwise, recursively
search is worst algorithm in terms of computation repeat step 2.
required. 3. Switch the search pattern from LDSP to
There are several other algorithms present that SDSP. The MAD point discovered at this Step is the
reduce the computation time as compared to full search final result of the motion vector which refers to the best
algorithm at the expense of degraded quality. These are Matched block.
sub-optimal block based search algorithm. One of these
algorithms is Diamond Search algorithm [10]. The C. Overlapped Motion Compensated Temporal
diamond shape is used as basic search pattern in Interpolation
diamond search algorithm. Algorithm can take The interpolated frames using MCTI produces
unlimited steps to come across the solution. Two fixed defect in the inserted frames. As the MCTI is a block
patterns are used in it; Large Diamond Search Pattern based temporal interpolated scheme; it depends upon
(LDSP) and Small Diamond Search Pattern (SDSP). only one motion vector for every block and this motion
Fig. 2 [11] shows the two patterns and search method. vector is estimated by using MAD between the best
The Diamond Search algorithm works as follows [11]: matched blocks of currently received frame and
1. Initially LDSP is centered at origin of fixed previously received frame, as discussed in the previous
size search window, and 9 points of LDSP are tested. If section. So every block is created without considering
the MAD calculated is located at the center location, go the neighboring blocks. This single motion vector
to Step 3; otherwise, go to Step 2. dependent motion compensation yields the blocky
effect in the newly constructed frame. To overcome this
problem, overlapped motion compensated temporal
interpolation is used. OMCTI utilizes the motion vector
of block itself along with the motion vectors of its
neighboring blocks to interpolate the inserted frame
blocks. A block A is accompanied by 4 neighboring
blocks B, C, D and E as shown in fig. 3 [2]. Let’s
assume that their corresponding motion vectors are
calculated using MCTI and are represented by Va, Vb,
Vc, Vd and Ve respectively. Every block is portioned
into 4 sub blocks. Block A is divided into A1, A2, A3
and A4. Now its own motion vector and motion vectors
of two adjacent blocks are used for the interpolation of
a sub block in the inserted frame. Weighted masks are
used in such a way that the weight of its own motion
vector is highest among all and sum of all the weights
for a sub block is unity. Va, Vb and Vc motion vectors
22
2012 1st International Conference on Future Trends in Computing and Communication Technologies
are used for the prediction of sub block A1. Similarly Several sets of test videos were used to evaluate the
Va, Vc and Vd motion vectors along with their performance of the parallel algorithm. To compare the
respective weights are used for the interpolation of sub parallel algorithm, the practical execution time was
block A2. This overlapping scheme reduces the blocky used as a measure. Practical execution time is the total
effect in the resulting video. time in seconds an algorithm needs to complete the
computation. To decrease random variation, the
D. Parallel Implemenation
execution time was measured as an average of 10 runs.
Video coding techniques are very important for
removing the redundant information from the video to B. Evaluation and Analysis of Performance Model
reduce the memory requirements of the video for The proposed technique is applied on various video
transmission and storage needs. Motion Compensation sequences of different formats like QCIF and CIF to
is a central element of different video codecs and these depict the actual performance improvement. The
Motion Compensation algorithms are based on different performance is improved because of the use of OpenMP
searching algorithm to find the best match between two based multithreaded implementation of block search
consecutive frames received at the destination and algorithm. Both search algorithms, Full search and
generate intermediate frame in order to increase the diamond search, are used for the evaluation of the
frame rate of the received video. Searching algorithms results. Creation of thread also incurs some
for block matching are computationally expensive computational cost. Different numbers of threads are
portion of Motion Compensation and these generated to further evaluate the impact of
computational requirements made it impractical [1], multithreading over the performance. The performance
[12]. In this paper, we have proposed a parallel enhancements of different video sequences are shown
implementation of search algorithms using the OpenMP in the figure 4 – figure 7.
in order to reduce the computational time. OpenMP is a
parallel programming API which is used to develop
multithreaded applications for shared memory
architecture systems. In this approach multiple threads
are created for searching and each thread performs
search on different blocks to find out the best match and
results are combined after the completion of all threads.
In this way search algorithm is performed
simultaneously resulting in overall reduction in
computational time.
The main theme of this work is to implement the
basic algorithm of Overlapped Motion Compensated
Temporal Interpolation using OpenMP API to reduce
the computation time of this algorithm. Performance in
Figure 4. CIF format, Diamond search
terms of timing requirement is improved by using this
OpenMP based technique without making any
modification to the algorithm. As discussed, the search
algorithms used for temporal interpolation are the
computationally extensive part of the algorithm hence
only the search algorithms are implemented using
OpenMP directives. Searching process is handled by
multiple threads while the rest of the algorithm (before
searching and after searching) is executed by only one
thread. The number of threads varies from two threads
to five threads. The time required for this multithreaded
approach is compared with the time required for
sequential single threaded approach. The change in
required time is expressed as percentage to indicate the
performance improvement.
Figure 5. QCIF format, Diamond search
IV. EXPERIMENTAL RESULTS
A. System Platform and Experimental Process
For our experimental evaluation we used an Intel (R)
Core i3-2310M CPU with two processor cores, a 2.10
GHz clock speed and 3 GB of memory. The system ran
Windows 7 and Microsoft Visual Studio 2010 Ultimate.
All programs were implemented in C using OpenMP
interface.
23
2012 1st International Conference on Future Trends in Computing and Communication Technologies
improvement while implemented using full search
algorithm. The performance is approved in every case
but the magnitude of this improvement differs and the
main reason of this difference is number of threads.
When the number of generated threads is less, the
application cannot fully utilize the multi-core
architecture. But when the number of threads is
increased, the major portion of time is consumed in the
creation of the threads and context switching between
these threads making multithreading approach an
overhead. Hence the number of threads must satisfy
both the problems; the number of threads can neither be
very low nor very high. The optimal number of threads
Figure 6. CIF format, Full Search for our work is three to four.
V. CONCLUSION
In this paper, we have proposed OpenMP based
multithreaded implementation of Overlapped Motion
Compensated Temporal Interpolation. Motion
Compensated Temporal Interpolation produces blocky
effect in the constructed video. Overlapping reduces
this blocky effect; which results in a better video output.
With the inclusion of multithreading approach, the
computation time is reduced. Although some of the
time is required for the creation of threads but the
overall performance enhancement is quite sufficient in
order to bear this overhead. The proposed scheme is
tested for the various video sequences of different
Figure 7. QCIF format, Full Search formats and the results are presented in the paper.
The percentage performance improvement is shown REFERENCES
on y-axis while the number of threads created is shown [1] Chi-kong Wong, and Oscar C. Au, “Fast Motion Compensated
on x-axis. It is obvious from the figures that the Temporal Interpolation for Video”, Proc. SPIE: Visual
Communication and Imgae Processing, 1995.
proposed solution gives the best performance when
[2] Chi-kong Wong, Oscar C. Au, and C. W. Tang, “Motion
three or four threads are created. First two figures Compensated Temporal Interpolation for video”, Proc. of IEEE
shows the performance improvement while using ISCAS, 1996.
diamond search algorithm where the next two figures [3] Alexis M. Tourapis, Hye-Yeon Cheongf, Ming L. Liouf, Oscar
shows the performance improvement while using full C. Au, “Temporal Interpolation Of Video Sequences Using
search algorithm. As full search is computationally Zonal Based-Algorithms”, Proc. of Conference on Image
Processing, 2001.
more expensive, the performance is more improved.
[4] Danny Lazar and Amir Averbuch, “Wavelet Video Compression
Fig. 4 shows the result of the proposed solution for Using Region Based Motion Estimation And Compensation”,
CIF video format while using diamond search Proc. of IEEE International confference on Acoustics, Speech
algorithm. and Video Processing, 2001.
Fig. 5 shows the result of the proposed solution for [5] Rong Ding, Qionghai Dai, Wenli Xu, Dongdong Zhu and Hao
QCIF video format while using diamond search Yin, “Background-frame Based Motion Compensation for
Video Compression”, Proc. of IEEE Internatinal Conference on
algorithm. Multimeda and Expo, 2004.
Fig.6 shows the result of the proposed solution for [6] Karunakar A. K. and Manohara Pai M. M.; “Computationally
CIF video format while using full search algorithm. Efficient MCTF for Scalable Video Coding”, Proc. of IEEE
Fig.7 shows the result of the proposed solution for International Conference on Advance Computing and
Communication, 2006.
QCIF video format while using full search algorithm.
[7] Vikram Arkalgud Chandrasetty, Shridhar R Laddh; “A Novel
The figures show the percentage improvement of the Dual Processing Architecture for Implementation of Motion
OpenMP based approach while executing it with Estimation Unit of H.264 AVC on FPGA”, IEEE Symposium on
different number of threads. This percentage is Industrial Electronics and Applications, 2009.
calculated as the change in the execution time required [8] Panagiotis D. Michailidis and Konstantinos G. Margaritis,
by the multithreaded application and sequential / time “Open Multi Processing (OpenMP) of Gauss-Jordan Method for
Solving System of Linear Equations”, Proc. of IEEE 11th
required by the single threaded application. The best International Conference on Computer and Information
results are achieved by using three or four threads. The Technology, 2011.
proposed approach shows approximately 30% [9] Lin Yang, Leiguang Gong, Hong Zhang, John L. Nosher, and
improvement when its implemented using diamond David J. Foran, “A Multicore Based Parallel Image Registration
Method,” Proc. of 31stAnnual International Conference of the
search as searching algorithm and about 50% IEEE EMBS, 2009.
24
2012 1st International Conference on Future Trends in Computing and Communication Technologies
[10] Shan Zhu and Kai-Kuang Ma, “A New Diamond Search
Algorithm for Fast Block-Matching Motion Estimation”, IEEE
Transactions on Image Processing, 2000.
[11] Sherief M. Hashimaa, Imbaby I. Mahmoud and Atef A. Elazm,
“Hardware Implementation of Diamond Search Algorithm for
Motion Estimation and Object Tracking”, Proc. of the 7th
Conference on Nuclear and Particle Physiscs, 2009.
[12] C. W. Tang and O. C. Au, “ Unidirectional Motion
Compensated Temporal Interpolation”, IEEE International
Symposim on Circuits and Systems, 1997.
25
Get documents about "