Docstoc

Performance Analysis of Overlapped Motion Compensated Temporal Interpolationusing Open Multiprocessing

Document Sample
Performance Analysis of Overlapped Motion Compensated Temporal Interpolationusing Open Multiprocessing Powered By Docstoc
					            2012 1st International Conference on Future Trends in Computing and Communication Technologies



     Performance Analysis of Overlapped Motion Compensated Temporal Interpolation
                                          using Open Multiprocessing

                    Madiha Sher1, Muhammad Asif Manzoor2, Munaza Sher1, Nasru Minallah1

1
    Department of Computer Systems Engineering, University of Engineering & Technology, Peshawar, Pakistan
          2
            Department of Computer Engineering, Umm Al-Qura University, Makkah, Kingdom of Saudi Arabia
       madiha@nwfpuet.edu.pk, mamanzoor@uqu.edu.sa, engrmunaza@yahoo.com, nasru.minallah@nwfpuet.edu.pk


Abstract— Motion Compensation is an essential part of             receiver end. Simple frame reconstruction methods such
different    video    compression     techniques.   Video         as frame repetition at the receiver end can produce
compression is always required especially for storage and         undesirable results in the output video sequence.
transmission of videos. Motion Compensation is
computationally complex and data intensive process.                   Motion Compensated Temporal Interpolation [1]
Overlapped Motion Compensated Temporal Interpolation              (MCTI) is a technique proposed by Chi-Kong Wong et
(OMCTI) is a block based approach for the temporal                al. for the generation of the skipped frames. MCTI is a
interpolation of skipped frames. It generates interpolated        block based Motion Compensation algorithm. In MCTI,
frames with considerably improved quality. In this paper,         block motions are compensated by tracking the blocks
Open Multiprocessing (OpenMP) based multithreaded                 between successive received frames at the receiver end.
solution is proposed to reduce the computation time of            The trajectory of each block is calculated and then these
Overlapped        Motion     Compensated         Temporal         blocks are placed at appropriate locations according to
Interpolation. The OpenMP based solution is tested on             the calculated trajectories in the interpolated frames.
multi-core processor for evaluation of performance. The           MCTI is a block based algorithm so the inserted frames
paper is concluded with a discussion about the generated          tend to be blocky. In order to reduce this blocky effect
experimental results.                                             an Overlapped Motion Compensated Temporal
                                                                  Interpolation [2] (OMCTI) technique is proposed by
    Index terms -- Motion Compensated Temporal
Interpolation (MCTI), Overlapped Motion Compensated
                                                                  Chi-Kong Wong et al. The main disadvantage of these
Temporal Interpolation (OMCTI), Block based search,               motion compensation algorithms is the massive
OpenMP, Multithreading.                                           computational requirements.
                                                                      Parallel programming approaches can be used to
                   I.   INTRODUCTION                              overcome the massive computational requirements of
                                                                  motion compensation. Multithreading method is an
    Many modern multimedia applications require low
                                                                  efficient parallel programming model used for the
bit rate transmission of the video sequences. This
                                                                  improvement of the computing capability of underlying
limitation of low bit rate transmission of video sequence
                                                                  system. OpenMP (OMP) is a parallel programming API
is imposed by the limited bandwidth of the transmission
                                                                  which is used to develop multithreaded applications for
channel. Video compression performs a very significant
                                                                  shared memory architecture systems. These systems can
role in these multimedia applications. Every digitized
                                                                  have one or multiple cores. In this paper, we discuss the
video contains substantial amount of redundant data and
                                                                  OpenMP based multithreaded implementation of
compression can be achieved by taking advantage of
                                                                  Overlapped        Motion      Compensated        Temporal
these redundancies. The redundant data contained in a
                                                                  Interpolation for the construction of skipped frames.
video is generally classified into two categories:
                                                                      The rest of the paper is organized as follow: In
statistical redundancy and subjective redundancy. The
                                                                  section-II we provide some discussion about the state-of-
objective of video compression is to get rid of both
                                                                  the-art literature review. Furthermore, an overview of
spatial and temporal domain redundancies. Motion
                                                                  some motion compensated algorithms and some
Compensation is widely used in video compression,
                                                                  OpenMP applications are provided in this section.
because of its capability to exploit high temporal
                                                                  Moreover MCTI, OMCTI and the parallel
correlation between consecutive frames of a video
                                                                  implementation is discussed in Section-III. The
sequence. To achieve low bit rate encoding requirement,
                                                                  discussion about our generated experimental results is
temporal subsampling is a very valuable technique and it
                                                                  provided in Section-IV. Finally Section-V is used to
can also be combined with any video compression
                                                                  provide the conclusion of this work.
method to gain very high compression ratio. The
skipped frames are required to be reconstructed at the
receiver end in order to achieve high frame rate at the




                                                             20
          2012 1st International Conference on Future Trends in Computing and Communication Technologies


                II.   LITERATURE REVIEW                            have discussed a multithreaded PDE-Based Image
    In [3], zonal algorithm is used for motion                     Registration process using OpenMP on a dual core
compensation. This algorithm requires less computation             computer. In this work, the image is divided into four
and produce efficient result. The predicted frame is               sub images and every sub image is treated as an
constructed using the current frame and previous frame             independent task to be performed on separate core to
by tracking the Macroblocks of the previous frame in the           improve the efficiency of the underlying system.
current frame. To improve the performance of the                                     III.   METHODOLOGY
interpolate frame, Multihypothesis Motion Compensated
Temporal Interpolation is used. The proposed technique             A. Motion Compensated Temporal Interpolation
reduces the artifacts by identifying whether the                   MCTI is the technique used for the up-sampling of the
examined frame can be predicted or not.                            received video sequence at the destination in order to
    Wavelet based compression produces better                      interpolate the skipped frames. Generally, for the sake of
subjective quality as compare to block transformation              improved compression efficiency, the frames are
coding, as it does not produce blocking artifact [4]. In
                                                                   skipped to accomplish the low bit rate requirement of the
this approach frames are split into blocks and then 2D
motion vector is computed per block. For motion                    transmitted video sequence. The interpolated frames are
estimation the center frame is split in the group into             inserted in-between the successively received video
moving regions that have similar motion. Prediction of             frames. This insertion increases the effective frame rate
frames that are adjacent to central frame is based on              of the received video; which results in smoother object
these segments.                                                    motion. MCTI algorithm assumes that the motion of an
    The used of background frame as reference frame for            object is translational and this motion is quite slow
improving the accuracy of motion compensation for                  hence in-between the successive video frames, the object
video compression is performed in [5]. Among certain               motion is almost linear with respect to time. For any k
number of continuous frames the blocks that have kept              frame, the objective of temporal interpolation is to
unchanged are constructed as background frame.                     construct a new frame which has to be added between
Moving objects do not influence the background frames,             the present kth and previous (k-1)th received frames at the
so better prediction is obtained when it is used as a              decoder side as shown in Fig. 1 [1].
reference along with traditional previous frame. This                  In MCTI, all the three frames, the inserted frame, the
algorithm also increases the coding efficiency.                    present kth frame and the previous (k-1)th frames are
    Temporal scalability is not supported where there is           partitioned into blocks of fixed size N x N. Motion
no motion as it results in blurring of the low frequency           vector is calculated for every block. To find this motion
frames [6]. Computational complexity of Motion                     vector, backward and forward block based motion
Compensated Temporal Filtering (MCTF) can be                       estimation is performed. The Mean Absolute Difference
reduced by applying MCTF to only those blocks which
                                                                   (MAD) is selected to find the motion estimation for each
have some motion, and apply temporal filtering to
remaining blocks. Temporal filtering is applied along the          block. This selection is made due to the simple nature of
motion trajectory to remove the blurring effect in low             Mean Absolute Difference (MAD). For a block A
frequency frames and to reduce the energy in high                  situated at (x, y) position in the currently kth received
frequency frame which results in compression                       frame, a search area is defined in the previously received
efficiency.                                                        frame (k-1)th of size (2W + 1) x (2W + 1). A motion
    In [7], simulation results for various algorithms (blot        search is performed in this search area to calculate MAD
search, cross search, logarithmic search) for motion               as distortion measure. The MAD between the currently
compensation are obtained. On the bases of results                 received frame block at position (x, y) and the
obtained from software reference model, blot search                previously received frame block at position (x + m, y +
proves to be ideal for hardware implementation on                  n) is defined as:
FPGA. In [7] RS232 interface with maximum baud rate
of 115200 bps is used for the considered experimental
setup. The designed motion estimation unit is able to
process 55 frames/sec of High Definition TV (HDTV) of
resolution 1920x 1080 pixels, at frequency 116 MHz
and power consumption of 543 mW.                                            (1)
    In [8], pipeline implementation of Gauss-Jordan
algorithm is done using OpenMP programming. In this
implementation, among p threads each core is assigned
[n/p] contiguous rows of a Matrices, where n is the
number of cores. Each thread executes n successive
steps of algorithms. Pipelined implementation proves to
be better as compare to other parallel versions of Gauss-
Jordon like, Rowblock and RowCyclic.
    With the multicore technology being widely used,
multithreading method are used for improving the
computation capacity by executing multiple threads of
same application on different cores. Chen Lin et al [9]

                                                              21
          2012 1st International Conference on Future Trends in Computing and Communication Technologies


         Figure 1. Motion Estimation and Interpolation                       Figure 2. Diamond Search Algorithm

    The block having the minimum of MAD (x, y) (m,
n) for all location (m, n) with in the defined (2W + 1) x
(2W + 1) search area is consider as best matched block.
    Thus the motion vector is calculated for every block
of the inserted frame and then the Motion
Compensation is utilized for the construction of the
interpolated frame.
B. Block Based Search for Motion Vector Estimation
    Block based search algorithm determines the extent
of displacement of an object on block by block basis.
For every block in the current kth frame, a block is
matched from the previous (k-1)th frame and this
matching is done according to certain criterion. In the
case of MCTI, Mean absolute difference is used as a
selection criterion.
    Full search is very preliminary algorithm for the
block based Motion Compensation. In full search, every
block of the current kth frame is compared with every
                                                                           Figure 3. Sub Blocks for overlapped MCTI
possible block in the previous (k-1)th frame within the
fixed size widow and the best match is found.                        2.    The MAD point found in the previous search
Although, the full search algorithm gives best Peak-             step is re-positioned as the center point to form a new
Signal-to-Noise-Ratio among the entire block based               LDSP. If the new MAD point obtained is Located at the
search algorithm and is very simple in nature but full           center position, go to Step 3; otherwise, recursively
search is worst algorithm in terms of computation                repeat step 2.
required.                                                            3.    Switch the search pattern from LDSP to
    There are several other algorithms present that              SDSP. The MAD point discovered at this Step is the
reduce the computation time as compared to full search           final result of the motion vector which refers to the best
algorithm at the expense of degraded quality. These are          Matched block.
sub-optimal block based search algorithm. One of these
algorithms is Diamond Search algorithm [10]. The                 C. Overlapped Motion Compensated Temporal
diamond shape is used as basic search pattern in                     Interpolation
diamond search algorithm. Algorithm can take                         The interpolated frames using MCTI produces
unlimited steps to come across the solution. Two fixed           defect in the inserted frames. As the MCTI is a block
patterns are used in it; Large Diamond Search Pattern            based temporal interpolated scheme; it depends upon
(LDSP) and Small Diamond Search Pattern (SDSP).                  only one motion vector for every block and this motion
Fig. 2 [11] shows the two patterns and search method.            vector is estimated by using MAD between the best
The Diamond Search algorithm works as follows [11]:              matched blocks of currently received frame and
    1. Initially LDSP is centered at origin of fixed             previously received frame, as discussed in the previous
size search window, and 9 points of LDSP are tested. If          section. So every block is created without considering
the MAD calculated is located at the center location, go         the neighboring blocks. This single motion vector
to Step 3; otherwise, go to Step 2.                              dependent motion compensation yields the blocky
                                                                 effect in the newly constructed frame. To overcome this
                                                                 problem, overlapped motion compensated temporal
                                                                 interpolation is used. OMCTI utilizes the motion vector
                                                                 of block itself along with the motion vectors of its
                                                                 neighboring blocks to interpolate the inserted frame
                                                                 blocks. A block A is accompanied by 4 neighboring
                                                                 blocks B, C, D and E as shown in fig. 3 [2]. Let’s
                                                                 assume that their corresponding motion vectors are
                                                                 calculated using MCTI and are represented by Va, Vb,
                                                                 Vc, Vd and Ve respectively. Every block is portioned
                                                                 into 4 sub blocks. Block A is divided into A1, A2, A3
                                                                 and A4. Now its own motion vector and motion vectors
                                                                 of two adjacent blocks are used for the interpolation of
                                                                 a sub block in the inserted frame. Weighted masks are
                                                                 used in such a way that the weight of its own motion
                                                                 vector is highest among all and sum of all the weights
                                                                 for a sub block is unity. Va, Vb and Vc motion vectors

                                                            22
          2012 1st International Conference on Future Trends in Computing and Communication Technologies


are used for the prediction of sub block A1. Similarly              Several sets of test videos were used to evaluate the
Va, Vc and Vd motion vectors along with their                    performance of the parallel algorithm. To compare the
respective weights are used for the interpolation of sub         parallel algorithm, the practical execution time was
block A2. This overlapping scheme reduces the blocky             used as a measure. Practical execution time is the total
effect in the resulting video.                                   time in seconds an algorithm needs to complete the
                                                                 computation. To decrease random variation, the
D. Parallel Implemenation
                                                                 execution time was measured as an average of 10 runs.
    Video coding techniques are very important for
removing the redundant information from the video to             B. Evaluation and Analysis of Performance Model
reduce the memory requirements of the video for                      The proposed technique is applied on various video
transmission and storage needs. Motion Compensation              sequences of different formats like QCIF and CIF to
is a central element of different video codecs and these         depict the actual performance improvement. The
Motion Compensation algorithms are based on different            performance is improved because of the use of OpenMP
searching algorithm to find the best match between two           based multithreaded implementation of block search
consecutive frames received at the destination and               algorithm. Both search algorithms, Full search and
generate intermediate frame in order to increase the             diamond search, are used for the evaluation of the
frame rate of the received video. Searching algorithms           results. Creation of thread also incurs some
for block matching are computationally expensive                 computational cost. Different numbers of threads are
portion of Motion Compensation and these                         generated to further evaluate the impact of
computational requirements made it impractical [1],              multithreading over the performance. The performance
[12]. In this paper, we have proposed a parallel                 enhancements of different video sequences are shown
implementation of search algorithms using the OpenMP             in the figure 4 – figure 7.
in order to reduce the computational time. OpenMP is a
parallel programming API which is used to develop
multithreaded applications for shared memory
architecture systems. In this approach multiple threads
are created for searching and each thread performs
search on different blocks to find out the best match and
results are combined after the completion of all threads.
In this way search algorithm is performed
simultaneously resulting in overall reduction in
computational time.
    The main theme of this work is to implement the
basic algorithm of Overlapped Motion Compensated
Temporal Interpolation using OpenMP API to reduce
the computation time of this algorithm. Performance in
                                                                             Figure 4. CIF format, Diamond search
terms of timing requirement is improved by using this
OpenMP based technique without making any
modification to the algorithm. As discussed, the search
algorithms used for temporal interpolation are the
computationally extensive part of the algorithm hence
only the search algorithms are implemented using
OpenMP directives. Searching process is handled by
multiple threads while the rest of the algorithm (before
searching and after searching) is executed by only one
thread. The number of threads varies from two threads
to five threads. The time required for this multithreaded
approach is compared with the time required for
sequential single threaded approach. The change in
required time is expressed as percentage to indicate the
performance improvement.
                                                                            Figure 5. QCIF format, Diamond search
             IV.   EXPERIMENTAL RESULTS
A. System Platform and Experimental Process
    For our experimental evaluation we used an Intel (R)
Core i3-2310M CPU with two processor cores, a 2.10
GHz clock speed and 3 GB of memory. The system ran
Windows 7 and Microsoft Visual Studio 2010 Ultimate.
All programs were implemented in C using OpenMP
interface.

                                                            23
          2012 1st International Conference on Future Trends in Computing and Communication Technologies


                                                                improvement while implemented using full search
                                                                algorithm. The performance is approved in every case
                                                                but the magnitude of this improvement differs and the
                                                                main reason of this difference is number of threads.
                                                                When the number of generated threads is less, the
                                                                application cannot fully utilize the multi-core
                                                                architecture. But when the number of threads is
                                                                increased, the major portion of time is consumed in the
                                                                creation of the threads and context switching between
                                                                these threads making multithreading approach an
                                                                overhead. Hence the number of threads must satisfy
                                                                both the problems; the number of threads can neither be
                                                                very low nor very high. The optimal number of threads
              Figure 6. CIF format, Full Search                 for our work is three to four.
                                                                                       V.     CONCLUSION
                                                                    In this paper, we have proposed OpenMP based
                                                                multithreaded implementation of Overlapped Motion
                                                                Compensated       Temporal      Interpolation.     Motion
                                                                Compensated Temporal Interpolation produces blocky
                                                                effect in the constructed video. Overlapping reduces
                                                                this blocky effect; which results in a better video output.
                                                                With the inclusion of multithreading approach, the
                                                                computation time is reduced. Although some of the
                                                                time is required for the creation of threads but the
                                                                overall performance enhancement is quite sufficient in
                                                                order to bear this overhead. The proposed scheme is
                                                                tested for the various video sequences of different
             Figure 7. QCIF format, Full Search                 formats and the results are presented in the paper.

    The percentage performance improvement is shown                                       REFERENCES
on y-axis while the number of threads created is shown          [1]   Chi-kong Wong, and Oscar C. Au, “Fast Motion Compensated
on x-axis. It is obvious from the figures that the                    Temporal Interpolation for Video”, Proc. SPIE: Visual
                                                                      Communication and Imgae Processing, 1995.
proposed solution gives the best performance when
                                                                [2]   Chi-kong Wong, Oscar C. Au, and C. W. Tang, “Motion
three or four threads are created. First two figures                  Compensated Temporal Interpolation for video”, Proc. of IEEE
shows the performance improvement while using                         ISCAS, 1996.
diamond search algorithm where the next two figures             [3]   Alexis M. Tourapis, Hye-Yeon Cheongf, Ming L. Liouf, Oscar
shows the performance improvement while using full                    C. Au, “Temporal Interpolation Of Video Sequences Using
search algorithm. As full search is computationally                   Zonal Based-Algorithms”, Proc. of Conference on Image
                                                                      Processing, 2001.
more expensive, the performance is more improved.
                                                                [4]   Danny Lazar and Amir Averbuch, “Wavelet Video Compression
    Fig. 4 shows the result of the proposed solution for              Using Region Based Motion Estimation And Compensation”,
CIF video format while using diamond search                           Proc. of IEEE International confference on Acoustics, Speech
algorithm.                                                            and Video Processing, 2001.
    Fig. 5 shows the result of the proposed solution for        [5]   Rong Ding, Qionghai Dai, Wenli Xu, Dongdong Zhu and Hao
QCIF video format while using diamond search                          Yin, “Background-frame Based Motion Compensation for
                                                                      Video Compression”, Proc. of IEEE Internatinal Conference on
algorithm.                                                            Multimeda and Expo, 2004.
    Fig.6 shows the result of the proposed solution for         [6]   Karunakar A. K. and Manohara Pai M. M.; “Computationally
CIF video format while using full search algorithm.                   Efficient MCTF for Scalable Video Coding”, Proc. of IEEE
    Fig.7 shows the result of the proposed solution for               International Conference on Advance Computing and
                                                                      Communication, 2006.
QCIF video format while using full search algorithm.
                                                                [7]   Vikram Arkalgud Chandrasetty, Shridhar R Laddh; “A Novel
    The figures show the percentage improvement of the                Dual Processing Architecture for Implementation of Motion
OpenMP based approach while executing it with                         Estimation Unit of H.264 AVC on FPGA”, IEEE Symposium on
different number of threads. This percentage is                       Industrial Electronics and Applications, 2009.
calculated as the change in the execution time required         [8]   Panagiotis D. Michailidis and Konstantinos G. Margaritis,
by the multithreaded application and sequential / time                “Open Multi Processing (OpenMP) of Gauss-Jordan Method for
                                                                      Solving System of Linear Equations”, Proc. of IEEE 11th
required by the single threaded application. The best                 International Conference on Computer and Information
results are achieved by using three or four threads. The              Technology, 2011.
proposed approach shows approximately 30%                       [9]   Lin Yang, Leiguang Gong, Hong Zhang, John L. Nosher, and
improvement when its implemented using diamond                        David J. Foran, “A Multicore Based Parallel Image Registration
                                                                      Method,” Proc. of 31stAnnual International Conference of the
search as searching algorithm and about 50%                           IEEE EMBS, 2009.


                                                           24
           2012 1st International Conference on Future Trends in Computing and Communication Technologies


[10] Shan Zhu and Kai-Kuang Ma, “A New Diamond Search
     Algorithm for Fast Block-Matching Motion Estimation”, IEEE
     Transactions on Image Processing, 2000.
[11] Sherief M. Hashimaa, Imbaby I. Mahmoud and Atef A. Elazm,
     “Hardware Implementation of Diamond Search Algorithm for
     Motion Estimation and Object Tracking”, Proc. of the 7th
     Conference on Nuclear and Particle Physiscs, 2009.
[12] C. W. Tang and O. C. Au, “ Unidirectional Motion
     Compensated Temporal Interpolation”, IEEE International
     Symposim on Circuits and Systems, 1997.




                                                                  25

				
DOCUMENT INFO
Shared By:
Stats:
views:62
posted:12/28/2012
language:English
pages:6
Description: Motion Compensation is an essential part of different video compression techniques. Video compression is always required especially for storage and transmission of videos. Motion Compensation is computationally complex and data intensive process. Overlapped Motion Compensated Temporal Interpolation (OMCTI) is a block based approach for the temporal interpolation of skipped frames. It generates interpolated frames with considerably improved quality. In this paper, Open Multiprocessing (OpenMP) based multithreaded solution is proposed to reduce the computation time of Overlapped Motion Compensated Temporal Interpolation. The OpenMP based solution is tested on multi-core processor for evaluation of performance. The paper is concluded with a discussion about the generated experimental results.