Review and Implementation of DWT based Scalable Video Coding

Document Sample
Review and Implementation of DWT based Scalable Video Coding Powered By Docstoc
					Project Title:

   Review and Implementation of DWT based Scalable
      Video Coding with Scalable Motion Coding.




Midterm Report
      CS 584
      Multimedia Communications




Submitted by:
     Syed Jawwad Bukhari
     2004-03-0028
About Project..................................................................................................................................................................1

Project Goals...................................................................................................................................................................1

Introduction ....................................................................................................................................................................2
   Why Image/Video Compression?............................................................................................................................ 2
   Error Metrics ............................................................................................................................................................... 2
   Why Scalable Video Coding? .................................................................................................................................. 3

Background study..........................................................................................................................................................4
  Motion Prediction....................................................................................................................................................... 4
  DWT ............................................................................................................................................................................. 5

Video Compression Using DWT................................................................................................................................6
  EZW .............................................................................................................................................................................. 6
  TWAVIX ..................................................................................................................................................................... 8

Project Work...................................................................................................................................................................8

Next Phase........................................................................................................................................................................9

Results.............................................................................................................................................................................10

References ......................................................................................................................................................................11
Page 1


About Project
         In this project I am reviewing and analyzing different scalable video coding with
scalable motion coding schemes that use the discrete wavelet transform for video
compression.

         For this I am implementing a scalable video coder using discrete wavelet
transform for different famous approaches described in text. In the later part I will look at
scalable motion coding part. For this I will be analyzing (in detail) the approach
described by Boisson [1].

         Main purpose of this project is to get understanding of video compression by
using wavelets and to carry out analysis for scalable motion coding so that I can come up
with some optimal estimates of parameters in a scalable video codec.

         This mid term report is organized as follows. First of all I will describe what is
scalable video coding and why we need it after that scalable motion coding and its
importance in scalable video coding will be discussed. Further a background study for
using wavelets will be discussed and finally the work done so far with the proposed
implementation scheme will be presented.

Project Goals
         The goal of this project is to explore some of famous DWT based techniques for
scalable video compression. More precisely the scope of project is as follows:

            v Design and implement a Scalable Video Codec.

            v Performing analyses for finding balance between different parameters
                involved in a scalable codec.

            v Study and implementation of a scalable motion coding technique
                described in [1].
Page 2


Introduction
Why Image/Video Compression?
         A video is a sequence of frames that are to be played out in such a manner that the
viewer gets illusion of a scene being played. Each frame of video is an image and as we
know that image compression is necessary since each pixel in a colored image required
24bits (for all three color channels) and with a frame size of 800 by 600 it requires almost
1.4MB of memory to store one frame and to display a sequence of frames as a video with
30 frames per second we require 2.5 GB memory for one minute of a video. Hence
Image/Video compression is necessary to save memory as well as the bandwidth if the
image/video is to be transferred over a network link.

         Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT) are
well known to achieve image compression by exploiting the spatial redundancy of the
image. These can achieve very high compression ratios for natural images. DWT has
been shown to achieve image compression ratios far beyond than that of DCT. For high
resolution images it outperforms the DCT based methods while keeping the perceptual
quality of image in acceptable ranges. The trade off in image/video compression is
compression ratio versus the perceptual quality. For very high compression ratio usually
we get degradation in the perceptual quality. In recent times wavelets based techniques
have shown promising results to achieve very high compression within the acceptable
quality ranges. I will describe more about it in next sections but first look at the error
metrics that are most commonly used to measure image quality.

Error Metrics
         Mean Square Error (MSE) and Peak Signal to Noise Ratio (PSNR) are two error
metrics commonly used for measuring image quality. The mathematical formulae for
these are as follows:
                               M     N
                          1
                MSE =
                         MN   ∑ ∑ (OrgPixelValue (i, j) −DecodedPix elValue (i, j))
                               i =1 j =1
                                                                                      2




                PSNR = 20 × log 10 ( 255         )
                                           MSE
Page 3


         Clearly if MSE between original and decompressed image is high than the quality
is supposed to be of a lower level and a higher PSNR value would mean a higher quality
of compression/decompression as the inverse relation between PSNR and MSE depicts.
                                                        ^
One more measure for perceptual quality is S it is best of the above for measuring
perceptual quality.
Why Scalable Video Coding?
         Heterogeneous network users request for different type of quality for the same
video for example a user with a lower bandwidth connection may require a low quality
video while a user watching the same video on a HDTV will require a video with much
higher quality resulting in greater requirements for bandwidth. A mobile user will require
the same video in a lower resolution mode due to smaller screen size and memory
requirements. Now a generic video coder to fulfill requirements for such a wide range of
bit rates and qualities it requires the video coder to be scalable. Also for achieving such a
high scalability it is essential that motion coding present some scalability. In the later part
of the report it will be discuss in much more detail.
Page 4



Background study
         First of all we will discuss basics of video compression in short. To compress a
video we usually exploit spatial as well as temporal redundancies. Spatial redundancies
can be exploited by conventional image compression methods while the temporal
redundancies can be exploited by various approaches. Here I will describe the use of
motion prediction method to achieve temporal redundancy.

Motion Prediction
         Motion in adjacent frame for a natural scene can be estimated from the previous
frame (as well as from future frames) that is most of the frame content is repeated in
consecutive frames to give effect of smooth motion. Therefore we can estimate motion
occurring in one frame from its neighboring frames. For this we use block-matching
algorithms and try to find similar small size blocks in adjacent frames. Motion vectors
represent the amount of motion that happened in current frame with respect to some
reference frame. All we need to do is to send a GOP (group of pictures) in which first
frame will be intra-coded (I-frame) i.e. a frame with no motion vectors and it is just
compressed as an image by efficient image compression methods. Now this can predict
the next frame at decoder side with the use of motion vectors. There are various
approaches for motion estimation and compensation. I will discuss the generic motion
prediction mechanism.

         Consider the following GOP
    I1       B2         B3       P4        B5       B6        P7         B8   B9   I10
         P frames are predicted from the previous I or P frames whereas B frames are
predicted from I and P frames on both forward and backward indices. We can use
different type of block matching algorithms ranging from exhaustive search to log search
and from pixel to sub-pixel search space. Experiments have shown that half or quarter
pixel value search techniques have significant achievement in quality.

         Motion vectors are estimated from motion compensation unit and the difference
of motion compensated image with the original image is compressed and transmitted.
Page 5


Motion estimation is quite useful for exploiting temporal redundancies at the expense of
extra computations.

DWT
         DWT is found very useful in transform based image compression. Due to better
energy compaction as well as correspondence with the human visual system, wavelet
transform methods for image compression are very successful. The Embedded Zero-tree
wavelet (EZW) algorithm provides a compression ratio of 100:1 while keeping the
perceptual quality of image in acceptable range. I will discuss different wavelet based
algorithms in next section.

         In the following figure simple image compression technique is shown which uses
DWT transform. We take the DWT on the whole image; it results in approximation
coefficients in which we can again take the 2D-wavelet transform. This can be done as
many levels as required

     Image        Wavelet             Quantization                 Entropy    Compressed
                  Transform                                        Coding     image

                Figure 1: Image Compression Using Wavelet




               1 level 2D DWT                                      2 level 2D DWT
                                 Figure 2: 1 and 2 levels of DWT
         In the above figures 1 level and 2 levels DWT on the cameraman images are
shown. Each application of DWT divides the image into sub-band image. The top left
corner is approximation coefficients that are the LL (low, low) frequency region.
Page 6


Video Compression Using DWT
         There are various approaches to get video compression one way is to perform
DWT on the intra-coded frame as well as on the residual of motion compensated and
original frame. In this approach DWT is used only for exploiting spatial redundancy and
for temporal redundancy we use block matching motion prediction. Many variations and
techniques are proposed and are in use under this framework. Another approach is to use
3D signal transforms for exploiting spatial and temporal redundancies one such method is
3D DWT based video compression. I will not review such techniques as it is out of the
scope of project. One more approach is to perform motion estimation and compensation
on the wavelet filtered image.

                l
         Now we ook at how we can apply DWT on an image to achieve compression.
When applying 2D DWT on an image for multiple levels we get the approximation
coefficients and detail coefficients. Approximation coefficients are the LL frequencies
that result from the low pass horizontal and vertical resolution filters. Most of the image
energy is present in approximation coefficients and the detail coefficients have the
information for sharp edges etc. Each application of DWT filters sub-bands the image
into an image of size that is half in size of the each dimension of the image. This is
depicted by the figure 2 in which the left image is a decomposed image after one level of
DWT and the right one is after two levels of DWT.

         Once we get the coefficients all we need to do is to encode them efficiently to get
image     compression     so     that   the   reconstructed   image   remains   perceptually
indistinguishable from the original one. The most famous approach that enabled the
application of DWT for achieving very high image compression ratios is embedded zero-
tree wavelet (EZW) coding. One noticeable thing is that EZW coding enables the
progressive image decompression, as we will see while discussing EZW. There are some
improvements in EZW coding that are widely used these algorithms include SPIHT,
WDR and ASWDR. We will discuss only EZW.

EZW
         Shapiro presented the EZW algorithm in [2]. Embedded coding is an approach for
encoding the transformed coefficients to achieve progressive transmission of compressed
Page 7


image hence we can achieve scalability in image compression by sending only those
coefficients that are necessary for image decompression on a specific bit rate. Zero-trees
allow an efficient coding technique of coefficients that will result in embedded coding.

         Consider the following matrix that shows the coefficients with a numbering
scheme representing the order in which coefficients are to be read and encoded.

                              1       2    5    8    17   24   25   32
                              3       4    6    7    18   23   26   31
                              9       10   13   14   19   22   27   30
                              12      11   15   16   20   21   28   29
                              33      34   35   36   49   50   54   55
                              40      39   38   37   51   53   56   61
                              41      42   43   44   52   57   60   62
                              48      47   46   45   58   59   63   64


         The process of embedded coding used in EZW is also referred as bit-plane
encoding. Following is the 5 step bit-plane coding process used in EZW [3]

Step 1: Set an initial threshold such that will only the first coefficient is greater than the
threshold and no other is greater than the threshold.

Step 2: Update the threshold to its half.

Step 3: Significance pass. Scan insignificant values using baseline algorithm scan order
as presented in figure above. Test each value if it greater than threshold then output the
sign of value and set its quantised value to this threshold otherwise set the quantised
value for this coefficient to zero.

Step 4: Refinement pass. Scan significant values found with higher threshold values.
For each significant value output a zero bit if it belongs to quantised value plus the
threshold value interval otherwise output a one bit.

Step 5: Repeat step 2 to 4.

         This way we will get a bit steam and the decoder needs only to produce the
quantised coefficients from it. Using a quad-tree (embedded tree here) structure we can
gain significant image compression by getting many zeros.
Page 8


         Next I will discuss another framework that uses wavelet for video compression.
TWAVIX
         TWAVIX [1] stands for The Wavelet based video coder with scalability. The
following diagram depicts its architecture: first the video is passed from temporal
analysis unit where on the basis of GOP format each frame is either passed to spatial
analysis unit or for motion prediction to the motion estimation unit. The rest of process is
simply the jpeg2000 image compression.




                                  Figure 3: TWAVIX architecture




Project Work
         The architecture for my video codec is almost similar to TWAVIX architecture;
the slight variations can be noticed from the following diagram.




                                     Figure 4: Planned Coder
Page 9



          In this diagram I have shown the architecture of coder that will use the EZW
coding for image compression. For testing and efficiency purposes I am implementing
both the EZW based compression as well as a JPEG2000 like architecture for image
compression in which approximation coefficients are quantized and then entropy coded.
The block diagram is similar to the first except the use of entropy coding instead of EZW
coding.




                           Figure 5: Planned Coder with Entropy Coding



          So far I have completed implementation of this coder in “matlab” the only thing
to do is to make the code in form of a separate coder and decoder and to tune up some
computationally expensive steps.

Next Phase
          In the next phase I will set up a server and make the encoder able for scalable
coding of video for different specifications and then finally performing the experiments
to fix the optimal values or find some relationships between different parameters for
scalable video coding.
Page 10



Results




                   Actual P4 frame                   Actual B2 frame




               Compensated P4 frame               Compensated B2 frame




          Difference between Compensated P4   Difference between Compensated
               frame and Actual P4 frame       B2 frame and Actual B2 frame
Page 11



References
   [1]. “Accuracy-scalable motion coding for efficient scalable video compression”;
        Boisson, Edouard & Guillemot., 2004.
   [2]. “Embedded image coding using zerotrees of wavelet coefficients.”
        IEEE Trans. on Signal Processing, 41(12): 3445-3462, 1993.
   [3]. “The transform and data compression handbook”