Docstoc

Image compression

Document Sample
Image compression Powered By Docstoc
					              Topic for lecture 2
• Topic: video compression
• The ultimate compression task?
• Color image (300 x 300 x 24bit):
  – 2.16Mbit/image x 30 image/s = 64.8Mbps
• Motion picture: 90min = 64.8Mbps x 60 x 90 =
  349.92Gbit
• 56.6K modem => Raw download time (excl.
  sound and overhead) ~ 1717 hours or ~ 72 days!!!
        Agenda for lecture 2
• What makes video compression possible?
• Implementations of motion compensation
  – Block matching
• The YCbCr color representation
• MPEG
          Video compression
• A sequence of images that needs to be
  compressed: storage and/or transmission
• Ignore audio as images >> audio
• Straight forward methods
  – Motion JPEG
  – 3D DCT
       Temporal redundancy
• Less than 10% of the pixels changes more than
  1% between frames
• Temporal redundancy or interframe correlation
• Temporal redundancy > spatial redundancy
• Origin: slow camera- and object movements
         Motion compensated coding
• Second generation of temporal compression method
• More efficient (especially with rapid changes) but also more
  complex:
   – Ok since the cost of computer power is decreasing faster than the
     cost of bandwidth
• Basic idea: only difference between two images are the
  moving objects (draw)
• Estimate the motion and simply code this information
• From prediction and the initial frame we can encode/decode
  all other frames
                   Practical issues
• Due to noise, camera movements, light changes etc. =>
  the object and background changes =>
   – Calculate the predicted error (difference) and code this
• Very hard to track and describe a general object
  (contour and texture) instead a block of pixels is used as
  ’object’
• The estimated motion is represented as pure translation:
  no rotation and scaling
   – This is justified since we have high frame rates and ’slow’
     changes
   – Denoted the displacement vector or motion vector
Procedure for motion compensated coding
• Image sequence => image => blocks of pixels
• Step 1: Motion analysis:
   – Estimate the motion vector of the current block, i.e. the
     position of the block in the previous image(s)
• Step 2: Prediction and differentiation
   – Predict how the block found in the previous image(s) will look
     like in the current image
   – Subtract the predicted block from the current block =>
     difference
• Step 3: Entropy encoding of the difference and motion vector
• Encoded difference and motion vector << raw image =>
  video compression
• Step 3 we know
  Motion analysis and prediction
• In general we seek the trajectory of a block so we
  can predict its current position e.g. using weights
• In praxis this is too complicated and instead a 0th
  order predictor is applied:
   – Predicted block(x,y,t) = block(a,b,t-1)
   – MPEG uses two 0th order predictors
• The only unknown issue: step 1: how do we find
  the block in the previous frame that best matches
  the block in the current frame?
• Three methods:
   – Block matching (by far the most applied method)
   – Pel-recursion (block = 1 pixel)
   – Optical flow (block = 1 pixel)
                 Block matching (1)
• Principle
• The displacement of
   the pixels in a block
   are assumed to have the
   same motion vector
• Search window
   – Maximum from frame rate and context
   – Usually a square region
• Usually p=q => square block
• The smaller the block size => the better prediction, but
  more overhead (motion vectors)
• Usually block size = 16 x 16
              Block matching (2)
• Overlapping blocks improve reconstructed
  image quality but decrease the bit-rate
  – Usually non-overlapping blocks are applies
• Block matching via a similarity measure:
  – Sum of squared differences (SSD): S(u,v) = (u-v)^2
  – Mean absolute differences (MAD): S(u,v) = |u-v|
              Searching strategies
• Full search:
   – Finds global minimum but requires heavy processing!
• Only one minimum in the search region => A less
  computational demanding search strategy
• Accept a local minimum =>
   – Larger difference but less processing
• Searching strategies with one (local) minimum:
   –   Coarse-fine three-step search
   –   2D logarithmic search
   –   Conjugate direction search
   –   Etc.
      Coarse-fine three-step search
•   Step 1) Test 9 points within a fixed pattern
•   Step 2+3) Centre the pattern around the best
    match and change the distance within the pattern
YCbCr color representation
            YCbCr color representation
• A camera captures color in RGB format (show)
• We would like a representation where the intensity and
  color is separated:
   –   So we can transmit and decode both a color and gray-scale signal
   –   [R,G,B]: [50,50,50] same color as [100,100,100]
   –   HSI (hue-saturation-intensity)
   –   HSI is complex to calculate so we seek a more simple rep.
• YUV-representation is a simple approximation:
   –   Y = Luminance (intensity) = 0.299 R + 0.587 G + 0.114 B
   –   The non-uniform weighting comes from the HVS
   –   U = B – intensity = ”pure” blue color = 0.492 (B - Y)
   –   V = R – intensity = ”pure” red color = 0.877 (R - Y)
   –   Rough approximation but very simple to compute
       YCbCr color representation (3)
    • The HVS is more sensitive to intensity (Y)
      than to color (Cb and Cr) so more bits can
      be used to represent the intensity
    • Formats:

    4:4:4 (24 bits)         4:2:2 (16 bits)           4:2:0 (12 bits)
1                       1                     1
2                       2                     2
3                       3                     3
4                       4                     4

           = Y sample            = Cb and Cr sample
                             MPEG
• MPEG = Moving pictures experts group
• International standard for compression of video (image,
  sound, and system info.), due to grows in the digital media
  (e.g. CD-rom, DVD) market. Both transmission and
  storage
• MPEG-1: 1991
• MPEG-2: 1994
   – MPEG-2 is MPEG-1 compatible, hence only MPEG-2 used today
• MPEG is NOT an algorithm but rather a framework
  with several algorithms and MANY user-settings.
   – Fixed protocol, hence fixed decoders (encoder not specified! )
   – Asymmetrical codec ~ 100:1 ( JPEG ~1:1 )
• MPEG is a lossy compression algorithm
                       MPEG-1
• MPEG-2 is an ”add-on” to MPEG-1
• Typical bit rate for MPEG-1 = 1.5Mbps
   – Meaning that an MPEG-1 decoder can decode and show
     real-time video that has been compressed to 1.5Mbps.
     MPEG: Trade off between video quality and bandwidth
• Allows resolutions up to 4095 x 4095 at 60Hz
   – Most used is the CPB (constrained parameter bit steam)
      • Fixed resolutions and frame rates =>
        HW implementations
      • Max. resolution = 768 x 576 at 30Hz
      • Max. bit rate = 1.856Mbps
          MPEG-1 compression rate
•   BT.601 (digital TV-signal):
•   704 x 576 x 24bit x 25Hz = 243Mbps
•   Compression factor: 243Mbps / 1.5Mbps = 162
•   JPEG = 10-20
•   YCrCb 4:2:0 format: 12 bit per pixel
•   Basic operation: down-scale to SIF (source input format)
    – Fixed resolution => HW solutions
    – 360 x 288 (ignore lines and/or interpolate)
• 360 x 288 x 12 x 25Hz = 30.4Mbps => comp. factor = 20
• But can be higher or lower
• In general: Fewer input data => better image quality (for
  fixed bit rate)
        MPEG-1 principle (1)
• Full-motion-compensated DCT and
  difference coding
• Frames: 1,2,3,4,5,6,7,8,9, …
• 1: (DCT-JPEG)
• 2,3,4,5,6,7,8,9, … : difference coding
  – The difference is DCT coded and quantized =>
    loosy compression
  – Problems?
  – Error propagation
  – No random access
               MPEG-1 principle (2)
• I-picture: intra-coded
   – Similar to JPEG
• P-picture: predictive
  coded via forward prediction
• B-picture: predictive coded via:
   – forward-, backward-, or bi-directional prediction
• Errors in I and P are limited to max one GOP (group of pixels)
• Errors in B are limited to one picture
• High N and M => good coding but error propagation.
   – Usually: 13<N<16 and 0<M<4
   – Recommended: I each ½ sec. and whenever scene changes
• Coding order vs. visualisation order
Entire sequence




             Type: I,P,B



                              MB = Macro Block

     16         8    8
                Cb 8 Cr 8
16   Y
               4:2:0-format

     8
         8      6 Blocks
            Coding one Block (8x8)
• Similar to JPEG except for adaptive quantization
  – DCT, quantization, zig-zag scan, entropy coding
  – Adaptive quantization controls the quality/amount of data
  – Intra vs. Inter coding:
     • I-blocks: Intra
     • P,B-blocks: Depending on DIFF: 0, motion vectors, Inter, Intra.
     Coding one Block (8x8)

• Encoding




• Decoding
                   What to remember
• Video compression is done by removing the temporal redundancy
• Principle: (at block level)
   – Step 1: Motion analysis => motion vector
   – Step 2: Calculate the error/difference (subtraction)
   – Step 3: Entropy encoding of motion vector and difference
• Motion analysis:
   – Pel-recursion
   – Optical flow
   – Block matching (the currently applied method)
• Block matching
   – Block of pixels (16 x 16)
   – Similarity measure
   – Search region
   – Different search strategies to avoid the full search
                  What to remember
• Video compression is done by removing the temporal redundancy
• Principle: (at (macro)block level)
   – Step 1: Motion analysis (block matching) => motion vector
   – Step 2: Calculate the error/difference (subtraction)
   – Step 3: ’JPEG’-coding (DCT, quantization and entropy encoding)
• MPEG-1:
   – Bit rate ~1.5Mbps
   – Asymmetrical codec ~ 100:1 ( JPEG ~1:1 )
   – Compression rate < 400 (down scaling + YCbCr 4:2:0 => ~20)
   – Coding-style: I B B P B B P B B I
• Questions?
• Presentations: email me tbm@cvmt.dk
• The end
Xtras
                Pel-recursion (1)
• The block consists of only one pixel (= pel)
• Problem formulation:
   – Displaced frame difference function:
   – DFD(x,y,dx,dy) = i(x,y,t) – i(x-dx,y-dy,t-1)
   – Find (dx,dy) which minimises DFD^2 =>
     most similar pixel => best displacement vector
• Solution:
   – Setting the partial derivatives = 0
   – Non-linear programming problem:
      •   Iterative algorithm
      •   Steepest decent method
      •   Newton-Raphson’s method
      •   others
                 Pel-recursion (2)
• Algorithm:
• Find the motion vector (dx,dy) for the first pixel
• The motion vectors
  are correlated =>
  – Use ’old’ (dx,dy) as
    initial guess for the
    iterative algorithm =>
    recursion
               Optical flow
• The block consists of only one pixel
• Similar to Pel-recursive but calculated in a
  different manner
Comparing the 3 types of motion analysis
• The three: pel-recursion, optical flow and block matching
• Optical flow and pel-recursion calculated one motion
  vector for each pixel =>
   – More precise => predicted block and current block are more
     similar => smaller difference => more compact coding of the
     difference.
   – More overhead as more motion vectors are to be coded
   – More complex to calculate
   – Pixel methods avoid the block artefacts of block matching
• Block matching is (at present) more suitable
   – Used in all coding standards
          Temporal methods
• Two methods which exploit both the spatial
  and temporal redundancies
  – Frame replenishment
  – Motion compensation
• Both utilise prediction => short summery
        Frame replenishment (1)

• Exploit the temporal redundancy
• First generation of temporal compression method
• If: value changed significantly:
  | i(x,y,t) – i(x,y,t-1) | > TH
• Then: code value and position: i(x,y,t) x,y
• Else: code nothing => re-use i(x,y,t-1)
• Enhancements:
   – Send differences instead of values
   – Remove noise from the images prior to processing
         Frame replenishment (2)
• A fixed bit rate of 1Mbps means that the decoder
  can only decode and play-back real-time video
  compressed to 1Mbps
• Many changes between two images =>
  many pixels to be coded.
• To achieve the same bit rate
  => TH is higher
  => only large changes are coded
  => poorer reconstruction
  aka. the dirty window effect
        2D logarithmic search
• Test 5 points within a fixed pattern
• Centre the pattern around the best match
• When best match is in the centre or on the
  border: reduce distance in pattern
      Conjugate direction search
• Step 1: Test 3 vertical points next to each other
• Step 2: Move to minimum point
• Continue step 1 and 2 until a minimum is
  found. Then repeat the process in the vertical
  direction
    YCbCr color representation (2)
           Y            0.299 0.587 0.114         R
           U       =   -0.147 -0.289 0.436        G
           V            0.615 -0.515 -0.100       B

• YUV-representation can have negative values,
  so YUV-representation is scaled and shifted to
  avoid this => YCbCr-representation
• Cb and Cr are denoted the chrominances
• YCbCr is the representation utilised in
  image/video compression
      Y             0.257 0.504 0.098         R       16
      Cb       =   -0.148 -0.291 0.439        G   +   128
      Cr            0.439 -0.368 -0.071       B       128
                 Audio in MPEG-1
•   16 bit sampled at: 16, 22.05, 24, 32, 44.1 and 48Kbps
•   Stereo at 44.1Kbps = 1.4Mbps
•   Compression based on psycho-acoustic redundancy:
•   Three methods:                     dB
    – Layer 1: Target rate = 384Kbps
    – Layer 2: Target rate = 256Kbps
    – Layer 3: Target rate = 128Kbps                Hz
• Layer 3 is the most advanced         dB
  and often applied
    – It has a nickname, which?
                                                    Hz
                          MPEG-2
•   Defined in 1994
•   Developed for DTV but has lots of other applications
•   Based on MPEG-1 (backward compatible)
•   Bit rates: 1.5Mbps – 60Mbps. Target: 2-15Mbps (best: 4)
•   Lots of new features including:
    – Support for fields, support for 4:4:4 and 4:2:2
    – Alternative zig-zag scan, better motion vectors
    – Scalability to allow any subset of a stream to be decoded and
      visualised, etc.
• MPEG-3: Purpose: HDTV
    – Merged with MPEG-2 => no MPEG-3 standard
                           MPEG-4
• Both for real video and synthetic video
• Very low bit rates < 64Kbps => efficient coding
• Content based coding: code the objects
   – Shape, texture and sprite (background objects)
• Interactivity
• Popular coding
  standards: