Learning Center
Plans & pricing Sign in
Sign Out




    Pamela C. Cosman

Extra flavors and refinements

   Many different variations/improvements
    possible for motion compensation
       Increased accuracy of motion vectors
       Unrestricted motion vectors
       Multiple frame prediction
       Variable sized blocks
       Motion compensation for objects

Accuracy of Motion Vectors
   Digital images are sampled on a grid. What if the actual
    motion does not move in grid steps?
   Solution: interpolation of grid points in reference frame
    adds a half-pixel grid
   Reference frame effectively has 4 times as many
    positions for the best match block to be found

      A   h    B
                             h  (A  B)/2
      v   m
                             v  (A  C)/2
      C        D
                             m  (A  B  C  D)/4

Unrestricted Motion Vectors

   Suppose the camera is panning to the left

                                          Lower Left
     Reference Frame   Current Frame

   Now consider the lower left macroblock in the
    current frame.
   What is the best match for it in the reference frame?

Unrestricted Motion Vectors
   If the macroblock were allowed to hang over the
    edge, then the best match would be like this:

                                          Lower Left
     Reference Frame   Current Frame
   But then the motion vector is pointing outside the
   The encoder and decoder can agree on some
    standard interpolation to deal with this case

Unrestricted Motion Vectors

 Reference Frame                                Current Frame
     The edge pixels in the reference frame are just replicated
      outside the frame, for as many extra columns as necessary
     In this way, a motion vector pointing outside the frame is
      acceptable. Can get better matches!
Arbitrary Multiple Reference Frames

   In H.261, the reference frame for prediction is always
    the previous frame
   In MPEG and H.263, some frames are predicted from
    both the previous and the next frames (bi-prediction)
   In H.264, any frame may be designated to be used as
      Encoder and decoder maintain synchronized buffers
       of available frames (previously decoded)
      Reference frame is specified as index into this buffer

Multiple Frame Prediction

   H.264 allows multiple frames to be used as references

Some Advantages of Multiple References
   If object leaves scene and then comes back,
    can have a reference for it in long term past
   Similarly, if the camera pans to the right, and
    then back to the left, then the scene that
    reappears has a reference
   If there’s an error, and the receiver sends
    feedback to say where the error is, then the
    encoder can use another reference frame
       Helpful even if there’s no feedback

Variable Block-Size MC

   Motivation: size of moving/stationary objects is
       Many small blocks may take too many bits to encode
       Few large blocks give lousy prediction
   Choices: In H.264, each 16x16 macroblock may be:
       Kept whole, or
       Divided horizontally (vertically) into two sub-blocks of size
        16x8 (8x16)
       Divided into 4 sub-blocks (8x8)
       In the last case, the 4 sub-blocks may be divided once
        more into 2 or 4 smaller blocks.

H.264 Variable Block Sizes

                        8 x8
  16 x 16
            8 x8

16 x 8                Tree-Structured Motion Compensation

             8 x 16

                        16 x 16   8 x 16     16 x 8    8x8

                                       8x8       8x4     4x8   4x4

Motion Scale Example
       T=1             T=2

H.264 Variable Block Size Example
        T=1                 T=2

Variable Output Rate
                                                 Typically, more bits produced
                                                  when there is high motion or
   Suppose the control                           fine detail
    parameters of a video                        Example: # of bits per frame
    encoder are kept                              varies from 1300 to 9000
                                                 (32-225 kbits per second)
       Quantization parameter
                                   9000        Bits
       Motion estimation search               per
        window size, etc.                     frame
   Then the # of coded
    bits per macroblock
    (and per frame) will           1000

    vary                                  0       Frame Number            200

Rate Control

   Streams are usually coded for target rates,
    for example, 3 Mbit/second
   How are bits allocated among frames?
   Macroblocks in I-frames are all intra coded
   Macroblocks in P/B frames can be coded as:
       Intra (DCT blocks)
       Motion vectors only
       Motion vectors and difference DCT blocks
       Nothing at all (skipped)

Rate Control
   The frames will have differing numbers of bits

   This variation in bit rate can be a problem for many
    practical delivery and storage mechanisms
       Constant bit rate channel (such as a circuit-switched
        channel) cannot transport a variable-bitrate data stream
       Even a packet-switched channel is limited by link rates and
        congestion at any point in time

Constant rate channel

   The variable data rate produced by an encoder can
    be smoothed by buffering prior to transmission

     ENCODER         Buffer             Buffer        DECODER

          Variable bit rate   Constant rate    Variable bit rate
          output from encoder   channel       input to decoder
   First In/First Out (FIFO) buffer at the output of the
    encoder; another one at the input to the decoder
   Emptied by the decoder at a variable rate

    Decoder Buffer Contents
                                            Takes 0.5 sec before first
                                             complete coded frame
    First frame decoded
                                            Then, decoder can extract
                                             and decode frames at
                                             correct rate of 25 fps until…
                                            At about 4 sec, buffer
                                             empties, decoder stalls
                 stall                       (pauses decoding)
                                            Problem: video clip freezes
                                             until more data arrives
                                            Partial solution: add
0   1   2   3    4   seconds   7 8   9       deliberate delay at decoder
                                             (e.g., 1 sec delay to decode
                                             frame 1, allow buffer to
                                             reach higher fullness)

Variable Bit Rate

   Example shows that variable coded bit rate can
    be adapted to a constant bit rate delivery
    medium using buffers. This entails
       Cost of buffer storage space
       Delay
   Not possible to cope with arbitrary variation of bit
    rate using this method, unless buffer size and
    decoding delay allowed to get arbitrarily large.
   So… encoder needs to keep track of buffer
Rate Control
   Goal: with the transmission system at the
    target rate for the video sequence, the
    encoder & decoder buffers of fixed size never
    overflow or underflow
   This is the problem of rate control
   MPEG does not specify how to achieve this
   In addition to preventing overflow/underflow,
    the rate control algorithm should also make
    the sequence look good

Choice of Rate Control Algorithm

 Choice of rate control depends on application
1) Offline encoding of video for DVD storage
       Processing time not a constraint
       Complex algorithm can be employed
       Two-pass encoding:
           Encoder collects statistics about the video in the 1st pass
           Encoder encodes the video on the 2nd pass
       Goal is to “fit” the video on the DVD while:
           maximizing the overall quality of the video
           preventing buffer overflow or underflow during decoding
Choice of Rate Control

   2) Encoding of live video for broadcast
       One encoder and multiple decoders
       Decoder processing and buffering are limited
       Encoder may use expensive fast hardware
       Delay of a few seconds usually OK
       Medium-complexity rate-control algorithm
       Perhaps two-pass encoding of each frame

Choice of Rate Control

   3) Encoding for two-way videoconferencing
       Each terminal does both encoding and decoding
       Delay must be kept to a minimum (say <0.5 sec)
       Low-complexity rate control
       Buffering minimized to keep delay small
       Encoder must tightly control output rate
       This may cause the output quality to vary
        significantly, e.g., may drop when there is
        increased movement or detail in the scene

Rate Control

   Various possible approaches to rate control
   For example, calculate a target bit rate Ri for
    a frame based on
       The number of frames in the group of pictures
       The number of bits available for the remaining
        frames in the group
       The maximum acceptable buffer size contents
       The estimated complexity of the frame

    Rate Control: Example Algorithm
   Let S be the mean absolute      Encode the current
    value of the difference          frame using parameter Q
    frame after motion              Update the model
    compensation (a measure          parameters X1 and X2
    of frame complexity)             based on the actual
        X 1S X 2 S                   number of bits generated
     R      2                      for the frame
         Q   Q
                                    There are also
   Calculate S for the frame        macroblock-level rate
                                     control algorithms when
   Compute the quantizer step       “tight” rate control is
    size Q using the model           needed


   Standards Groups (MPEG, VCEG)
   H.261: Videophone/videoconferencing (1990)
   MPEG-1: Low bit rates for dig. storage (1992)
   MPEG-2: Generic coding algorithms (1994)
   H.263: Very low bit rate coding (1995)
   MPEG-4: Flexibility and computer vision
    approaches (1998)
   H.264: Recent improvements (2003)


   Disadvantages of                   Advantages of
    standardization:                    standardization:
       Improvements in price              Interoperability
        and performance come               Different platforms
        from battle to create and           supported
        own proprietary approach           Vendors can compete for
       Proprietary codecs                  improved implementations
        generally exhibit higher           Worldwide technical
        quality than a standard             community can build on
       Standards are slow                  each other’s work
        moving, developed by               Several standards have
        committee, try to avoid             been hugely successful

H.261: real-time, low complexity, low delay
   Motivated by the definition       Videoconferencing
    and planned deployment of
    ISDN (Integrated Services
    Digital Network)                      Operate in real time
   Rate of p*64 kbits/s where            Not much coding delay
    p is integer 1…30                     Low complexity
   For example, p=2→ 128                  No particular advantage
    kbits/s with video coding at           to shifting the complexity
    112 kbits/s and audio at 16            onto encoder or decoder
    kbits/s                                (each user will require
   Applications: videophone,              both encoding and
    videoconferencing                      decoding capabilities)

H.261 Basics
   Standardization started 1984, finished 1990
   Uncompressed CIF (4:2:0 chrom. sampling,
    15 frames per sec.) requires 18.3 Mbps
   To get this down to p x 64 Kbps requires 10:1
    up to 300:1 compression
   H.261 achieves compression using the same
    basic elements discussed before:
       Motion compensation (for temporal redundancy)
       DCT + Quantization (for spatial redundancy)
       Variable length coding (run-length, Huffman)
H.261 Motion Compensation
   Motion compensation done on macroblocks of size
    16 x 16, same as MPEG-1 and -2
   However, consider application fields:
    videoconferencing, videophone
       A call is set up, conducted, and terminated.
       These events always occur together, in sequence
       Don’t need random access into the video
       Need low delay
       Also, expect slow-moving objects
   Question: What features should these facts lead to?

H.261 Motion Estimation
   Slow movement: For each block of pixels in
    the current frame, the search window is only
    ± 15 pixels in each direction


      T=1 (previous frame)     T=2 (current frame)
H.261 Motion Compensation
   No B pictures: don’t want the delay or
    complexity associated with them
   H.261 uses forward motion compensation
    from the previous picture only
   First frame is Intra-frame. NO frame after
    that has to be Intra. Every subsequent frame
    may use prediction from the one before
       This means that to decode a particular frame in
        the sequence, it is possible that we will have to
        decode from the very beginning. No random
   Originally set up in 1988, committee had 3 work items:
      MPEG-1: targeted at 1.5 Mbps
     MPEG-2: targeted at 10 Mbps

     MPEG-3: targeted at 40 Mbps

   Later, became clear that algorithms developed for MPEG-2 would
    accommodate higher rates, so 3rd work item dropped
   Later MPEG-4 added
   Goals:
       MPEG-1: compression of video/audio for CD playback
       MPEG-2: storage and broadcast of TV-quality audio and video
       MPEG-4: coding of audio-visual objects
       Also MPEG-7 and MPEG-21 which are about multimedia content and
        not compression

MPEG-1 Audiovisual coder for digital
storage media
   Goal: Coding full-motion video & associated audio at
    bit rates up to about 1.5 Mbps
   Brief history of MPEG-1
       October 1988: working group formed
       September 1989: 14 proposals made
       October 1989: video subjective tests performed
       March 1990: simulation model
       November 1992: international standard
   Solution to a specific problem:
       Compress an audio-video source (~210 Mbps) to fit into a
        CD-ROM originally designed to handle uncompressed
        audio alone (requires aggressive compression 200:1)

MPEG-1 major differences

   Unlike videoconferencing, for digital storage
    media, random access capability is important
       INTRA frames
       In order to avoid a long delay between the frame a
        user is looking for, and the frame where decoding
        starts, INTRA frames should occur frequently
       But then the coding efficiency goes down
       Improve compression efficiency using B frames

B Frames                                   B pictures – forward, backward, &
                                           interpolatively motion compensated
   Bidirectionally predicted blocks       from previous/next I/P frames
    allows effective prediction of
    uncovered background
   Bidirectional prediction can
    reduce noise (if good
    predictions available both past
    and future)
   B pictures not used for
                                           Increases motion estimation
    prediction→ substantial
                                            complexity in 2 ways:
    reduction in bits (I:P:B 5:3:1)
                                                 Search 2 frames
                                                 Search bigger window if
                                                  anchor frame farther away

MPEG-2 Generic Coding Algorithms

   Goal: digital video transmission in range 2-15 Mbps
   Generic coding algorithms to support:
       Digital storage media, existing TV (PAL, SECAM, NTSC),
        cable, direct broadcast satellite, HDTV, computer graphics,
        video games
   Brief history:
       July 1990: working group established
       Nov 1991: Subjective tests on 32 proposals
       March 1993: technical contents of main level frozen
       Nov 1994: international standard (parts 1-3)

Main differences MPEG-1 and -2

   MPEG-2 aimed at higher bit rates
       Can be used for larger picture formats
   MPEG-2 has a wider range of bit rates
       Tool kit approach allows use of different subsets
        of algorithms
   MPEG-2 supports scalable coding
       SNR scalable, spatially scalable
   MPEG-2 supports interlacing
       This permeates everything: motion compensation,
        DCTs, ZigZag ordering for variable length coding

Overview of MPEG-4 Visual

   MPEG-4 Visual is meant to handle many
    types of data, including
       Moving video (rectangular frames)
       Video objects (arbitrary-shaped regions of moving
       2D and 3D mesh objects (representing
        deformable objects)
       Animated human faces and bodies
       Static texture (still images)

Video Objects

   MPEG-4 moves away from traditional view of video
    as a sequence of rectangular frames
   Instead, collection of video objects
   A video object is a flexible entity that a user can
    access (seek, browse) and manipulate (cut, paste)
   A video object (VO) is an arbitrarily-shaped area of
    scene that may exist for an arbitrary length of time
   An instance of a VO at a particular time is called a
    video object plane (VOP)
   Definition encompasses traditional view of
    rectangular frames too

MPEG-4: Object-based motion

     T=1                  T=2
Static Sprite Coding
   Background may be coded as a static sprite
   The sprite may be much larger than the visible area
    of the scene


Global Motion Compensation
   The encoder sends up to 4 global motion vectors (GMVs) for
    each VOP together with the location of each GMV in the VOP
   For each pixel position, an individual MV is calculated by
    interpolating between the GMVs and the pixel position is
    motion compensated according to this interpolated vector

   GMVs and                 GMC compensating      GMC compensating
    interpolated vector       for rotation           for camera zoom

Global Motion Estimation between 2
images assuming 2d affine motion

    Compression example: error images before and after global motion compensation
       (Soccer sequence: global motion estimation between 1st and 10th frame)

Coding Synthetic Visual Scenes
   Animated 2D mesh                    No texture transmitted for
    coding                               intermediate frames
       A 2-D mesh is made up of        Mesh parameters
        triangular patches               transmitted
       Deformation or motion can       Decoder animates mesh
        be modelled by warping the
       Surface texture may be
        compressed as static
       Mesh and texture
        information might both be
        transmitted for key frames

Motion Vectors for Meshes

   A mesh is warped
    by transmitting
    vectors which
    displace the nodes
   Mesh MVs are
    predictively coded
   Texture residual
                            MPEG-4 also allows 3-D meshes
    can be coded with
    a very small            The vertices need not be in one plane
    number of bits          3-D mesh samples the surface of a
                             solid body

Shape-Adaptive DCT
    The shape-adaptive DCT uses one-dimensional DCT, where the
     number of points in the transform matches the number of opaque
     values in each column (or row)

                   Shift                     1-D column
                   vertically                DCT

                                     Final coefficients   More complex than
                                                          normal 8x8 DCT,
                                                          but improves
    Shift                       1-D row                   coding efficiency
    horizontally                DCT                       for boundary MBs

Face and Body Animation

   Two basic steps:
       Define basic shape of
        face or body model
        (typically carried out once
        at start of session)
       Send animation
        parameters to animate
        the model                        In similar way, a body
   Encoder has choice of                 object is rendered from a
                                          set of Body Definition
       Generic facial definition
                                          Parameters (BDPs) and
        parameters (FDPs)
                                          animated using Body
       Custom FDPs for a
                                          Animation Parameters
        specific face

Face Animation
   The generic face can be modified by Facial
    Definition Parameters (FDPs) into a particular face
   FDP decoder creates a neutral face: one which
    carries no expression
   Change expressions by moving the vertices
   Not necessary to transmit data for each vertex,
    instead use Facial Animation Parameters (FAPs)
   Some combinations of vectors are common in
    expressions such as a smile, so these are coded as
       Can be used alone
       Can be used as predictions for more accurate FAPs
   Resulting data rate is small, e.g., 2-3 kbps

H.264 Brief history

   The work started in VCEG (in 1998) as a parallel
    activity with the final version of H.263
   First test model produced in 1999. Many small
    steps over the next 4 years:
       Many tweaks to the integer transform and to the variable
        block size
       1/8 pixel accurate MVs added in and then dropped
       Many tweaks on the deblocking filter
       Etc. etc.
   Final version March 2003
   Final results: 2-fold improvement in compression
    (compared to H.263 and MPEG-2) & significantly
    better than MPEG-4 ASP
Comparison of H.264 and MPEG-4
Comparison                    MPEG-4 Visual                             H.264
Supported data types          Rectangular video frames and              Rectangular video
                              fields, arbitrary-shaped objects, still   frames and fields
                              texture and sprites, synthetic
                              objects, 2D and 3D mesh objects
# profiles                    19                                        3

Compression efficiency        medium                                    high

Support for video streaming   Scalable coding                           Switching slices

Motion comp. min block size   8x8                                       4x4

MV accuracy                   ½ or ¼ pixel                              ¼ pixel
Transform                     8 x 8 DCT                                 4 x 4 DCT approx.

Built-in deblocking filter    No                                        Yes
License payments required     Yes                                       Probably no for


   A 16 by 16 MB to be
    motion compensated is
    shown above
   The search window is
    shown below
   Which block(s) in the
    search window will
    provide the best match
       With MAE error metric?
       With MSE error metric?


   A sequence of frames is being coded by an MPEG-
    style coder that searches for best-match
    macroblocks using full search with an MSE criterion
   Frames are I,B,B,P,B,B.,…
   The camera is moving horizontally by 10 pixels per
    frame during this sequence, so 30 pixels of offset
    between I frame and subsequent P frame
   Many macroblocks in the P frame might get coded
    using MV=(30,0) with no difference block
   Why might some MBs not get coded precisely this
    way? List all the reasons you can think of.


To top