# VIDEO COMPRESSION FUNDAMENTALS AND ALGORITHMS

Document Sample

```					VIDEO COMPRESSION
FUNDAMENTALS, part 2

Pamela C. Cosman

1
Extra flavors and refinements

   Many different variations/improvements
possible for motion compensation
   Increased accuracy of motion vectors
   Unrestricted motion vectors
   Multiple frame prediction
   Variable sized blocks
   Motion compensation for objects

2
Accuracy of Motion Vectors
   Digital images are sampled on a grid. What if the actual
motion does not move in grid steps?
   Solution: interpolation of grid points in reference frame
adds a half-pixel grid
   Reference frame effectively has 4 times as many
positions for the best match block to be found

A   h    B
h  (A  B)/2
v   m
v  (A  C)/2
C        D
m  (A  B  C  D)/4

3
Unrestricted Motion Vectors

   Suppose the camera is panning to the left

Lower Left
Macroblock
Reference Frame   Current Frame

   Now consider the lower left macroblock in the
current frame.
   What is the best match for it in the reference frame?

4
Unrestricted Motion Vectors
   If the macroblock were allowed to hang over the
edge, then the best match would be like this:

Lower Left
Macroblock
Reference Frame   Current Frame
   But then the motion vector is pointing outside the
frame!
   The encoder and decoder can agree on some
standard interpolation to deal with this case

5
Unrestricted Motion Vectors

Reference Frame                                Current Frame
   The edge pixels in the reference frame are just replicated
outside the frame, for as many extra columns as necessary
   In this way, a motion vector pointing outside the frame is
acceptable. Can get better matches!
6
Arbitrary Multiple Reference Frames

   In H.261, the reference frame for prediction is always
the previous frame
   In MPEG and H.263, some frames are predicted from
both the previous and the next frames (bi-prediction)
   In H.264, any frame may be designated to be used as
reference:
 Encoder and decoder maintain synchronized buffers
of available frames (previously decoded)
 Reference frame is specified as index into this buffer

7
Multiple Frame Prediction

   H.264 allows multiple frames to be used as references

8
Some Advantages of Multiple References
   If object leaves scene and then comes back,
can have a reference for it in long term past
   Similarly, if the camera pans to the right, and
then back to the left, then the scene that
reappears has a reference
   If there’s an error, and the receiver sends
feedback to say where the error is, then the
encoder can use another reference frame
   Helpful even if there’s no feedback

9
Variable Block-Size MC

   Motivation: size of moving/stationary objects is
variable
   Many small blocks may take too many bits to encode
   Few large blocks give lousy prediction
   Choices: In H.264, each 16x16 macroblock may be:
   Kept whole, or
   Divided horizontally (vertically) into two sub-blocks of size
16x8 (8x16)
   Divided into 4 sub-blocks (8x8)
   In the last case, the 4 sub-blocks may be divided once
more into 2 or 4 smaller blocks.

10
H.264 Variable Block Sizes

8 x8
16 x 16
8 x8

16 x 8                Tree-Structured Motion Compensation

8 x 16

16 x 16   8 x 16     16 x 8    8x8

8x8       8x4     4x8   4x4

11
Motion Scale Example
T=1             T=2

12
H.264 Variable Block Size Example
T=1                 T=2

13
Variable Output Rate
   Typically, more bits produced
when there is high motion or
   Suppose the control                           fine detail
parameters of a video                        Example: # of bits per frame
encoder are kept                              varies from 1300 to 9000
constant:
   (32-225 kbits per second)
   Quantization parameter
9000        Bits
   Motion estimation search               per
window size, etc.                     frame
   Then the # of coded
bits per macroblock
(and per frame) will           1000

vary                                  0       Frame Number            200

14
Rate Control

   Streams are usually coded for target rates,
for example, 3 Mbit/second
   How are bits allocated among frames?
   Macroblocks in I-frames are all intra coded
   Macroblocks in P/B frames can be coded as:
   Intra (DCT blocks)
   Motion vectors only
   Motion vectors and difference DCT blocks
   Nothing at all (skipped)

15
Rate Control
   The frames will have differing numbers of bits

   This variation in bit rate can be a problem for many
practical delivery and storage mechanisms
   Constant bit rate channel (such as a circuit-switched
channel) cannot transport a variable-bitrate data stream
   Even a packet-switched channel is limited by link rates and
congestion at any point in time

16
Constant rate channel

   The variable data rate produced by an encoder can
be smoothed by buffering prior to transmission

ENCODER         Buffer             Buffer        DECODER

Variable bit rate   Constant rate    Variable bit rate
output from encoder   channel       input to decoder
   First In/First Out (FIFO) buffer at the output of the
encoder; another one at the input to the decoder
   Emptied by the decoder at a variable rate

17
Decoder Buffer Contents
   Takes 0.5 sec before first
complete coded frame
received
First frame decoded
   Then, decoder can extract
and decode frames at
correct rate of 25 fps until…
   At about 4 sec, buffer
empties, decoder stalls
stall                       (pauses decoding)
   Problem: video clip freezes
until more data arrives
   Partial solution: add
0   1   2   3    4   seconds   7 8   9       deliberate delay at decoder
(e.g., 1 sec delay to decode
frame 1, allow buffer to
reach higher fullness)

18
Variable Bit Rate

   Example shows that variable coded bit rate can
be adapted to a constant bit rate delivery
medium using buffers. This entails
   Cost of buffer storage space
   Delay
   Not possible to cope with arbitrary variation of bit
rate using this method, unless buffer size and
decoding delay allowed to get arbitrarily large.
   So… encoder needs to keep track of buffer
fullness…
19
Rate Control
   Goal: with the transmission system at the
target rate for the video sequence, the
encoder & decoder buffers of fixed size never
overflow or underflow
   This is the problem of rate control
   MPEG does not specify how to achieve this
   In addition to preventing overflow/underflow,
the rate control algorithm should also make
the sequence look good

20
Choice of Rate Control Algorithm

 Choice of rate control depends on application
1) Offline encoding of video for DVD storage
   Processing time not a constraint
   Complex algorithm can be employed
   Two-pass encoding:
   Encoder collects statistics about the video in the 1st pass
   Encoder encodes the video on the 2nd pass
   Goal is to “fit” the video on the DVD while:
   maximizing the overall quality of the video
   preventing buffer overflow or underflow during decoding
21
Choice of Rate Control

   2) Encoding of live video for broadcast
   One encoder and multiple decoders
   Decoder processing and buffering are limited
   Encoder may use expensive fast hardware
   Delay of a few seconds usually OK
   Medium-complexity rate-control algorithm
   Perhaps two-pass encoding of each frame

22
Choice of Rate Control

   3) Encoding for two-way videoconferencing
   Each terminal does both encoding and decoding
   Delay must be kept to a minimum (say <0.5 sec)
   Low-complexity rate control
   Buffering minimized to keep delay small
   Encoder must tightly control output rate
   This may cause the output quality to vary
significantly, e.g., may drop when there is
increased movement or detail in the scene

23
Rate Control

   Various possible approaches to rate control
   For example, calculate a target bit rate Ri for
a frame based on
   The number of frames in the group of pictures
   The number of bits available for the remaining
frames in the group
   The maximum acceptable buffer size contents
   The estimated complexity of the frame

24
Rate Control: Example Algorithm
   Let S be the mean absolute      Encode the current
value of the difference          frame using parameter Q
frame after motion              Update the model
compensation (a measure          parameters X1 and X2
of frame complexity)             based on the actual
X 1S X 2 S                   number of bits generated
R      2                      for the frame
Q   Q
   There are also
   Calculate S for the frame        macroblock-level rate
control algorithms when
   Compute the quantizer step       “tight” rate control is
size Q using the model           needed

25
Standards

   Standards Groups (MPEG, VCEG)
   H.261: Videophone/videoconferencing (1990)
   MPEG-1: Low bit rates for dig. storage (1992)
   MPEG-2: Generic coding algorithms (1994)
   H.263: Very low bit rate coding (1995)
   MPEG-4: Flexibility and computer vision
approaches (1998)
   H.264: Recent improvements (2003)

26
Advantages/Disadvantages

   Disadvantages of                   Advantages of
standardization:                    standardization:
   Improvements in price              Interoperability
and performance come               Different platforms
from battle to create and           supported
own proprietary approach           Vendors can compete for
   Proprietary codecs                  improved implementations
generally exhibit higher           Worldwide technical
quality than a standard             community can build on
   Standards are slow                  each other’s work
moving, developed by               Several standards have
committee, try to avoid             been hugely successful
patents

27
H.261: real-time, low complexity, low delay
   Motivated by the definition       Videoconferencing
and planned deployment of
compression:
ISDN (Integrated Services
Digital Network)                      Operate in real time
   Rate of p*64 kbits/s where            Not much coding delay
p is integer 1…30                     Low complexity
   For example, p=2→ 128                  No particular advantage
kbits/s with video coding at           to shifting the complexity
112 kbits/s and audio at 16            onto encoder or decoder
kbits/s                                (each user will require
   Applications: videophone,              both encoding and
videoconferencing                      decoding capabilities)

28
H.261 Basics
   Standardization started 1984, finished 1990
   Uncompressed CIF (4:2:0 chrom. sampling,
15 frames per sec.) requires 18.3 Mbps
   To get this down to p x 64 Kbps requires 10:1
up to 300:1 compression
   H.261 achieves compression using the same
basic elements discussed before:
   Motion compensation (for temporal redundancy)
   DCT + Quantization (for spatial redundancy)
   Variable length coding (run-length, Huffman)
29
H.261 Motion Compensation
   Motion compensation done on macroblocks of size
16 x 16, same as MPEG-1 and -2
   However, consider application fields:
videoconferencing, videophone
   A call is set up, conducted, and terminated.
   These events always occur together, in sequence
   Don’t need random access into the video
   Need low delay
   Also, expect slow-moving objects
   Question: What features should these facts lead to?

30
H.261 Motion Estimation
   Slow movement: For each block of pixels in
the current frame, the search window is only
± 15 pixels in each direction

15
15
15
15

T=1 (previous frame)     T=2 (current frame)
31
H.261 Motion Compensation
   No B pictures: don’t want the delay or
complexity associated with them
   H.261 uses forward motion compensation
from the previous picture only
   First frame is Intra-frame. NO frame after
that has to be Intra. Every subsequent frame
may use prediction from the one before
   This means that to decode a particular frame in
the sequence, it is possible that we will have to
decode from the very beginning. No random
access.
32
ISO MPEG
   Originally set up in 1988, committee had 3 work items:
  MPEG-1: targeted at 1.5 Mbps
 MPEG-2: targeted at 10 Mbps

 MPEG-3: targeted at 40 Mbps

   Later, became clear that algorithms developed for MPEG-2 would
accommodate higher rates, so 3rd work item dropped
   Later MPEG-4 added
   Goals:
   MPEG-1: compression of video/audio for CD playback
   MPEG-2: storage and broadcast of TV-quality audio and video
   MPEG-4: coding of audio-visual objects
   Also MPEG-7 and MPEG-21 which are about multimedia content and
not compression

33
MPEG-1 Audiovisual coder for digital
storage media
   Goal: Coding full-motion video & associated audio at
bit rates up to about 1.5 Mbps
   Brief history of MPEG-1
   October 1988: working group formed
   September 1989: 14 proposals made
   October 1989: video subjective tests performed
   March 1990: simulation model
   November 1992: international standard
   Solution to a specific problem:
   Compress an audio-video source (~210 Mbps) to fit into a
CD-ROM originally designed to handle uncompressed
audio alone (requires aggressive compression 200:1)

34
MPEG-1 major differences

   Unlike videoconferencing, for digital storage
media, random access capability is important
   INTRA frames
   In order to avoid a long delay between the frame a
user is looking for, and the frame where decoding
starts, INTRA frames should occur frequently
   But then the coding efficiency goes down
   Improve compression efficiency using B frames

35
B Frames                                   B pictures – forward, backward, &
interpolatively motion compensated
   Bidirectionally predicted blocks       from previous/next I/P frames
allows effective prediction of
uncovered background
   Bidirectional prediction can
reduce noise (if good
predictions available both past
and future)
   B pictures not used for
    Increases motion estimation
prediction→ substantial
complexity in 2 ways:
reduction in bits (I:P:B 5:3:1)
   Search 2 frames
   Search bigger window if
anchor frame farther away

36
MPEG-2 Generic Coding Algorithms

   Goal: digital video transmission in range 2-15 Mbps
   Generic coding algorithms to support:
   Digital storage media, existing TV (PAL, SECAM, NTSC),
cable, direct broadcast satellite, HDTV, computer graphics,
video games
   Brief history:
   July 1990: working group established
   Nov 1991: Subjective tests on 32 proposals
   March 1993: technical contents of main level frozen
   Nov 1994: international standard (parts 1-3)

37
Main differences MPEG-1 and -2

   MPEG-2 aimed at higher bit rates
   Can be used for larger picture formats
   MPEG-2 has a wider range of bit rates
   Tool kit approach allows use of different subsets
of algorithms
   MPEG-2 supports scalable coding
   SNR scalable, spatially scalable
   MPEG-2 supports interlacing
   This permeates everything: motion compensation,
DCTs, ZigZag ordering for variable length coding

38
Overview of MPEG-4 Visual

   MPEG-4 Visual is meant to handle many
types of data, including
   Moving video (rectangular frames)
   Video objects (arbitrary-shaped regions of moving
video)
   2D and 3D mesh objects (representing
deformable objects)
   Animated human faces and bodies
   Static texture (still images)

39
Video Objects

   MPEG-4 moves away from traditional view of video
as a sequence of rectangular frames
   Instead, collection of video objects
   A video object is a flexible entity that a user can
access (seek, browse) and manipulate (cut, paste)
   A video object (VO) is an arbitrarily-shaped area of
scene that may exist for an arbitrary length of time
   An instance of a VO at a particular time is called a
video object plane (VOP)
   Definition encompasses traditional view of
rectangular frames too

40
MPEG-4: Object-based motion
compensation

T=1                  T=2
41
Static Sprite Coding
   Background may be coded as a static sprite
   The sprite may be much larger than the visible area
of the scene

Source:
http://mpeg.telecomitalialab.com/standards
/mpeg-4/mpeg-4.htm

42
Global Motion Compensation
   The encoder sends up to 4 global motion vectors (GMVs) for
each VOP together with the location of each GMV in the VOP
   For each pixel position, an individual MV is calculated by
interpolating between the GMVs and the pixel position is
motion compensated according to this interpolated vector

   GMVs and                 GMC compensating      GMC compensating
interpolated vector       for rotation           for camera zoom

43
Global Motion Estimation between 2
images assuming 2d affine motion

Compression example: error images before and after global motion compensation
(Soccer sequence: global motion estimation between 1st and 10th frame)

44
Coding Synthetic Visual Scenes
   Animated 2D mesh                    No texture transmitted for
coding                               intermediate frames
   A 2-D mesh is made up of        Mesh parameters
triangular patches               transmitted
   Deformation or motion can       Decoder animates mesh
be modelled by warping the
triangles
   Surface texture may be
compressed as static
texture
   Mesh and texture
information might both be
transmitted for key frames

45
Motion Vectors for Meshes

   A mesh is warped
by transmitting
vectors which
displace the nodes
   Mesh MVs are
predictively coded
   Texture residual
   MPEG-4 also allows 3-D meshes
can be coded with
a very small            The vertices need not be in one plane
number of bits          3-D mesh samples the surface of a
solid body

46
Shape-Adaptive DCT
    The shape-adaptive DCT uses one-dimensional DCT, where the
number of points in the transform matches the number of opaque
values in each column (or row)

Shift                     1-D column
vertically                DCT

Final coefficients   More complex than
normal 8x8 DCT,
but improves
Shift                       1-D row                   coding efficiency
horizontally                DCT                       for boundary MBs

47
Face and Body Animation

   Two basic steps:
   Define basic shape of
face or body model
(typically carried out once
at start of session)
   Send animation
parameters to animate
the model                        In similar way, a body
   Encoder has choice of                 object is rendered from a
set of Body Definition
   Generic facial definition
Parameters (BDPs) and
parameters (FDPs)
animated using Body
   Custom FDPs for a
Animation Parameters
specific face

48
Face Animation
   The generic face can be modified by Facial
Definition Parameters (FDPs) into a particular face
   FDP decoder creates a neutral face: one which
carries no expression
   Change expressions by moving the vertices
   Not necessary to transmit data for each vertex,
instead use Facial Animation Parameters (FAPs)
   Some combinations of vectors are common in
expressions such as a smile, so these are coded as
visemes
   Can be used alone
   Can be used as predictions for more accurate FAPs
   Resulting data rate is small, e.g., 2-3 kbps

49
H.264 Brief history

   The work started in VCEG (in 1998) as a parallel
activity with the final version of H.263
   First test model produced in 1999. Many small
steps over the next 4 years:
   Many tweaks to the integer transform and to the variable
block size
   1/8 pixel accurate MVs added in and then dropped
   Many tweaks on the deblocking filter
   Etc. etc.
   Final version March 2003
   Final results: 2-fold improvement in compression
(compared to H.263 and MPEG-2) & significantly
better than MPEG-4 ASP
50
Comparison of H.264 and MPEG-4
Comparison                    MPEG-4 Visual                             H.264
Supported data types          Rectangular video frames and              Rectangular video
fields, arbitrary-shaped objects, still   frames and fields
texture and sprites, synthetic
objects, 2D and 3D mesh objects
# profiles                    19                                        3

Compression efficiency        medium                                    high

Support for video streaming   Scalable coding                           Switching slices

Motion comp. min block size   8x8                                       4x4

MV accuracy                   ½ or ¼ pixel                              ¼ pixel
Transform                     8 x 8 DCT                                 4 x 4 DCT approx.

Built-in deblocking filter    No                                        Yes
License payments required     Yes                                       Probably no for
baseline

51
Question

   A 16 by 16 MB to be
motion compensated is
shown above
   The search window is
shown below
   Which block(s) in the
search window will
provide the best match
   With MAE error metric?
   With MSE error metric?

52
Question

   A sequence of frames is being coded by an MPEG-
style coder that searches for best-match
macroblocks using full search with an MSE criterion
   Frames are I,B,B,P,B,B.,…
   The camera is moving horizontally by 10 pixels per
frame during this sequence, so 30 pixels of offset
between I frame and subsequent P frame
   Many macroblocks in the P frame might get coded
using MV=(30,0) with no difference block
   Why might some MBs not get coded precisely this
way? List all the reasons you can think of.

53

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 7 posted: 8/29/2012 language: Catalan pages: 53
How are you planning on using Docstoc?