Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>

Overview 20of 20H 264 by I91J4s

VIEWS: 4 PAGES: 85

									  Università degli studi Roma Tre

Overview of the H.264AVC video coding standard


                Maiorana Emanuele




                          Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                                                  Agenda

• Introduction

• Project overview & timeline

• Standardization concepts

• Codec technical design

• New Fewtures
     Prediction
     De-Blocking
     Entropy Coding

• Profiles & levels

• Comparisons




                                Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                                           Introduction
H.264/AVC is newest video coding standard developed by the ITU-
T/ISO/IEC Joint Video Team (JVT), consisting of experts from:

   • ITU-T Video Coding Experts Group (VCEG)
   • ISO/IEC Moving Picture Experts Group (MPEG)

Its design represents a delicate balance between:

   • Coding Gain (improved efficiency by a factor of two over MPEG-2)
   • Implementation complexity
   • Costs based on state of VLSI (ASICs and Microprocessors) design
     technology




                                       Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                                           Terminology
The following terms are used interchangeably:

   •   H.26L
   •   The Work of the JVT or “JVT CODEC”
   •   JM2.x, JM3.x, JM4.x
   •   The Thing Beyond H.26L
   •   The “AVC” or Advanced Video CODEC

Proper Terminology going forward:

   • MPEG-4 Part 10 (Official MPEG Term)
       ISO/IEC 14496-10 AVC
   • H.264 (Official ITU Term)




                                      Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                                                            History
The digital representation of the TeleVision signals created many services
for the content delivery:

    •   Satellite
    •   Cable TV
    •   Terrestrial Broadcasting
    •   ADSL and Fiber on IP

To optimize this services, there is the need of:

    • High Quality of Service (QoS)
    • Low Bit-Rate                                                                Conflicting
    • Low Power Consumption

The source coding is responsible for the reduction of the bit-rate.

Example: the complete transmission of the                             TV        signal,           as        in
Recommendation ITU-R BT.601, would require:

              720 × 576 + 2 (360 × 576) × 25 × 8= 166 Mbit/s
                                         Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                      Video Coding History
Efforts in maximizing coding efficiency while dealing with:

    • diversification of network types
    • characteristic formatting and loss/error robustness requirements.

ITU Standard for VideoTelephony ISO-MPEG Standard

    • H.261, H.263, H.263+            • MPEG-1:medium quality, physical
                                        support
                                      • MPEG-2:medium/high             quality,
                                        physical and transmission support
                                      • MPEG-4:audio video objects




                                        Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                                                 Evolution (1/2)
                                6


                                5


                                4
                                                                                             MPEG-2




                       Mbit/s
MPEG-2 Introduction             3


                                2


                                1


                                0
                                    1994   1995   1996   1997   1998   1999   2000   2001   2002   2003   2004   2005


                                6


                                5


                                4                                                             MPEG-2
                                                                                              MPEG-4
                                                                                              H.263
                       Mbit/s




MPEG-4 in Comparison            3


                                2


                                1


                                0
                                    1994   1995   1996   1997   1998   1999   2000   2001   2002   2003   2004   2005


                                                  Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                                                  Evolution (2/2)
                                 6


                                 5

                                                                                              MPEG-2
                                 4                                                            MPEG-4
                                                                                              H.26L




                        Mbit/s
                                                                                              H.263

H.26L Provides Focus             3


                                 2


                                 1


                                 0
                                     1994   1995   1996   1997   1998   1999   2000   2001   2002   2003   2004   2005


                                 6


                                 5

                                                                                              MPEG-2
                                 4                                                            MPEG-4
                                                                                              H.26L
                        Mbit/s




                                                                                              H.263

MPEG-4 “Adopts” H.26L            3


                                 2


                                 1


                                 0
                                     1994   1995   1996   1997   1998   1999   2000   2001   2002   2003   2004   2005


                                                   Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                       Codec Defects

Blocking   Low   Original             High




                       Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                              Codec Defects

Packet Loss




              Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                    Video Coding History
• Early 1998: Started as ITU-T Q.6/SG16 (VCEG - Video Coding Experts
  Group) “H.26L” standardization activity

• August 1999: first draft design

• July 2001: MPEG open call for “AVC” technology: H.26L wins

• December 2001: Formation of the Joint Video Team (JVT) between VCEG
  and MPEG to finalize H.26L as a joint project similar to MPEG-2/H.262

• July 2002: Final Committee Draft status in MPEG

• March 2003: formal approval submission

• October 2004: final ITU-T and ISO approvation




                                      Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                               Versions




Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                JVT Project Technical Objectives
Primary technical objectives:

    • Significant improvement in coding efficiency: Average bit rate
      reduction of 50% compared to any other video standard
    • Network-friendly video representation for “conversational” (video
      telephony) and “non-conversational” (storage, broadcast or
      streaming) applications
    • Error resilient coding
    • Simple syntax specification and targeting simple and clean solutions

The scope of the standardization is only the central decoder, by:

    • imposing restrictions on the bitstream and syntax
    • defining the decoding process such that every conforming decoder
      will produce similar output with an encoded bitstream input
          Source
                          Pre-Processing                     Encoding

                         Post-Processing
                                                             Decoding
          Destination    & Error Recovery
                                                    Scope of the Standard
                                            Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                                            Applications
The new standard is designed for technical solutions including at least the
following application areas

   • Broadcast over cable, satellite, Cable Modem, DSL, terrestrial, etc.
   • Interactive or serial storage on optical and magnetic devices, DVD,
     etc.
   • Conversational services over ISDN, Ethernet, LAN, DSL, wireless and
     mobile networks, modems, etc. or mixtures of these.
   • Video-on-demand or multimedia streaming services over ISDN,
     Cable Modem, DSL, LAN, wireless networks, etc.
   • Multimedia Messaging Services (MMS) over ISDN, DSL, Ethernet,
     LAN, wireless and mobile networks, etc.



How to handle this variety of applications and networks?




                                       Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                                             H.264 Design
To address this need for flexibility and customizability, the H.264/AVC
design covers a:

   • Video Coding Layer (VCL): representation of the video content
     (performing all the classic signal processing tasks)

   • Network Abstraction Layer (NAL): adaptation of VCL representations
     in a manner appropriate for conveyance by a variety of transport
     layers or storage media
            Control Data




                                   Video Coding Layer
                                               Coded Macroblock

                                    Data Partitioning

                                               Coded Slice/Partition

                              Network Abstraction Layer

                  H.320    MP4FF        H323/IP              MPEG-2                    etc.


                                           Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                   Features enhancing coding efficiency (1)
Enhancements on picture encoding is enabled through value prediction
methods

   • Variable block-size motion compensation with small block sizes

   • Quarter-sample-accurate motion compensation

   • Multiple reference picture motion compensation

   • Weighted prediction

   • Directional spatial prediction for intra coding

   • In-the-loop deblocking filtering




                                        Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                  Features enhancing coding efficiency (2)
Enhancements on picture encoding is enabled through high performance
tools

   • Small block-size transform

   • Hierarchical block transform

   • Short word-length transform

   • Exact-match inverse transform

   • Arithmetic entropy coding

   • Context-adaptive entropy coding




                                       Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                 Features enhancing Robustness
Enhancements on Robustness to data errors/losses and flexibility for
operation over a variety of network environments is enabled by new design
aspects new

   • Parameter set structure

   • NAL unit syntax structure

   • Flexible slice size

   • Flexible macroblock ordering (FMO)

   • Redundant pictures

   • SP/SI synchronization/switching pictures




                                      Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                      Network Abstraction Layer

The Network Abstraction Layer (NAL) is designed in order to provide
"network friendliness“, facilitating the ability to map H.264/AVC VCL data
to transport layers such as

   • RTP/IP for any kind of real-time wire-line and wireless Internet
     services (conversational and streaming)
   • File formats, e.g. ISO MP4 for storage and MMS
   • H.32X for wireline and wireless conversational services
   • MPEG-2 systems for broadcasting services, etc.

Some key concepts of the Network Abstraction Layer are:

   • NAL Units
   • Use of NAL Units in:
        Byte stream format systems
        Packet-Transport systems
   • Parameter Sets
   • Access Units


                                      Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                                                    NAL Units
The coded video data is organized into NAL units, each of which is
effectively a packet that contains an integer number of bytes

NAL units are classified into

    • VCL NAL units: contain the data associated to the video pictures
    • non-VCL NAL units: contain any associated additional information


Header byte: first byte of each NAL unit; contains an indication of the type
of data in the NAL unit, and the remaining bytes contain payload data of
the type indicated by the header

Emulation Prevention Bytes: bytes inserted in the payload data to prevent
the accidentally generation of a particular pattern of data called a start
code prefix

NAL unit stream: series of NAL units generated by an encoder


                                        Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                               Use of NAL Units
Bitstream-oriented transport systems (H.320, MPEG-2 systems)

   • Delivery of the entire or partial NAL unit stream as an ordered
     stream of bytes or bits  the locations of NAL unit boundaries need
     to be identifiable


   • In the byte stream format, each NAL unit is prefixed by a specific
     pattern of three bytes called a start code prefix

Packet-oriented transport systems (IP, RTP systems)

   • The coded data is carried in packets that are framed by the system
     transport protocol  the boundaries of NAL units within the packets
     can be established without use of start code prefix patterns


   • The NAL units can be carried in data packets without start code
     prefixes.

                                      Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                                         Parameters Set
A Parameter Set contains information that is expected to rarely change.

Types of parameter sets:

   • sequence parameter sets: relative to a series of consecutive coded
     video pictures (coded video sequence)
   • picture parameter sets: relative to one or more individual pictures.

Parameter sets can be sent

   •   One time        (ahead the VCL NAL Units)
   •   Many time       (to provide robustness)
   •   In-band         (same VCL NAL Unit Channel)
   •   Out-of-Band     (different Channel)




                             Out-of-band Transmission
                                            Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                                              Access Units
A set of NAL units in a specified form is referred to as an Access Unit.
The decoding of each access unit results in one decoded picture.

It can be composed by:

    • access unit delimiter: to aid in locating the start
      of the access unit.
    • supplemental enhancement information (SEI):
      containing data such as picture timing information
    • primary coded picture: set of VCL NAL units that
      represent the samples of the video picture.
    • redundant coded pictures: for use by a decoder
      in recovering from loss or corruption
    • end of sequence: if the coded picture is the last
      picture of a coded video sequence
    • end of stream: if the coded picture is the last
      coded picture in the entire NAL unit stream



                                         Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                                       Video Coding Layer
The VCL design follows the so-called block-based hybrid video coding
approach
                                 Coder
 Input Video                     Control
    Signal
                                                                              Control Data
                                Transform/                                     Quant.
                   -           Scal./Quant.                                 Transf. coeffs
                    Decoder                   Scaling & Inv.
  Split into                                    Transform
 Macroblocks                                                                                          Entropy
 16x16 pixels                                                                                         Coding
                              Intra-frame     De-blocking
                               Prediction        Filter
                                                                            Output Video
                                Motion-                                        Signal
                Intra/Inter   Compensation

                                                                            Motion Data
                                  Motion
                                Estimation


There is no single coding element in the VCL that provides the majority of
the significant improvement in compression efficiency in relation to prior
video coding standards.
                                                    Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                             Video Coding Layer
• The picture is split into blocks.

• Intra coded of the first picture or a
  random access point

• Inter coding for all remaining pictures
  or between random access points

• Transmission of the motion data
  as side information

• Transform of the residual
  of the prediction (Intra or Inter)

• Quantization of the transform coefficients

• Entropy coding and transmission of the
  quantized transform coefficients, together with the side information


                                          Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                       Pictures, frames, and fields
A coded pictures can represent either an entire frame or a single field

A frame of video can be considered to contain two interleaved fields

    • interlaced frame: the two fields of a frame were captured at different
      time instants
    • progressive frame

The coding representation in H.264/AVC is primarily agnostic with respect
to this video characteristic




                                        Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                      Adaptive frame/field coding operation
In interlaced frames with regions of moving objects, two adjacent rows
tend to show a reduced degree of statistical dependency

H.264/AVC design allows any of the following decisions for coding a frame:

   • Frame mode: combine the two fields together
   • Field mode: not combine the two fields together

The choice can be made adaptively for each frame and is referred to as
Picture Adaptive Frame/Field (PAFF) coding

Field mode:

   • Motion compensation utilizes reference fields
   • De-blocking filter is not used for horizontal edges of macroblocks

               Moving region         field mode

               Non-moving region  frame mode

                                       Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                                                  Sampling
                            Y is called luma, and represents brightness.

YCbCr color space
                            Cb and Cr are called chroma, and represent
                            the deviation from gray toward blue and red


H.264/AVC uses a sampling structure called 4:2:0 sampling with 8 bits of
precision per sample

The chroma component has one fourth of the number of samples than the
luma component (in both the horizontal and vertical dimensions)




                                     Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                Macroblocks and Slices
Fixed-size macroblocks partition with 16x16 samples of the luma
component and 8x8 samples of each of the two chroma components.

Slices are a sequence of macroblocks which are processed in the order of a
raster scan when not using Flexible Macroblock Ordering (FMO).

A picture is a collection of one or more slices in H.264/AVC.




Indipendency  Each slice can be correctly decoded without use of data
from other slices.



                                        Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                            Flexible Macroblock Ordering
FMO uses the concept of slice groups


A set of macroblocks defined by a macroblock to slice group map, specified
in the picture parameter set


A slice is a sequence of macroblocks within the same slice group




   Useful for concealment in video conferencing applications


                                                 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                                                Slice Types
I slice: a slice in which all macroblocks of the slice are coded using intra
prediction  is coded exploiting only the spatial correlation

P slice: In addition to the coding types of the I slice, some macroblocks of
the P slice can also be coded using inter prediction with backward
references (I or P slices)

B slice: In addition to the coding types available in a P, some macroblocks
of the B slice can also be coded using inter prediction with forward
references (I, P or B slices)

The following two coding types for slices are new:

SP slice: a slice that is coded such that efficient switching between
different pre-coded pictures becomes possible

SI slice: a slice that allows an exact match of a macroblock in an SP slice
for random access and error recovery purposes


                                        Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                  Motivation for SP and SI slices
The best-effort nature of today’s networks causes variations of the
effective bandwidth available to a user

For Video Streaming, the server should adjusting, on the fly, source
encoding parameters

Representation of each sequence using multiple and independent streams



Prior video encoding standards
Switching is possible only at I-frames.

H.264
Identical SP-frames can be obtained
even when they are predicted using
different reference frames.



                                          Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                  Intra-frame Prediction
In all slice-coding types, the following types of intra coding are supported

    • Intra_4x4 with chroma prediction: areas of a picture with significant
      detail
    • Intra_16x16 with chroma prediction : very smooth areas of a picture
    • I_PCM: values of anomalous picture content (accurately
      representation)

Intra prediction in H.264/AVC is always conducted in the spatial domain

IDR: picture composed of slice I only

    • can be decoded without any reference
    • no subsequent picture in the stream will require reference to
      pictures prior to IDR

Chroma samples: similar prediction technique as for the luma component
in Intra_16x16 macroblocks


                                         Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                                               Intra_4x4 mode
                      M   A   B   C   D E F G H
                      I   a   b   c   d
                      J   e   f   g   h
                      K   i   j   k   l
                      L   m   n   o   p
                  Labelling of prediction samples(4x4)


   0 (vertical)                                          1 (horizontal)

MA B C DE F GH                                        MA B CDE F GH
I                                                     I
J                                                     J
K                                                     K
L                                                     L



   2 (DC)                                         3 (diagonal down-left)

MA B C DE F GH                                      MA B CD E F GH
I                                                   I
J (A+B+C+D+
  I+J+K+L)/8
                                                    J
K                                                   K
L                                                   L



                                                    Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                                                 Intra_4x4 mode
                          M   A   B   C   D E F G H
                          I   a   b   c   d
                          J   e   f   g   h
                          K   i   j   k   l
                          L   m   n   o   p
                  Labelling of prediction samples(4x4)


4 (diagonal down-right)                                 5 (vertical-right)

 MA B CD E F GH                                          M A B C D E F GH
 I                                                       I
 J                                                       J
 K                                                       K
 L                                                       L



 6 (horizontal-down)                                      7 (vertical-left)

 MA B C DE F GH                                         MA B CD E F GH
 I                                                      I
 J                                                      J
 K                                                      K
 L                                                      L




                                                      Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                                              Intra_4x4 mode
                     M A B C D E F G H
                      I   a   b   c   d
                      J   e   f   g   h
                      K   i   j   k   l
                      L   m   n   o   p
                 Labelling of prediction samples( 4x4)


 8 (horizontal-up)

MA B C DE F GH
I
J
K
L




When samples E-H are not available, they are replaced by D




                                                   Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                   Intra_16x16 and I_PCM mode

Intra_16x16
    • mode 0: vertical
    • Mode 1: horizontal
    • mode 2: DC
    • Mode 3: plane (a linear “plane” function is fitted to the upper and
      left-hand samples  in areas of smoothly-varying luminance)




I_PCM  sends directly the values of the encoded samples, to to precisely
represent them


                                        Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                           Inter-frame Prediction in P Slices
Partitions with luma block sizes of 16x16, 16x8, 8x16, and 8x8 samples
are supported by the syntax.

In case partitions with 8x8 samples are chosen, one additional syntax
element for each 8x8 partition is transmitted.

The prediction signal is specified by
   • a translational motion vector
   • a picture reference index  a maximum of sixteen motion vectors
      may be transmitted for a single P macroblock.




                                    Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                                                         Algorithm
The encoder selects the “best” partition size for each part of the frame, to
minimize the coded residual and motion vectors.

The macroblock partitions chosen for each area are shown superimposed
on the residual frame.
    • little change between the frames (residual appears grey)  a 16x16
      partition is chosen
    • detailed motion (residual appears black or white)  smaller
      partitions are more efficient.




                        Residual (no motion compensation)

                                             Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                                       Effects (1/2)




            Frame Fn                                    Frame Fn-1




Residual (no motion compensation)           Residual (16x16 bock size)

                                    Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                                          Effects (2/2)




     Residual (8x8 bock size)                    Residual (4x4 bock size)




Residual (4x4 bock size; half pixel)   Residual (4x4 bock size; quarter pixel)

                                       Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                                              Example (1/2)




                 Frame Fn                        Reconstructed reference Frame F’n-1




Residual Fn – F’n-1 (no motion compensation)           16x16 Motion Vector Field

                                               Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                                     Example (2/2)




Motion compensation reference frame     Motion compensation residual frame




                                      Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                    Motion Estimation Accuracy
The accuracy of motion compensation is in units of one quarter of the
distance between luma samples.

Integer-sample position  the prediction                 signal          consists             of      the
corresponding samples of the reference picture

Non integer-sample position  the corresponding sample is obtained using
interpolation to generate non-integer positions.


The prediction values at half-sample positions are obtained by applying a
one-dimensional 6-tap FIR Wiener filter horizontally and vertically




                                      Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                 Motion Estimation Accuracy
• Half sample positions (aa, bb, b, s, gg, hh and cc, dd, h, m, ee, ff)
  are derived by first calculating intermediate values

     Ex.:      b1 = ( E – 5 F + 20 G + 20 H – 5 I + J )
               h1 = ( A – 5 C + 20 G + 20 M – 5 R + T )


               b = (b1 + 16) >>5
               h = (h1 + 16) >> 5


• Position j
               j1 = cc1 – 5 dd1 + 20 h1 + 20 m1 – 5 ee1 + ff1


               j = ( j1 + 512) >> 10

• Quarter sample positions (a, c, d, n, f, i, k, q) are derived by
  averaging with upward rounding of the two nearest samples at
  integer and half sample positions

      Ex.: a = ( G + b + 1 ) >> 1

• Quarter sample positions (e, g, p, r) are derived by averaging with
  upward rounding of the two nearest samples at half sample
  positions in the diagonal direction as, for example, by

      Ex.: e = ( b + h + 1 ) >> 1



                   Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                        Motion Estimation Accuracy

The prediction values for the chroma component are always obtained by
bi-linear interpolation.

For chroma the resolution is halved (4:2:0) therefore the motion
compensation precision is down to one-eighth pixel  ¼ pixels accuracy




                         a = round{[(8-dx)·(8-dx)·A]+dx·(8-dy)·B+(8-dx)·dy · C+dx · dy · D ]/64}

                         Ex.:       a = round[(30A+10B+18C+6D)/64]




                                           Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                Multi-picture Prediction
Multi-picture motion compensation using previously-encoded pictures as
references allows up to 32 reference pictures to be used in some cases


Very significant bit rate reduction for scenes with
    • rapid repetitive flashing
    • back-and-forth scene cuts
    • uncovered background areas




                                         Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                              Inter-frame Prediction in B Slices
The concept of B slices is generalized in H.264/AVC

Other pictures can reference pictures containing B slices for motion-
compensated prediction

Some macroblocks or blocks may use a weighted average of two distinct
motion-compensated prediction values for building the prediction signal

In B slices, four different types of inter-picture prediction are supported:
    • list 0 (backward)
    • list 1 (forward)
    • bi-predictive: weighted average of motion-compensated list 0 and
       list 1 prediction signals
    • direct prediction: inferred from previously transmitted syntax
       elements

It is also possible to have both motion predictions from past, or both
motion predictions from future.


                                        Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                              Transform: types
Each residual macroblock is transformed, quantized and coded

H.264 uses a smaller size transform

H.264 uses three transforms depending on the type of residual data that
has to be coded

   • a 4x4 transform for the luma DC coefficients in Intra_16x16
     macroblocks
   • a 2x2 transform for the chroma DC coefficients
   • a 4x4 transform for all other blocks


Adaptive block size transform mode

Further transforms are (eventually) chosen depending on the motion
compensation block size (4x8, 8x4, 8x8, 16x8, etc)



                                      Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                              Transform: Order
 For a 16x16 Intra mode coded Macroblock

     •   “-1” Block       DC coefficient of each 4x4 luma block
     •   “0-15” Blocks    Luma residual blocks
     •   “16-17” Blocks   DC coefficients from the Cb and Cr components
     •   “18-25” Bloks    Chroma residual blocks


Coding of smooth areas




                                      Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                4x4 residual transform
This transform operates on 4x4 blocks of residual data (labelled 0-15 and
18-25)

Fundamental differences from DCT transform:

   • integer transform
   • fully specified inverse transform  mismatch between encoders and
     decoders should not occur
   • requires only additions and shifts
   • scaling multiplication integrated into the quantizer
   • transform and quantization can be carried out using 16-bit integer
     arithmetic

Reasons

   • better prediction method  less spatial correlation in the residual
   • visual benefits: less noise around edges
   • less computations and a smaller processing wordlength


                                       Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                               Transform: Order
The 4x4 DCT of an input array X is given by:




                               d=c/b




• CXCT  “core” 2-D transform
• E     matrix of scaling factors

                                       Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                             Transform: Order
To simplify the implementation of the transform, while keeping it
orthogonal, the parameters must be modified




The inverse transform is given by:




                                     Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                                          Quantization
Requirements

   • avoid division and/or floating point arithmetic
   • incorporate the post- and pre-scaling matrices Ef and Ei described
     above




A total of 52 values of Qstep are supported by the standard and these are
indexed by a Quantization Parameter, QP.

   • Qstep doubles in size for every increment of 6 in QP
   • Qstep increases by 12.5% for each increment of 1 in QP
   • QP may be different for luma and chroma (an offset can be signalled
     in a Picture Parameter Set)




                                      Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                        DC Components Transform
Luma DC coefficient (Intra_4x4 block)



   • 4x4 Hadamard transform:




Chroma DC coefficient (any block)




   • 2x2 Hadamard transform:




                                        Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                                                    Scanning
The quantized transform coefficients of a block generally are scanned in a
zig-zag fashion and transmitted using entropy coding methods




The 2x2 DC coefficients of the chroma component are scanned in raster-
scan order and transmitted using entropy coding methods




                                      Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                                       De-Blocking Filter
Benefits of the application to every decoded macroblock:

    • reduced blocking distortion block  improved image appearance
    • better motion-compensated prediction of further frames

Filtering is applied to vertical or horizontal edges of 4x4 blocks:

    1)   Filter   4   vertical boundaries of   the luma component (a,b,c,d)
    2)   Filter   4   horizontal boundaries    of the luma component (e,f,g,h)
    3)   Filter   2   vertical boundaries of   each chroma component (i,j)
    4)   Filter   2   horizontal boundaries    of each chroma component (k,l)




                                                 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                                                        Filtering Choice
The choice of filtering outcome depends on

      • boundary strength
      • gradient across the boundary




 BS   Rule

 4    p or q is intra coded and boundary is a macroblock boundary

 3    p or q is intra coded and boundary is not a macroblock boundary

 2    neither p or q is intra coded; p or q contain coded coefficients
 1    neither p or q is intra coded; neither p or q contain coded coefficients;
      p and q have different reference frames or a different number of reference frames or different motion
      vector values
 0    neither p or q is intra coded; neither p or q contain coded coefficients;
      p and q have same reference frame and identical motion vectors  no filtering

The filter is “stronger” where there is likely to be significant blocking
distortion (high values of BS)
                                                            Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                                        Filter decision
A group of samples is filtered only if BS > 0 and the following conditions is
satisfied:

    • |p0-q0| < (QP)
    • |p1-p0| < (QP)
    • |q1-q0| < (QP)     with (QP)< (QP)




Small QP
Anything other than a very small gradient across the boundary is likely to
be due to image features  low (QP) and (QP) value

Large QP
blocking distortion is likely to be more significant  high (QP) and (QP)
value, so that more filtering takes place.

                                        Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                    Filter implementation
I. BS=1,2,3

   • A 4-tap linear filter is applied with inputs p1, p0, q0 and q1, producing
     filtered outputs P0 and Q0
   • For luma only
        o If |p2-p0| < (QP), a 4-tap linear filter is applied with inputs p2,
          p1, p0 and q0, producing filtered output P1
        o If |q2-q0| < (QP), a 4-tap linear filter is applied with inputs q2,
          q1, q0 and p0, producing filtered output Q1
II. BS=4

   If |p2-p0|< (QP) and |p0-q0|< (QP) /4
        • P0 is produced by 5-tap filtering of p2, p1, p0, q0 and q1
        • P1 is produced by 4-tap filtering of p2, p1, p0 and q0
        • Luma  P2 is produced by 5-tap filtering of p3, p2, p1, p0 and q0
   else:
        • P0 is produced by 3-tap filtering of p1, p0 and q1

   (the same for qi pixels)
                                        Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                               De-blocking: Example




                 Reconstructed Frame; QP=32; No Filter       Reconstructed Frame; QP=32; With Filter




Original Frame




                 Reconstructed Frame; QP=36; No Filter       Reconstructed Frame; QP=36; With Filter

                                   Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                                     Entropy Coding
Parameters that require to be encoded and transmitted

   •   Macroblock Type (Prediction method)
   •   QP
   •   Motion data: Reference frame and Motion Vector
   •   Coded block pattern (blocks containing coded coefficients)
   •   Residual Data


H.264/MPEG-4 AVC uses a number of techniques for entropy coding:

   • Exp-Golomb codes

   All syntax elements except the quantized transform coefficients.

   • Context Adaptive Variable Length Coding (CAVLC)

   Quantized transform coefficients

   • Context Adaptive Binary Arithmetic Coding (CABAC).
                                        Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                          Exp-Golomb Codes
Exp-Golomb codes (Exponential Golomb codes) are variable length codes
with a regular construction.

   • uses a single infinite-extent codeword table
   • only the mapping to the single codeword table is customized
     according to the data statistics

Each codeword is constructed as follows:

       [M zeros][1][INFO]             codeword length  (2M+1) bits


                                            Encoding
                                 M = round[log2(code_num+1)]
                                 INFO = code_num+1-2M


                                          Decoding
                                 Code_num = 2M +INFO-1
                                      Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                 Exp-Golomb Codes: Value mapping
Each element v is assigned a type reflecting how data is to be mapped

   • Unsigned Exponential  ue(v)
          Maping: code_num = v
          Used for macroblock type, reference frame index


   • Signed Exponential           se(v)
          Mapping
             o code_num = 2|v|        (v < 0)
             o code_num = 2|v| - 1 (v > 0)
          Used for motion vector difference, QP.



   • Mapped Exponential           me(v)
          Mapping specified in the standard
          Used for the Coded block patterns.

                                                             Table   for    Inter   predicted     macroblocks:
                                                             coded_block_pattern indicates which 8x8 blocks in
                                                             a macroblock contain non-zero coefficients
   • Truncated Exponential  te(v)

Each mapping is designed to produce short codewords for frequently
occurring values and longer codewords for less common parameter values.
                                                Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                                                            CAVLC
Method for coding residual, zig-zag ordered transformed blocks

VLC tables for various syntax elements are switched depending on already
transmitted syntax elements

CAVLC takes advantage of several characteristics of quantized 4x4 blocks:

   • Sparse blocks (containing mostly zeros)

   • Highest non-zero coefficients after the zig-zag scan are often
     sequences of +/-1 (“Trailing 1s” or “T1s”)

   • The number of non-zero coefficients in neighbouring blocks is
     correlated

   • The level (magnitude) of non-zero coefficients tends to be higher at
     the start of the reordered array (near the DC coefficient) and lower
     towards the higher frequencies


                                       Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                                  CAVLC encoding
CAVLC encoding of a block of transform coefficients proceeds as follows:


   1. Encode the number of coefficients and trailing ones in a 4x4 block


   2. Encode the sign of each T1


   3. Encode the levels of the remaining non-zero coefficients


   4. Encode the total number of zeros before the last coefficient


   5. Encode each run of zeros




                                       Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                               CAVLC: Steps (1/3)
Step 1: Encode the number of coefficients (TotalCoeffs) and T1s

    • TotalCoeffs range from 0 to 16 (in a 4x4 block)
    • T1s range from 0 to 3 (max value allowed)

Coding: 4 choices for look-up table (coeff_token)

    •   Num_VCL0             biased towards small numbers of coefficients
    •   Num_VCL1             biased towards medium numbers of coefficients
    •   Num_VCL2             biased towards higher numbers of coefficients
    •   Num_FLC              fixed 6-bit length code

The choice depends on the number of non-zero coefficients in upper and
left-hand previously coded blocks NU and NL (context adaptivity)

•   U,L available            N=(NU+NL)/2
•   U available              N=NU
•   L available              N=NL
•   U,L unavailable          N=0

                                             Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                            CAVLC: Steps (2/3)
Step 2: Encode the sign of each T1.

For each T1 up to three, a single bit encodes the sign (0=+, 1=-).
Encoding is in reverse order, starting with the highest-frequency T1

Step 3: Encode the levels of the remaining non-zero coefficients.

The level (sign and magnitude) of each remaining non-zero coefficient in
the block is encoded in reverse order

Encoding VLC table depends on successive coded level (context adaptivity)

1. Initialise the table to Level_VLC0

2. Encode highest-frequency coefficient

3. If magnitude is larger than a
   threshold, move up to the next VLC
   table
                                          Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                          CAVLC: Steps (2/3)
Step 4: Encode the total number of zeros before the last coefficient

Coding with a VLC of the number of all zeros preceding the highest non-
zero coefficient.

Step 5:Encode each run of zeros

The number of zeros preceding each non-zero coefficient (run_before) is
encoded in reverse order.

The VLC for each run of zeros is chosen depending on

   • the number of zeros that have not yet been encoded (ZerosLeft)
   • value of run_before parameter.




                                        Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                CAVLC: example (1/2)

4x4 Block

Reordered Block: 0,3,0,1,-1,-1,0,1,0,0,0,0,0,0,0,0
TotalCoeff = 5;        TotalZeros=3;           T1s =3 (max value)

Encoding

     1
     2




     3


     4
     5




Transmitted bitstream  000010001110010111101101 (24 bits)
                                      Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                  CAVLC: example (2/2)
Decoding

Input string:   000010001110010111101101
Values added to the output array at each stage are underlined




The decoder has inserted two zeros; however, TotalZeros is equal to 3 and
so another 1 zero is inserted before the lowest coefficient, making the final
output array: 0,3,0,1,-1,-1,0,1,…
                                        Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                                                             CABAC
CABAC uses arithmetic coding


effective use of probability models of occurrence of symbols

Particularly beneficial for symbol probabilities greater than 0.5

The use of adaptive codes permits adaptation to non-stationary symbol
statistics

easily adapt to changing statistical characteristics of the data to be coded

Context modelling: The statistics of already-coded syntax elements are
used to estimate the conditional probabilities


CABAC provides a reduction in bit-rate between 5% to 15% over CAVLC,
when coding TV signals at the same quality.


                                         Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                                        CABAC basis
Encoding with CABAC consists of three stages

   • Binarization
     needed for syntax elements that are non-binary valued

   • Context modeling
     a model is selected such that the choice may depend on previous
     encoded syntax elements or bins

   • Adaptive binary arithmetic coding
     coding of bin sequence with updating of probabilities takes place




                                      Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                                          Binarization
Requirements for a successful context coding method

   • fast and accurate estimation of conditional probabilities
   • the computational complexity must be kept at a minimum


“Pre-processing” step to reduce the alphabet size of the syntax elements

The design of binarization schemes in CABAC (mostly) relies on a few basic
code trees

   • Unary or Truncated Unary code
     x>0        x “1” bits plus a terminating “0” bit (unary)
     0<x<S  for x=S the terminating “0” bit is neglected (truncated)
   • Exp-Golomb code
     codes are constructed by a concatenation of a prefix and a suffix
     code word
   • Fixed-length code
     a finite alphabet of values of the corresponding syntax element is
     assumed.
                                       Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                                      Context Modeling
A model probability distribution is assigned to the given symbols, to drive
the actual coding engine to generate a sequence of bits as a coded
representation of the symbols

   • Define a modeling function F:T  C operating on a set T of past
     symbols
   • For each symbol x to be coded, a conditional probability p(x|F(z)) is
     estimated according to the already coded neighboring symbols z  T
   • After encoding x, the probability model is updated with the value of
     the encoded symbol x




             Context template consisting of two neighboring syntax elements
             A and B to the left and on top of the current syntax element C


                                                   Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                       Profiles and Levels
Profile
Defines a set of coding tools or algorithms that can be used in generating a
conforming bit-stream

Level
Places constraints on certain key parameters of the bitstream

Decoders conforming to a profile must support all features in that profile

In H.264/AVC, three profiles are defined (plus extensions)
    • Baseline
    • Main
    • Extended Profile
    • Fidelity Range Extensions: four HighProfile versions (High, High 10,
      High 4:2:2, and High 4:4:4), for high quality uses

Fifteen levels are defined specifying upper limits for
     • picture size                        • video bit rate
     • decoder-processing rate             • video buffer size
     • size of the multi-picture buffers
                                        Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                                         Profile features
Baseline profile supports all features in H.264/AVC except the following
two feature sets:

   • Set 1
       o   B slices
       o   Weighted prediction
       o   CABAC
       o   Field coding
       o   Picture or macroblock adaptive switching between frame and field coding


   • Set 2
       o SP/SI slices


Main profile supports Set 1, and does not support the FMO and redundant
pictures features

Extended Profile supports all features except for CABAC.




                                                Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                     Profile Map




Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                                    Application Areas
Conversational services
   • operate typically below 1 Mbps; low latency requirements
   • Baseline profile for following application
       •   H.320 conversational services with circuit-switched ISDN-based video conferencing
       •   3GPP conversational H.324/M services
       •   H.323 conversational services over the Internet with best effort IP/RTP protocols.
       •   3GPP conversational services using IP/RTP for transport and SIP for session set-up.


Entertainment video applications
    • operate between 1-8 Mbps; moderate latency (0.5 to 2 seconds)
    • Main profile for following application
       • Broadcast via satellite, cable, terrestrial, or DSL
       • DVD for standard and high-definition video
       • Video on demand via various channels.


Streaming services
    • operate at 50 kbps–1.5 Mbps; latency of 2 or more seconds
    • Baseline or Extended profile; differences based on the use for wired
      or wireless environments as follows:
       • 3GPP streaming using IP/RTP for transport and RTSP Baseline profile
       • Streaming over the wired Internet using IP/RTP protocol and RTSP for session set-
         up  Extended profile
                                                 Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                     Differences




Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                Complexity of Codec Design
Codec design includes relaxation of traditional bounds on complexity
(memory & computation) – rough guess 2-3x decoding power increase
relative to MPEG-2, 3-4x encoding

Problem areas:
    • Smaller block sizes for motion compensation (cache access issues)
    • Longer filters for motion compensation (more memory access)
    • Multi-frame motion compensation (more memory for reference
      frame storage)
    • More segmentations of macroblock to choose from (more searching
      in the encoder)
    • More methods of predicting intra data (more searching)
    • Arithmetic coding (adaptivity, computation on output bits)




                                      Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                            Comparison (1/2)
                            Foreman QCIF 10Hz
              39

              38
              37

              36
                                                                      JVT/H.264/AVC
              35
Y-PSNR [dB]




                                                                      MPEG-4
              34
  Quality




                                                                      MPEG-2
              33
                                                                      H.263
              32

              31

              30
              29

              28
              27


                   0   50        100            150                  200                  250

                                  Bit-rate [kbit/s]




                                           Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                               Comparison (2/2)
                                Tempete CIF 30Hz
              38
              37
              36
              35
              34
                                                                         JVT/H.264/AVC
Y-PSNR [dB]




              33
  Quality




                                                                         MPEG-4
              32
                                                                         MPEG-2
              31
              30                                                         H.263

              29
              28
              27
              26
              25

                   0   500   1000    1500       2000           2500           3000           3500

                                      Bit-rate [kbit/s]



                                              Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                 Test Set Results for Perceptual Quality
Informal perceptual tests

At the same PSNR, people generally prefer JVT

   • Small motion compensation block size (breaks up block structure)
   • Small transform block size (breaks up block structure, reduces
     ringing)
   • In-loop deblocking filter

By how much?

   • Needs further study
   • No rigorous testing reported
   • 10-15% might be a good guess




                                      Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC
                                                                            References
IEEE Transactions on Circuits and Systems for Video Technology (2003):
Special Issue on the H.264/AVC Video Coding Standard


Signal Processing: Image Communication (2004):
Video coding using the H.264/MPEG-4 AVC compression standard


http://www.vcodex.com/h264.html




                                      Università degli studi Roma Tre – Introduzione alla codifica H.264/AVC

								
To top