    Participation:      Companies and lnstltutions having contributed                           Schedule. ‘The schedule          of MPEC;
                           an MPEC video Proposal                                               was derived with the goal of obtain-
                                                                                                ing a draft of the standard          (Com-
I            Company                       countrv                    Proposer              I   mittee Dratt) by the end of 1990.
I              AT&T                          USA                         AT&T               I   Although
                                                                                                              the amount of work was
                                                                                                                  and staying on sched-
I            Bellcore                        USA                       Bellcore             I   ule meant         many     meetings,     the
                Intel                        USA                       Bellcore                 members        of MPEG-Video           werr
                                                                                                able to reach an agreement             on a
                GCT                         Japan                      Bellcore                 Draft in September         1990. The con-
I          c-cube    Micro                   USA                    C-Cube   Micro.         I
                                                                                                tent of the draft has been “frozen”
                                                                                                since then,       indicating    that only
I               DEC                          USA                         DEC                I   minor changes will be accepted, i.e.,

I         France T&corn                    France               France TeleCOm              I
                                                                                                editorial changes and changes only
                                                                                                meant to correct demonstrated             in-
I          Cost 211 Bis                      EUR                France Telecom              I   accuracies.     Figure I illustrates     the
                                                                                                MPEG schedule for the competitive
                IBM                          USA                          IBM
                                                                                                and convergence         phases.
             JVC Carp                       Japan                     JVC COrp
                                                                                                MPEC-Video Reclulrements
I         Matsushita     EIC                JaIlaIl                 Matsushita    EIC       I   A Generic      Standard

I          Mitsubishi    EC                 Japan                   Mltsublshi    EC        I   Because of the various          segments of
                                                                                                the information        processing       indus-
I           NEC Corp.                       Japan                     NEC Corp.             I   try represented      in the IS0 commit-
                                                                                                tee, a representation         for video on
                                                                                                digital storage media has to support
                                                                                                many     applications.        This     is ex-
                                                                                                pressed    by saying that the MPEG
                                                                                                standard    is a genetic standard.          Ge-
                                                                                                neric means that the standard                  is
                                                                                                independent       of a particular        appli-
                                                                                                cation; it does not mean however,
                                                                                                that it ignores the requirements              of
                                                                                                the applications.       A generic         stan-

I       Storage Media and Channels where MPEC could have

                                                                                                dard possesses features that make it
                                                                                                somewhat       universal--e.g.,
                                                                                                lows the toolkit approach;
                                                                                                                                       it fol-
                                                                                                                                      it does
                                                                                                not mean that all the features              are
                          DAT                                                                   used all the time for all applica-
                                                                                                tions, which would result in dra-
                          Winchester      Disk                                                  matic inefficiency.        In MPEG, the
                          wrltable     Optical   Disks                                          requirements       on the video com-
                                                                                                pression    algorithm       have been de-
I                         ISDN                                                              I   rived directly from the likely appli-

I                         LAN                                                               I   cations of the standard.
                                                                                                    Many     applications        have     been
                          other      Communication       Channels                               proposed     based on the assumption
                                                                                                that an acceptable        quality of video
                                                                                   Wmmetrlc     AppilWlOn~           of
                                                                                         DIgItal Video
     September 188% Proposal Regis~atlon
                                                                                       EleCtrOnlC PubllShlng
                                                                                         Education and Training
     October 1999: Subjective Test                                                       Travel Guidance
                                                                                         Point of Sale
     March 19go: DeUnlUon of video Algorithm
                                                                                       EnteItalnttIent ImOVIeS)
                 (Simulation Model 1)                       Convergenca

     September Isso:   Draft Proposal                 J

                                                                                   SvmmetrlC       A~llfdons        of
                                         advantages      of the other media
                                                                                             Mgltal video
                                         (recordability,      random    acces-           EleCtrOnlC PubllShlng
MPEC Schedule for the Competl-
tlVe and COnvergenCe Phases              sability, portability and low cost).              l~roduction)
                                            The compressed bit rate of 1.5               Video Mall
can be obtained   for a bandwidth of     Mbits is also perfectly suitable to             Video Conferenclng
about 1.5 Mbits/second (including        computer and telecommunication
audio). We shall review some of          networks and the combination of
these applications because they put      digital storage and networking can
constraints   on the compression         be at the origin of many new appli-     eration of material for playback-
technique that go beyond those           cations from video on Local area        only applications:    (desktop video
required of a videotelephone or a        networks (LANs) to distribution of      publishing); another class involves
videocassette recorder (VCR). The        video over telephone lines [I].         the use of telecommunication      ei-
challenge of MPEG was to identify                                                ther in the form of electronic mail
those constraints and to design an       Asymmetric Applications. In order       or in the form of interactive face-
algorithm that can flexibly accom-       to find a taxonomy of applications      to-face applications. Table 4 shows
modate them.                             of digital video compression,   the     the symmetric applications of digi-
                                         distinction between symmetric and       tal video.
Applications  Of COmpreSSed    Video     asymmetric     applications is most
On Dlgital Storage Media                 useful. Asymmetric applications are     Features of the Video
Digital Storage Media. Many star-        those that require frequent use of      COmpreSsiOn Algorithm
age media and telecommunication          the decompression process, but for      The requirements       for compressed
channels are perfectly suited to a       which the compression process is        video on digital storage media
video compression     technique tar-     performed once and for all at the       (DSM) have a natural impact on the
geted at the rate of 1 to 1.5 Mbits/s    production of the program. Among        solution. The compression         algo-
(see Table 2). CD-ROM is a very          asymmetric applications, one could      rithm must have features that make
important storage medium because         find an additional subdivision into     it possible to fulfill all the require-
of its large capacity and low cost.      electronic publishing, video games      ments. The following features have
Digital audio tape (DAT) is also         and delivery of movies. Table 3         been identified      as important     in
perfectly suitable to compressed         shows the asymmetric applications       order fo meet the need of the appli-
video; the recordability of the me-      of digital video.                       cations of MPEC.
dium is a plus, but its sequential
nature is a major drawback when          Symmetric Applications. Symmetric       Random Access. Random access is
random access is required. Win-          applications    require   essentially   an essential feature for video on a
chester-type   computer disks pro-       equal use of the compression and        storage medium whether or not the
vide a maximum          of flexibility   the decompression process. In sym-      medium is a random access me-
(recordability, random access) but       metric applications there is always     dium such as a CD or a magnetic
at a significantly higher cost and       production   of video information       disk, or a sequential medium such
limited portability. Writable optical    either via a camera (video mail,        as a magnetic tape. Random access
disks are expected to play a signiti-    videotelephone)   or by editing pre-    requires that a compressed video
cant role in the future because they     recorded material. One major class      bit stream be accessible in its middle
have the potential to combine the        of symmetric application is the gen-    and any      frame     of video     be
decodable in a limited amount of           Coding/Decoding      Delay. As men-
time. Random access implies the            tioned previously, applications such
existence of access points, i.e., seg-     as videotelephony    need to maintain
ments of information     coded only        the total system delay under 150 ms
with reference    to themselves.    A      in order to maintain the converse-
random      access time of about
second should be achievable with-
                                     112   tional, “face-to-face” nature of the
                                           application.    On the other hand,         The requirements on
                                                                                      the MPEG video com-
out significant quality degradation.       publishing applications could con-
                                           tent themselves with fairly long
Fast FommrdlReverse     Searches.   De-    encoding delays and strive to main-
pending on the storage media, it
should be possible to scan a com-
                                           tain the total decoding delay below
                                           the “interactive threshold” of about       pression algorithm
pressed bit stream (possibly with
the help of an application-specific
                                           one second. Since quality and delay
                                           can be traded-off to a certain ex-         have been derived
                                                                                      directly from
directory structure) and, using the        tent, the algorithm should perform
appropriate   access points, display       well over the range of acceptable
selected pictures to obtain a fast         delays and the delay is to be consid-
forward or a fast reverse effect.
This feature is essentially a more
                                           ered a parameter.
                                                                                      the likely
                                                                                      applications of the
demanding form of random acces-            Editability.  While it is understood
sibility.                                  that all pictures will not be com-
                                           pressed independently     (ix., as still
Reverse Ployback.    Interactive appli-
cations might require the video sig-
                                           images), it is desirable to be able to
                                           construct editing units of a short         standard.
nal to play in reverse. While it is not    time duration and coded only with
necessary for all applications        to   reference to themselves so that an         high compression         associated with
maintain    full quality in reverse        acceptable level of editability      in    interframe coding, while not com-
mode or Eden to have a reverse             compressed form is obtained.               promising random access for those
mode at all, it was perceived that                                                    applications that demand it. This
this feature should be possible with-      Format  Flexibility. The computer          requires a delicate balance between
out an extreme additional cost in          paradigm of “video in a window”            in%- and interframe coding, and
memory.                                    supposes a large flexibility of for-       between recursive and nonrecur-
                                           mats in terms of raster size (width,       sive temporal redundancy            reduc-
Audio-Visual   Synchronization.    The     height) and frame rate.                    tion. In order to answer this chal-
video signal should be accurately                                                     lenge, the members of MPEG have
synchronizable     to an associated        Cost Tradeoffs. All the proposed           resorted to using two interframe
audio source. A mechanism should           algorithmic  solutions were evalu-         coding techniques:        predictive and
be    provided      to    permanently      ated in order to verify that a de-         interpolative.
resynchronize    the audio and the         coder is implementable    in a small          The MPEG video compression
video should the two signals be de-        number of chips, given the technol-        algorithm      [3] relies on two basic
rived from slightly different clocks.      ogy of 1990. The proposed algo-            techniques:     blxk-based       motion
This feature is addressed by the           rithm also had to meet the con-            compensation for the reduction of
MPEG-System group whose task is            straint that the encoding process          the temporal        redundancy      and
to define the tools for synchroniza-       could be performed     in real time.       transform     domain-(DCT)        based
tion as well as integration of multi-                                                 compression for the reduction of
ple audio and video signals.               Overview   of the MPEC                     spatial     redundancy.        Motion-
                                           Compression    Algorithm                   compensated      techniques are ap-
Robushess    to Errors. Most digital       The difficult challenge in the de-         plied with both causal (pure predic-
storage media and communication            sign of the MPEG algorithm is the          tive coding) and noncausal predic-
channels are not error-free,      and      following: on one hand the quality         tors (interpolative     coding).   The
while it is expected that an appro-        requirements   demand a very high          remaining signal (prediction error)
priate channel coding scheme will          compression    not achievable with         is further compressed with spatial
be used by many applications, the          intraframe   coding alone; on the          redundancy reduction (DCT). The
source coding scheme should be             other hand, the random access re-          information relative to motion is
robust to any remaining        uncor-      quirement    is best satisfied with        based on I6 X I6 blocks and is
rected errors;     thus catastrophic       pure intraframe coding. The algo-          transmitted together with the spa-
behavior in the presence of errors         rithm can satisfy all the require-         tial information. The motion infor-
should be avoidable.                       ments only insofar as it achieves the      mation is compressed using vari-
                                                                                 subsignal with low temporal resolu-
                                                                                 tion (typically 112 or Ii3 of the
                                                                                 frame rate) is coded and the full-
                                                                                 resolution signal is obtained by in-
                                                                                 terpolation of the low-resolution
                                                                                 signal and addition of a correction
                                                                                 term. The signal to be recon-
                                                                                 structed by interpolation is ob-
                                                                                 tained by adding a correction term
                                                                                 to a combination of a past and a fu-
                                                                                 ture reference.
                                                                                    Motion-compensated      interpola-
                        Biiimnional Pmdicfion                                    tion (also called bidirectional pre-
                                                                                 diction in MPEG         terminology)
                                                                                 presents a series of advantages, not
                                         MPEG is quite flexible and will de-     the least of which is that the com-
                                         pend on application-specific pa-        pression obtained by interpolative
                                         rameters such as random accessibil-     coding is very high. The other ad-
able-length codes        to   achieve    ity and coding delay. As an example     vantages of bidirectional prediction
maximum efficiency.                      in Figure 2, an intracoded picture is   (temporal interpolation) are:
                                         inserted every 8 frames, and the
TempOral   Redundancy    Reduction       ratio of interpolated pictures to       l   It deals properly with uncovered
Because of the importance of ran-                                                    areas, since an area just uncov-
                                         intra- or predicted pictures is three
dom access for stored video and the                                                  ered is not predictable from the
                                         ““t of four.
significant bit-rate reduction af-                                                   past reference, but can be prop-
forded by motion-compensated in-         Motion Compensation.                        erly predicted from the “future”
terpolation, three types of pictures     Prediction. Among the techniques            reference.
are     considered     in    MPEG.*      that exploit the temporal redun-        l   It has better statistical properties
Intrapictures (I), Predicted pictures    dancy of video signals, the most            since more information is avail-
       and   Interpolated    pictures    widely used is motion-compensated           able: in particular, the effect of
                                         prediction. It is the basis of most         noise can be decreased by averag-
(B-for      bidirectional prediction).
                                         compression algorithms for visual           ing between the past and the fu-
lntrapictures provide access points
                                         telephony such as the CCITT stan-           ture reference pictures.
for random access but only with
                                         dard H.261. Motion-compensated          l   It allows decoupling between
moderate compression; predicted
                                         prediction assumes that “locally”           prediction and coding (no error
pictures are coded with reference
                                         the current picture can be modeled          propagation).
to a past picture (Intra- or Pre-
                                         as a translation of the picture at      l   The trade-off associated with the
dicted) and will in general be used
                                         some previous time. Locally means           frequency of bidirectional pic-
as a reference for future predicted
                                         that the amplitude and the direc-           tures is the following: increasing
pictures; bidirectional pictures pro-
                                         tion of the displacement need not           the number of B-pictures be-
vide the highest amount of com-
                                         be the same everywhere in the pic-          tween references decreases the
pression but require both a past
                                         ture. The motion information is             correlation of B-pictures with the
and a future reference for predic-
                                         part of the necessary information to        references as well as the correla-
tion; in addition, bidirectional pic-
tures are never used as reference.       recover the picture and has to be           tion between       the references
 In all cases when a picture is coded    coded appropriately.                        themselves. Although this trade-
                                                                                     off varies with the nature of the
with respect to a reference, motion
                                         Interpolation. Motion-compensated           video scene, for a large class of
compensation is used to improve
 the coding efficiency. The relation-    interpolation is a key feature of           scenes it appears reasonable to
                                         MPEG. It is a technique that helps          space references at about l/lOth
ship between the three picture
                                         satisfy some of the application-            second interval resulting in a
 types is illustrated in Figure 2. The
                                         dependent requirements since it             combination of the type I !%B P B
organization of the pictures in
                                         improves random access and re-              BPBB..IBBPBB.
                                         duces the effect of errors while at
                                         the same time contributing signifi      Motion Representation, Macroblock.
                                         cantly to the image quality.            There is a trade-off between the
                                            In the temporal dimension, mo-       coding gain provided by the motion
                                         tion-compensated interpolation is a     information and the cost associated
                                         multiresolution     technique:     a    with coding the motion informa-

tion. The choice of 16 x 16 blocks                spatial correlation    of the motion
for the motion-compensation               unit    vector field (the differential motion
is the result of such a trade-off,                vector is likely to be very small ex-
such      motion-compensation            units    cept at object boundaries).
are called Macroblocks.         In the more

                                                                                                    The freedom
general     case of a bidirectionally             Motion Estimation.         Motion estima-
coded picture,       each 16 x 16 mac-            tion covers a set of techniques         used
roblock can be of ‘ype Intra, For-                to extract the motion information
              or Average. As expressed
                                                  from a video sequence.
                                                  syntax specifies
                                                                                  The MPEG
                                                                          how to represent          left to
in Table 5, the expression
predictor      for a given macroblock
                                      for the     the motion information:
                                                  motion     vectors
                                                                                   one or two
                                                                         per 16 x 16 sub-           manufacturers...
                                                                                                    means the existence
depends on reference         pictures (past       block of the picture depending             on
and future) as well as the motion                 the type of motion compensation:
vectors: X is the coordinate           of the     forward-predicted.               hackward-
picture element,       iiiVol the motion
vector relative to the reference           pic-
                                                  predicted,      average.      The
                                                  draft does not specify how such
                                                                                                    of a standard
ture IO, mvp, the motion vector rel-
                                                                                                    does not prevent
                                                  vectors are to be computed,             how-
ative to the reference        picture II.         ever. Because of the block-based

                                                                                                    creativity and
    The motion information            consisrs    motion      representation         however,
of one vector for forward-predicted               block-matching           techniques       are
macroblocks           and         backward-       likely to be used; in a hlock-match-
predicted      macroblocks,
vectors for bidirectionally
                                 and of two
                                                  ing technique,       the motion vector is
                                                  obtained by minimizing          a cost func-      inventive spirit.
macroblocks.       The motion informa-            tion measuring         the mismatch       be-
tion associated       with each 16 x 16           tween a block and each predictor                  ered are known to give good re-
block is coded differentially             with    candidate.    Let Mi be a macroblock              sults, but at the expense of a very
respect to the motion information                 in the current picture I,, v the dis-             large complexity for large ranges:
 present in the previous adjacent                 placement     with respect to the refer-          the decision of tradeoff quality of
blxk. The range of the differential               ence picture       I,, then the optimal           the motion vector field versus com-
 motion vector can be selected on a               displacement        (“motion     vector”)   is    plexity of the motion estimation
 picture-by-picture       basis, to match         obtained    by the formula:                       process is for the implementer    to
 the spatial resolution, the temporal                                                               make.
 resolution and the nature of the                 VT= min~‘;~M D[I, (x)              I,(;   + ;)I
 motion in a particular sequence-                                                                   Spatial Redundancy   ReduCtlon
 the maximal allowable range has                        XfV                                         Both still-image and prediction-
 been chosen large enough to ac-                                                                    error signals have a very high spa-
commodate even the most demand-                   where the search range V of the                   tial redundancy. The redundancy
 ing situations.       The differential           possible motion vectors and the se-               reduction techniques usable to this
 motion      information         is further       lection of the cost function      D are           effect are many, but because of the
 coded by means of a variable-length              left entirely to the implementation.              block-based nature of the motion-
 code to provide       greater     efficiency     Exhaustive    searches where all the              compensation process, block-based
 by taking advantage         of the strong        possible motion vectors are consid-               techniques are preferred.     In the

I                                     PredIction      Modes for MacroblocL                  In B-Picture
                                                                     Predictor                                      PrediCtIOn   ErrOr

                                                  i. &I = 128                                                         I,CXI- i, (XI

1   Forward     Predicted                         i, (Xi = i. IX + mv.,I                                              I, (XI - i, CXI    /
1   Backward      Prf+dlcted                      i, IX1= i, cX + mw                                                  I, (3 - i, 1x1     I
    Average                                       r, (Xl = 2 ri, IX + mv,,l      + I2 (x + mv,,ll                     I, IX1- r, (Xl
                                                                          field of block-based        spatial redun-
                                                                          dancy techniques,         transform    cod-
                                                                          ing techniques      and vector quantira-
                                                                          tion coding        arc the two likely
                                                                          candidates.    ‘l’ransform coding tech-
                                                                          niques with a combination           of visu-
                                                                          ally weighted        scalar quantiration
                                                                          and run-length        coding have been
                                                                          preferred     because the DCT pres-
                                                                          ents a certain number           of definitr
                                                                          advantages       and has a relatively
                                                                          straightforward          implementation;
                                                                          the advantages        are the following:

                                                                          l   The       DCT      is an Orthogonal
                                                                                 Orthogonal          ‘rransforms       arr
                                                                                 filter-bank-oriented         (i.e., have
                                                                                 a frequency       domain interpreta-
                                                                                 Locality:      the samples         on a
                              DCT                                                8 x 8 spatial window are sutfi-
                             -                                                   cient to compute 64 transform
                                                                                 coefficients      (or subbands).
                                                                                 Orthogonality       guarantees     well-
                                                                                 behaved         quanrization            in
                                                                          l   The DCT is the best of the or-
                                                                              thogonal     transforms       with a far
                                                                              algorithm,      and a very close ap-
                                                                              proximation        to the optimal for a
                                                                              large class of images.
                                                                          l   The     DCT        basis   function       (or
                                                  Quantlzation.               subband     decomposition)         is suffi-
                                                 zig-zag    scan,             ciently well-behaved         to allow ef-
                                               Run-length    coding           fective use of psychovisual           crite-
                                                                              ria. (This is not the case with
                                                                              “simpler”        transform       such      as

                                                                             In the standards    for still image
                                                                          coding    (IPEG) and for visual te-
                                                                          lephony (CCITT H.261), the 8 x 8
                                                                      T   DCT has also been chosen for simi-
Ouantlzer with deadzone             Qunntlrer with no deadzone
                                                                          lar reasons. The technique      to per-
  (Nonlntm M-blocks)                       (Intra M-blocks)
                                                                          form intraframe     compression    with
             Reconstructed                       Remnsbuc1ed
                                                                          the DCT is essentially     common     in

                                                                          Motion-Compensated              Interpola-

                                                                          TramfOrm   Coding. Ouantization
                                                                          and Run-Length   Coding

                                                                          Ouantizer  Characteristics   for
                                                                          tntra. and Non-lntra    Blocks
                                                                          (stepsize = 2)
                                                                                  DIGITAL          MULTIMEDIA        EVETEME

the three standards and consists of      particular quantization matrix for
three stages: computation of the         an application or even for an indi-
transform coefficients; quantira-        vidual sequence. A customized ma-
don of the transform coefficients;       trix can be stored as context to-
and conversion of the transform          gether with the compressed video.
coefficients into {run-amplitude}
pairs after reorganization of the
data in a zigzag scanning order (see
                                         Quuuantiiatun lnlra u. Nonintm
                                         Blockr. The signal from intracoded
                                                                                  The flexibilitv of
Figure 4).                               blocks should he quantized differ-
                                         ently from the signal resulting from
                                                                                  the video seqiuence
Discrete Cosine Tmnsfom. The Dis-
crete Cosine Transform has inputs
                                         prediction       or    interpolation.
                                         Intracoded blocks contain energy         parameters in MPEG
                                                                                  is such that a wide
in the range [-255, 2551 and out-        in all frequencies and are very likely
put signals in the range [-2048,         to produce “blocking effects” if too
20471, providing enough accuracy         coarsely quantized; on the other
even for the finest quantizer. In
order to control the effect of
                                         hand, prediction error-type blocks
                                         contain predominantly high fre-
                                                                                  range of
rounding errors when different
implementations of the inverse
                                         quencies and can be subjected to
                                         much coarser quantization. It is as-     spatial and
                                                                                  temporal resolution
transform are in use, the accuracy       sumed that the coding process is
of the inverse transform is deter-       capable of accurately predicting low
mined according to the CCITT             frequencies, so that the low fre-
H.261 standard specification [9].        quency content of the prediction
                                         error signal is minimal; if it is not    is supported.
Quantiration. Quantiration of the        the case, the intracoded block type
DCT coefficients is a key operation,     should be preferred at encoding.         to         a   particular   bit   rate   (rate-
because the combination of quanti-       This difference between intracoded       COIltd).

zation and run-length coding con-        blocks and differentially      coded
tributes to roost of the compression;    blocks results in the use of two dif-    Entropy coding. In order to further
it is also through quantization that     ferent quantize= structures: while       increase the compression inherent
the encoder can match its output to      both quantizers are near uniform         in the DCT and to reduce the im-
a given bit rate. Finally, adaptive      (have a constant stepsize), their        pact of the motion information on
quantization is one of the key tools     behavior around zero is different.       the total bit rate, variable-length
to achieve visual quality. Because       Quantizer for intracoded blocks          coding is used. A Hoffman-like
the MPEG        standard has both        have no deadzone (Le., the region        table for the DCT coefficients is
intracoded pictures as in the JPEC       that gets quantized to the level zero    used to code events corresponding
standard and differentially coded        is smaller than a stepsize while         to a pair {run, amplitude). Only
pictures (i.e., pictures coded by a      quantizers for nonintrablocks have       those codes with a relatively high
combination of temporal prediction       a large deadzone). Figure 5 illus-       probability of occurrence are coded
and DCT of the prediction error as       trates the behavior of the two quan-     with a variable-length code. The
in      CCITT       Recommendation       tizers for the same stepsize of 2.       less-likely events are coded with an
H.261). it combines features of                                                   escape symbol followed by fixed
both standards to achieve a set of       Modfied Qtumtizevs. Not all spatial      length codes, to avoid extremely
very accurate tools to deal with the     information is perceived alike by        long code words and reduce the
quantization of DCT coefficients.        the human visual system and some         cost of implementation. The vari-
                                         blocks need to be coded more accu-       able-length code associated with
 Visually weighted quantization.  Sub-   rately than others: this is particu-     DCT coefficient is a superset of the
jective perception of quantization       larly true of blocks corresponding       one used in CCITT recommenda-
 error greatly varies with the fre-      to very smooth gradients where a         tion H.261 to avoid unnecessary
 quency and it is advantageous to        very slight inaccuracy could be per-     costs when implementing         bath
 use coarser quantizers for the          ceived as a visible block boundary       standards on a single processor.
 higher frequencies.      The    exact   (blocking effect). In order to deal
 “quantization matrix” depends on        with this inequality between blocks,     layered structure. syntax
 many external parameters such as        the quantizer stepsize can be modi-      and Bit Stream
 the characteristics of the intended     fied on a block-by-block basis if the    Goals. The goal of a layered struc-
display, the viewing distance and        image content makes it necessary.        ture is to separate entities in the bit-
the amount of noise in the source.       This mechanism can also be used to       stream that are logically distinct,
 It is therefore possible to design a    provide a very smooth adaptation         prevent ambiguity and facilitate the

drcodmg process. .Thr xparauor,           the overhead      information        (dis-       ing    and     delay     requirement-
in layers supports the claims of          placement  fields, quantirer   step-             expressed     in the sequence header
gwwncz~ flexibilio, and rfficien~.        size, type of predictor or inwr-                 in thr fields bit rate and buffer size.
                                          polator). The robustness of the                  The model of the video buffer veri-
~rnur~ify. ‘l-he generic aspect of the    compressed bit stream also depend*               fier is that of a receiving buffer for
MPEG standard is nowhere better           to a large exwnt on the ability to               the coded bit stream and an instan-
illustrated than by the MPEG bit          quickly   regenrratr lost context                tzmeous decoder so that all the data
strezm,. The syntax allows for pro-       after an error.                                  for a picture ia instantaneously
vision of many application-specific                                                        removed from the buffer. Within
features without penalizing appli-        Layered    Syntas. The syntax of rl              the framework       of this model, the
cations that do not need those fea-       MPEG video bit stream contains six               MPEG Committee Draft establisheh
tures. Two examples of such “bit-         layers (see Table 7); each layer sup-            constraints or) the bit stream-by
stream customization” illustrate the      ports a definite function: either a              way of the buffer occupancy-so
potential of the syntax:                  signal-processirrg  function   (DCT,             that decoding cao occur without
   Example I: Random access and           Motion Compensation)      or a logical           buffer underflow or overflow.
editabililv of mdeo stored on a comfmtw   function (Resynchronization,     Ran-
hard dzsk. Random accessibility and       dom access point).                               Dmdzn~     Proces.,. The MPEG drafr
easy editability require many access                                                       standard defines the decoding pro-
points; groups of pictures are of         Bit Stream. The MPEG syntax [S]                  cess--not    the decoder. There are
short duration     (e.g., 6 pictures,     defines a MPEG bit stream as any                 many ways to implement a decoder
115 second) and coded with a fixed        sequence of binary digits coosistem              and the standard does not recom-
amount of bits (to make editability       with the syntax. In addition, the bit            mend a particular way. The de-
possible). The granularity    of the      stream must satisfy particular con-              coder structure of Figure 6 is a typi-
editing units (group of pictures          straints so that the bit stream is to            cal decoder structure with a buffer
only coded with refererlce to pic-        be decodable with a buffer of an                 at the input of the decoder. The bit
tures within the group)        allows     appropriate   sire. These additional             stream is demultiplexed       into over-
editability to one-fifth of a second      constraints    preclude     coded   video        head information       such as motion
XUrXy.                                                                                     information,       quantize=    Stepsire,
                                                                                           macroblock       type and quantized
  Example 2: Broadcar/ oucr nuuy                                                           DCT coefficients.       The quantized
channel. There are occasional re-                                                          DC7 coefficients are dequantized,
maining uncorrected      errors.  In                                                       aod are input to the Inverse Cosine
order to provide robustness, the                                                           Transform       (IDCT). The recon-
predictors are frequently reset and                                                        structed waveform from the IDCl
each intra and predicted picture is       I             Picture Height                 I   is added to the result of the prcdic-
segmented in many slices. In addi-
tion, to support “tuning in” in the
                                          I         Pel ASP&        Ratio              I   tion. Because of the particular na-
                                                                                           ture of Bidirectional        prediction,
middle of the bit stream, frequent        I              Frame Rate                    I   two reference pictures are used to
repetitions  of the coding context                                                         form the predictor.
(Video Sequence Layer) are pro-
                                          I                Bit Rate                    I
                                          I              Buffer Size
                                                                                       I   Standard      and   Oualitv
                                                                                           COnfOrmanCe: EnCOder and
Fkribdtly.   The flexibility ot the                                                        Decoders
MPEG standard is illustrated by the       bit strearos that have “unreason-                Bit Stream and Decoding Process.
large number of parameters         de-    able”     buffering    requirements.             The MPEG standard     specifies a
fined    in the Video Sequence            Every bit stream is characterized (at            syntax for video on digital storage
Header. Table 6 shows the video           the sequence layer) by two fields:               media and the meaning associated
sequence header. The range of             bit rate and buffer size. The buffer             to this syntax: the decoding pro-
those parameters is fairly large, and     sizr specifies the minimum buffer                cess. A decoder is an MPEG de-
while the MPEG standard         is fo-    size necessary to decode the bit                 coder if it decodes an MPEG bit
cused at bit rates about 1.5 Mbits/s      stream within the context of the                 stream to a result that is within ac-
and resolutions of about 360 p&i          video buffer verifier.                           ceptable baunds (still to be deter-
line, higher resolution and higher                                                         mined) of the one specified by the
bit rates are not precluded.              Video Buffer Venfivr. The video                  decoding process; an encoder is a
                                          buffer verifier [3] is an abstract               MPEG encoder if it can produce           a
.!@cficiency.A compression  scheme        model of decoding used to verify                 legal MPEG bit stream.
such as the MPEG algorithm needs          that an MPEG bit stream           is
to provide efficient management of        decodable with reasonable buffer-                Encoders     and Decoders.    ‘l’he s,ar,-

m                                                                                                 DIGITAL       MULTIMEDIA          EVETEME

                                     MacroBlock Type                                              Schematic Block Diagram              of the
                                                                                                  Decoding Process

                                                                                                  dad dclines only the bwstrcam
                                                                                                  syntax and the decoding process;
                                                                                                  manufacturers         are entirely    free    to
                                                                                                  make      good use of the flexibility         of
                                                                                i A               the syntax to design very high-
                          I____M”6”:““~_________I-J                                               quality encoders and very low-cost
                                                                                                  decoders. The freedom left to man-
                                                                                                  ufacurers   at the encoder covers
                                                                                                  such important quality factors as
                                                                                                  motion estimation, adaptive quanti-
                                                                                                  zation and rate control. This means
    I         Six LaVeK of Syntax of the MPEG VlUeO Sit Stream                                I
                                                                                                  that the existence of a standard
        sequenceLayer:                            IRandom Access Unit: Context)                   does not prevent creativity and in-
        Group of Pictures Layer:                  IRandOm Access Unit: Video Coding)              ventive spirit in implementing en-
        Picture Layer:                            IPrimary Coding Unit)
        Slice Layer:                              IResynchronization Unit1
        Macroblock   Layer:                       (Motion Compensation    Unit1
        BIOCk Layer:                              (DCT Unit1                                      Resolution,        Bit Rates and Quality
                                                                                                  The quality of video compressed
                                                                                                  with the MPEG algorithm at rates
                                                                                                  of about 1.2 Mbits/s has often been
                                                                                                  compared to VHS recording [I].
                                                                                                  The qualificative VHS-like and bet-
    I         Parameters        of the MPEG Constrained           Parameter           Set         ter than VHS have been used. The
    I          Horizontal     Size <=      720 pels                                           I   spatial resolution is limited to 360
                                                                                                  samples per video line and the
    I          vertical   size <=     576 pels                                                I   video signal at the input of the

    I          Total number      of Macroblocks/picture      <=   396                         I   source coder has 30 frames/s non-
                                                                                                  interlaced. For most source mate-
               Total number      of MaCrOblOCks/second       <=   396*25 = 330*30                 rial, artifact-free renditions can be
                                                                                                  obtained, hut for the most demand-
               PiCtUre Rate <=         30 Frames/second
                                                                                                  ing material, it is at times necessary
    I          Bit Rate <=     1.86 Mbits/second                                              I   to trade resolution       for impair-
    I          Decoder      Buffer    <=   376832 bits
                                                                                              I      The flexibility of the video se-
                                                                                                  quence parameters in MPEG is re-
                                                                                                  sponsible for these characteristics: a
                                                                                                  wide range of spatial and temporal
                                                                                                  resolution is supported, and it has
        Perspectives of Application of the MPEC Algorithm                          beyond         the capability of using a large range
                      the Constrained Parameter Set                                               of bit rates. It is, however, impor-
                                                                                                  tant to guarantee interoperability
         Format                  Video      Parameters            COrnDresSed      Bit Rate       of equipment using MPEG, without
    I   SIF                           352 x 240 30Hz                    1.2-3    MbPS         I   forcing the equipment manufactur-
                                                                                                  ers to build very overdesigned sys-
    I   CCIR 601                      720 x 486 30HZ                      5-10   MbPS         I   tems. For this reason a special sub-
                                                                                                  set of the paramerer space has been
    1   EDN                           960 x 486 30Hz                      7-15 Mbps           I   defined     that     represents   a reason-
        HDN                          1920 x 1080 30Hz                   20-40 MbpS                able compromise   well within the
                                                                                                  prime target of MPEG of address-
mg video coded at about           1.5 Mbits/      ate stage as a Draft International            5. Hidaka, T., Ozawa, K. Subjective
s. A “constrained           parameter       bit   Standard     (DIS) and a second review           assessment of redundancy-reduced
stream” was defined            [3] with the       process. Prior to the review process             moving images for interactive ap-
parameters      shown in Table 8.                 itself, it is expected   that a real-time        plications: Test methodology and
    It is expected      that all “MPEG”           MPEG decoder will be demon-                      report. Sifll Pmemg:    Image Come
                                                                                                   mu”. 2, 2 (Aug. 1990).
decoders     be capable of decoding           a   strated.
                                                                                                6. JPEG digital compression     and cod-
constrained      parameter “Core” bit                 In addition to the ongoing effort,
                                                                                                    ing of continuous-tone  still images.
stream. Or beyond the ‘Core” bit-                 the algorithmic and technical ave-                Draft IS0 10918. 1991.
stream parameters, the MPEG al-                   nues opened by MPEG are making                 7. Lieu, M.L. Overview of rhe px64
gorithm     can be applied        to a wide       the concepts       of digital videotape           kbps video coding standard. Corn-
range of’ video formats.           It can be      recorders     and digital video broad-            mun. ACM 34, 4 (Apr. ,991).
argued,      however,       that at those         casting more likely to occur quite             8. MPEG proposal package descrip-
higher resolutions       and those higher         soon. A second phase of work has                  tion. Documem ISOiWGRiMPEGi
bit rates, the MPEG algorithm           is not    been started in the MPEG commit-                  89%L28 (July ,989).
necessarily optimal since the techni-             tee to address       the compression     of    9. Video codec for audio visual ser-
                                                                                                    vices at px64 kbitsis. CCITT Rec-
cal trade-offs     have been widely dis-          video for digital storage media in
                                                                                                    ommendation H.261, ,990.
cussed mostly within the range of                 the range of 5 to 10 Mbits/s.
                                                                                                LO. Wallace, G.K. The JPEG still-
the “Core” bit stream (see Table 9).                                                                Picture compression standard. Corn-
    A new phase of activities of the              Acknowledgments                                   vw7l. ACM 34,4 (Apr. ,991,.
MPEG committee            (ISO-IECIJTCII          Now that MPEG is widely recog-
SCZIWGIl)         has been started           to   nized as an important     milestone in
study video compression           algorithm       the evolution of digital video, the              CR Categories and Subject Des&p
of higher      resolution     signals (typi-      author would like to acknowledge              tom: c2.o [ComP”ter-Comm”nicPtion
cally CCIR 601) at bit rates up to IO             Hiroshi Yasuda, Convener of WG8               Networks]: General--Data      communico-
Mbits/s.                                          under whose guidance both JPEC                tions; 1.42 [Image Processing]:   Com-
                                                                                                pression (coding)-A#roxuruzle    me&x&
                                                  and MPEG were started and Leo-
Conclusion                                                                                         Genera, Terms: Design, Standardize
                                                  nardo Chiariglione, Convener ol
It is anticipated    that the work of the         WGI I without whose vision there                 Additional Key Words and Phrases:
MPEC committee           will have a very         would have been no MPEG. The                  MPEG, multimedia, video compression
significant     impact on the industry            author would also like to thank all
and that products        based on MPEG            the technical teams that contributed
are expected       as early as 1992. In-          proposals to the MPEG-Video       test.
deed, the concept that a video sig-               and rno~t of all, the people that con-        DIDIER LE GALL is Director of Re-
nal and its associated audio can be               tributed to putting together thr              search a~ C-Cube Microsystems. He has
compressed to a bit rate of about                                                               been involved with the MPEG standard-
                                                  MPEG      Simulation     Models    and
1.5 Mbits/s with an acceptable qual-                                                            ization effort since its beginning and is
                                                  Committee    Drafts. 0
                                                                                                currently serving as chairperson of the
ity has been proven and the soh-
                                                                                                MPEG-Video group at C-Cube Micro-
tion appears to be implementable           at
                                                                                                systems. His current research interests
low cozt with today’s technology.                  I. Anderson, M. “CR quahty vtdeo at          include signal processing, video com-
The consequences            for computer              1.5 Mbits/s. Nalwu21 Conznzunicalion      pression algorithms and architecrure of
systems and computer          and commu-              Forunz (Chicago, Oct. 1990).              digital video compression systems.
nication networks are likely to open
the way to a wealth of new applica-               2. Chen, CT. and Le Gall, D.,. A Kth            Author’s Present Address: C-Cube
tions loosely labeled “multimedia,”                  order adaptive transform coding
                                                                                                Microsystems, 399-A W. Trimble Road,
                                                     algorithm for high-fidelity recon-         San Jose. CA 9513,. emai,: djl@c3.
because they integrate         text, graph-
                                                     struction of still images. In Proceed-
ics, video, and audio. The exact
                                                     ings of the SITE   (San Diego,   Aug.
impact of “multimedia” is of course                  1989,.
yet to be determined, but is likely to
lx very great.                                    5. Coding of moving Pictures and as-
    MPEG has a Committee Draft;                     sociated audio. Committee Draft of
                                                    Standard    *SO, ,172: ISOiMPEG
the path to an International           Stan-
                                                    901176. Dec. 1990.
dard calls for an extensive          review
process      by the National        Member
                                                  4. Digital transmission of component
Bodi&,       followed bv an intermedi-
                                                     coded r&vision signals at 30-34
                                                     Mbitsis and 45 Mbits/s using the dis-
                                                     crete cosine transform.       CCIR-
                                                     CMTIX      Document CMTTR. July


Shared By: