Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out
Get this document free

Extraction of Time Space Information

VIEWS: 3 PAGES: 93

									Methods and Standards for Lossless Compression




                                                                                                     CHAPTER 6
                                                 Department of Electronic Engineering, FJU




                                                                                             VLSI Architectures for Motion
                                                                                                      Estimation



                                                                                                                                                         1
                                                                                             Video Coding techniques and Hardware Architectures Design
Methods and Standards for Lossless Compression




                                                                                                            1-D Systolic Array
                                                 Department of Electronic Engineering, FJU




                                                                                             A Family of VLSI Designs for the Motion Compensation
                                                                                                           Block-Matching Algorithm

                                                                                                       K. M. Yang, M. T. Sun, and L. Wu
                                                                                             IEEE Transactions on Circuits and Systems, vol. 36, no.
                                                                                                         10, pp. 1317-1325, Oct. 1989

                                                                                                                                                               2
                                                                                                   Video Coding techniques and Hardware Architectures Design
                                                                                                                  Main Features
Methods and Standards for Lossless Compression




                                                                                              They allow full search capability which is the optimal solution in
                                                                                               block-matching.
                                                                                              They allow sequential inputs to save pin counts but perform
                                                 Department of Electronic Engineering, FJU




                                                                                               parallel processing.
                                                                                              They use common busses for data transfers and save silicon
                                                                                               area.
                                                                                              They are very flexible and modular designs, capable of
                                                                                               processing different block sizes, e.g. 8 X 8, 1 6 x 1 6 or 32x32
                                                                                              They are cascadable, i.e., cascaded chips allow a larger
                                                                                               tracking area.
                                                                                              They contain testing circuitry for increasing the testability.
                                                                                              The first chip design for block matching motion estimation in the
                                                                                               world.

                                                                                                                                                                    3
                                                                                               Video Coding Techniques and Hardware Architectures Design
                                                                                                             Architecture Design
Methods and Standards for Lossless Compression




                                                                                              In order to utilize fully the processing power of the
                                                                                               PE’s, a special data flow has to be derived to keep
                                                                                               the PE’s as busy as possible.
                                                 Department of Electronic Engineering, FJU




                                                                                              The data are repeatedly used at different searching
                                                                                               positions.
                                                                                              In the following, two data-flow techniques which allow
                                                                                               the designs to achieve 100 percent efficiency are
                                                                                               described. One broadcasts previous frame data and
                                                                                               the other broadcasts current block data.


                                                                                                                                                            4
                                                                                                Video Coding Techniques and Hardware Architectures Design
                                                                                                                                        Notations
Methods and Standards for Lossless Compression




                                                                                                                                                               b(Ib, Jb+15)
                                                                                                                                          b(Ib, Jb)        0                      31
                                                                                                 a(Ia, Ja)                    16                      0    X          X
                                                                                                                         0         15
                                                                                                                 0                                                           …
                                                 Department of Electronic Engineering, FJU




                                                                                                                         X

                                                                                                                             a(i, j)
                                                                                                                 15                                                b(k, l)
                                                                                                                               c
                                                                                                                                                      31
                                                                                                                                                               p             p’

                                                                                             S (m j )   | a( I a  i, J a  j )  b(I b  k , J b  l  m j ) |,                m j  0,1,..., .
                                                                                                                                                                                                15
                                                                                                             M       N
                                                                                             E  min  | a(i, j )  b(i  mi , j  m j ) |q
                                                                                                  mi , m j
                                                                                                             i       j
                                                                                                                                                                                               5
                                                                                                  Video Coding Techniques and Hardware Architectures Design
                                                                                               Broadcasting the Previous Frame Data
Methods and Standards for Lossless Compression




                                                                                              While b(Ib, Jb+15) is being inputted it can be
                                                                                               broadcasted to all processors that need it.
                                                 Department of Electronic Engineering, FJU




                                                                                              This relieves the burden of repeated access of the
                                                                                               same data from the previous frame.




                                                                                                                                                            6
                                                                                                Video Coding Techniques and Hardware Architectures Design
                                                                                                 Broadcasting Reference Frame
Methods and Standards for Lossless Compression




                                                                                                                              The 16 PE columns represent the
                                                                                                                               calculation of the error measurement
                                                                                                                               for 16 search positions.
                                                                                                                              Except for a very short initial delay, all
                                                 Department of Electronic Engineering, FJU




                                                                                                                               the PE’s are busy all the time, so that
                                                                                                                               the utilization is 100%.
                                                                                                                              The address generator generates the
                                                                                                                               address by summing up a base
                                                                                                                               address and a running index.
                                                                                                                              The base address, (Ia, Ja) or (Ib, Jb)
                                                                                                                               which is defined as the upper left
                                                                                                                               corner of a block, remains the same
                                                                                                                               for the entire processing of that
                                                                                                                               blocks and the running indexes (i, j)
                                                                                                                               and (k, l) are identical sequence for all
                                                                                                                               blocks.



                                                                                                                                                                     7
                                                                                             Video Coding Techniques and Hardware Architectures Design
                                                                         Methods and Standards for Lossless Compression
                                                                Department of Electronic Engineering, FJU

                                                                                                                          Basic Data Flow




Video Coding Techniques and Hardware Architectures Design
                                                            8
                                                                                                                       Architecture of PE
Methods and Standards for Lossless Compression




                                                                                                                 a-b
                                                                                                                          Absolute
                                                                                                                                     |a-b|                 | a(i, j)  b(k, l) |
                                                                                              a
                                                                                                                           Value             Accumulator
                                                                                              b     Subtractor   Latch               Latch                   Latch
                                                                                                                          Function
                                                 Department of Electronic Engineering, FJU




                                                                                              These sub-operations are performed in a pipeline
                                                                                               fashion and thus reduce the cycle time.
                                                                                              The accumulator in the last stage of the PE has 16-bit
                                                                                               precision to accommodate the largest possible error
                                                                                               measurement.



                                                                                                                                                                                    9
                                                                                                  Video Coding Techniques and Hardware Architectures Design
                                                                                                  Broadcasting the Current Frame Data
Methods and Standards for Lossless Compression




                                                                                             Parallel-in-parallel-
                                                                                             output shift registers
                                                 Department of Electronic Engineering, FJU




                                                                                                                                                    Parallel-in-parallel-
                                                                                                                                                    output shift registers
                                                                                                                                                    with multiplexers




                                                                                                                                                                   10
                                                                                                  Video Coding Techniques and Hardware Architectures Design
                                                                                             Basic Dataflow for Broadcasting Current
                                                                                                           Block Data
Methods and Standards for Lossless Compression
                                                 Department of Electronic Engineering, FJU




                                                                                                                                                          11
                                                                                              Video Coding Techniques and Hardware Architectures Design
                                                                                                             Flexible Block Size
Methods and Standards for Lossless Compression




                                                                                              Different motion-compensation schemes may
                                                                                               use different block sizes and require large
                                                                                               tracking ranges. It is very desirable to have a
                                                 Department of Electronic Engineering, FJU




                                                                                               chip flexible enough for use in different systems.
                                                                                              Consider a block-size of 8  8, the required
                                                                                               computations for each block is ¼ of the
                                                                                               computation required for a block-size of 16  16.
                                                                                              However, in each frame, the number of blocks is
                                                                                               4 times the number of the block-size of 16  16.

                                                                                                                                                           12
                                                                                               Video Coding Techniques and Hardware Architectures Design
                                                                                                        Flexible Block Size (Cont.)
Methods and Standards for Lossless Compression




                                                                                              The computational load for each frame is the same for
                                                                                               different block-sizes except that the internal dynamic-
                                                                                               range is slightly different (tracking range is fixed).
                                                 Department of Electronic Engineering, FJU




                                                                                              Both architectures discussed are flexible enough to
                                                                                               process 8  8, 16  16 or 32  32 blocks as long as the
                                                                                               tracking range is fixed to 16 searches in one coordinate.
                                                                                              The same hardware containing 16 PE’s can be
                                                                                               reconfigured to process different block sizes by a very
                                                                                               simple control signal (address generator).
                                                                                              The above discussion can be generalized to other block
                                                                                               sizes of power of 2.

                                                                                                                                                            13
                                                                                                Video Coding Techniques and Hardware Architectures Design
                                                                                                        Larges Tracking Ranges
Methods and Standards for Lossless Compression




                                                                                              The tracking range is basically limited by the
                                                                                               computation power of the PE's. If the tracking
                                                                                               range of -16 to +15 is needed, the computation
                                                 Department of Electronic Engineering, FJU




                                                                                               load is increased by 4 times.
                                                                                              Assuming each PE is already operating at the
                                                                                               limit of its capability, 4 times the number of PE's
                                                                                               will be needed.
                                                                                              In this connection, essentially two chips are
                                                                                               cascaded to provide 32-stage input registers and
                                                                                               32 PE’s for the doubled horizontal tracking range.

                                                                                                                                                           14
                                                                                               Video Coding Techniques and Hardware Architectures Design
                                                                                             Block Diagram for Cascading Four Chips to
                                                                                                Achieve Tracking Range of -16 to +15
Methods and Standards for Lossless Compression




                                                                                                                                Motion Vector
                                                 Department of Electronic Engineering, FJU




                                                                                                           CHIP A                      CHIP C
                                                                                                                           CMP
                                                                                                  C1                                                   C2

                                                                                                           CHIP B                      CHIP D

                                                                                                  p1                                                   p2
                                                                                                  p1’                                                  p2’
                                                                                                                                                             15
                                                                                               Video Coding Techniques and Hardware Architectures Design
                                                                                                                      Overlapped Search Area
Methods and Standards for Lossless Compression




                                                                                                  0         16       32 47         0         16       32 47
                                                                                              0                                0

                                                                                             16                               16
                                                 Department of Electronic Engineering, FJU




                                                                                                                                                                   0   16   32 47
                                                                                             32                               32                               0

                                                                                                      Sub-tracking area I              Sub-tracking area III   16
                                                                                             47                               47

                                                                                                  0         16       32 47         0         16      32 47     32
                                                                                              0                                0
                                                                                                      Sub-tracking area III            Sub-tracking area IV    47
                                                                                             16                               16

                                                                                             32                               32

                                                                                             47                               47

                                                                                                                                                                                    16
                                                                                                      Video Coding Techniques and Hardware Architectures Design
                                                                                                  Overlapped Search Area (Cont.)
Methods and Standards for Lossless Compression




                                                                                              The cascaded chip design can also be easily done by
                                                                                               assigning each chip to process one portion of the
                                                                                               tracking area.
                                                 Department of Electronic Engineering, FJU




                                                                                              While these data from the overlapped area are
                                                                                               inputted, they can be broadcasted to two chips to
                                                                                               save the bandwidth. This avoids proportional
                                                                                               increase of the memory requirement in a cascaded
                                                                                               chips system.




                                                                                                                                                           17
                                                                                               Video Coding Techniques and Hardware Architectures Design
                                                                                             Motion Estimation with Fractional Precision
Methods and Standards for Lossless Compression




                                                                                              Quarter-pel precision
                                                 Department of Electronic Engineering, FJU




                                                                                                                                                           18
                                                                                               Video Coding Techniques and Hardware Architectures Design
                                                                                             Fractional Motion Estimation Chip-Pair
                                                                                                             Design
Methods and Standards for Lossless Compression




                                                                                             Video in                Current Frame
                                                                                                                        Storage        Motion Compensation
                                                                                                                       Memory                 Chip I

                                                                                                                                             Integer
                                                 Department of Electronic Engineering, FJU




                                                                                             Reconstructed          Previous Frame          Precision
                                                                                             Video in                  Storage
                                                                                                                       Memory
                                                                                                                                                    (mi, mj)
                                                                                                                     Tracking Area
                                                                                                                        Storage        Motion Compensation
                                                                                                                        Memory                Chip F

                                                                                                                                            Fractional
                                                                                                                    Current Block
                                                                                                                                            Precision
                                                                                                                      Storage
                                                                                                                      Memory                              ~          ~
                                                                                                                                                   (mi  mi , m j  m j )

                                                                                                                                                                19
                                                                                             Video Coding Techniques and Hardware Architectures Design
                                                                                             Block Diagram of a Fractional Motion
                                                                                                       Estimation CHip
Methods and Standards for Lossless Compression
                                                 Department of Electronic Engineering, FJU




                                                                                                                                                         20
                                                                                             Video Coding Techniques and Hardware Architectures Design
                                                                                                                       Interpolation
Methods and Standards for Lossless Compression




                                                                                              The combination of IP1 and IP2 eases the input rate
                                                                                               and keeps the PE’s performing operations every
                                                                                               cycle.
                                                 Department of Electronic Engineering, FJU




                                                                                              The interpolated values at the output of the IP1 and
                                                                                               the IP2 can be expressed as
                                                                                                                    4l 4k                   k
                                                                                                      blk (i, j )          [       b(i, j )  b(i  1, j ) 
                                                                                                                       4       4              4
                                                                                                      l 4k                       k
                                                                                                         [        b(i, j  1)  b(i  1, j  1)
                                                                                                      4 4                         4
                                                                                                      bl3 (i, j )  bl0 (i, j  1),
                                                                                                      b3k (i, j )  b0k (i  1, j ),   k , l  0,1,2,3
                                                                                                                                                                21
                                                                                               Video Coding Techniques and Hardware Architectures Design
                                                                                             Basic Data Flow for Fractional Motion Vector
                                                                                                              Estimator
Methods and Standards for Lossless Compression
                                                 Department of Electronic Engineering, FJU




                                                                                                                                                            22
                                                                                                Video Coding Techniques and Hardware Architectures Design
                                                                                             Basic Data Flow for Fractional Motion Vector
                                                                                                          Estimator (cont.)
Methods and Standards for Lossless Compression
                                                 Department of Electronic Engineering, FJU




                                                                                                                                                            23
                                                                                                Video Coding Techniques and Hardware Architectures Design
                                                                          Methods and Standards for Lossless Compression
                                                                 Department of Electronic Engineering, FJU

                                                                                                                           Schematic Diagram of IP1




Video Coding Techniques and Hardware Architectures Design
                                                            24
                                                                          Methods and Standards for Lossless Compression
                                                                 Department of Electronic Engineering, FJU

                                                                                                                           Schematic Diagram of IP2




Video Coding Techniques and Hardware Architectures Design
                                                            25
                                                                          Methods and Standards for Lossless Compression
                                                                 Department of Electronic Engineering, FJU

                                                                                                                           Chip Layout




Video Coding Techniques and Hardware Architectures Design
                                                            26
                                                                                                                    Testability
Methods and Standards for Lossless Compression




                                                                                              The motion vector calculated by the chip is a function
                                                                                               of the current block data and the data in the previous
                                                                                               frame within the tracking range. Since the number of
                                                 Department of Electronic Engineering, FJU




                                                                                               possible combinations of these input data are
                                                                                               extremely large, exhaustive testing of the chip is
                                                                                               impossible.
                                                                                              In order to be able to test the chip, it is highly
                                                                                               desirable to have a testing circuit inside the chip
                                                                                               without using excessive chip area, or degrading
                                                                                               performance.
                                                                                              The chip proposed operates in two modes, the
                                                                                               normal mode and the test mode, which are selected
                                                                                               by an external signal named “test.”
                                                                                                                                                            27
                                                                                                Video Coding Techniques and Hardware Architectures Design
                                                                                                               Testability (Cont.)
Methods and Standards for Lossless Compression




                                                                                              By using tri-state buses and a decoder, the testing
                                                                                               vectors for the whole chip are reduced to much
                                                 Department of Electronic Engineering, FJU




                                                                                               smaller sets of functionally divided modules.
                                                                                              In the test mode, a test pattern is inputted from some
                                                                                               data pins, which are normally used for inputting one
                                                                                               of the previous frame data, and then is decoded by
                                                                                               the Test Pattern Decoder.
                                                                                              Only one of the modules will be tested at a time and
                                                                                               only its results are routed to an output bus and
                                                                                               observed from the output pins.

                                                                                                                                                            28
                                                                                                Video Coding Techniques and Hardware Architectures Design
Methods and Standards for Lossless Compression




                                                                                                Array Architectures for Block
                                                 Department of Electronic Engineering, FJU




                                                                                                    Matching Algorithms

                                                                                                       T. Komarek and P. Pirsch

                                                                                             IEEE Transactions on Circuits and Systems, vol. 36, no.
                                                                                                        10, Oct. 1989, pp. 1301-1308


                                                                                                                                                          29
                                                                                              Video Coding techniques and Hardware Architectures Design
                                                                                                           Block Matching Algorithm
Methods and Standards for Lossless Compression




                                                                                                                    N   N
                                                                                                       s (m, n)   | x(i, k )  y (i  m, k  n) |,    p  m, n  p
                                                                                                                   i 1 k 1

                                                                                                       u  min ( m ,n ) {s (m, n)},   v  (m, n) |u (motion vector)
                                                 Department of Electronic Engineering, FJU




                                                                                              The BMA is defined over a four-dimensional index space due to
                                                                                               its four indexes i, k, m, and n.
                                                                                              As an example, the BMA is decomposed into two parts which
                                                                                               are defined over two-dimensional index spaces.
                                                                                                 – The first one is spawn by the indexes i and k and consists of
                                                                                                    the addition of the sum s(m, n).
                                                                                                 – In the rest, which is defined over m and n, the minimum
                                                                                                    search and the selection of the displacement vector
                                                                                                    components is performed.
                                                                                                                                                                         30
                                                                                                Video Coding Techniques and Hardware Architectures Design
                                                                                             Derivation of Systolic Arrays for Full Search
                                                                                                                 BMA
Methods and Standards for Lossless Compression




                                                                                              The addition of s(m, n ) starts with the index k, and is
                                                                                               continued over the index i for fixed m and n.
                                                 Department of Electronic Engineering, FJU




                                                                                                                      N
                                                                                                         si (m, n)   | x(i, k )  y (i  m, k  n) |,   n, m  const.
                                                                                                                     k 1
                                                                                                                     N
                                                                                                         s (m, n)   si (m, n)         m and n fixed
                                                                                                                    i 1


                                                                                              The second part of the decomposed BMA is given by
                                                                                                        un  min ( m) {s(m, n)}, 0  m, n  2 p
                                                                                                           vn  (m, n) |un

                                                                                                                                                                          31
                                                                                                Video Coding Techniques and Hardware Architectures Design
                                                                                                              DG Spawn in the i, k Plane
Methods and Standards for Lossless Compression




                                                                                                                                 Subtraction
                                                                                                                                 magnitude operation,                     DG displayed for a block
                                                                                                                                 addition
                                                                                                          i                                                               size of N = 3 and a
                                                                                                                  0              0              0                         maximum displacement
                                                                                                   k                        1              2           3
                                                                                                                                                                          of p = 2’ in the i, k-plane of
                                                 Department of Electronic Engineering, FJU




                                                                                                                 AD              AD             AD
                                                                                                                                                               Time       the decomposed full search
                                                                                                                                                       4       schedule   BMA.
                                                                                                                 AD              AD             AD
                                                                                                                                                      5
                                                                                             y(i+m,k+n)          AD              AD             AD      x(i, k)
                                                                                                                      s1(m, n)       s2(m, n)    s3(m, n) 6
                                                                                                          0       A              A              A
                                                                                                                                                               s(m,n)

                                                                                                                                          m,n                      7
                                                                                                              addition                                     M
                                                                                                                                          s(m-1,n)

                                                                                                                                          minimum,search
                                                                                                                                          displacement
                                                                                                                                                                                                      32
                                                                                                                                          vector
                                                                                                 Video Coding Techniques and Hardware Architectures Design
                                                                                             Systolic Architecture AB1 for N = 3, p = 2
Methods and Standards for Lossless Compression




                                                                                                               Search area data               Reference data
                                                                                                                                       0
                                                                                                                   41 31 21 31 21 11   AD   11 21 31 11 21 31
                                                 Department of Electronic Engineering, FJU




                                                                                                                 42 32 22 32 22 12     AD     12 22 32 12 22 32


                                                                                                              43 33 23 33 23 13        AD          13 23 33 13 23 33


                                                                                             Number of time instance necessary
                                                                                             to determine a displacement vector        AD      M

                                                                                             N  (2p+1)(2p+1+N-1)                           Displacement
                                                                                             = N  (2p+1)(2p+N)                             Vector
                                                                                                                                                                33
                                                                                               Video Coding Techniques and Hardware Architectures Design
                                                                                             Three-Dimensional Index Space Spawn by
                                                                                                      the Index i, k, and m
Methods and Standards for Lossless Compression
                                                 Department of Electronic Engineering, FJU




                                                                                                                                                           34
                                                                                               Video Coding Techniques and Hardware Architectures Design
                                                                                                                Systolic Array AS2
Methods and Standards for Lossless Compression




                                                                                              Systolic architecture AS2 with processing elements AD, A , and M
                                                                                               derived from the previous DGwith the indexes of input data x ( i , k )
                                                                                               and y(i + m , k + n). The indexes enclosed by the dashed lines belong
                                                                                               to data of one search area line and one reference block.
                                                 Department of Electronic Engineering, FJU




                                                                                                                                                      projection onto
                                                                                                                                                      the i, m plane




                                                                                                                                                                        35
                                                                                                Video Coding Techniques and Hardware Architectures Design
                                                                                                         Systolic Architecture AB2
Methods and Standards for Lossless Compression




                                                                                              Systolic architecture AB2 with the indexes of search area data
                                                                                               y(i + m, k + n). The reference block data x ( i , k) remain fixed
                                                                                               in the PE's AD. The indexes of one search area line data are
                                                                                               enclosed by the dashed line.
                                                 Department of Electronic Engineering, FJU




                                                                                             Projection along the i, k-plane
                                                                                                                                                                   36
                                                                                               Video Coding Techniques and Hardware Architectures Design
                                                                                                                  Processing Element
Methods and Standards for Lossless Compression




                                                                                                                           sad Sign                           1     cin
                                                                                                                            0
                                                                                                 R0        -
                                                                                                           C                    +
                                                                                                            0
                                                                                                                         sad1
                                                                                                 R1        -                                            Ci                  Di
                                                                                                           C                    +                                  Full
                                                 Department of Electronic Engineering, FJU




                                                                                                            1
                                                                                                                         sad2                                     Adder
                                                                                                 R2        -
                                                                                                           C2                   +
                                                                                                                         sad3                     Ri                        Ri
                                                                                                 R3        -
                                                                                                           C                    +
                                                                                                            3
                                                                                                                         sad4
                                                                                                 R4        -
                                                                                                           C                    +                                    cout
                                                                                                            4
                                                                                                                         sad5
                                                                                                 R5        -
                                                                                                           C                    +                      Sign          cin
                                                                                                            5
                                                                                                                         sad6
                                                                                                 R6        -
                                                                                                           C6                   +
                                                                                                                         sad7
                                                                                                                                                                             ˆ
                                                                                                 R7        -
                                                                                                           C                    +                Di
                                                                                                                                                                   Full
                                                                                                                                                                            SADi
                                                                                                            7
                                                                                                           Sign                                 SADi              Adder

                                                                                                C
                                                                                                       -
                                                                                                       R
                                                                                                                   ABS          +     Partial     Ri                        Ri

                                                                                              SAD                                      SAD
                                                                                                                                                                     cout
                                                                                                                                                       Sign

                                                                                                                                                                                   37
                                                                                             Video Coding Techniques and Hardware Architectures Design
                                                                                                                Bit-Level Cell Array
Methods and Standards for Lossless Compression




                                                                                                    mode Cin1   mode Cin2

                                                                                             SAD0
                                                                                              R0
                                                                                                                            AD   AD      AD   AD
                                                 Department of Electronic Engineering, FJU




                                                                                             SAD1
                                                                                              R1
                                                                                                                            AD   AD      AD   AD
                                                                                             SAD2
                                                                                              R2
                                                                                                                            AD   AD      AD   AD
                                                                                             SAD3
                                                                                              R3

                                                                                                                            AD   AD      AD   AD
                                                                                             SAD4
                                                                                              R4

                                                                                             SAD5                           A    A       A    A      M
                                                                                              R5                                                         Motion Vector

                                                                                                                                      4x4 PE array
                                                                                                                                                                 38
                                                                                             Video Coding Techniques and Hardware Architectures Design
                                                                                                     Bit-Level PE Array (Cont.)
Methods and Standards for Lossless Compression




                                                                                                        PE   PE   PE   PE   PE   PE   PE   PE

                                                                                                        PE   PE   PE   PE   PE   PE   PE   PE

                                                                                                        PE   PE   PE   PE   PE   PE   PE   PE
                                                 Department of Electronic Engineering, FJU




                                                                                                        PE   PE   PE   PE   PE   PE   PE   PE

                                                                                                        PE   PE   PE   PE   PE   PE   PE   PE

                                                                                                        PE   PE   PE   PE   PE   PE   PE   PE

                                                                                                        PE   PE   PE   PE   PE   PE   PE   PE

                                                                                                        PE   PE   PE   PE   PE   PE   PE   PE

                                                                                                        PE   PE   PE   PE   PE   PE   PE   PE

                                                                                                        PE   PE   PE   PE   PE   PE   PE   PE

                                                                                                        PE   PE   PE   PE   PE   PE   PE   PE

                                                                                                                                                         39
                                                                                             Video Coding Techniques and Hardware Architectures Design
                                                                                                                        Systolic Array AS1
Methods and Standards for Lossless Compression




                                                                                              Systolic architecture AS1 for N = 3 and p = 2 with the
                                                                                               indexes of search area data y ( i + m, k + n) and
                                                 Department of Electronic Engineering, FJU




                                                                                               reference block data x ( i , k).
                                                                                                       Reference data
                                                                                             .. .. 32 22 12 .. .. .. .. .. 31 21 11
                                                                                             52 42 32 22 12 21 61 51 41 31 21 11          D   D   D   D   D
                                                                                                        Search area data


                                                                                                                                          A   A   A   A   A   Displacement
                                                                                                                                                              Vector

                                                                                                                                      0   M   M   M   M   M     M


                                                                                                                                                                       40
                                                                                                  Video Coding Techniques and Hardware Architectures Design
Methods and Standards for Lossless Compression




                                                                                             Efficient Hybrid Tree/Linear Array
                                                                                                     Aarchitectures for
                                                 Department of Electronic Engineering, FJU




                                                                                             Block-Matching Motion Estimation
                                                                                                         Algorithms
                                                                                                    M.-J.Chen, L.-G. Chen, K.-N.Cheng, M.C.Chen


                                                                                              IEE Proc.-Vis. Image Signal Process., vol. 143,
                                                                                                      no. 4, pp. 217-222, Aug. 1996


                                                                                                                                                            41
                                                                                                Video Coding techniques and Hardware Architectures Design
                                                                                             Illustration of One-Dimensional Full Search
                                                                                                               Algorithm
Methods and Standards for Lossless Compression
                                                 Department of Electronic Engineering, FJU




                                                                                                                                                           42
                                                                                               Video Coding Techniques and Hardware Architectures Design
                                                                                             Tree-Type Array Architecture with N = 4
Methods and Standards for Lossless Compression
                                                 Department of Electronic Engineering, FJU




                                                                                                                                                          43
                                                                                              Video Coding Techniques and Hardware Architectures Design
                                                                                               Hybrid Tree/Linear Architecture
Methods and Standards for Lossless Compression
                                                 Department of Electronic Engineering, FJU




                                                                                                                                                         44
                                                                                             Video Coding Techniques and Hardware Architectures Design
                                                                                                Tree-Cut Technique: Direct Form
Methods and Standards for Lossless Compression
                                                 Department of Electronic Engineering, FJU




                                                                                                                                                         45
                                                                                             Video Coding Techniques and Hardware Architectures Design
                                                                                               Image pel Distribution for Memory
                                                                                                         Interleaving
Methods and Standards for Lossless Compression
                                                 Department of Electronic Engineering, FJU




                                                                                                                                                         46
                                                                                             Video Coding Techniques and Hardware Architectures Design
                                                                                                 Chip Layout and Characteristics
Methods and Standards for Lossless Compression
                                                 Department of Electronic Engineering, FJU




                                                                                                                                                         47
                                                                                             Video Coding Techniques and Hardware Architectures Design
Methods and Standards for Lossless Compression




                                                                                             Analysis and Architecture Design of Variable
                                                                                             Block Size Motion Estimation for H.264/AVC
                                                 Department of Electronic Engineering, FJU




                                                                                                 Ching-Yeh Chen, Shao-Yi Chien, Yu-Wen Huang, Tung-
                                                                                                                Chien Chen, Tu-Chih
                                                                                                             Wang, and Liang-Gee Chen


                                                                                                      IEEE Trans. Circuits Syst. Video Technology



                                                                                                                                                               48
                                                                                                   Video Coding techniques and Hardware Architectures Design
                                                                                                                     Abstract
Methods and Standards for Lossless Compression




                                                                                              Variable block size motion estimation (VBSME) has
                                                                                               become an important video coding technique, but it
                                                                                               increases the difficulty of hardware design.
                                                 Department of Electronic Engineering, FJU




                                                                                              We use inter/intra-level classification and various
                                                                                               data flows to analyze the impact of supporting
                                                                                               VBSME in different hardware architectures.
                                                                                              We propose two hardware architectures, which can
                                                                                               support traditional fixed block size motion estimation
                                                                                               as well as VBSME with the less chip area overhead
                                                                                               compared to previous approaches.

                                                                                                                                                           49
                                                                                               Video Coding Techniques and Hardware Architectures Design
                                                                                                                Abstract (Cont.)
Methods and Standards for Lossless Compression




                                                                                              By broadcasting reference pixel rows and
                                                                                               propagating partial SADs, the first design has the
                                                                                               fewer reference pixel registers and a shorter critical
                                                 Department of Electronic Engineering, FJU




                                                                                               path.
                                                                                              The second design utilizes a 2-D distortion array and
                                                                                               one adder tree with the reference buffer which can
                                                                                               maximize the data reuse between successive
                                                                                               searching candidates.
                                                                                              We demonstrate a 720p, 30fps solution at 108 MHz
                                                                                               with 330.2K gate count and 208K bits on-chip
                                                                                               memory.

                                                                                                                                                            50
                                                                                                Video Coding Techniques and Hardware Architectures Design
                                                                                                                 Introduction (Cont.)
Methods and Standards for Lossless Compression




                                                                                                                    N    N
                                                                                                    SAD(m, n)   Distortion(i, j, m, n),               P  m, n  P,
                                                                                                                    i 1 j 1

                                                                                                    Distortion(i, j, m, n) | cur (i, j )  ref (i  m, j  n) |,
                                                 Department of Electronic Engineering, FJU




                                                                                              The row (column) SAD is the summation of N distortions
                                                                                               in a row (column).
                                                                                              Although FSBMA provides the best quality among
                                                                                               various ME algorithms, it consumes the largest
                                                                                               computation power. In general, the computation
                                                                                               complexity of ME is from 50% to 90% of a typical video
                                                                                               coding system. Hence a hardware accelerator of ME is
                                                                                               required.

                                                                                                                                                                          51
                                                                                                Video Coding Techniques and Hardware Architectures Design
                                                                                                                      VBSME
Methods and Standards for Lossless Compression




                                                                                              Variable block size motion estimation (VBSME) is a
                                                                                               new coding technique and provides more accurate
                                                                                               predictions compared to traditional fixed block size
                                                 Department of Electronic Engineering, FJU




                                                                                               motion estimation (FBSME).
                                                                                              With FBSME, if a MB consists of two objects with
                                                                                               different motion directions, the coding performance of
                                                                                               this MB is worse.
                                                                                              On the other hand, for the same condition, the MB can
                                                                                               be divided into smaller blocks in order to fit the different
                                                                                               motion directions with VBSME.
                                                                                              VBSME has been adopted in the latest video coding
                                                                                               standards, including H.263, MPEG-4, WMV9.0, and
                                                                                               H.264/AVC.
                                                                                                                                                            52
                                                                                                Video Coding Techniques and Hardware Architectures Design
                                                                                                                   VBSME (Cont.)
Methods and Standards for Lossless Compression




                                                                                              In H.264/AVC, a MB with variable block size can be divided into
                                                                                               seven kinds of blocks including 4 × 4, 4 × 8, 8 × 4, 8 × 8, 8 × 16, 16
                                                                                               × 8, and 16 × 16.
                                                 Department of Electronic Engineering, FJU




                                                                                              Although VBSME can achieve higher compression ratio, it not
                                                                                               only requires huge computation complexity but also increases the
                                                                                               difficulty of hardware implementation for ME.
                                                                                              Traditional ME hardware architectures are designed for FBSME,
                                                                                               and they can be classified into two categories.
                                                                                                – One is an inter-level architecture, where each processing
                                                                                                    element (PE) is responsible for one SAD of a specific
                                                                                                    searching candidate.
                                                                                                – The other is an intra-level architecture, where each PE is
                                                                                                    responsible for the distortion of a specific current pixel in the
                                                                                                    current MB for all searching candidates.
                                                                                                                                                                   53
                                                                                                 Video Coding Techniques and Hardware Architectures Design
                                                                                                   Yang, Sun, and Wu’s Architetures
Methods and Standards for Lossless Compression
                                                 Department of Electronic Engineering, FJU




                                                                                              An 1-D inter-level hardware architecture (1DInterYSW).
                                                                                              The number of PEs is equal to the number of searching candidates
                                                                                               in the horizontal direction, 2Ph.
                                                                                              The most important concept is data broadcasting. With
                                                                                               broadcasting technique, the memory bandwidth which is defined
                                                                                               as the number of bits for the required reference data in one cycle is
                                                                                               reduced significantly, although some global routings are required.
                                                                                                                                                                 54
                                                                                                Video Coding Techniques and Hardware Architectures Design
                                                                          Methods and Standards for Lossless Compression
                                                                 Department of Electronic Engineering, FJU

                                                                                                                           Yeo and Hu’s Architectures




Video Coding Techniques and Hardware Architectures Design
                                                            55
                                                                                                        Lai and Chen’s Architeture
Methods and Standards for Lossless Compression
                                                 Department of Electronic Engineering, FJU




                                                                                              Reference pixels are propagated with propagation registers, and
                                                                                               current pixels are broadcasted into PEs.
                                                                                              The partial SADs are still stored and accumulated in PEs.
                                                                                              Besides, 2DInterLC has to load reference pixels into
                                                                                               propagation registers before computing SADs. The latency of
                                                                                               loading reference pixels can be reduced by partitioning the
                                                                                               search range in 2DInterLC.
                                                                                                                                                                 56
                                                                                                Video Coding Techniques and Hardware Architectures Design
                                                                                                 Vos and Stegherr’s Architecture
Methods and Standards for Lossless Compression
                                                 Department of Electronic Engineering, FJU




                                                                                                                                                         57
                                                                                             Video Coding Techniques and Hardware Architectures Design
                                                                                              Vos and Stegherr’s Architecture (Cont.)
Methods and Standards for Lossless Compression




                                                                                              A 2-D intra-level architecture.
                                                                                              The number of PEs is equal to the block size. Each
                                                                                               PE is corresponding to a current pixel. And current
                                                 Department of Electronic Engineering, FJU




                                                                                               pixels are stored in PEs, respectively.
                                                                                              The important concept of 2DIntraVS is the scanning
                                                                                               order in searching candidates, snake scan.
                                                                                              The computation flow is as follows.
                                                                                                – First, the distortion is computed in each PE, and N partial
                                                                                                  row SADs are propagated and accumulated in the horizontal
                                                                                                  direction.
                                                                                                – Second, an adder tree is used to accumulate the N row
                                                                                                  SADs to be SAD. The accumulations of row SADs and SAD
                                                                                                  are done in one cycle. Hence no partial SAD is required to
                                                                                                  be stored.

                                                                                                                                                                58
                                                                                               Video Coding Techniques and Hardware Architectures Design
                                                                                                Komarek and Pirsch’s Architecture
Methods and Standards for Lossless Compression
                                                 Department of Electronic Engineering, FJU




                                                                                             Komarek and Pirsch’s               Hsieh and Lin’s
                                                                                             Architecture
                                                                                                                                                          59
                                                                                              Video Coding Techniques and Hardware Architectures Design
                                                                                             Komarek and Pirsch’s Architecture (Cont.)
Methods and Standards for Lossless Compression




                                                                                              Komarek and Pirsch contributed a detailed systolic
                                                                                               mapping procedure by the dependence graph (DG).
                                                                                               AB2 (2DIntraKP) is a 2-D intra-level architecture.
                                                 Department of Electronic Engineering, FJU




                                                                                              Current pixels are stored in corresponding PEs.
                                                                                               Reference pixels are propagated PE by PE in the
                                                                                               horizontal direction.
                                                                                              The N partial column SADs are propagated and
                                                                                               accumulated in the vertical direction, first.
                                                                                              After the vertical propagation, these N column SADs
                                                                                               are propagated in the horizontal direction.

                                                                                                                                                           60
                                                                                               Video Coding Techniques and Hardware Architectures Design
                                                                                                        Hsieh and Lin’s Architecture
Methods and Standards for Lossless Compression




                                                                                              2DIntraHL consists of N PE arrays in the vertical direction, and
                                                                                               each PE array is composed of N PEs in a row.
                                                                                              In 2DIntraHL, reference pixels are propagated with propagation
                                                 Department of Electronic Engineering, FJU




                                                                                               registers one by one, which can provide the advantages of serial
                                                                                               data input and increasing the data reuse.
                                                                                              Current pixels are still stored in PEs. The N partial column SADs
                                                                                               are propagated in the vertical direction from bottom to up.
                                                                                              In each computing cycle, each PE array generates N distortions
                                                                                               of a searching candidate and accumulates these distortions with
                                                                                               N partial column SADs in the vertical propagation.
                                                                                              After the accumulation in the vertical direction, N column SADs
                                                                                               are accumulated in the top adder tree in one cycle. The longer
                                                                                               latency for loading reference pixels and large propagation
                                                                                               registers are the penalties for the reduction of memory bandwidth
                                                                                               and memory bandwidth.

                                                                                                                                                               61
                                                                                                Video Coding Techniques and Hardware Architectures Design
                                                                                                Proposed Propagate Partial SAD
Methods and Standards for Lossless Compression
                                                 Department of Electronic Engineering, FJU




                                                                                                                                                         62
                                                                                             Video Coding Techniques and Hardware Architectures Design
                                                                                              Proposed Propagate Partial SAD (Cont.)
Methods and Standards for Lossless Compression




                                                                                              The architecture is composed of N PE arrays with 1-D adder tree
                                                                                               in the vertical direction.
                                                                                              Current pixels are stored in each PE, and two sets of N
                                                 Department of Electronic Engineering, FJU




                                                                                               continuous reference pixels in a row are broadcasted to N PE
                                                                                               arrays at the same time.




                                                                                                                                                             63
                                                                                                Video Coding Techniques and Hardware Architectures Design
                                                                                              Data Flow of Propagate Partial SAD
Methods and Standards for Lossless Compression
                                                 Department of Electronic Engineering, FJU




                                                                                                                                                         64
                                                                                             Video Coding Techniques and Hardware Architectures Design
                                                                          Methods and Standards for Lossless Compression
                                                                 Department of Electronic Engineering, FJU

                                                                                                                           Proposed SAD Tree




Video Coding Techniques and Hardware Architectures Design
                                                            65
                                                                                                Scan Order and Memory Access
Methods and Standards for Lossless Compression
                                                 Department of Electronic Engineering, FJU




                                                                                                                                                         66
                                                                                             Video Coding Techniques and Hardware Architectures Design
                                                                                             Variable Block Size Motion Estimation
Methods and Standards for Lossless Compression
                                                 Department of Electronic Engineering, FJU




                                                                                                                                                         67
                                                                                             Video Coding Techniques and Hardware Architectures Design
                                                                                             The Impact of Variable Block Size Motion
                                                                                               Estimation in Hardware Architectures
Methods and Standards for Lossless Compression




                                                                                              There are many methods to support VBSME in
                                                                                               hardware architectures.
                                                                                              For example, we can increase the number of PEs or
                                                 Department of Electronic Engineering, FJU




                                                                                               the operating frequency to do ME for different block
                                                                                               sizes, respectively. One of them is to reuse the SADs
                                                                                               of the smallest blocks, which are the blocks partitioned
                                                                                               with the smallest block size, to derive the SADs of
                                                                                               larger blocks.
                                                                                              By this method, the overhead of supporting VBSME is
                                                                                               only the slight increase of gate count, and the other
                                                                                               factors, such as frequency, hardware utilization,
                                                                                               memory usage, and so on, are the same as those of
                                                                                               FBSME.
                                                                                                                                                            68
                                                                                                Video Coding Techniques and Hardware Architectures Design
                                                                                               Data Flow I–Storing in PEs (Inter-Level
                                                                                                           Architecture)
Methods and Standards for Lossless Compression
                                                 Department of Electronic Engineering, FJU




                                                                                                        FBSME, N = 16             VBSME, N = 16, n = 4

                                                                                              The number of bits for the data buffer in each PE is increased
                                                                                               from log2N2+8 to n2×(log2(N/n)2+8), where N2 and (N/n)2 are the
                                                                                               number of pixels in one block, and 8 is the wordlength of one
                                                                                               pixel.                                                            69
                                                                                                Video Coding Techniques and Hardware Architectures Design
                                                                                             Data Flow II–Propagating with Propagation
                                                                                                Registers (Intra-Level Architecture)
Methods and Standards for Lossless Compression




                                                                                              In intra-level architectures, partial SADs can be
                                                                                               accumulated and propagated with propagation registers.
                                                                                              Each PE computes the distortion of one corresponding
                                                 Department of Electronic Engineering, FJU




                                                                                               current pixel in current MB.
                                                                                              By propagation adders and registers, the partial SAD is
                                                                                               accumulated with these distortions.
                                                                                              When supporting VBSME, more propagation registers
                                                                                               are required to store partial SADs of the smallest blocks.
                                                                                               In each propagating direction, the number of
                                                                                               propagation registers are n times of that in the original
                                                                                               for the n smallest blocks in the other direction.

                                                                                                                                                            70
                                                                                                Video Coding Techniques and Hardware Architectures Design
                                                                                             The Proposed Propagate Partial SAD
                                                                                                Architecture with Data Flow II
Methods and Standards for Lossless Compression
                                                 Department of Electronic Engineering, FJU




                                                                                                                                                         71
                                                                                             Video Coding Techniques and Hardware Architectures Design
                                                                                                  Data Flow III–No Partial SADs
Methods and Standards for Lossless Compression
                                                 Department of Electronic Engineering, FJU




                                                                                                The proposed SAD Tree architecture with Data Flow III,
                                                                                                where N = 16 and n = 4.
                                                                                                                                                         72
                                                                                             Video Coding Techniques and Hardware Architectures Design
                                                                                                Data Flow III–No Partial SADs (Cont.)
Methods and Standards for Lossless Compression




                                                                                              In intra-level architectures, it is possible that no partial SADs are
                                                                                               required to be stored, such as SAD Tree.
                                                                                              Each PE computes the distortion of one current pixel for a
                                                 Department of Electronic Engineering, FJU




                                                                                               searching
                                                                                               candidate, and the total SAD is accumulated by an adder tree in
                                                                                               one cycle, as shown in Fig. 5(a).
                                                                                              Because there is no partial SAD in this architecture, there is no
                                                                                               registers overhead to store partial SADs when supporting
                                                                                               VBSME.
                                                                                              The adder tree is the one to be reorganized to support VBSME
                                                                                              That is, we partition the 2-D adder tree in order to get the SADs
                                                                                               of the smallest blocks first, and then based on these SADs, to
                                                                                               derive the SADs of large blocks. Although there is no additional
                                                                                               register overhead, the adder tree additions required to support
                                                                                               VBSME do require additional area,
                                                                                                                                                                       73
                                                                                                Video Coding Techniques and Hardware Architectures Design
                                                                                             THE PARALLELISM, CYCLES, LATENCY, AND
                                                                                                 DATA FLOW OF EIGHT HARDWARE
                                                                                                        ARCHITECTURES
Methods and Standards for Lossless Compression
                                                 Department of Electronic Engineering, FJU




                                                                                                                                                           74
                                                                                               Video Coding Techniques and Hardware Architectures Design
                                                                                             THE DATA BUFFER AND MEMORY BITWIDTH
                                                                                              OF EIGHT HARDWARE ARCHITECTURES
Methods and Standards for Lossless Compression
                                                 Department of Electronic Engineering, FJU




                                                                                                                                                           75
                                                                                               Video Coding Techniques and Hardware Architectures Design
                                                                                                                     An Example
Methods and Standards for Lossless Compression




                                                                                              The specifications of ME are as follows. The MB size is 16×16,
                                                                                               and the search range is Ph = 64 and Pv = 32.
                                                                                              The frame size is D1 size, 720 × 480.
                                                 Department of Electronic Engineering, FJU




                                                                                              When VBSME is supported, a MB can be partitioned at most to
                                                                                               16 4×4 blocks.
                                                                                              We use Verilog-HDL and SYNOPSYS Design Compiler with
                                                                                               ARTISAN UMC 0.18um cell library to implement each hardware
                                                                                               architecture.
                                                                                              Because the timing of the critical path in some architectures is
                                                                                               too long, which means the maximum operating frequency is
                                                                                               limited without modifying the architecture, the frame rate is set
                                                                                               as only 10 frames per second (fps).


                                                                                                                                                                   76
                                                                                                Video Coding Techniques and Hardware Architectures Design
                                                                                                      Area and Required Frequency
Methods and Standards for Lossless Compression
                                                 Department of Electronic Engineering, FJU




                                                                                              Among these eight hardware architectures, all inter-level
                                                                                               architectures with Data Flow I increase gate count dramatically.
                                                                                               The chip area is five times of that in FBSME at least.
                                                                                                                                                                  77
                                                                                                Video Coding Techniques and Hardware Architectures Design
                                                                                                                     Latency
Methods and Standards for Lossless Compression




                                                                                              The latency is defined as the number of start-up cycles
                                                                                               that a hardware takes to generate the first SAD.
                                                                                              If a module has a long latency and it cannot be
                                                 Department of Electronic Engineering, FJU




                                                                                               shortened by parallel architectures, the effect of
                                                                                               parallel computation is reduced. That is, a shorter
                                                                                               latency is better for video coding systems.
                                                                                              There are two factors to affect the latency.
                                                                                                – Hardware architecture
                                                                                                – Memory bandwidth
                                                                                              Compared to these hardware architectures, the other
                                                                                               intra-level architectures, such as proposed Propagate
                                                                                               Partial SAD and SAD Tree, have shorter latencies. 78
                                                                                               Video Coding Techniques and Hardware Architectures Design
                                                                                                                     Utilization
Methods and Standards for Lossless Compression




                                                                                              In general, inter-level architectures can continuously
                                                                                               compute MB by MB, so the initial cycles can be
                                                                                               neglected and the utilization will be 100%.
                                                 Department of Electronic Engineering, FJU




                                                                                              Therefore, we defined the utilization as Computing
                                                                                               cycles / Operating cycles for a MB.
                                                                                              The operating cycles include three parts, latency,
                                                                                               computing cycles, and bubble cycles. Computation
                                                                                               cycles are the number of cycles when we can get one
                                                                                               SAD at least. That is, if the utilization is 100%, we can
                                                                                               get one SAD in each cycle at least. Fewer operating
                                                                                               cycles will less the penalty of the latency be apparent.
                                                                                              The more bubble cycles are, the lower the utilization is.
                                                                                                                                                            79
                                                                                                Video Coding Techniques and Hardware Architectures Design
                                                                                                                Memory Usage
Methods and Standards for Lossless Compression




                                                                                              Memory usage consists of two parts, memory bitwidth and
                                                                                               memory bandwidth.
                                                                                              Memory bitwidth is defined as the number of bits which a
                                                 Department of Electronic Engineering, FJU




                                                                                               hardware has to access from memory in each cycle, and
                                                                                               memory bandwidth is re-defined as the number of bits
                                                                                               which a hardware has to access from memory for a MB.
                                                                                              Memory bandwidth affects the loading of system bus
                                                                                               without on-chip memory or the power of on-chip memory,
                                                                                               and memory bitwidth is the key to the data arrangement of
                                                                                               on-chip memories.
                                                                                              Memory bitwidth and bandwidth are affected by the data
                                                                                               reuse scheme and operating cycles.
                                                                                                                                                            80
                                                                                                Video Coding Techniques and Hardware Architectures Design
                                                                                                                  Hexagonal Plot
Methods and Standards for Lossless Compression




                                                                                              The closer the point is to the
                                                                                               center, the worse the
                                                                                               performance is.
                                                 Department of Electronic Engineering, FJU




                                                                                              Note that, in various video
                                                                                               coding systems or hardware
                                                                                               system platforms, the
                                                                                               weighting of each axis will be
                                                                                               very different.
                                                                                              We can use these hexagonal
                                                                                               plots to select the optimal
                                                                                               architecture based on
                                                                                               different constraints for the
                                                                                               system integration.
                                                                                                                                                            81
                                                                                                Video Coding Techniques and Hardware Architectures Design
                                                                          Methods and Standards for Lossless Compression
                                                                 Department of Electronic Engineering, FJU

                                                                                                                           Hexagonal Plots




Video Coding Techniques and Hardware Architectures Design
                                                            82
                                                                          Methods and Standards for Lossless Compression
                                                                 Department of Electronic Engineering, FJU

                                                                                                                           Hexagonal Plots




Video Coding Techniques and Hardware Architectures Design
                                                            83
                                                                          Methods and Standards for Lossless Compression
                                                                 Department of Electronic Engineering, FJU

                                                                                                                           Hexagonal Plots




Video Coding Techniques and Hardware Architectures Design
                                                            84
                                                                          Methods and Standards for Lossless Compression
                                                                 Department of Electronic Engineering, FJU

                                                                                                                           Hexagonal Plots




Video Coding Techniques and Hardware Architectures Design
                                                            85
                                                                                               Hardware Architecture of H.264 Integer
                                                                                                        Motion Estimation
Methods and Standards for Lossless Compression




                                                                                              Based on the above analysis, we propose a ME hardware for
                                                                                               H.264/AVC integer-pixel motion estimation (IME) as an example.
                                                                                              Our specification is that two frame sizes are supported in our
                                                 Department of Electronic Engineering, FJU




                                                                                               specification.
                                                                                                – One is D1 Format with four reference frames, 30 fps. In
                                                                                                  the previous frame, the search range is [-64,64) and [-
                                                                                                  32,32) in the horizontal and vertical directions. In the
                                                                                                  rest frames, the search range is [-32,32) and [-16,16)
                                                                                                  in the horizontal and vertical directions.
                                                                                                – The other is 720p with one reference frame, 30 fps.
                                                                                                  The search range is the same as that of the previous
                                                                                                  frame in D1 Format.
                                                                                                                                                                86
                                                                                                Video Coding Techniques and Hardware Architectures Design
                                                                                               Hardware Architecture of H.264 Integer
                                                                                                    Motion Estimation (Cont.)
Methods and Standards for Lossless Compression




                                                                                              In our specification, the computation complexity of H.264 is 2.4
                                                                                               tera instructions per second and 3.8 tera bytes per second in D1
                                                                                               Format and dominated by IME, which is estimated by instruction
                                                                                               profiling of reference software, JM7.3.
                                                 Department of Electronic Engineering, FJU




                                                                                              The ultra large computation complexity can be solved by the
                                                                                               parallel computation, but the huge external memory bandwidth
                                                                                               can not. Therefore, the huge memory bandwidth is a difficult
                                                                                               challenge for hardware design.
                                                                                              There are still two problems.
                                                                                                – First, because of VBSME and Lagrangian mode decision,
                                                                                                   the data dependency of motion vector predictor prohibits
                                                                                                   from the parallel computation between the smaller blocks in
                                                                                                   a MB.
                                                                                                – Secondly, when the high processing ability is necessary, the
                                                                                                   hardware cost of ME hardware architectures with high
                                                                                                   degrees of parallelism is also required to be discussed.
                                                                                                                                                                  87
                                                                                                Video Coding Techniques and Hardware Architectures Design
                                                                                                              Modified Algorithm
Methods and Standards for Lossless Compression




                                                                                              First, we divide the computation of ME into two parts,
                                                                                               integer-pixel ME and fractional-pixel ME (FME), and
                                                                                               propose two individual hardware accelerators for IME
                                                 Department of Electronic Engineering, FJU




                                                                                               and FME, respectively. The utilization of hardware
                                                                                               accelerators can be significantly improved by this way.
                                                                                              Second, in the original Lagrangian mode decision, the
                                                                                               MV predictor of a block is the medium MV among the
                                                                                               MVs of top, top-right, left neighboring 4×4 blocks but in
                                                                                               the parallel computation of hardware architectures, the
                                                                                               coding modes of the neighboring 4×4 blocks can not
                                                                                               be decided in parallel, especially when the block size
                                                                                               is 4×4.
                                                                                                                                                            88
                                                                                                Video Coding Techniques and Hardware Architectures Design
                                                                                             The motion vector predictor for (a) the 4×8 block,
                                                                                             (b) the 16×16 block, and (c) the modified motion
                                                                                                       vector predictor for all blocks.
Methods and Standards for Lossless Compression
                                                 Department of Electronic Engineering, FJU




                                                                                                                                                            89
                                                                                                Video Coding Techniques and Hardware Architectures Design
                                                                                             Hardware Architecture with M-parallelism
Methods and Standards for Lossless Compression




                                                                                              In our specification, we require eight sets of Propagate Partial
                                                                                               SAD or SAD Tree to achieve the realtime computation.
                                                                                              Eight sets of Propagate Partial SAD and SAD Tree, which can
                                                 Department of Electronic Engineering, FJU




                                                                                               process eight successive candidates in a row at the same time,
                                                                                               are combined as Eight-Parallel Propagate Partial SAD and Eight-
                                                                                               Parallel SAD Tree, respectively.




                                                                                                                                                             90
                                                                                               Video Coding Techniques and Hardware Architectures Design
                                                                                             Hardware Architecture of H.264 Integer
                                                                                                      Motion Estimation.
Methods and Standards for Lossless Compression
                                                 Department of Electronic Engineering, FJU




                                                                                                                                                         91
                                                                                             Video Coding Techniques and Hardware Architectures Design
                                                                                             Comparison of RD Curves Between JM7.3
                                                                                                  and Our Proposed Encoder
Methods and Standards for Lossless Compression
                                                 Department of Electronic Engineering, FJU




                                                                                                                                                           92
                                                                                               Video Coding Techniques and Hardware Architectures Design
                                                                                                Memory Reduction of H.264 IME
Methods and Standards for Lossless Compression
                                                 Department of Electronic Engineering, FJU




                                                                                                                                                         93
                                                                                             Video Coding Techniques and Hardware Architectures Design

								
To top