EE 5359

Document Sample
EE 5359 Powered By Docstoc
					Swaminathan Sridhar      EE 5359 Project          AVS to VP6

                      EE 5359 SPRING
                INSTRUCTOR: Dr. K. R. RAO

           AVS China to VP6 transcoder

                                      SWAMINATHAN SRIDHAR
                                                MS EE, UTA

1 | Page
Swaminathan Sridhar             EE 5359 Project                  AVS to VP6


I would sincerely like to thank Dr. Rao for his constant support and guidance
throughout the duration of my project.
I would also like to thank Dr. Yong Li and Mr. Cui Bin for assisting me in my

2 | Page
Swaminathan Sridhar                  EE 5359 Project          AVS to VP6

                                 List of Acronyms

AVS                   Audio Video Standard
B-Frame               Interpolated frame
CAVLC                 Context Based Variable Length Coding
CIF                   Common Intermediate format
DIP                   Direct Inter prediction
EOB                   End of Block
HD                    High Definition
ICT                   Integer Cosine Transform
I-Frame               Intra frame
MB                    Macro Block
ME                    Motion Estimation
MPEG                  Moving Picture Experts Group
MV                    Motion Vector P-Frame Predicted Frame
PIT                   Pre Scaled Transform
QCIF                  Quarter Common Intermediate Format
SD                    Standard Definition
VLC                   Variable Length Coding

Project Proposal
Title: AVS China to VP6 transcoder

3 | Page
Swaminathan Sridhar              EE 5359 Project                     AVS to VP6

True motion VP6 developed by On2 Technologies is one of the best video codecs
available on the market today. It offers better image quality and faster decoding
schemes than Windows Media 9 video, Real 9 video, H.264 and Quick Time
MPEG-4 video codecs. AVS China is a new streamlined and highly efficient video
coder developed by China employing the latest coding tools and dedicated to
coding HDTV content. AVS applications include broadcast TV, HD-DVD and
broadband video networking. Hence there is an increasing importance for
transmitting the AVS coded signals over the internet and one of the possible ways
of achieving this is by developing an AVS to VP6 transcoder. The proposal is
submitted to successfully develop a transcoder with reduced computational
complexity by using the available reference material to study the various
transcoding algorithms and implement them effectively.

Student: Swaminathan Sridhar
Student ID: 1000612948
Date: February 19, 2009

4 | Page
Swaminathan Sridhar               EE 5359 Project                    AVS to VP6

Current Research
1.a An Overview of AVS Coding Standard
At present there are four audio and video coding standards namely MPEG-2,
MPEG-4 part 2 visual, MPEG-4 part 10 and AVS China. Based on the coding
efficiencies MPEG-4 is nearly 1.4 times MPEG-2 and AVC, AVS is more than
twice of MPEG-2 [5]. AVC is only a one video coding standard where as AVS
China comprises a set of standard system of systems, audio, video and media
copyright management and thus evolves as a second generation source standard.
On 30th April, 2005 AVS standard video part was approved as the China’s
national standard [2]. The three main characteristics of AVS China are that, it is
technically an advanced second generation source coding standard and is totally
controlled and formulated by China. At present AVS China is being used in IP
television wherein TV programs are transmitted over the IP protocols and is also
being tested for Chinese mobile multimedia broadcasting [2]. AVS China primarily
aims at providing high definition and high quality video services. Since the basic
syntax structure of AVS China is very similar to that of MPEG-2 standard it can be
easily used in the present widely used MPEG-2 systems except for the fact that it
has a higher coding efficiency [5]. This refers that AVS China is compatible with
the existing MPEG-2 systems and has an architecture model very similar to that of
H.264 codec [5]. AVS China has a coding efficiency similar to that of H.264
except that it has lower computational complexity. AVS China has been divided
into various parts and thus dividing the AVS China architecture into various sub
fields. The different parts of AVS China are as follows [1],

AVS parts                                 Contents
Part 1                                    System for broadcasting
Part 2                                    SD/HD video
Part 3                                    Audio
Part 4                                    Conformance test
Part 5                                    Reference software
Part 6                                    Digital right management
Part 7                                    Mobility video
Part 8                                    System over IP
Part 9                                    File format

5 | Page
Swaminathan Sridhar               EE 5359 Project                      AVS to VP6

1.b Data formats [3]
1.b.i Progressive scan format
 AVS codes data in progressive scan format. This format is compatible with all
content that originates in film and can accept inputs directly from progressive
telecine machines [3]. It is also compatible with the emerging new standard known
as “24p” that would be the future digital film standard. AVS codes progressive
content at higher frame rates which would be apt for televised sports. One of the
benefits of the progressive scan format is the efficiency with which motion
compensation technique operates on the progressive scan format. Progressive scan
content can be coded at a significantly lower bit rate than the interlaced content
with the same image quality and further more the motion compensation technique
of the progressive scan format involves less complexity than the interlaced data
format [3]. This is one of the major advantages of the AVS coding technique.
1.b.ii Interlaced scan format
AVS also supports the coding tools for interlaces scan format.

                         Figure 1.a Different scan formats

1.c Picture format [3]
AVS application is primarily focused on broadcast TV with an emphasis on HDTV
format which is a 1080p format. Since it is a generic standard it can actually code
pictures with a rectangular format of up to 16K x 16K pixels in size [3]. Pixels are
6 | Page
Swaminathan Sridhar             EE 5359 Project                   AVS to VP6

coded in standard YUV format i.e. YUV 4:2:0 formats. AVS supports 4:2:0, 4:2:2
and chroma formats.

Figure 1.b A standards YUV color plane with Y=0.5 represented within RGB
                              color format

7 | Page
Swaminathan Sridhar                 EE 5359 Project                   AVS to VP6

1.c Data structure [3]

                      Figure 1.c AVS layered data structure [3]
In Fig.1.c AVS implements a layered data structure consisting of the Sequence at
the highest data layer followed by Picture/Frame, Slice, Macro block, Block. The
sequence, picture and slice begin with unique start codes that allow the decoder to
find them within a bit stream as shown in Fig.1.d.

                       Figure 1.d video sequence example [3]

8 | Page
Swaminathan Sridhar                  EE 5359 Project                       AVS to VP6

1.c.i Sequence
 Sequence layer provides an entry point on to the coded video. Sequence headers
should be placed in the bit stream to support the appropriate

transmission of video. Repeat sequence headers may be inserted to provide random
access and these are terminated with sequence end code [3].
1.c.ii Picture

Three types of pictures are defined by AVS namely
           Intra pictures (I)
           Predicted pictures (P)- At most two reference frames (P or I)
           Interpolated pictures (B)- two reference frames (I or P or both)

                            Figure 1.e I,P,B frame format
1.c.iii Slice
The slice structure provides the lowest-layer mechanism for re-synchronizing the
bit stream in case of transmission error. Slices comprise an arbitrary number of
raster-ordered rows of macro blocks. Slices must be contiguous, must begin and
terminate at the left and right edges of the picture and must not overlap. it is
possible for a single slice to cover the entire picture. The slice structure is optional.
Slices are independently coded – no slice can refer to another slice during the
decoding process.

9 | Page
Swaminathan Sridhar              EE 5359 Project                     AVS to VP6

                            Figure 1.f Slice layer [3]
1.c.iv Macro Blocks
A macro block contains luminance and chrominance pixels that represent a 16x16
sized picture. In 4:2:0 format the chrominance pixels are sub-sampled by a factor
of 2 and henceforth in this format each chrominance pixel contains one 8x8 block.
In 4:2:2 format each chrominance pixel is sub-sampled by a factor of 2 in the
horizontal direction and hence each chrominance pixel contains two 8x8 blocks.
This formatted block structure can be is shown in the Fig.1.g.

10| Page
Swaminathan Sridhar                EE 5359 Project                       AVS to VP6

                        Figure 1.g Macro Block format [3]
1.c.v Block
A block is the smallest unit of a data structure. It contains transform coefficient
data for prediction errors. For an instance in an Intra predicted block, intra
prediction is performed from the neighboring blocks [3].
2.a The family of AVS-video
So far there have been four profiles defined namely the Jizhun (base) profile, Jiben
(basic) profile, Shenzhan (extended profile) and Jiaqiang (enhanced) profile.
2.a.i AVS-video
Jizhun profile (base profile) Jizhun profile is defined as the first profile in the
national standard ofAVS-Part2 [10], approved as national standard in 2006, which
mainly focuses on digital video applications like commercial broadcasting and
storage media, including high-definition applications. Typically, it is preferable for
high coding efficiency on video sequences of higher resolutions, at the expense of
moderate computational complexity.

11| Page
Swaminathan Sridhar               EE 5359 Project                     AVS to VP6

2.a.ii AVS-video Jiben profile (basic profile)
Jiben profile is defined in AVS-Part7 [2] targeting to mobile video applications
featured with smaller picture resolution. Thus, computational complexity becomes
a critical issue. In addition, the ability on error resilience is needed due to the
wireless transporting environment. AVS-Part7 reached to final committee draft at
the end of 2004.
2.a.iii AVS-Shenzhan profile(extended profile)
The standard of AVS-Shenzhan focuses exclusively on solutions of standardizing
the video surveillance applications. Especially, there are special features of
sequences from video surveillance, i.e. the random noise appearing in pictures,
relatively lower encoding complexity affordable, and friendliness to events
detection and searching required, so corresponding techniques considering a proper
process on these special features will be encouraged in the condition of
compatibility to AVS-Part2.
2.4. AVS-Jiaqiang profile (enhanced profile)
 To fulfill the needs of multimedia entertainment, one of the major concerns of
Jiaqiang profile is movie compression for high-density storage. Relatively higher
computational complexity can be tolerated at the encoder side to provide higher
video quality, with compatibility to AVS-Part2 as well.

        Figure 2.a Different profiles of AVS and their applications [11]

12| Page
Swaminathan Sridhar               EE 5359 Project                      AVS to VP6

Figure 2.b Summary of AVS profiles and their performance [11]
3.a An overview on AVS encoder
As the appearance of the new video coding standards such as AVS, H.264 the
improvement of the encoder’s performance is greatly influenced by an increasing
complexity in the design. Like other video coding standards (H.264/MPEG-4 part
10), AVS has adopted the hybrid video coding technology based on the block
motion estimation/compensation [9]. Basically input pictures can be coded in intra
(I), inter (P), or bidirectional modes (B). To improve the coding efficiency, motion
estimation and motion compensation technology have been implemented in AVS.
These techniques allow to support up to 2 reference frames and a large number of
block sizes (16x16, 16x8, 8x16, 8x8), quarter resolution pixel on a finite impulse
response filtering (FIR), a purely integer spatial transform and quantization, the
distortion and rate model optimization and the highly efficient variable length
coding (VLC) technique [9].

13| Page
Swaminathan Sridhar               EE 5359 Project                    AVS to VP6

                          Figure 3.a AVS encoder [3]
4.a. Key techniques on coding efficiency in AVS-video
4.a.i Intra-frame prediction
 Intra-frame prediction (intra-prediction) uses decoded information in the current
frame as the reference of prediction, exploiting statistical spatial dependencies
between pixels within a picture. If MBPAFF is applied, intra-frame prediction can
only take the macro blocks within the same stage as reference.
4.a.ii Intra-prediction
The technique of 8X8 intra- prediction in AVS-video allows five prediction modes,
DC, horizontal, vertical, down left and downright, for the luminance component
and four prediction modes, DC, horizontal, vertical and plane, for chrominance
components. Each of the four 8X8 luminance blocks can be predicted using one of
the five intra-prediction modes. A head of prediction of DC mode
14| Page
Swaminathan Sridhar                EE 5359 Project                       AVS to VP6

(Mode2),diagonal down left (Mode3) mode and diagonal downright mode(Mode
4), at three-tap low-pass filter(1,2,1) is applied on the samples that will be used as
references of prediction. It needs to be pointed out that in DC mode each pixel of
current block is predicted by an average of the vertically and horizontally
corresponding reference pixels. Hence, the prediction values of different pixels in a
block might be different [12]. This results in a fine prediction for a large block.
Prediction of the most probable mode is according to the intra-prediction modes of
neighboring blocks. This will help to reduce average bits needed in describing the
intra-prediction mode in video bit stream.
4.a.iii Intra-prediction
In lower resolution applications, smaller block size will lead to better coding
efficiency, so that AVS-video also defines 4X4 intra prediction. Some specific
techniques are working together with 4X4 intra-predictions, such as direct intra-
prediction (DIP), padding before prediction (PBP) and simplified chrominance
intra prediction (SCI). Prediction of most probable mode from neighboring blocks
is also used. Fig. 6 illustrates all the available directional modes for4X4 intra-
prediction for both luminance component and chrominance components. One flag
at macro block level indicates the use of DIP [13, 14].

                  Figure 4.a Non sampling (NS) block pair [11]

15| Page
Swaminathan Sridhar               EE 5359 Project                      AVS to VP6

                  Figure 4.b Vertical sampling block pair [11]
If one macro block is marked as DIP-mode, it infers that each of the16 luminance
4X4 sub-blocks in this macro block takes the most probable mode as its intra-
prediction mode, even though the intra-prediction mode for each4X4 sub-block
might be different, and no more mode information is transmitted in bit stream. PBP
is applied for both luminance and chrominance components, during which the
reference pixel r5, r6, r7 and r8 are padded from r4, and c5, c6, c7 and c8 are
padded from c4, so as to skip conditional test of availability of up-right and down-
left reference pixels. SCI means that only DC, vertical and horizontal modes are
available for chrominance components [15].
4.a.iv Adaptive block size of intra prediction
Adaptive intra prediction enables 4X4 intra prediction to be applied along with
8X8 intra prediction, using an indicator at macro block header [16]. Besides,
mapping is needed between the modes used in4X4 intra prediction and that in 8X8
intra prediction before the prediction of most probable mode, if the current block
and its neighboring blocks using different block-sized intra-prediction
4.a.v Inter-frame prediction
To remove temporal redundancy in video sequence, inter-frame prediction (inter-
prediction) predicts from previously decoded frames/fields. A number of
techniques jointly contribute to coding efficiency of inter-prediction in AVS-video.

16| Page
Swaminathan Sridhar                 EE 5359 Project                        AVS to VP6

4.b P-prediction and bi-prediction
P-prediction restricts the reference of prediction to the decoded pictures should be
previous to the current coding picture in display order. It uses one motion vector
and one reference index to locate the reference block. The motion-compensated
prediction includes five macro block modes, with partitioning down to 8X8 blocks.
Bi-prediction stands for the inter-frame prediction from the forward and backward
decoded reference pictures in display order. It enables two motion vectors and
reference indices to locate the reference block. Direct prediction and symmetric
prediction are the two unique techniques of bi-prediction in AVS-video. In direct
prediction [17], both forward and backward motion vectors of current block are
derived from the motion vector of its collocated block in the backward reference
according to the temporal block distance between predicted and reference blocks.
In symmetric prediction [18], forward motion vector needs to be transmitted for
each partition of current macro block, while backward motion vector is conducted
from the forward motion vector by asymmetric rule

4.c Interpolation
Since the motion-compensated prediction in AVS-video allows motion vector
accuracy down to one-quarter pixel, corresponding reference pixel values of
fractional motion vectors are obtained by sub-pixel interpolation. Default sub-pixel
interpolation in AVS-video is called as two steps four taps (TSFT) interpolation
[19] and three kinds of filters are applied, respectively, on to sub-pixels of different

17| Page
Swaminathan Sridhar            EE 5359 Project                  AVS to VP6

Fig. 5.a Modes for luminance component of 8X8 intra-prediction (a)
directions and neighbor pixels used as reference in 8X8 intra-prediction for
luminance component (b) mode 0, (c) mode1, (d) mode 3, (e) mode 2, (f) mode
4 [11]

18| Page
Swaminathan Sridhar          EE 5359 Project                 AVS to VP6

Figure 5.b Simplified 4X4 intra-prediction. (left: directional modes for
luminance component; right: directional modes for chrominance components

Figure 5.c Mapping between 4X4 intra-prediction modes and 8X8 intra-
prediction modes in AVS-video [11]

19| Page
Swaminathan Sridhar             EE 5359 Project                 AVS to VP6

       Figure 5.d Macro block modes of P-prediction in AVS-video [11]

                  Figure 5.e Example for symmetric mode [11]

20| Page
Swaminathan Sridhar             EE 5359 Project                    AVS to VP6

Figure 5.f Positions of integer pixels (indicated by upper case letters), half-
pixels and quarter-pixels (indicated by lower case letters) [11]

21| Page
Swaminathan Sridhar                 EE 5359 Project                       AVS to VP6

Figure 5.g Locations of current block and its neighboring blocks in motion
vector prediction [11]
The two steps four taps interpolation applies a filter of (-1,5,5, -1) to get the half-
pel reference pixel values as the first
step and a filter of (1,7,7,1)is applied for quarter-pel reference pixel values either
horizontally or vertically as the second step. The exception of the second step is
that for quarter-pel reference pixel values of e, g, p, r diagonal bilinear filter is
used. Additionally, adaptive interpolation filter [20] is available which may
involve in higher computational complexity. For compatibility, the default
interpolation filters are firstly applied to get the initially filtered half-pixels.
Afterwards, all half-pixels and quarter-pixels are resulted from adaptive filters that
are transmitted to the decoder slice by slice.
5.a Transform and quantization algorithms
5.a.i Transform algorithm
In AVS, the equation of forward transform is defined as [21]
Where HXHT is a “core” 2-D transform, H is a forward transform matrix. The
coefficient of 1-D forward transform H is shown in Fig. 6.a. And it is obvious that
H can be realized by the 1-D forward transform implementation. The transform
matrix contains only integer coefficients, so that it can be implemented by using
only addition and shift operations.

22| Page
Swaminathan Sridhar               EE 5359 Project                    AVS to VP6

Figure 6.a Forward transform matrix [21]
5.b Inverse transform algorithm
The inverse transform include four steps [21]:
A. W' =C ×HiT
Where Hi is an inverse transform matrix, HiT is its transpose matrix, W’ is the
result of 1-D inverse transform. The coefficient of 1-D inverse transform Hi is
shown in Fig. 6.b

Figure 6.b Hi matrix [21]

Where W’’’ is a result after 2-D inverse transform.

5.c Quantization algorithm
In AVS, the quantization step size can be varied and controlled by the value of a
quantization parameter (QP). QP varies from 0 to 63 in increments of one, and for
every increase of eight in quantization parameter (QP) the value of quantization
step size will be doubled. The scale matrix M is incorporated in the quantization.

23| Page
Swaminathan Sridhar                EE 5359 Project                      AVS to VP6

And the mechanisms of quantizer are complicated because of the requirements of
avoiding division and floating point arithmetic. The forward quantization is
defined as [21]

Where F is the unscaled coefficients after forward transform; (u,v) represent the
row and column; M represents the scaling factor of forward transform;
L=15+floor (QP/8); QC is the quantization factor
5.c Inverse quantization algorithm
The inverse quantization is defined as [21]:

Where Fiq is the transformed coefficients after inverse quantization; (i,j) represent
the row and column; QM represents the coefficients after quantization; D is the
inverse quantization factor table; S is the number of shift [21].
5.d Transform implementation
There are numerous ways to implement matrix multiplications in hardware. The
overall consideration is to reduce computation time. AVS supports 8-bits pixel data,
but it is likely that this will be extended in the future, that is why this
implementation uses 16-bit pixel data [22]. The transform matrix contains only
integer coefficients so that the matrix multiplications can be implemented purely
with addition/subtraction and shift operations. 2-D transform can be implemented
by one 1-D transform architecture. To complete 2-D transform 1-D transform
operation needs to be performed twice. The transform architecture is shown in
Figure 7.a. This “multiplier1” shown in Figure 7.a calculates every one of the 64
results separately. To calculate each result four cycles. A very important advantage
of using the “multiplier1” is the ability to reducing the computational complexity.

24| Page
Swaminathan Sridhar                EE 5359 Project             AVS to VP6

                      Figure 7.a Transform Architecture [22]
The I-D transform can be written as [23], [24].

25| Page
Swaminathan Sridhar             EE 5359 Project   AVS to VP6

5.e Coding tools in AVS China [3]

Motion compensated prediction

Motion compensated inteerpolation

Intra prediction

DCT coding and uniform quantization

Deblocking (loop filter)

VCL coding

Rate buffering

26| Page
Swaminathan Sridhar                EE 5359 Project              AVS to VP6

                      Figure 7.b Flowchart of C2DVLC encoding

27| Page
Swaminathan Sridhar               EE 5359 Project                      AVS to VP6

6.a AVS decoder

Figure 8.a AVS decoder [3]
 As a kind of high performance video coding standard, AVS has adopted a series of
advanced technology to reach the high-efficient compression ratio. The key
technology includes: 8X8 integer transform, entropy coding, intra prediction, inter
prediction, 1/4 pixel interpolation, loop filter, etc [2]. According to the format
norm of stream, the decoder gets various kinds of data from the data stream to
carry on the compensation of the video and reconstruction [3]. AVS encoder
employs time redundancy and space redundancy to improve compressibility, so the
process of decoding is usually made up of two parts: residual data and prediction
data. Decoder first reads header information from the video data stream, which
contain sequence header, frame header, macro header information through entropy
decoding. Then it begins to decode by macro block according to the procedure
illustrated in Fig.8.a. As in Fig.8.a, the basic steps are: read the brightness
coefficient and chroma coefficient of one macro through VLD, then inverse zig-
zag scan, inverse quantization, inverse integer cosine transform, from which the
decoder gets the macro residual data. The prediction data is gained from intra
prediction or inter prediction according to the macro type. At last, reconstruct the
macro with the residual data and prediction data Entropy coding adopts its special
self-adaptive technology based on Exp-Golomb coding. The Exp-Golomb can be
divided into prefix and suffix [4]. The prefix is formed by n bits of continues “0”
and a bit of “1”, the suffix contains n+k bits. Here, k is the grade of Golomb code.
When analyzing the k grad Golomb code, the decoder should first find the nonzero
flag from the present position of the stream data and remember the number of “0”
bits as n, then calculate out CodeNum with equation.

28| Page
Swaminathan Sridhar                  EE 5359 Project                       AVS to VP6

The DCT in AVS means discrete cosine transform. It transforms data from time
domain to frequency domain and most high-frequency weight turns into 0. The
human eye is not very sensitive to the high-frequency data, so quantization can
remove some unimportant high-frequency weight to improve the compress ratio.
7.a An overview on VP6 coding technique [7]
 VP6 is the coding technique developed by On2 Technologies. Flash Media is
emerging as the new preferred solution over the existing Windows Media Player,
Apple QuickTime, and Real Network Player for providing video services over the
Internet. Macromedia adopted VP6 as their video coding algorithm for its Flash
player in 2005 [7]. VP6 on Flash8 provides better performance than the existing
standards with smoother streaming and low color contrast video images [7]. Hence
this creates an opening for developing an AVS to VP6 transcoder to transmit the
AVS coded signals over the internet.
7.b Performance of VP6 coding technique [25]
The purpose of a video compressor is to take raw video and compress it into a
more manageable form for transmission or storage. A matching decompressor is
then used to convert the video back into a form that can be viewed. Most modern
codecs, including VP6 , are "lossy" algorithms, meaning that the decoded video
does not exactly match the raw source. Some information is selectively sacrificed
in order to achieve much higher compression ratios. The art of the codec designer
is to minimize this loss, whilst maximizing the compression.
At first glance, VP6 has a lot in common with other leading codes. It uses motion
compensation to exploit temporal redundancy, DCT to exploit spatial redundancy,
a loop filter to deal with block transform artifacts, and entropy encoding to exploit
statistical correlation. However, the "devil is in the details," so to speak, and in this
paper I will discuss a few of the features that set VP6 apart.
One of the problems with algorithms that use frequency based block transforms is
that the reconstructed video sometimes contains visually disturbing discontinuities
along block boundaries. These "blocking artifacts" can be suppressed by means of
post processing filters. However, this approach does not address the fact that these
artifacts reduce the value of the current decompressed frame as a predictor for
subsequent frames.

29| Page
Swaminathan Sridhar                 EE 5359 Project                       AVS to VP6

An alternative or complementary approach is to apply a filter within the
reconstruction loop of both the encoder and decoder. Such "loop filters" smooth
block discontinuities in the reconstructed frame buffers that will be used to predict
subsequent frames. In most cases this technique works well, but in some situations
it can cause problems. Firstly, loop filtering a whole frame consumes a lot of CPU
cycles. Secondly, when there is no significant motion in a region of the image,
repeated application of a filter over several frames can lead to problems such as
VP6 takes an unusual approach to loop filtering. In fact, some would say that it is
not a loop filter at all but rather a prediction filter. Instead of filtering the whole
reconstructed frame, VP6 waits until a motion vector is coded that crosses a block
boundary. At this point in time it copies the relevant block of image data and filters
any block edges that pass through it, to create a filtered prediction block (Fig. 9.a)

                       Figure 9.a. VP6 prediction loop filter
Because the reconstruction buffer itself is never filtered, there is no danger of
cumulative artifacts such as blurring. Also, because the filter is only applied where

30| Page
Swaminathan Sridhar                EE 5359 Project                       AVS to VP6

there is significant motion, this approach reduces computational complexity for
most frames. When we first implemented this approach in VP6, we saw an
improvement of up to 0.25 dB above a traditional loop filter on some clips.
7.c VP6 golden frames [25]
 In addition to the previous frame, some codes retain additional frames that can be
used as predictors. VP6 and other codes in the VPx range support a special kind of
second reference frame which we call a Golden Frame. This frame can be from the
arbitrarily distant past (or at least as far back as the previous Golden Frame) and is
usually encoded at a higher than average quality.
7.d Background / foreground segmentation [25]
 One use for Golden Frames is segmentation of the foreground and background in
a video. For example, in most video conferencing applications the background is
static. As the speaker moves around, parts of the background are temporarily
obscured and then uncovered again. By creating and maintaining a high quality
image of the background in the Golden Frame buffer, it is possible to cheaply re-
instate these regions as they are uncovered. This allows the quality of the
background to be maintained even when there is rapid movement in the foreground.
Furthermore, the cost savings can be used to improve the overall encoding quality.
The VP6 encoder also uses the Golden Frame to improve quality in certain types of
scenes. In slow moving pans or zooms, for example, a periodic high-quality golden
frame can improve image quality by restoring detail lost because of repeated
application of a loop filter or sub-pixel motion filters. This high quality frame
remains available as an alternate reference buffer until explicitly updated. As long
as the speed of motion is not too fast, this frame can help stabilize the image and
improve quality for a significant number of frames after the update. The VP6
encoder monitors various factors to determine the optimum frequency and quality
boost for golden frame updates. These factors include the speed of motion, how
well each frame predicts the next and how frequently the golden frame is selected
as the best choice reference for encoding macro blocks.
The results of this process can be quite dramatic for some clips, as shown below.

31| Page
Swaminathan Sridhar             EE 5359 Project                   AVS to VP6

Figure 9.b. Quality improvement with (left) vs. without (right) golden frames
7.e Context predictive entropy encoding [25]
Some other advanced video codecs use an entropy coding technique known as
32| Page
Swaminathan Sridhar                EE 5359 Project                      AVS to VP6

"Context Adaptive Binary Arithmetic Coding" (CABAC). This technique, while
quite efficient from a compression point of view, is expensive in terms of CPU
cycles because the context needs to be recalculated each time a token is decoded.
VP6 employs a proprietary "Context Predictive Binary Arithmetic Coding"
technique that relies upon sophisticated adaptive modeling at the frame level. This
technique assumes that information from spatially-correlated blocks is relevant
when considering the likelihood of a particular outcome for the current block. For
example, when considering the probability that a particular DCT coefficient is non
zero, information about the equivalent coefficient in neighboring blocks may be
important. An important point here is that the encoder performs heuristic modeling
at the frame level and passes relevant context information to the decoder in the bit
stream. This means that it is not necessary to compute contexts in the decoder on a
token by token basis.
7.f Bitstream partitions [25]
VP6's bitstream is partitioned to provide flexibility in building a fast decoder. All
of the prediction modes and motion vectors are stored in one data partition, and the
residual error information is stored in another. The jobs of creating a predictor
frame and decoding the residual error signal can thus be easily separated and run
on different cores with minimal overhead. Alternatively a VP6 decoder can decode
and reconstruct macroblocks one at a time, by pulling the mode and motion vector
information from one substream, and the residual error signal for that macroblock
from the other. Any compromise between these two extremes is possible, allowing
maximum flexibility when trying to optimize performance and minimize data and
instruction cache misses.
7.g Dual mode arithmetic and VLC encoding [25]
In addition to its proprietary "Context Predictive Binary Arithmetic Coding"
algorithm, VP6 also supports "Variable Length Coding" (VLC). As with the
arithmetic coder, the VLC coder makes use of predictive contexts to improve
compression efficiency. The efficiency of the VLC method compared to the
arithmetic coding method depends substantially on the data rate. At very high data
rates, where most of the DCT coefficients in the residual error signal are non-zero,
the difference between the VLC coder and the arithmetic coder is small (≤ 2%).
However, at low data rates, the arithmetic coder may deliver a very substantial
improvement in compression efficiency (>20%).

33| Page
Swaminathan Sridhar                EE 5359 Project                      AVS to VP6

Because of the way the bitstream is partitioned between the prediction modes and
motion vectors on the one hand and the residual error signal on the other, VP6 can
support mixed VLC and arithmetic coding. Here one partition is encoded using
arithmetic coding (typically the modes and motion decoder) while the other uses
the VLC method. This allows the encoder to trade off decoder complexity and
quality in a very efficient way. Below we show how we used this approach in the
recently announced VP6-S profile in Flash.
7.h Adaptive sub-pixel motion estimation [25]
One very unusual feature of VP6 is the way that it uses multiple 2- and 4-tap filters
when creating the prediction block for sub-pixel motion vectors (for example 1/2
and 1/4 pixel vectors). Traditionally, codecs typically use a single filter for all
blocks. In contrast, VP6 supports 16 different 4-tap filters, all with different
characteristics, as well as a 2-tap bilinear filter. The encoder can either select a
particular filter at the frame level, or signal that the choice should be made at the
8x8 block level according to a heuristic algorithm implemented in both the encoder
and decoder. This algorithm examines the characteristics of the reference frame at
the selected location and attempts to choose an optimal filter for each block, one
that will neither over-blur nor over-sharpen. The bitstream even allows the
parameters of the filter selection algorithm to be tweaked, so a user can specify a
preference for sharper video or less noisy and blocky video at encode time. This
feature is provided in recognition of the fact that attitudes to and acceptance of
different types of compression artifacts vary considerably from person to person
and between different cultures.
7.i VP6-E and VP6-S encoder profiles [25]
Adobe recently announced support for a new VP6 profile in Flash called VP6-S.
(The new support is on the encoding side. On the decoding side, both VP6-S and
the original profile (VP6-E) have been fully supported since the launch of VP6
video in Flash 8, so there are no problems of backwards compatibility.) The
principal difference between the two profiles comes down to decisions made by the
encoder in regard to sub-pixel motion estimation, loop filtering and entropy
encoding. As mentioned previously, VP6 allows for considerable flexibility in all
of these areas.
VP6-S targets HD content, which is characterized by high data rates. At these rates,
the difference from a compression efficiency standpoint between VP6’s “Context
Predictive Binary Arithmetic Coding" and its "Context Predictive VLC” coder is
less pronounced. However, at high data rates the number of CPU cycles used in the
entropy decoding stage rises substantially. To address this problem VP6-S
34| Page
Swaminathan Sridhar                EE 5359 Project                       AVS to VP6

selectively uses the VLC method for the residual error partition (DCT coefficients)
if the size of that partition rises above a pre-determined level. This compromise is
made possible by VP6's use of two bitstream partitions as described above.
In addition, VP6-S is restricted to using bilinear sub-pixel filters, whereas VP6-E
automatically chooses an optimal 4-tap or 2-tap filter for each macroblock. This
significantly reduces decode complexity for VP6-S. Although bilinear filtering can
cause some loss of sharpness and detail, this is much less pronounced for HD video.
The loss of quality is more pronounced for smaller image formats, making VP6-E
the better choice in such cases.
A final important difference is that the loop filter is disabled in VP6-S, giving rise
to a further reduction in decoder complexity. As with the use of bilinear filtering,
the detrimental effect of this from a quality standpoint is much less pronounced for
HD video. However, this difference makes VP6-S much less suitable for smaller
image formats such as QCIF and QVGA, where the lack of loop filtering may
result in a very noticeable drop in perceived quality.
The tradeoffs described above make possible the smooth playback of HD video
encoded using the VP6-S profile on much less powerful legacy computers, without
too big a hit on quality. However, the original VP6-E profile should be used for
smaller image formats and at low data rates, where it will deliver noticeably better
7.j Device ports and hardware implementations [25]
 In addition to implementations for Windows, Mac and Unix based PCs, VP6 has
been ported to a wide variety of devices, chipsets and processors from leading
companies including: ARM, TI (OMAP & DaVinci), Philips, Freescale, Marvell,
C2, Videantis, Sony, Yamaha and Archos. Furthermore, On2 is currently working
on a highly optimized hardware implementation of VP6, which is due to start
shipping later this year. This implementation will be used in SoCs for mobile
handsets and other low-power applications. This implementation will enable HD
playback of VP6 video on mobile phones.

35| Page
Swaminathan Sridhar              EE 5359 Project                     AVS to VP6

8.a Proposed Research
 Based on the available reference materials like H.263 to VP6 transcoder [8], AVS
to MPEG-2 transcoder [5], performance comparison of AVS and H.264/AVC
video coding standard [6], a new method has been proposed to develop the AVS to
VP6 transcoder.

Figure 10.a showing the implementation of AVS to VP6 transcoder
9.a Simulation results
As already mentioned the video input to the AVS encoder is in YUV format. So,
the performance of the AVS encoder can be realized by testing the encoder using
various QCIF and CIF video sequences.

36| Page
Swaminathan Sridhar             EE 5359 Project                    AVS to VP6

The above example shows the foreman.qcif video sequence which is played using
a YUV player. The above picture shows the first frame of the video sequence.

The above picture shows the details of the video sequence format. We can notice
that the sequence is in YUV format and is played at one frame per second. This
sequence is input to the AVS encoder.

From the above figure it can be seen that the input file to the AVS encoder is
foreman.qcif and the output of the encoder is stored as foreman2.avs. The above

37| Page
Swaminathan Sridhar            EE 5359 Project                  AVS to VP6

screenshot is taken from the Visual basics C++ window which is the reference
software used for AVS China [C].

The above screenshot shows the command prompt output window of the AVS
encoder. The output of the encoder is stored as foreman211.avs.

38| Page
Swaminathan Sridhar   EE 5359 Project   AVS to VP6

39| Page
Swaminathan Sridhar           EE 5359 Project                 AVS to VP6

The above window shows the AVS decoder output window. We can see that the
decoded output is saved as foreman_dec211.yuv.

40| Page
Swaminathan Sridhar             EE 5359 Project                  AVS to VP6

The above frame shows the decoded sequence played using a YUV player.
Container.qcif sequence
Test frame

Decoded frame

41| Page
Swaminathan Sridhar    EE 5359 Project   AVS to VP6

Claire.qcif sequence
Test frame

42| Page
Swaminathan Sridhar   EE 5359 Project   AVS to VP6

Decoded frame

43| Page
Swaminathan Sridhar   EE 5359 Project   AVS to VP6

News.qcif sequence
Test frame

44| Page
Swaminathan Sridhar   EE 5359 Project   AVS to VP6

Decoded frame

45| Page
Swaminathan Sridhar   EE 5359 Project   AVS to VP6

46| Page
Swaminathan Sridhar              EE 5359 Project                    AVS to VP6

The above two images show a comparison between VP6 coding technique and the
MX version which uses the H.263 coding technique. It can be noticed that VP6
does a fairly good job which can be seen through the image on the left while the
image on the left remains quite blocky [7].

47| Page
Swaminathan Sridhar             EE 5359 Project                   AVS to VP6

1] L. Yu et al., “An Overview of AVS-Video: tools, performance and complexity”,
Visual Communications and Image Processing 2005, Proc. of SPIE, vol. 5960,
pp.596021, July 31, 2006.

2] Zhang and L. Yu “An area-efficient VLSI architecture for AVS intra frame
encoder”,Visual Communications and Image Processing 2007, Proc. of SPIE-IS &
T Electronic Imaging, SPIE vol. 6508, pp. 650822, Jan. 29, 2007.

48| Page
Swaminathan Sridhar              EE 5359 Project                     AVS to VP6

3] W. Gao et al., “AVS - The Chinese Next-Generation Video Coding Standard”,
NAB, Las Vegas, 2004.

4] T. Wiegand and J. Sullivan “Overview of the H.264/AVC Coding Standard”,
IEEE Trans. Circuits Syst. Video Technol., vol.13, pp.560-576, July 2003.

5] J. Wang et al., “An AVS-to-MPEG2 Transcoding System”, China Proceedings
of 2004 International Symposium on Intelligent Multimedia, Video and Speech
Processing , Hong Kong, pp. 302-305, Oct. 20-22, 2004.
6] X. Wang and D. Zhao “Performance comparison of AVS and H.264/AVC
video coding standards”, J. Comput. Sci. & Technol., Vol.21, No.3, pp.310-314 J,
May 2006.

7] On2 Technologies, Inc. “WHITE PAPER On2 VP6 for Flash8 Video”, Sept.
12, 2005.

8] C. Holder and H. Kalva , “H.263 to VP6 transcoder”, SPIE, vol.6822 (VCIP) ,
pp. 68222B, San Jose, CA, Jan. 2008.
9] B. Tang et al. ,“ AVS Encoder Performance and Complexity Analysis Based
on Mobile Video Communication”, WRI International conference on
Communications and Mobile Computing, CMC '09, Vol. 3, pp. 102 – 107, . 6-8
Jan. 2009.
10] W. Gao et al., “A Fast Intra Mode Decision Algorithm for AVS to H.264
Transcoding”, 2006 IEEE International Conference on Multimedia and Expo, pp.
61 – 64, 9-12 July 2006.
11] C. Guanghua et al., “An efficient VLSI architecture of sub-pixel interpolator
for AVS encoder”, 9th International Conference on Signal Processing, 2008, ICSP
2008, pp. 1255 – 1258, 26-29 Oct. 2008

12] W. Gao et al. , “A Real-Time Full Architecture for AVS Motion Estimation”,
IEEE Trans. on Consumer Electronics, Vol. 53, Issue 4, pp. 1744 –1751, Nov.

13] W. Gao, “AVS standard - Audio Video Coding Standard Workgroup of
China”, International Conference on Wireless and Optical Communications, 14th
Annual WOCC 2005, pp. 54, 22-23 April 2005.

49| Page
Swaminathan Sridhar              EE 5359 Project                    AVS to VP6

14] L. Miao et al., “Context-dependent bitplane coding in China AVS audio”
Proceedings of 2005 International Symposium on Intelligent Signal Processing and
Communication Systems, ISPACS 2005, pp. 765 - 768, 13-16 Dec. 2005.

15] Guo-An Su et al. ,“Low-Cost Hardware-Sharing Architecture of Fast 1-D
Inverse Transforms for H.264/AVC and AVS Applications”, IEEE Trans. on
Circuits and Systems II: Express Briefs, Vol. 55, Issue 12, pp.1249 – 1253, Dec.
16] Y. Qu et al,. “A Cost-Effective VLSI Architecture of VLD for MPEG-2 and
AVS”, IEEE International Conference on Multimedia and Expo 2007, pp. 1619 –
1622, 2-5 July 2007.

17] Lu Yu et al., “Overview of AVS-video coding standards”, Signal Processing:
Image Communication, Vol. 24, Issue 4, pp. 247-262, April 2009.

18] W. Gao et al., “Context-based entropy coding in AVS video coding standard”,
Signal Processing: Image Communication, Vol. 24, Issue 4, pp. 263-276, April

19] X. Jin et al., “Platform-independent MB-based AVS video standard
implementation”, Signal Processing: Image Communication, Vol. 24, Issue 4, pp.
312-323, April 2009.
20] D. Ding et al. “Reconfigurable video coding framework and decoder
reconfiguration instantiation of AVS”, Signal Processing: Image Communication,
Vol.24, Issue 4, Pages 287-299, April 2009.
21] J. Zheng et al. “An efficient VLSI architecture for CBAC of AVS HDTV
decoder”, Signal Processing: Image Communication, Vol. 24, Issue 4, Pages 324-
332, April 2009.

22] H. S. Malvar et al. “Low-Complexity Transform and Quantization in
H.264/AVC”, IEEE Trans. on Circuits and Systems for video technology, vol. 13,
pp. 598-603, July 2003.

23] A.M. Patino et al. “2D-DCT on FPGA by polynomial transformation in two-
dimensions,” Proceedings of the 2004 International Symposium on Circuits and
Systems, vol. 3, pp. III – 365–8, May 2004.

24] Q. Wang et al. “Context Based 2D-VLC Entropy Coder in AVS Video Coding
Standard”, J. Comput. Science & Technol., vol. 21, No.3, pp. 315-322, May 2006.

50| Page
Swaminathan Sridhar                EE 5359 Project               AVS to VP6

25] Reference from Paul Wilkins Chief Technology Officer and Senior Vice
President of R&D at On2 Technologies.
Web References:
AVS China software



51| Page

Shared By: