Performance Analysis of AAC audio coder and AAC with SBR technology- fin

Document Sample
Performance Analysis of AAC audio coder and AAC with SBR technology- fin Powered By Docstoc

Project Proposal:

Title: Study and comparison of AC-3, AAC and HE-AAC audio codecs


The spectral band replication technology (SBR) is an advancement in the field of low bit rate
audio coding and it enhances the performance of the traditional audio coders.            Coding
Technologies, an international company in the audio coding field has developed and marketed
SBR. MPEG-AAC belonging to the ISO-MPEG standard has shown a tremendous improvement
with SBR.[1] The coding efficiency of the traditional audio coders with SBR increases at least by
30%.[7] The SBR is a bandwidth extension technique which exploits the strong correlation effect
between the low and high frequency content in an audio signal. In this project, a performance
analysis of the MPEG-AAC audio coders and advanced audio coding (AAC) audio coders with
SBR will be implemented which includes a comparison of the coding efficiency.

Student: Dhatchaini Rajendran
Student ID: 1000636681
Date: September 28, 2010

AAC - Advanced audio coding

AC-3 – Audio codec 3

AES - Audio Engineering Society

ATSC – Advanced television systems committee

HE-AAC – High efficiency advanced audio coding

IMDCT – Inverse modified discrete cosine transform

ISO – International organization for standardization

LC – Low complexity

LFE – Low frequencies enhancement

LTP – Long term prediction

MDCT – Modified discrete cosine transform

MPEG – Moving pictures experts group

PCM – Pulse code modulation

SBR – Spectral band replication

SRS – Sample rate scalable

TNS – Temporal noise shaping
An Overview of Perceptual Audio Coding

Audio coding algorithms aim at representing the audio signal with minimum number of bits and
at the same time achieves signal reproduction with minimum errors.

Perceptual audio coding algorithms make use of facts like the insensitivity of the human ear to
frequencies less than 20 kHz and the redundancy in audio signals to accomplish maximum
compression of the audio signal. The irrelevant information in the signal is identified by using
several psychoacoustic parameters like absolute hearing thresholds, simultaneous masking,
critical band frequency analysis, temporal masking and spread of masking along the basilar

Audio Input      Analysis               Quantization            Encoding of
                 Filter Bank            and Coding              Bitstream


Figure 1: Block diagram of perceptual encoding/ decoding scheme [1]

The blocks in Fig.1 are explained below:

        The filter bank decomposes the digital input signal into its subsampled spectral
         components in the time or frequency domain.
        The perceptual model uses the time domain input signal and mostly the output of the
         analysis filter bank along with the psychoacoustic rules, and calculates the actual
         masking threshold. This is called the perceptual model of the perceptual encoding system.
        The quantization and coding of the spectral components is done and the noise
         introduced by quantizing below the masking threshold level is retained. There are several
         ways of accomplishing this step from simple block companding to analysis-by-synthesis
         systems using additional noiseless compression.
        A bitstream formatter is used in the encoding of the bitstream which is made up of
         quantized and coded spectral coefficients and some side information like bit allocation
An Overview of AC-3 Audio Codec

AC-3 is an audio codec developed by Dolby Laboratories. Dolby AC-3 audio compression
algorithm is a advanced television systems committee (ATSC) standard for digital audio
compression.[2] It is a lossy audio compression format and supports multi-channel format and is
used in a variety of applications including digital television and DVD.

There are 5 full range channels (3Hz- 20,000Hz). Three of them are in the front (left, right and
centre) and the other two are surround channels. The sixth channel ranges from 3Hz-120Hz and
is also known as low frequencies enhancement (LFE) Channel. This set of channels is known as
“5.1” channels.

                         Figure 2: Block diagram of AC-3 encoder [2]

The working of the AC-3 encoder blocks in Fig. 2 is explained here [2]. Transforming the
representation of audio from a sequence of PCM time samples into a sequence of frequency
coefficients blocks is the first step in the encoding process. This is accomplished with the
analysis filter bank. Overlapping blocks of 512 time samples are transformed into the frequency
domain by multiplying them with a time window. As the blocks overlap, each PCM input sample
is represented by two sequential transformed blocks. Thus the frequency domain representation
gets decimated by a factor of two and so each block will contain 256 frequency coefficients. A
binary exponent and mantissa is used to represent each frequency. The set of exponents is
encoded into a coarse representation of the signal spectrum which is referred to as the spectral
envelope. The core bit allocation routine is used to determine the number of bits used to encode
each individual mantissa. The mantissa is then quantized according to the bit allocation
information. The spectral envelope and the coarsely quantized mantissas for 6 audio blocks
(1536 audio samples) are formatted into an AC-3 frame. The AC-3 bit stream (from 32 to 640
kbps) is a sequence of AC-3 frames. The AC-3 decoder function is the exact opposite to the

An overview of MPEG – Advanced Audio Coding

Advanced audio coding scheme was a joint development by Dolby, Fraunhoffer, AT&T, Sony
and Nokia.[9] It is a digital audio compression scheme for medium to high bit rates which is not
backward compatible with motion pictures experts group (MPEG) audio standards. The AAC
encoding follows a modular approach and the standard define four profiles which can be chosen
based on factors like complexity of bitstream to be encoded, desired performance and output.

      Low complexity (LC)
      Main profile (MAIN)
      Sample-rate scalable (SRS)
      Long term prediction (LTP)

Excellent audio quality is provided by AAC and it is suitable for low bit rate high quality audio
applications. MPEG – AAC audio coder uses the AAC scheme.

HE – AAC also known as aacPlus is a low bit rate audio coder. It is an AAC LC audio coder
enhanced with SBR technology.

A generic block diagram of an AAC encoder is shown in Fig. 3.[3] AAC is a second generation
coding scheme which is used for stereo and multichannel signals. When compared to the
perceptual coders, AAC provides more flexibility and uses more coding tools [3].

The coding efficiency is enhanced by the following tools and they help attain higher quality at
lower bit rates.[3]

      This scheme has higher frequency resolution with the number of lines increased up to
       1024 from 576.
      Joint stereo coding has been improved. The bit rate can be reduced frequently owing to
       the flexibility of the mid or side coding and intensity coding.
      Huffman coding is applied to the coder partitions.

The following tools are used to improve the audio quality:
   Enhanced block switching: Switched modified discrete cosine transform (MDCT)
    filterbank with an impulse response of 5.3 ms at 48 kHz sampling frequency is used. This
    helps in the reduction of pre-echo artifacts.[3]

    MDCT is a lapped Fourier transform based on type IV DCT. Since it is a lapped
    transform the number of outputs is as half as the number of inputs. This transform is very
    useful in signal compression application and is used in AAC and AC-3 audio codecs. The
    MDCT is computed using the equation below [11].

                                          k = 0,1,…..N-1
    where , Xk is the MDCT co-efficient in the frequency domain
            xn is the sample in the time domain

    The inverse MDCT is computed by adding the consecutive overlapping blocks, thus
    cancelling the errors and retrieving the original signal. The formula used to compute
    IMDCT is given below [11].

                                         n = 0,1,…..2N-1
    where , Xk is the MDCT co-efficient in the frequency domain
            yn is the sample in the time domain

   Temporal noise shaping (TNS): An open loop prediction is done in the frequency domain
    which leads to noise reduction in the frequency domain. This technique enhances quality
    of speech at low bit-rates.
                     Figure 3: Block diagram of MPEG 2 – AAC [3]

An Overview of AAC audio coder with Spectral Band Replication

SBR is an add-on to the audio coder. It is a preprocess on the encoder side and a post process
on the decoder side. The data rate of the SBR data is a fraction of data of the combined
system. The audio encoder codes the lower band of frequencies upto a certain cutoff
frequency. The higher frequencies above cutoff are recreated from the lower band. This
reconstructed band along with the low band forms the full decoded audio signal. The encoder
operates at half the sampling rate of the SBR thus increasing the frequency resolution of the
filter bank.

In general, a signal composed of a strong harmonic series upto a cutoff frequency has the
same harmonic series in its higher band of frequencies. This property is the principle for
SBR. For signals that do not follow this property, tools like inverse filtering, adaptive noise
addition and sinusoidal regeneration are used to improve the signals. A block diagram of the
audio codec with SBR is shown in Fig. 4[4].
                           Figure 4: AAC codec with SBR technology [4]

Proposed Study

In this project, a study of the AC-3, AAC and HE-AAC audio coders will be performed.
Performance analysis of the respective coders will be implemented to verify the increase in the
coding efficiency when SBR technology is added to AAC.

[1] K. Brandenburg and M. Bosi, “Overview of MPEG audio: current and future standards
for low-bit-rate audio coding,” JAES, vol.45, pp.4-21, Jan./Feb. 1997.

[2] A/52 B ATSC Digital Audio Compression Standard:

[3] D.Meares, K. Watanabe and E.Scheirer, “Report on the MPEG-2 AAC stereo verification
tests”, ISO/IEC JTC1/SC29/WG11, Feb.1998.

 [4] M. Dietz, L. Liljeryd and K. Kjörling, “Spectral band replication, a novel approach in
audio coding,” in 112th AES Convention, Munich, May 2002.

[5] F. Henn , R. Böhm and S. Meltzer, “ Spectral band replication technology and its
application in broadcasting”, International Broadcasting Convention, 2003.

[6] M. Dietz and S. Meltzer, “ CT-aacplus – a state of the art audio coding scheme”, Coding
Tecnologies, EBU Technical review, Jul. 2002.

[7]P. Ekstrand, “ Bandwidth extension of audio signals by spectral band replication”, IEEE
Benelux Workshop on Model based Processing and Coding of Audio (MPCA-2002), Nov.15,

[8] International Standard ISO/IEC 11172-3:1993, “Information technology – Coding of
moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s – Part
3: Audio,” ISO/IEC, 1993.

[9] ISO/IEC IS 13818-7, “Information technology – Generic coding of moving pictures and
associated audio information Part 7: advanced audio coding (AAC)”, 1997.

[10] M. Bosi and R.E. Goldberg, “ Introduction to digital audio coding standards”, Norwell.
MA: Kluwer, 2003.

[11] H.S. Malvar, “Signal processing with lapped transforms”, Artech House: Norwood MA,

Shared By: