Embed
Email

2. Overview of AC-3 Audio Codec

Document Sample

Shared by: chenmeixiu
Categories
Tags
Stats
views:
0
posted:
11/21/2011
language:
English
pages:
39
EE5359 MULTIMEDIA PROCESSING









STUDY AND COMPARISON OF AC-3, AAC AND HE-AAC AUDIO

CODECS









Under the guidance of Dr.K.R.Rao



Submitted by,

Dhatchaini Rajendran

M.S.E.E

ID # 1000636681



December 8, 2010







1

Acknowledgement



I would sincerely like to thank Dr. K.R.Rao for his constant guidance, support and motivation

which led to the successful completion of this project.



I would also like to thank all my friends for their support and encouragement.









2

Abstract:





The spectral band replication technology (SBR) is advancement in the field of low bit rate audio

coding and it enhances the performance of the traditional audio coders. Coding technologies [6],

an international company in the audio coding field has developed and marketed SBR. MPEG-

AAC belonging to the ISO-MPEG standard has shown a tremendous improvement with SBR.[1]

The coding efficiency of the traditional audio coders with SBR increases at least by 30%.[7] The

SBR is a bandwidth extension technique which exploits the strong correlation between the low

and high frequency contents in an audio signal. In this project, a performance analysis of the

MPEG-AAC audio coders and advanced audio coding (AAC) audio coders with SBR is

implemented which includes a comparison of the coding efficiency.









3

Table of Contents



Abstract: .......................................................................................................................................... 3



List of acronyms ............................................................................................................................. 5



List of figures .................................................................................................................................. 7



List of tables:................................................................................................................................... 8



1.Overview of Perceptual Audio Coding ........................................................................................ 9



1.1 Psychoacoustic parameters:................................................................................................. 10



2. Overview of AC-3 Audio Codec .............................................................................................. 12



2.1 AC-3 encoder .................................................................................................................... 13



2.2 AC-3 decoder ...................................................................................................................... 16



3. Overview of Advanced Audio Coding .................................................................................... 18



3.1 Basic Profiles in AAC codec .............................................................................................. 18



3.2 AAC encoder and decoder .................................................................................................. 19



3.3 AAC Bit stream Multiplexing ............................................................................................ 24



4. Overview of HE-AAC .............................................................................................................. 27



4.1Spectral Band Replication .................................................................................................... 27



5.1 Performance analysis of the audio codecs .............................................................................. 30



5.1 MUSHRA test .................................................................................................................... 30



5.2 AAC codec .......................................................................................................................... 30



5.3 HE-AAC codec .................................................................................................................. 34



5.4 AC-3 codec ......................................................................................................................... 35



References: .................................................................................................................................... 38









4

List of acronyms





AAC - Advanced audio coding



AC-3 - Audio codec 3



AES - Audio Engineering Society



ADIF - Audio data interchange format.



ADTS - Audio data transport stream.



ATSC - Advanced television systems committee



CT - Coding technologies



HE-AAC - High efficiency advanced audio coding



IMDCT - Inverse modified discrete cosine transform



ISO - International organization for standardization



ITU - International telecommunication union



JAES - Journal of the Audio Engineering Society



KBD - Kaiser-Bessel derived



LC - Low complexity



LFE - Low frequencies enhancement



LTP - Long term prediction



MDCT - Modified discrete cosine transform



MPEG - Moving pictures experts group



MUSHRA - Multiple stimuli with hidden reference and anchor



PCM - Pulse code modulation



PNS - Perceptual noise substitution



SBR - Spectral band replication



5

SRS - Sample rate scalable



TNS - Temporal noise shaping









6

List of figures





Figure 1a: Block diagram of perceptual encoding/decoding scheme



Figure 1b: Graph illustrating the triangular spreading function



Figure 2a: Six channels in AC-3 codec



Figure 2b: Block diagram of AC-3 encoder



Figure 2c: Frame structure and window function of AC-3



Figure 2d: Flow diagram of the AC-3 encoding process



Figure 2e: Flow diagram of the AC-3 decoding process



Figure 3a: Block diagram of AAC encoder



Figure 3b: Block Switching and the window function



Figure 3c: Block diagram of AAC decoder



Figure 4a: AAC audio codec family



Figure 4b: original audio signal



Figure 4c: High band reconstruction through SBR



Figure 4d: AAC codec with SBR technology









7

List of tables:



Table 3a: ADTS header format

Table 3b: ADTS profile bits in header

Table 5a: Performance of AAC audio codec



Table 5b: Performance of HE-AAC audio codec



Table 5c: Performance of AC-3 audio codec









8

1.Overview of Perceptual Audio Coding





Audio coding algorithms aim at representing the audio signal with minimum number of bits and

at the same time achieves signal reproduction with minimum errors.



Perceptual audio coding algorithms make use of facts like the insensitivity of the human ear to

frequencies less than 20 kHz and the redundancy in audio signals to accomplish maximum

compression of the audio signal. The irrelevant information in the signal is identified by using

several psychoacoustic parameters like absolute hearing thresholds, simultaneous masking,

critical band frequency analysis, temporal masking and spread of masking along the basilar

membrane.







Digital

Audio Input Analysis Quantization Encoding of

Filter Bank and Coding Bitstream







Perceptual

Model





Figure 1a: Block diagram of perceptual encoding/decoding scheme [1]







The blocks in Fig.1a are explained below:



 The filter bank decomposes the digital input signal into its sub sampled spectral

components in the time or frequency domain.



 The perceptual model uses the time domain input signal and mostly the output of the

analysis filter bank along with the psychoacoustic rules, and calculates the actual

masking threshold. This is called the perceptual model of the perceptual encoding system.



 The quantization and coding of the spectral components is done and the noise

introduced by quantizing below the masking threshold level is retained. There are several

ways of accomplishing this step from simple block companding to analysis-by-synthesis

systems using additional noiseless compression.









9

 A bitstream formatter is used in the encoding of the bitstream which is made up of

quantized and coded spectral coefficients and some side information like bit allocation

information.









1.1 Psychoacoustic parameters:





 Absolute threshold of hearing is the amount of energy required by a pure tone so that it

can be heard by the listener under noiseless conditions. It is the maximum allowable

energy level for coding distortions in the frequency domain and this information is used

to keep the noise levels introduced during quantization below the threshold. This is a

non-linear time varying function.



 Critical band is a parameter that is used in the spectral analysis of the tone. This arises

from the fact that the human ear behaves as a set of band pass filters encompassing the

entire 20 kHz range. The inner ear known as the cochlea acts as the spectrum analyzer at

it contains the frequency sensitive portions. The cochlea moves on the reception of a tone

and continues till it resonates. The cochlea filter bands are quantified by the critical

bandwidth. This is a function of frequency with the unit as bark.





 Masking as the name suggests is a process where one sound makes another sound

inaudible. There is simultaneous masking in the frequency domain and non-simultaneous

masking in the time domain. There are also three different cases of masking:

 Noise masking tone

 Tone masking noise

 Noise masking noise



 Spread of Masking is a predictable effect that a masker centered within one critical band

has on the detection thresholds in other critical bands. This is usually modeled in coding

applications by an approximately triangular spreading function shown in Fig. 1b









10

Figure 1b: Graph illustrating the triangular spreading function [18]









11

2. Overview of AC-3 Audio Codec





AC-3 is an audio codec developed by Dolby Laboratories. Dolby AC-3 audio compression

algorithm is a advanced television systems committee (ATSC) standard for digital audio

compression.[2] It is a lossy audio compression format and supports multi-channel format and is

used in a variety of applications including digital television and DVD.









Figure 2a: Six channels in AC-3 codec [2]



There are 5 full range channels (3Hz- 20,000Hz). Three of them are in the front (left, right and

centre) and the other two are surround channels are depicted in fig. 2a. The sixth channel ranges

from 3Hz-120Hz and is also known as low frequencies enhancement (LFE) Channel. This set of

channels is known as “5.1” channels.









Figure 2b: Block diagram of AC-3 encoder [2]

12

The working of the AC-3 encoder blocks in Fig. 2b is explained here [2]. Transforming the

representation of audio from a sequence of PCM time samples into a sequence of frequency

coefficients blocks is the first step in the encoding process. This is accomplished with the

analysis filter bank. Overlapping blocks of 512 time samples are transformed into the frequency

domain by multiplying them with a time window. As the blocks overlap, each PCM input sample

is represented by two sequential transformed blocks. This is shown in fig. 2c. Thus the frequency

domain representation gets decimated by a factor of two and so each block will contain 256

frequency coefficients. A binary exponent and mantissa is used to represent each frequency. The

set of exponents is encoded into a coarse representation of the signal spectrum which is referred

to as the spectral envelope. The core bit allocation routine is used to determine the number of bits

used to encode each individual mantissa. The mantissa is then quantized according to the bit

allocation information. The spectral envelope and the coarsely quantized mantissas for 6 audio

blocks (1536 audio samples) are formatted into an AC-3 frame. The AC-3 bit stream (from 32

to 640 kbps) is a sequence of AC-3 frames. The AC-3 decoder function is the exact opposite to

the encoder.









Figure 2c: Frame structure and window function of AC-3[17]







2.1 AC-3 encoder[2]



A detailed description of the AC-3 encoder is given in this section. A flow diagram of the

encoding process is shown in fig. 2d.

13

Input PCM: The AC-3 encoder accepts audio signals in the form of PCM words with lengths up

to 24 bits. The output bit rate and the input sample rate are locked inorder for the AC-3 sync

frame to contain 1536 samples of audio per channel. The individual input channels are high pass

filtered and DC components are removed for efficient coding. The LFE channel will be low pass

filtered.









Figure 2d: Flow diagram of the AC-3 encoding process [2]







Transient Detection: A decision to switch to short length audio blocks to improve the pre-echo

performance is made by detecting the transients in the full-bandwidth. Block switching is

employed to switch to shorter blocks if a transient is detected. The transient detector operates on



14

512 samples for every audio block and processes 256 samples at one pass. The four steps of

transient detection are



 High pass filtering is done with a cascaded biquad direct form IIR filter with a cutoff of 8

kHz [2].



 Block segmentation represents the block of 256 samples as three levels, where level 1

represents the 256 length block, level 2 is the 2-segment 128 length blocks and level 3 is

for 4 segments of length 64.



 Peak detection is used to identify the largest magnitude on every level of the hierarchy.



 Threshold comparison is done to check if there is a significant signal level in the current

block.



Forward transform: Windowing is done to reduce transform boundary effects and improve

frequency selectivity in the filter bank. A 512 point symmetrical window is formed from 256

coefficients used back-to-back. Time to frequency transformation is done either by one long N-

512 point transform or two short N-256 point transforms.



Coupling strategy: A static coupling strategy is used for a basic encoder. The coordinates for

all channels are transmitted for every other block. For advanced encoders, dynamically variable

coupling parameters are used. The frequencies can be varied based on the psychoacoustic model

and bit demand. Rapidly time varying power level channels are removed whereas slowly varying

channels may have their coupling coordinates sent less often.



Form coupling channel: A coupling channel can be got from a basic encoder by adding the

individual coefficients and dividing by 8 so that the channel does not exceed the value of 1.

Coupling coordinates are got by taking magnitude ratios within each coupling band. The

coordinates are then converted to floating point format and quantized.



Rematrixing: Power measurements are made on L, R, L+R, L-R signals within each rematrixing

band. The rematrix flag is set whenever the maximum power is found in L+R or L-R signal.

When the flag is set L+R and L-R are encoded.



Extract exponents: The number of leading zeros in the binary representation of the frequency

coefficient becomes the initial exponent value. The exponent sets are used in determining the

appropriate strategies.



Exponent strategy: The variation in exponents over frequency and time for each channel is

analyzed. It is necessary to trade off time versus frequency resolution while operating at low bit

rates. In general there is a tradeoff between the fine frequency resolution, fine time resolution

and number of bits involved in sending the exponents.

15

Dither Strategy: The coefficients that are quantized to zero bits will be reproduced with dither

and this will be controlled by the encoder. The purpose is to maintain the same energy in the

reproduced spectrum.



Encode exponents: The exponents of each set are preprocessed. They undergo encoding for

transmission in the bit stream. In this step, another set of exponents is generated which is equal

to the previous set and is used by the decoder.



Normalize mantissas: The normalization process is done by left shifting the number of times

obtained from the corresponding exponent. These mantissas are then quantized.



Core bit allocation: This routine is used by a basic encoder. The parameters involved are sent

only during block 0 as the bit allocation parameters are static. The core bit allocation is done and

the SNR is tuned till all the bits in the frame are used up.



Quantize mantissas: The normalized mantissa is quantized to give the quantized mantissa.

These are quantized by rounding to the number of bits indicated by bap that is used by the

quantized mantissa block.



Pack AC-3 frame: This is the encoded AC-3 frame with all the data. This frame can be output

as a burst or in serial format.







2.2 AC-3 decoder[2]



The various steps involved in the Ac-3 decoder are shown in fig. 2e. The input is either

continuous or in burst format. The bitstream is then bit or word aligned thus making the decoder

simpler. Rapid synchronization is possible with AC-3 bitstream. The next step is to extract the

information in the bitstream. They can be copied to specific memory location or input buffer.

Then the exponents are decoded. This requires the number of exponents and the strategy used to

be known. Bit allocation, processing mantissas, decoupling and rematrixing, dynamic range

compression and inverse transform are the reverse processes of the steps in the encoder. The

adjacent blocks are overlapped and added together to reconstruct the final continuous time output

PCM audio signal. Then downmixing is done if the number of channels encoded in the bitstream

is higher than the channels in the decoder. The PCM samples are held in a buffer before being

output. These samples can be connected to a digital to analog converter.









16

Figure 2e: Flow diagram of the AC-3 decoding process [2]









17

3. Overview of Advanced Audio Coding





Advanced audio coding scheme was a joint development by Dolby, Fraunhoffer, AT&T, Sony

and Nokia standardized by ISO (International organization for standardization) and IEC

(International Electro Technical Commission) as a part of MPEG-2 and MPEG-4 specifications

[9]. It is a digital audio compression scheme for medium to high bit rates which is not backward

compatible with moving pictures experts group (MPEG) audio standards. It is a wide band audio

coding algorithm that supersedes its predecessor MP3 (MPEG Layer 3 audio) by providing a

better compression ratio at the same bit rates as the previous standards or same quality audio at

lower bit rates. The main features of this standard are [7]



 Sample frequencies from 8 KHz to 96 KHz (MP3 16 KHz to 48 KHz) and thus can support

48 channels.



 Higher efficiency and simpler filter banks (MDCT- modified discrete cosine transform)



 Better handling of frequencies above 16 KHz and superior performance at bit rates > 64 Kbps

and bit rates reaching as low as 16 Kbps.



 AAC meets the requirements for stereo quality sound at 128 Kbps and 5.1 channel audio at

320 Kbps.





3.1 Basic Profiles in AAC codec :[13]



The AAC encoding follows a modular approach and the standard define four profiles which can

be chosen based on factors like complexity of bitstream to be encoded, desired performance and

output.



 Low-complexity profile is the most widely used and it deletes the prediction tool and

reduces the temporal noise-shaping tool in complexity.



 Main profile is the profile which uses all tools except the gain control module and it

provides the highest quality for applications where the amount of random accessory

memory (RAM) needed is not a constraint.



 Sample-rate scalable (SRS) profile adds the gain control tool to the low complexity

profile and allows the least complex decoder.



 Long term prediction (LTP) profile was newly introduced in MPEG-4 and reduces the

redundancy of a signal between successive coding frames.



18

3.2 AAC encoder and decoder [16]:



A generic block diagram of an AAC encoder is shown in fig. 3a. [3]









Figure 3a: Block diagram of AAC encoder [4]







Filterbank and block switching: MDCT (modified discrete cosine transforms) is the standard

transform used to convert the incoming audio signal from time domain to frequency domain.

MDCT is a lapped Fourier transform based on type IV DCT. Since it is a lapped transform the

number of outputs is as half as the number of inputs. This transform is very useful in signal

compression application and is used in AAC and AC-3 audio codecs. The MDCT is computed

using the equation below [11].







(3.1)

k = 0,1,….., N-1



where , Xk is the MDCT co-efficient in the frequency domain



19

xn is the sample in the time domain



The inverse MDCT is computed by adding the consecutive overlapping blocks, thus cancelling

the errors and retrieving the original signal. The formula used to compute IMDCT is given below

[11].









(3.2)

n = 0,1,….., 2N-1



where , Xk is the MDCT co-efficient in the frequency domain

yn is the sample in the time domain





The audio sample is first broken into segments called blocks. The data in these blocks are

modified to provide smooth transition between blocks by applying a time domain filter called a

window [10]. This is done by MDCT to the blocks. One of the challenges faced by audio coders

is the election of optimal block size.









Figure 3b: Block Switching and the window function [19]



Intermediate transition windows between the long and short windows smoothens the window

switching as shown in Figure 3b. AAC handles the difficulty associated with coding audio

material that vacillates between steady-state and transient signals by dynamically switching

20

between the two block lengths: 2048-samples, and 256-samples, referred to as long blocks and

short blocks, respectively [10]. The long block offers improved coding efficiency for stationary

signals and the short blocks provides optimized coding capabilities for transient signals. AAC

also switches between two different types of long blocks based on the window shape: sine-

function and Kaiser-Bessel derived (KBD) according to the complexity of the signal. The far-off

rejection is higher in KBD when compared to the sine shaped window.



This signal adaptive selection of the transform length is an important feature and is controlled by

analyzing the short time variance of the incoming audio signal. The block synchronicity between

two channels with different block length sequences is ensured by performing eight short

transforms in a row with 50% overlap and the transition windows are used at the start and end of

a short sequence. Thus the spacing between two consecutive blocks is maintained at a constant

level of 2048 input samples.



Filterbank and gain control: A gain control module and a processing block containing an

uniformly spaced PQF (4-band Polyphase quadrature filter) precedes the MDCT. The gain

control block is used to attenuate or amplify the output of each PQF band and decreases the pre-

echo effects. After performing gain control, MDCT is applied on each PQF band and the length

is one quarter of that of the original MDCT.



Temporal noise shaping (TNS): Speech signals that vary with time are often a challenge to

conventional transform schemes owing to the fact that quantization noise is controlled over

frequency but is constant in a transform block. The TNS technique was introduced into MPEG-2

AAC to overcome this limitation. It is like a post processing step of the MDCT transform which

is used to create a continuous filter bank instead of a switched filter bank. This scheme provides

enhanced control of the location of quantization noise within a filter bank window in the time

domain. It uses the principle of duality of time and frequency domain. A prediction approach is

used in the frequency domain to shape the quantization noise over time. This is done by filtering

the original spectrum and then quantizing and the quantized filter coefficients are transmitted in a

bitstream. This is used at the decoder end to undo the filtering resulting in a temporally shaped

distribution of quantization noise in the decoded audio signal.



TNS handles signals that are between steady state and transient in nature. Quantization noise is

present throughout the audio block when a transient signal lies at an end of a long block. The

non-transient locations in the blocks are described due to the availability of greater amount of

information allowed by TNS. This results in an increase in quantization noise of the transient,

where masking will render the noise inaudible, and a decrease of quantization noise in the

steady-state region of the audio block. [10].



Long term prediction (LTP): Redundancy reduction of stationary signal segments can be

improved by frequency domain prediction. Stationary signals are supported in long transform

blocks and not in short blocks. The predictor can be implemented by a second order backwards

adaptive lattice structure which is calculated independently for every frequency line. The use of

predicted values is controlled on a scale factor band basis and also depends on the prediction

gain in the band. A cyclic reset mechanism which is synchronized between the encoder and



21

decoder is used to improve the stability. Another advantage of the backwards adaptive structure

of the filter is the bitstreams are sensitive to transmission errors.



LTP is a very effective tool for frequency domain prediction especially for signals which have

clear pitch property. It reduces redundancy of the signal between successive coding frames. LTP

implementation is simpler and it uses forward adaptive predictor making it less sensitive to

round-off numerical errors in the decoder or bit error in the transmitted spectral coefficients.



Intensity stereo: Intensity stereo coding is based on an analysis of high-frequency audio

perception specifically on the energy-time envelope of the region of the audio spectrum. This

allows a stereo channel pair to share a single set of spectral values for the high-frequency

components while preserving the sound quality. This is achieved by maintaining the unique

envelope for each channel by means of a scaling operation so that each channel produces the

original level after decoding [10]. In this method, the right and left signal is replaced by a signal

plus directional information thus reducing the bit rate. It is a lossy coding method used primarily

for low bit rates.



Prediction: The prediction module is used to represent stationary or semi-stationary parts of an

audio signal and the repeated information for sequential windows can be represented by a repeat

instruction thus checking on the redundancy of the signal. Short blocks are used for the non-

stationary or rapidly varying signals and so prediction is used along with long blocks. The

prediction process is based on a second-order backward adaptive model in which the spectral

component values of the two preceding blocks are used in conjunction with each predictor. The

prediction parameter is adapted on a block-by-block basis [10].



Mid/Side (M/S) stereo coding: M/S stereo coding is another data reduction module based on

channel pair coding and is used to increase coding efficiency. In this case channel pair elements

are analyzed as left/right and sum/difference signals on a block-by-block basis. In cases where

the M/S channel pair can be represented by fewer bits, the spectral coefficients are coded, and a

bit is set to note that the block has utilized m/s stereo coding. M/S stereo achieves a significant

saving in bit rate when the signal is concentrated in the middle of the stereo image. During

decoding, the decoded channel pair is de-matrixed back to its original left/right state [10]. This

scheme is used for coding at higher bitrates.



Scalefactors: The inherent noise shaping in the non-linear quantizer is not sufficient to achieve

acceptable audio quality. To improve audio quality the noise is shaped using scalefactors. The

scalefactors increase SNR (signal to noise ratio) in certain bands by amplifying the signal in

those spectral regions. The bit-allocation over frequency is modified as more bits are used to

code the higher spectral values. At the decoder, original spectral values are reconstructed by

transmitting the scalefactors within the bitstream. Huffman coding is used to reduce the

redundancy within the scalefactor data.



Quantization and coding: Majority of the data reduction generally occurs in the quantization

phase after the data has already achieved certain level of compression when passed through the

previous modules. In the AAC module, the spectral data is quantized under the control of the



22

psychoacoustic model. The number of bits used must be below a limit determined by the desired

bit rate. Huffman coding is also applied 24 in the form of twelve codebooks. In order to increase

coding gain, scale factors with spectral coefficients of value zero are not transmitted [10].

Adaptive quantization is the primary source of bit rate reduction and key components in the

process are the quantization function and noise shaping. Non-linear quantization is used as it has

implicit noise shaping when compared to the conventional linear quantizer.



Noiseless Coding: This block is used to optimize the redundancy reduction. It is nested inside

the quantization and coding module. Noiseless dynamic range compression can be applied prior

to Huffman coding. A value of +1/- 1 is placed in the quantized coefficient array to carry sign,

while magnitude and an offset from base, to mark frequency location, are transmitted as side

information. This process is only used when there is a reduction in the number of bits [10]. An

efficient grouping algorithm is used to find an optimum tradeoff between the optimum table for

each scalefactor band and minimizing the number of data elements to be transmitted.





The AAC decoder is shown in fig. 3c



The coding efficiency is enhanced by the following tools and they help attain higher quality at

lower bit rates.[3]



 This scheme has higher frequency resolution with the number of lines increased up to

1024 from 576.

 Joint stereo coding has been improved. The bit rate can be reduced frequently owing to

the flexibility of the mid or side coding and intensity coding.

 Huffman coding is applied to the coder partitions.



The following tools are used to improve the audio quality:



 Enhanced block switching: Switched MDCT filterbank with an impulse response of 5.3

ms at 48 kHz sampling frequency is used. This helps in the reduction of pre-echo

artifacts.[3]

 TNS: An open loop prediction is done in the frequency domain which leads to noise

reduction in the frequency domain. This technique enhances quality of speech at low bit-

rates.









23

Figure 3c: Block diagram of AAC decoder[15]









3.3 AAC Bit stream Multiplexing [8]:



AAC has very flexible bit stream syntax. A single transport is not ideally suited to all

applications, and AAC can support two basic bit stream formats: audio data interchange format

(ADIF) and audio data transport stream (ADTS).



 ADIF (audio data interchange format): This format has only one header at the

beginning of the AAC file and the rest of the data are consecutive raw data blocks. This



24

file format is used for simple local storing purposes, where breaking of the audio data is

not necessary.



 ADTS (audio data transport stream): This format has one header for each frame

followed by a raw block of data. ADTS headers are present before each AAC raw data

block or block of 2 to 4 raw data blocks in a frame to ensure better error robustness in

streaming environments. For this study ADTS bit stream format is adopted. The details of

the ADTS header are given in Table 3a and 3b.





Table 3a: ADTS header format[14]



Field name Field size in bits Comment



ADTS Fixed header: these

do not change from frame

to frame



Syncword always: '111111111111'

12

ID 0: MPEG-4, 1: MPEG-2

1

always: '00'

Layer

2



protection_absent

1

Profile

2

Sampling_frequency_index

4

private_bit

1

channel_configuration

3

original/copy

1

Home

1



ADTS Variable header:

This can change from

frame to frame



25

Copyright_identification_bit 1





Copyright_identification_start 1



length of the frame

aac_frame_length

13 including header (in bytes)



0x7FF indicates VBR

ADTS_buffer_fullness 11





No_raw_data_blocks_in_frame 2



ADTS Error check



Only if protection_absent

crc_check

16 == 0



Variable

Raw block of data







Table 3b: ADTS profile bits in header[14]





Profile bits ID 1 (MPEG-2 profile)



00 (0) Main profile



01 (1) Low complexity profile (LC)



10 (2) Scalable sample rate profile (SSR)



11 (3) (reserved)









26

4. Overview of HE-AAC





High efficiency advanced audio codec is a low bit rate audio codec defined in MPEG4 audio

profile belonging to the AAC family [4] . This is a combination of AAC with SBR where AAC

is the audio codec and SBR is a technique which increases the coding gain by bandwidth

extension technique. The family of advanced audio codecs is shown in figure. 4a.









HE AAC v2









AAC SBR PS



HE AAC







Figure 4a. AAC audio codec family [20]







4.1Spectral Band Replication (SBR):







SBR is one of the important advancements in the field of audio coding. It is a bandwidth

expansion technique which is based on the correlation between the energy components at the

high and low frequency bands of the audio signal.







SBR is an add-on to the audio coder. It is a pre-process on the encoder side and a post

process on the decoder side. The data rate of the SBR data is a fraction of data of the combined

system. The audio encoder codes the lower band of frequencies upto a certain cutoff frequency.

The higher frequencies above cutoff are recreated from the lower band. This reconstructed band

along with the low band forms the full decoded audio signal. The encoder operates at half the

sampling rate of the SBR thus increasing the frequency resolution of the filter bank. . These are



27

referred to as SBR data. The original and the high band reconstructed audio signal are shown in

the figures 4b and 4c respectively.









Figure 4b: original audio signal [21].









Figure 4c: High band reconstruction through SBR [21].







SBR has enabled high-quality stereo sound at bitrates as low as 48 kbps. Parametric stereo

coding is a technique to efficiently code a stereo audio signal as a monaural signal plus a small

amount of stereo parameters and this technique when combined with SBR and AAC is the HE-

AAC codec version 2 also known as enhanced aacplus codec.









28

In general, a signal composed of a strong harmonic series up to a cutoff frequency has the same

harmonic series in its higher band of frequencies. This property is the principle for SBR. For

signals that do not follow this property, tools like inverse filtering, adaptive noise addition and

sinusoidal regeneration are used to improve the signals. It also exploits the fact that the

psychoacoustic parameters of the high band are relatively less important and uses the

transposition technique to predict energies at the high band with the knowledge of the low band.

A block diagram of the audio codec with SBR is shown in Fig. 4d [4].









Figure 4d: AAC codec with SBR technology [4]









29

5.1 Performance analysis of the audio codecs







5.1 MUSHRA test [22]



This test is done to assess the quality of the audio compression algorithm. Multiple stimuli

with hidden reference and anchor (MUSHRA) defined by international telecommunication

union (ITU) is a methodology employed for subjective evaluation of audio quality. It is used

to evaluate the perceived quality of the output from lossy audio compression algorithms. The

MUSHRA methodology is recommended for assessing "intermediate audio quality". This

method requires fewer participants to obtain statistically significant results owing to the fact

that all codecs are presented at the same time, on the same samples, so that a paired t-test can

be used for statistical analysis. In MUSHRA, the listener is presented with the reference

(labeled as such), a certain number of test samples, a hidden version of the reference and one

or more anchors. The recommendation specifies that one anchor must be a 3.5 kHz low-pass

version of the reference. The purpose of the anchor(s) is to make the scale be closer to an

"absolute scale", making sure that minor artifacts are not rated as having very bad quality.









5.2 AAC codec



An analysis of AAC at constant bandwidth is done for different file formats.



Length of audio sequence = 45 seconds.

Bit rate before encoding = 1536 kbps



Table 5a. Performance of AAC audio codec



Bit rate

Encoding Decoding

Results: after Original Compressed Compression

time time

File format encoding Size (MB) Size (kB) Ratio

(seconds) (seconds)

(kbps)



ADTS 64 8.7 3.09 8.23 353 12:1



ADIF 64 8.7 3.51 8.23 353 12:1



AAC 64. 8.7 3.07 8.23 353 12:1









30

The snap shots of the encoded and decoded audio sequences are shown below.









31

32

33

5.3 HE-AAC codec



An analysis of HE-AAC at constant bandwidth is done and the results are tabulated below.



Length of audio sequence = 45 seconds.

Bit rate before encoding = 1536 kbps







Table 5b. Performance of HE-AAC audio codec



Bit rate

Encoding Decoding

after Original Compressed Compression

time time

encoding Size (MB) Size (kB) Ratio

(seconds) (seconds)

(kbps)



48 3.0 2.0 8.23 272 30:1



32 3.0 2.0 8.23 184 45:1



24 3.0 2.0 8.23 140 59:1









34

A snap shot of the encoded and decoded audio sequences are shown below:









5.4 AC-3 codec



An analysis of HE-AAC at constant bandwidth is done and the results are tabulated below.



Length of audio sequence = 45 seconds.

Bit rate before encoding = 1536 kbps









35

Table 5c. Performance of AC-3 audio codec



Bit rate

Encoding

after Original Compressed Compression

time

encoding Size (MB) Size (kB) Ratio

(seconds)

(kbps)



32 0.53 8.23 175 47:1



48 0.41 8.23 263 31:1







The snapshot of the encoding and decoding process is shown below:









36

5.5 Conclusions:



In this project, a study of three audio codecs (AC-3, AAC and HE-AAC) has been done. A wave

file had been encoded and decoded using these codecs and they were compared based on the

compression ratios and encoding times. It is seen HE-AAC has a better compression ratio than

AAC owing to the SBR technology being used. AC-3 and HE-AAC have similar compression

ratios and they are based on different standards. The results are tabulated in tables 5a, 5b and 5c.









37

References:

[1] K. Brandenburg and M. Bosi, “Overview of MPEG audio: current and future standards

for low-bit-rate audio coding,” JAES, vol.45, pp.4-21, Jan./Feb. 1997.



[2]A/52 B ATSC Digital Audio Compression Standard:

http://www.atsc.org/cms/standards/a_52b.pdf



[3] D.Meares, K. Watanabe and E.Scheirer, “Report on the MPEG-2 AAC stereo verification

tests”, ISO/IEC JTC1/SC29/WG11, Feb.1998.



[4] M. Dietz, L. Liljeryd and K. Kjörling, “Spectral band replication, a novel approach in

audio coding,” in 112th AES Convention, Munich, May 2002.



[5] F. Henn , R. Böhm and S. Meltzer, “ Spectral band replication technology and its

application in broadcasting”, International Broadcasting Convention, 2003.



[6] M. Dietz and S. Meltzer, “ CT-aacplus – a state of the art audio coding scheme”, Coding

Tecnologies, EBU Technical review, July. 2002.



[7]P. Ekstrand, ― Bandwidth extension of audio signals by spectral band replication‖, IEEE

Benelux Workshop on Model based Processing and Coding of Audio (MPCA-2002), Nov.15,

2002.



[8] International Standard ISO/IEC 11172-3:1993, ―Information technology – Coding of

moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s – Part

3: Audio,‖ ISO/IEC, 1993.



[9] ISO/IEC IS 13818-7, “Information technology – Generic coding of moving pictures and

associated audio information Part 7: advanced audio coding (AAC)”, 1997.



[10] M. Bosi and R.E. Goldberg, “Introduction to digital audio coding standards”, Norwell.

MA: Kluwer, 2003.



[11] H.S. Malvar, “Signal processing with lapped transforms”, Artech House: Norwood MA,

1992.



[12] T.Ogunfunmi and M.Narasimha, “Principles of speech coding”, Boca Raton, FL: CRC

Press, 2010.



[13] X. Hu et al ,―An efficient low complexity encoder for MPEG advanced coding‖ ICACT

2006, pp. 1501-1505, Feb. 20-22, 2006.









38

[14] H. Kalva et al. “Implementing multiplexing, streaming and server interaction for MPEG-

4”, IEEE Transactions on circuits and systems for video technology, vol. 9, No.8, pp 1299-

1311,Dec. 1999.



[15] MPEG–2 Advanced audio coding, AAC. International Standard IS 13818–7, ISO/IEC

JTC1/SC29 WG11, 1997.



[16] H. Murugan, “Multiplexing H264 video bit-stream with AAC audio bit-stream,

demultiplexing and achieving lip sync during playback”, M.S.E.E Thesis, University of Texas

at Arlington, TX, May 2007.



[17] Dr. O. Yamada, “Technologies and services on digital broadcasting – Source coding of

audio”, CORONA publishing co., Ltd., 2002



[18] GB/T 20090.1, “Information technology - advanced coding of audio and video – Part 1:

system, chinese AVS standard‖.



[19] H.G. Ranjani and A. Kalagi, “Algorithmic delay and synchronization in MPEG audio

codecs‖, Ittiam Systems Pvt. Ltd., May 2010

[20] “MPEG-4 HE-AAC v2 — audio coding for today's digital media world “, article in the

EBU technical review (01/2006. Link: http://tech.ebu.ch/docs/techreview/trev_305-moser.pdf



[21] M. Modi, “Audio compression gets better and more complex”, link:

http://www.eetimes.com/discussion/other/4025543/Audio-compression-gets-better-and-more-

complex



[22] Recommendation ITI-R: BS.1534: ―Method for the subjective assessment of intermediate

quality levels of coding systems‖. Link: http://www.itu.int/rec/R-REC-BS.1534/en



[23] C.C.Todd, G.A. Davidson, M.F. Davis et. Al,” AC-3: Flexible perceptual coding for

audio transmission and storage”, Dolby laboratories.

http://www.dolby.com/uploadedFiles/English_(US)/Professional/Technical_Library/Technologie

s/Dolby_Digital_(AC-3)/37_ac3-flex.pdf





Reference Web Sites:

[24] Audio coding website www.audiocoding.com









39



Related docs
Other docs by chenmeixiu
Summer_of_2011
Views: 3  |  Downloads: 0
Guidance_Update_03-17-10
Views: 0  |  Downloads: 0
0H8524 RevA.indd
Views: 0  |  Downloads: 0
1995 IF327RC
Views: 244  |  Downloads: 0
National Gallery of Art Children's Website
Views: 0  |  Downloads: 0
cu18_1_
Views: 7  |  Downloads: 0
Fundraising Report - August Newsletter-1
Views: 0  |  Downloads: 0
Mass Opinion 1-2010
Views: 1  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!