Learning Center
Plans & pricing Sign in
Sign Out



This deals about MATLAB programs

More Info
Subband Coders
  Audio Coding
     S. R. M. Prasanna

       Dept of ECE,
       IIT Guwahati,

                            Subband Coders – p. 1/2
Subband coders like Tfm. coders exploit statistical
redundancy and psychoacoustic irrelevancy in FD
Freq. spectrum is divided into subbands using a bank of
band pass filters
O/P of each BPF is downsampled and encoded
At the receiver decoding, upsampling, filtering and
summing to synthesize audio
Subband coders achieve significant coding efficiency by
efficient quantization of decimated o/p sequences from
Efficient quantization relies on psychoacoustically
controlled dynamic bit allocation rules such that
reconstructed o/p signal is free of audible distortion

                                                      Subband Coders – p. 2/2
           MASCAM Coder
Masking Pattern Adapted Subband Coding (MASCAM)
Developed at IRT and PHILIPS independently
Based upon a tree str. QMF FB designed to mimic the
CB str of auditory FB
Coder has 24 nonuniform subbands
  0-1 kHz, 125 Hz BW (8 bands)
  1-2 kHz, 250 Hz, BW (4 bands)
  2-4 kHz, 500 Hz, BW (4 bands)
  4-8 kHz, 1 kHz, BW (4 bands)
  8-16 kHz, 2 kHz, BW (4 bands)

                                               Subband Coders – p. 3/2
     MASCAM Coder (contd.)
Subband o/p seqs. are processed in 2 ms blocks
Subband bit allocation is derived from simplified
psychoacoustic analysis
Reported to achieve high quality results for 15 kHz b.w
i/p signals at bit rates between 80 and 100 kbps per

                                                   Subband Coders – p. 4/2
           MUSICAM Coder
Masking Pattern Adapted Universal Subband Integrated
Coding and Multiplexing (MUSICAM)
Based primarily on MASCAM
MUSICAM coder eventually formed the basis for
MPEG-1 and MPEG-2 audio layers I and II.
Compared to MASCAM, MUSICAM makes several
practical tradeoffs between complexity, delay and
By utilizing a uniform BW, 32 band Pseudo-QMF bank
instead of a tree str. QMF bank, both complexity and
delay are greatly reduced relative to MASCAM
Improved delay and complexity at the expense of
sub-optimal FB, i.e., filter BWs (750 Hz) no longer
correspond to critical band rate
                                                     Subband Coders – p. 5/2
    MUSICAM Coder (contd.)
Despite these excessive filter BWs, high quality coding
in MUSICAM is due to its enhanced psychoacoustic
High resolution spectral estimates using 1024 point FFT
in parallel with the PQMF bank.
Parallel str. allows improved estimation of masking
thresholds and hence more accurate min.
signal-to-mask ratios (SMRs) for each subband
MUSICAM psychoacoustic analysis procedure is
essentially same as the MPEG-1 psychoacoustic

                                                      Subband Coders – p. 6/2
  MUSICAM Coder (contd. 2)
Subband o/p seqs. are processed in 8 ms blocks
Subband samples are quantized according to SMR
requirements for each subband
Bit allocation for each subband are transmitted as side
MUSICAM scored 4.6 on 5 point MOS scale at 128
kbps and 4.3 at 96 kb/s per channel compared to 4.6 for
the uncoded original.
MUSICAM was selected for MPEG-1 audio due to
combination of high quality, reasonable complexity and
manageable delay
Also bit error robustness was found to be very good up
to a bit error rate of 10−3
                                                   Subband Coders – p. 7/2
DWT and DWPT based Coders
Previous subband coders utilize fixed resolution BP
QMF or Pseudo-QMF FIR filters
Alternatively, these coders rely on filter bank
interpretation of DWT
DWT based coders offer increased flexibility over the
subband coders described earlier
DWT provides identical FB magnitude responses for
many diff. choices of wavelet basis
This flexibility provides an opportunity for basis

                                                    Subband Coders – p. 8/2
DWT and DWPT Coders (contd.)
First desired FB mag. response can be established
This response might be matched to the auditory FB
Then for each segment of audio, one can adaptively
choose a wavelet basis that minimizes the rate for some
target distortion level
Given a psychoacoustically derived distortion target, the
encoding remains perceptually transparent
In DWT based applications effectively i/p data is passed
thro’ a tree str. Cascade of LP and HP filters followed by
2:1 decimation at every node
Forward/inverse tfm. matrices of a particular wavelet
are associated with a corresponding QMF analysis/
synthesis FB

                                                   Subband Coders – p. 9/2
DWT and DWPT Coders (contd.2)
 The usual octave band FB str using DWT is shown

              x      Q         y1   Q         y2     Q      y3        Q     y4

             y5 y4       y3          y2                          y1
               1.4 2.8        5.5               11                               22 kHz
                                          Frequency (Hz)
    Figure 8.3. Octave-band subband decomposition associated with a discrete wavelet trans-
    form (“DWT”).

 Alternatively, in DWPT representation both the detail
 and approximate coeff. at each stage of the tree are

                                                                                              Subband Coders – p. 10/2
  DWT and DWPT Coders (contd. 3)
                                                                                   Q          y2

                                                                Q                             y3
                                                                                   Q          y4

                                    x         Q
                                                                                   Q          y6

                                                                Q                             y7
                                                                                   Q          y8

                                   y1         y2         y3          y4       y5        y6          y7          y8
                                        2.8        5.5        8.3         11     13.8        16.5        19.3        22 kHz
                                                                    Frequency (Hz)
Figure 8.4. Subband decomposition associated with a particular wavelet packet transform (“WPT” or “WP”). Although the picture illustrates
a balanced binary tree and the associated uniform bandwidth subbands, nodes could be pruned in order to achieve nonuniform frequency

        A FB interpretation of wavelets is attractive in audio
        coding algs.
        Wavelet or Wavelet Packet decomposition can be tree
        str. as necessary to decompose i/p audio into a set of
        freq. subbands tailored to some application

                                                                                                                              Subband Coders – p. 11/2
                      DWPT Coder

                                       T                            Y       Transmit
                Psychoacoustic                     d (s, sd ) ≤ T
                Analysis                                                    Index of sd
                                                                        T   Packet
                  Dynamic                                                   Search/
                                  sd                                    r   Analysis
                  Search                   −
                                               s      +

                                                                        s   Transmit
                                                                             r or s

    Figure 8.5. Dynamic dictionary/optimal wavelet packet encoder (after [Sinh93a]).

                                                                                          Subband Coders – p. 12/2
        DWPT Coder (contd.)
DWPT coder with globally adapted Daubechies
analysis wavelet
Variable rate wavelet-based coding scheme providing
nearly transparent coding of CD quality audio at 48-64
Exploits signal redundancy using a VQ scheme and
irrelevancy using a WP signal decomposition combined
with masking thresholds
I/P audio is segmented into vectors
Dynamic Dictionary (DD) which is essentially an
adaptive VQ subsystem, then eliminates signal

                                                  Subband Coders – p. 13/2
       DWPT Coder (contd.2)
Dictionary of N codewords is searched for the vector
perceptually closest to i/p vector
Codebook is systematically updated with incoming
audio vectors according to perceptual distortion
After DD procedure is completed, an optimized WP
decomposition is applied to the original signal as well
as the DD residual
Decomposition tree is structured such that its 29 freq
subbands roughly correspond to the CBs of the auditory
Masking th. obtained is assumed to be const. within
each subband and then used to complete a perceptual
bit allocation.
                                                    Subband Coders – p. 14/2
      DWPT Coder (contd. 3)
Encoder transmits particular combination of DD and
WP infmn. that minimizes bit rate maintaining
perceptual quality.
Three combinations are possible
   DD index and time-warping factor are transmitted
   alone, if DD residual energy is below masking thr.
   If DD residual has audible noise energy, then WP
   coeffs. of DD residual are also quantized, encoded
   and transmitted
   If WP coefficients corresponding to signal are more
   compact than combination of DD plus WP residual
   infmn, then only quantized and encoded WP
   coefficients of signal are transmitted.
Outperforms MPEG-1 layer II at 64 kbps

                                                 Subband Coders – p. 15/2
       Scalable DWPT Coder
Signal specific perceptual best basis is constructed by
adapting WP tree str. on each frame such that
perceptual entropy and bit rate are minimized
While tree str. is signal adaptive, analysis filters are
time-invariant and obtained from family of spline-based
biorthogonal wavelets.

                                      Perceptual Model


                                                    Zerotree             Lossless
                                Adaptive            Quantizer            Coding
                    l           WPT

          Figure 8.6. Masking-threshold adapted WP audio coder [Srin98]. On each frame, the WP
          tree structure is adapted in order to minimize a perceptually motivated rate constraint.

                                                                                                     Subband Coders – p. 16/2
            SDWPT (contd.)
Incorporates mechanisms for bit rate and complexity
Masking thresholds for all subbands
For any WP tree, associated bit rate computed using
min masking thr. and sufficient bits to guarantee
quantization noise does not exceed min. thr.
Objective of tree adaptation process is to construct min.
cost subband decomposition by maximizing min.
masking threshold
On each frame the tree adaptation process performs
top-down, iterative growing procedure.
During any iteration, existing subbands are assigned
individual costs based on bit allocation required for
transparent coding.
                                                   Subband Coders – p. 17/2
             SDWPT (contd.)
Subdivision occurs only if associated bit rate
improvements exceeds a threshold.
Tree adaptation is also constrained by complexity
scaling mechanism.
Top-down tree growth is halted by the complexity
scaling constraint λ when estimated total cost of
computing the DWPT reaches predetermined limit.
Complexity constrained tree adaptation procedure is
shown to yield basis requiring the fewest bits for
perceptually transparent coding for given complexity
and temporal resolution

                                                    Subband Coders – p. 18/2
             SDWPT (contd.)
After WP tree adaptation procedure has been
completed, the zerotree alg. is applied iteratively to
quantize coefficients and exploit remaining temporal
correlation until perceptual rate-distortion criteria are
Complete bitstream consists of encoded tree str., the
no. of zero tree iterations, and block of zero iteration
encoded data.
These elements are encoded in lossless fashion to
remove any remaining redundancies and transmitted to
the decoder
Found to provide perceptual transparency at data rates
of 45 kb/s

                                                      Subband Coders – p. 19/2
DWPT Coder with Globally Adapted Wavelet

   Principle of SDWPT coder is adapting WP tree with
   time-invariant analysis wavelet
   Principle of this coder is time-invariant WP tree with
   globally adapted analysis wavelet
   Objective is to demonstrate that there exists a
   signal-specific best wavelet basis in terms of perceptual
   coding gain

                                                       Subband Coders – p. 20/2
      Basis for Hybrid Coders
WP coders provide good compression, but perform
poorly for harmonic signals.
Because filters employed in WP decomposition are
characterized by poor freq. selectivity and hence poor
compact representation for sinusoidal signals
WP decomposition provide some control over time
resolution properties, leading to efficient
representations of transient signals.
This observation lead to the development of several
hybrid coders

                                                  Subband Coders – p. 21/2
Hybrid WP and Adapted WP/Sinusoidal Algs

   Based on hybrid WP/sinusoidal signal analysis
   Often improve coder robustness.
   Wavelet portion better suited to transient like
   Sinusoidal portion better suited for tonal or steady-state
   Several signal-adaptive wavelet and WP subband
   coding systems discussed earlier extended in hybrid
   Hybrid coding adapts the analysis properties of the
   coding alg. to the signal content.

                                                       Subband Coders – p. 22/2
Hybrid Sinusoidal/DWPT coder
          Masking                                      Encode
           Model                                      Side Info.
         Sinusoidal     Sinusoidal                    Encoder

                                                                     Bit Packing
          Analysis      Synthesis

 s(n)                        −
                           Σ            Analysis      Transient


        Figure 8.9. Hybrid sinusoidal/wavelet encoder (after [Hamd96]).

                                                                                   Subband Coders – p. 23/2
Hybrid Sinusoidal/ DWPT Coder
Exploit the efficiencies of both harmonic and wavelet
signal repns.
For each frame, coder chooses a compact signal repn.
from combined sinusoidal and wavelet basis.
Based on the notion that audio frames can be
decomposed into tonal, transient and noise
Assumes that tonal components are most compactly
represented by sinusoidal basis fns.
Transient and noise components are most efficiently
represented in terms of wavelet bases.

                                                 Subband Coders – p. 24/2
        Hybrid Coder (contd.)
First the audio frame is analyzed to extract sinusoidal
parameters (amp., freq., phase)
Audio frame is synthesized using sinusoidal parameters
Residual between original and synthesized version are
found out
Residual is decomposed into WP subbands
Hybrid harmonic-wavelet coder was reported to achieve
nearly transparent coding over a wide range of CD
quality source material at bit rates in the vicinity of 44

                                                    Subband Coders – p. 25/2

To top