Docstoc

ADSP-10-AC-Psychoacoustics-EC623-ADSP

Document Sample
ADSP-10-AC-Psychoacoustics-EC623-ADSP Powered By Docstoc
					                                          www.jntuworld.com
Audio Coding
Psychoacoustics
   S. R. M. Prasanna


     Dept of ECE,
     IIT Guwahati,
 prasanna@iitg.ernet.in




                          Audio Coding – p. 1/4
                                                                       www.jntuworld.com
                  Motivation
Acoustics: Study of sounds
Psychoacoustics: Study of perception of sounds
Deals with characterizing human auditory perception
Particularly time-frequency analysis capabilities of inner
ear
Audio coders achieve significant compression by
exploiting the property that perceptually irrelevant
information cannot be heard
Perceptually irrelevant information is identified by
incorporating several psychoacoustic principles




                                                       Audio Coding – p. 2/4
                                                       www.jntuworld.com
 Human Speech Perception




Figure 1: Cross Section of Human Ear
                                       Audio Coding – p. 3/4
                                                                      www.jntuworld.com
      Functions of Human Ear
Mainly three regions - outer ear, middle ear & inner ear
Outer ear - directs speech pressure variations towards
the middle ear
Middle ear - transforms pressure variations into
mechanical motion
Inner ear - converts mechanical vibrations into electrical
firings in the auditory neurons, which leads to brain
Language decoding and message understanding at the
higher centers of learning in brain which is less
understood




                                                      Audio Coding – p. 4/4
                                                         www.jntuworld.com
            Inner Ear




Figure 2: Figures Related to Inner Ear
                                         Audio Coding – p. 5/4
                                                                     www.jntuworld.com
Frequency to Place Transformation
 Sound waves to mechanical vibrations by middle ear
 Mechanical vibrations to traveling waves by inner ear
 along the length of basilar membrane
 Neural receptors are connected along the length of the
 basilar membrane
 Traveling waves generate peak responses at frequency
 specific membrane positions
 Therefore different neural receptors are effectively
 tuned to different frequency bands according to their
 locations.




                                                     Audio Coding – p. 6/4
                                                                      www.jntuworld.com
  Freq. to Place Tfmn. (contd.)
For sinusoidal stimuli, the peak response occurs near
the basilar membrane region with a resonant freq.
equal to input sinusoid freq.
Location of peak is characteristic place for the stimulus
Freq. that best excites a particular place is
characteristic frequency
Thus a frequency to place transformation takes place




                                                      Audio Coding – p. 7/4
                                                              www.jntuworld.com
  Signal Processing Perspective
Bank of highly overlapping band pass filters
Magnitude responses are asymmetric
Bandwidths increase with frequency




                                              Audio Coding – p. 8/4
                                                                        www.jntuworld.com
   Sound Pressure Level (SPL)
A std. metric that quantifies the intensity of an
acoustical stimulus
SPL gives the level (intensity) of sound pressure in dBs
relative to an internationally defined ref. level
LSP L = 20log10 (p/p0 ) (dB)
where LSP L is the SPL of a stimulus p, which is the
sound pressure in pascals and p0 is the std. ref level of
20 µP a
About 150 dB SPL spans the dynamic range of intensity
for human auditory system
Min value is the limit of detection for low intensity (quiet)
stimuli
Max value is the threshold of pain for high intensity
(loud) stimuli
                                                        Audio Coding – p. 9/4
                                                                            www.jntuworld.com
Absolute Threshold for Hearing (ATH)
   Amount of energy needed in a pure tone such that it can
   be detected by a listener in a noiseless environment
   ATH is expressed in dB SPL
   ATH is frequency dependent parameter and is given by
   Tq (f ) =
   3.64(f /1000)−0.8 − 6.5e−0.6(f /1000−3.3)2 + 10−3 (f /1000)4

   dB(SP L)
   In the context of signal compression, Tq (f ) could be
   interpreted naively as a maximum allowable energy
   level for coding distortions introduced in the frequency
   domain (Fig 5.1 from Spanias book)
   Use of ATH to shape the coding distortion spectrum
   represents the first step towards perceptual coding.
                                                           Audio Coding – p. 10/4
                                                            www.jntuworld.com
          ATH Diagram




Figure 3: Absolute Threshold for Hearing
                                           Audio Coding – p. 11/4
                                                                          www.jntuworld.com
           Critical Bands (CB)
Critical band is a function of frequency that quantifies
the cochlear filter passbands
CB tends to remain constant (about 100 Hz) up to 500
Hz and increases to approximately 20% of the center
frequency about 500 Hz
For an average listener the critical bandwidth is given
by BWc (f ) = 25 + 75[1 + 1.4(f /100)2 ]0.69 (Hz)
The function
Zb (f ) = 13tan−1 (0.00076f ) + 3.5tan−1 ((f /7500)2 ) (Bark)
is often used to convert frequency in Hz to Bark scale
Nonuniform Hz spacing of the filter bank is actually
uniform on a Bark scale
One critical band (CB) comprises one Bark. (Table 5.1
and Fig. 5.4)
                                                         Audio Coding – p. 12/4
                                                          www.jntuworld.com
         Critical Bands




Figure 4: Table Showing Critical Bands
                                         Audio Coding – p. 13/4
                                                           www.jntuworld.com
  Mapping from Hz to Bark




Figure 5: Mapping from Hz to Bark Scale
                                          Audio Coding – p. 14/4
                                                                     www.jntuworld.com
       Simultaneous Masking
Masking: One sound is rendered inaudible because of
the presence of another sound
Simultaneous masking: When two or more stimuli are
simultaneously presented to the auditory system
Freq. Domain: Relative shapes of the masker and
maskee magnitude spectra determine to what extent
presence of certain spectral energy will mask the
presence of other spectral energy
Time Domain: Phase relationships between stimuli can
also affect masking outcomes
In simple words presence of a strong noise or tone
masker creates an excitation of sufficient strength on
the basilar membrane at the critical band location to
block effectively detection of a weaker (maskee) signal.
                                                    Audio Coding – p. 15/4
                                                                 www.jntuworld.com
Types of Simultaneous Masking
Noise-Masking-Tone (NMT), Tone-Masking-Noise
(TMN) and Noise-Masking-Noise (NMN)
NMT:
  A NB noise (1 Bark) masks a tone within the same
  CB, provided intensity of masked tone is below a
  predictable threshold
  Signal-to-Mask Ratio (SMR) (dB) is the difference
  between the intensities of masking and maskee
  Min. SMR at the threshold of detection occurs when
  maskee freq is close to center freq of masker and
  will be about 5 dB




                                                Audio Coding – p. 16/4
                                                                  www.jntuworld.com
            TMN and NMN
TMN:
  Pure tone at the center of a CB masks noise of any
  subcritical BW, provided noise spectrum is below a
  predictable threshold
  Min SMR lie between 21 and 28 dB
NMN:
  A NB noise masks another NB noise
  Min SMR is nearly about 26 dB




                                                 Audio Coding – p. 17/4
                                             www.jntuworld.com
 Masking Schemes




Figure 6: Masking schemes
                            Audio Coding – p. 18/4
                                                                       www.jntuworld.com
       Asymmetry of Masking
The NMT and TMN show asymmetry in masking power
between noise masker and tone masker
In spite of both maskers at same db SPL, associated
threshold SMRs differ by 20 dB
Hence the interest in all types of masking
Knowledge of all three is critical to succeed in the task
of shaping coding distortion
For each temporal analysis interval, a codec’s
perceptual model should identify across the freq
spectrum noise-like and tone-like components within
both the audio signal and the coding distortion
Model should then apply appropriate masking
relationships to obtain global masking threshold

                                                      Audio Coding – p. 19/4
                                                                      www.jntuworld.com
           Spread of Masking
Simultaneous masking is not bandlimited to within the
boundaries of a single CB
Interband masking also occurs, i.e., a masker centered
within one critical band has some predictable effect on
detection thresholds in other CBs.
This effect is known as spread of masking
A triangular spreading function that has slopes of +25
and -10 dB per Bark.
SFdB (x) = 15.81 + 7.5(x + 0.474) − 17.5 1 + (x + 0.474)2
dB
where x in Barks and SFdB (x) is expressed in dB.



                                                     Audio Coding – p. 20/4
                                                                  www.jntuworld.com
Just Noticeable Distortion (JND)
Global masking threshold comprises an estimate of the
level at which quantization noise becomes just
noticeable
Hence global masking threshold is sometimes referred
to as JND




                                                 Audio Coding – p. 21/4
                                                                  www.jntuworld.com
    Nonsimultaneous Masking
Also termed temporal masking
Masking phenomenon extends beyond window of
simultaneous stimulus presentation
Masking occurs both prior to masker onset and also
after masker removal
Forward (post) and backward (pre) masking are the two




                                                 Audio Coding – p. 22/4
www.jntuworld.com




                    Figure 7: Temporal Masking




                               22-1
                                                                       www.jntuworld.com
          Perceptual Entropy
Entropy gives min. no. of bits/sample required to store
or transmit given message block
Johnstan combined notion of psychoacoustic masking
with signal quantization principles to define Perceptual
Entropy (PE).
Perceptual Entropy gives min. no. of bits/sample
required to store or transmit perceptually relevant
information in given audio message block.
While discussing PE, conventional entropy is termed as
statistical entropy.
Statistical entropy employs the statistical properties of
the signal for computing entropy
Perceptual entropy employs both statical and
perceptual properties of signal for computing entropy.
                                                      Audio Coding – p. 23/4
                                                                      www.jntuworld.com
                Basis for PE
Masking threshold indicates amount of quantzn. in freq.
dom. without perceptually corrupting signal.
Assume that step size and no. of levels in the quantizer
for each spectral line could be set independently.
Further choice of step size is such that total noise
injected at each frequency corresponds to masking
threshold i.e., min no of quantization levels are used.
Then no. of bits required to encode entire transform
represents min. no. of bits necessary to transmit that
block of the signal.
The total number of bits divided by the no. of samples in
the transform represents per-sample rate.
This per-sample bit rate is Perceptual Entropy of signal.

                                                     Audio Coding – p. 24/4
                                                                        www.jntuworld.com
                   PE v/s SE
Statistical entropy (SE) exploits signal statistics
Perceptual entropy (PE) exploits signal statistics and
also psychoacoustic masking
No. of quantization levels just to avoid perceptual
distortion due to quantization by exploiting masking
thresholds.




                                                       Audio Coding – p. 25/4
                                                                    www.jntuworld.com
    Steps for PE Computation
DFT computation
Finding Masking thresholds
Calculating no. of bits to quantize DFT spectrum




                                                   Audio Coding – p. 26/4
                                                            www.jntuworld.com
           DFT Computation
Windowing and frequency transformation
2048 sample DFT by FFT
1024 are considered for further analysis




                                           Audio Coding – p. 27/4
                                                                      www.jntuworld.com
Calculation of Masking Threshold
Critical band analysis
Applying spreading function to critical band spectrum
Calculating Masking Thresholds
Accounting for absolute thresholds
Relating spread masking threshold to critical band
masking threshold




                                                     Audio Coding – p. 28/4
                                                                 www.jntuworld.com
        Critical Band Analysis
DFT spectrum is complex: S(ω) = Re(ω) + Im(ω)
Power Spectrum: P (ω) = Re2 (ω) + Im2 (ω)
P (ω) is partitioned into CBs
                                bhi
Energy in each CB: Bi =         ω=bli   P (ω)
Bi represents CB spectrum




                                                Audio Coding – p. 29/4
                                                                      www.jntuworld.com
      Spreading Function (SF)
CB spectrum threshold is also influenced by adjacent
CBs which is accounted using SF.
SF is used to estimate effects of masking across CBs
SF is calculated for abs(j − i) ≤ 25, where i is bark freq
of masked and j is bark freq of masking and placed into
a matrix Sij
Spread CB Spectrum: Ci = Sij ∗ Bi
Effect of spreading function is to spread peaks in Bi and
also raise threshold values, especially at higher
frequencies.




                                                     Audio Coding – p. 30/4
                                                                    www.jntuworld.com
         Masking Thresholds
TMN is estimated as 14 + i dB below Ci , where i is bark
freq.
NMT is estimated as 5.5 dB below Ci uniformly across
CB spectrum




                                                   Audio Coding – p. 31/4
                                                                        www.jntuworld.com
Tone Like and Noise Like Components
  Spect. Flatness Measure: SF M = GM /AM
  GM geometric mean of P (ω) and AM is arithmetic mean
  of P (ω)
  SF MdB = 10log10 (GM /AM )
  Coeff. of tonality: α = min(SF MdB /SF MdBmax , 1)
  SF Mdbmax = −60 dB is used to estimate tonality
  SF MdB = 0 indicate complete noise like
  SF MdB = −30 dB indicates α = 0.5
  SF MdB = −75 dB indicates α = 1.0




                                                       Audio Coding – p. 32/4
                                                                      www.jntuworld.com
    Offset for Masking Energy
Oi = α(14.5 + i) + (1 − α)5.5 (dB), in each band i
Index α is used to geometrically weight the two
thresholds
Oi is then subtracted from Ci to yield spread threshold
estimate Ti = 10log10 (Ci )−Oi /10
Since spectrum spread fns. do not have normalized
gain, it is normalized by the DC gain for each CB
After normalization, bark thresholds are compared to
absolute thresholds.
Any CB that has bark threshold lower than absolute
threshold is changed to the absolute threshold
This will be the threshold used for computing bit rate.

                                                     Audio Coding – p. 33/4
                                                                      www.jntuworld.com
        Calculation of Bit Rate
No. of quantization levels to follow signal in freq domain
Ti is in power d omain
Quantization energy must be spread across ki spectral
lines in each CB
Assuming noise to spread equally across the entire
band, noise energy will be δ 2 /12
Energy at each spectral freq = Ti /ki
Real and imaginary are quantized independently,
= Ti /2ki
δ 2 /12 = Ti /2Ki =⇒ δ = Ti′ =   (6Ti )/ki
Ti′ is step size.


                                                     Audio Coding – p. 34/4
                                                                         www.jntuworld.com
               Computing PE
NRe (ω) = abs(nint(Re(ω)/Ti′ )) and
NIm (ω) = abs(nint(Im(ω)/Ti′ )) for each ω within CB i.
Let N∗ represents actual (integer) quantized value of
each line
                            ′
If N(ReorIm) (ω) = 0, then N(ReorIm) (ω) = 0
                            ′
If N(ReorIm) (ω) = 0, then N(ReorIm) (ω) = log2 (2N∗ (ω) + 1)
This operation assigns a bit rate of zero bits to any
signal with an amplitude that does not need to be
quantized and assigns a bit ate of log2 (no.of levels) to
those that must be quantized.
                   π     ′         ′
Total bit rate =   ω=0 (NRe (ω) + NIm (ω))
Rate per sample, P E = T otalbitrate/2048
                                                        Audio Coding – p. 35/4
                                                                     www.jntuworld.com
Example codec perceptual model
ISO/IEC 11172-3 (MPEG-1) Psychoacoustic Model-1
Determines max. allowable quantization noise energy
in each CB such that it remains inaudible.
Blocking i/p audio into frames
High resolution spectral computation for each frame
For each frame tonal and noise maskers estimation
Decimation and reorganization of maskers
Calculation of individual masking thresholds for
components in each CB
Calculation of global masking thresholds for each CB



                                                    Audio Coding – p. 36/4
                                                                                           www.jntuworld.com
                             Spectral Analysis
            512 point DFT computation
            Power Spectral Density (PSD) P (k) estimation, where
            k = 1, 2, . . . , 512

           60
           50
SPL (dB)




           40
           30
           20
           10
             0
           −10
                 0   2000   4000   6000   8000 10000 12000 14000 16000 18000
                                             Frequency (Hz)


                                                                          Audio Coding – p. 37/4
                                                                       www.jntuworld.com
Identn. of Tonal and Noise Maskers
 P (k) where k = 1, 2, . . . , 256 are considered
 Local maxima in PSD within a certain Bark by at least 7
 dB are classified as tonal
 Tonal set ST is defined as

   ST = P (k)|P (k) > P (k ± 1)&P (k) > P (k ± ∆k ) + 7dB

 where

           ∆k ∈ 2       2 < k < 63(0.17 − 5.5kHz)
       ∆k ∈ [2, 3]     63 ≤ k < 127(5.5 − 11kHz)
       ∆k ∈ [2, 6]     127 ≤ k ≤ 256(11 − 20kHz)



                                                      Audio Coding – p. 38/4
                                                                          www.jntuworld.com
Tonal and Noise Maskers (contd.)
Tonal maskers PT M (k), are computed from spectral
peaks listed in ST :
                               1
         PT M (k) = 10log10          100.1P (k+j) (dB)
                              j=−1

For each neighborhood max, energy from three
adjacent peaks combined to form a single tonal masker
                   ¯
For each CB, PN M (k) a single NM is then computed
from (remaining) spectral lines not within the ±∆k
neighborhood of a tonal masker using the sum
               ¯
         PN M (k) = 10log10    100.1P (j) (dB)
                               j
             ∀P (j) = PT M (k, k ± 1, k ± ∆k )
       ¯
 where k is geometric mean spectral line of CB
                                                         Audio Coding – p. 39/4
                                                                      www.jntuworld.com
       Decimation of Maskers
No. of maskers are reduced using two criteria
First, any tonal or noise maskers below abs. threshold
are discarded, i.e., PT M,N M (k) ≥ Tq (k) are retained.
Next, a sliding 0.5 Bark-wide window is used to replace
any pair of maskers occurring within a distance of 0.5
Bark by the stronger of the two.
Masker freq. bins are reorganized using the decimation
scheme

              PT M,N M (i) = PT M,N M (k)
                        PT M,N M (k) = 0




                                                     Audio Coding – p. 40/4
                                                                        www.jntuworld.com
          Decimation (contd.)

i = k,                                1 ≤ k ≤ 48
i = k + (kmod2)                      49 ≤ k ≤ 96
i = k + 3 − ((k − 1)mod4)           97 ≤ k ≤ 232

Net effect is 2 : 1 decimation of masker bins in CBs
18-22
4:1 decimation of masker bins in CBs 22-35
With no loss of masking components.
Decimation reduces total no. of tone and noise masker
freq. bins under consideration from 256 to 106



                                                       Audio Coding – p. 41/4
                                                                        www.jntuworld.com
Individual Masking Thresholds
Using decimated set of tonal and noise maskers,
individual tone and noise masking thresholds are
computed
Each individual threshold represents a masking
contribution at freq. bin i due to the tone or noise
masker located at bin j
Tonal Masking Threshold, TT M (i, j) is given by
TT M (i, j) = PT M (j)−0.2757zb (j)+SF (i, j)−6.025(dbSP L)
where, PT M (j) is SPL of tonal masker in freq. bin j ,
zb (j) Bark freq of bin j and SF (i, j) is spreading of
masking from bin j to bin i
Noise Masking Threshold, TN M (i, j) is given by
TN M (i, j) = PN M (j)−0.175Zb (j)+SF (i, j)−2.025(dbSP L)
where, PN M (j) is SPL of noise masker in freq bin j
                                                       Audio Coding – p. 42/4
                                                                      www.jntuworld.com
   Global Masking Thresholds
Individual masking thresholds are combined to estimate
a global masking threshold for each freq. bin
Tg (i) = 10log10 (100.1Tq (i) + L 100.1TT M (i,l) +
                                   l=1
   M
   m=1  100.1TN M (i,m) )(db, SP L) where, L and M are the
number of tonal and noise maskers, respectively.
The number of bits are allocated based on the global
masking thresholds and is termed as perceptual bit
allocation.




                                                     Audio Coding – p. 43/4
                                                                     www.jntuworld.com
Expt. 5-AC- Audio Synthesis using MSE
   Problem No. 2.25 (pp. 49) of Spanias book on Audio
   Signal Processing




                                                    Audio Coding – p. 44/4
                                                                      www.jntuworld.com
 Expt. 6-AC- Audio Synthesis using Psychoacoustics


Problem No. 5.11 (pp. 142) of Spanias book on Audio
Signal Processing




                                                     Audio Coding – p. 45/4

				
DOCUMENT INFO
Shared By:
Categories:
Tags: FILTERS
Stats:
views:9
posted:8/22/2011
language:English
pages:46
Description: This deals about the digital processing programs