Docstoc

MPEG_audio_tutorial

Document Sample
MPEG_audio_tutorial Powered By Docstoc
					MPEG/Audio Compression Tutorial

Mike Blackstock
CPSC 538a January 11, 2004
 Overview
 • Digital Sound
 • Psychoacoustics
 • Time to Frequency Domain Transformation
 • MPEG/Audio Basic Algorithm
 • Related Work
 • Web references



CPSC 538a MPEG Audio Tutorial January 12, 2004   2 of 17
 Digital Sound Basics
 • Sound is a continuous wave through the air
 • Made up of pressure differences, detected by
   measuring pressure levels at a location.
 • Microphone changes analog sound pressure
   to analog voltage levels.
 • To digitize sound, the signal must be
   sampled in time and encoded into numbers
 • Quantization divides signal strength into
   levels, linearly or logarithmically.
 • 8 bits –> 256 levels; 16 –> 65536 levels
CPSC 538a MPEG Audio Tutorial January 12, 2004   3 of 17
 Digital Audio Questions
 • How often should sound be sampled?
        – Need to sample at a rate at least twice as high as highest
          frequency, otherwise frequency is lost. Nyquist Theorum
 • What quality is required?
        – Telephone, radio, CD, different quality requirements.
        – Signal to Noise Ratio (SNR) is a measure of the quality
          of a signal
        – noise may be introduced during conversion from sound
          to voltage and due to sampling/quantization.
 • Format to use?
        – .au, aiff, .wav, and of course .mp3



CPSC 538a MPEG Audio Tutorial January 12, 2004                  4 of 17
 Psychoacoustics
 • Principles of the human perception of sound
 • MPEG compression algorithm uses model of
   human hearing to remove data (perceptual
   coding algorithm)
 • Frequency range is about 20 Hz to 20 kHz,
   most sensitive at 2 to 4 KHz.
 • Dynamic range (quietest to loudest) is about
   96 dB
 • Normal voice range is about 500 Hz to 2 kHz
 • Low frequencies -> vowels, bass; High ->
   consonants
CPSC 538a MPEG Audio Tutorial January 12, 2004   5 of 17
 Human Hearing Sensitivity
 • Experiment: Put a person in a quiet room.
   Raise level of 1 kHz tone until just barely
   audible. Vary the frequency, plot:




CPSC 538a MPEG Audio Tutorial January 12, 2004   6 of 17
 Human Frequency Masking
 • Experiment: Play 1 kHz tone (masking tone)
   at fixed level (60 dB). Play test tone at a
   different level (e.g., 1.1 kHz), and raise level
   until just distinguishable.
 • Vary the frequency of the test tone and plot
   the threshold when it becomes audible




CPSC 538a MPEG Audio Tutorial January 12, 2004   7 of 17
 Frequency Masking




CPSC 538a MPEG Audio Tutorial January 12, 2004   8 of 17
 Temporal Masking
 • If we hear a loud sound, then it stops, it takes a little while until we can
   hear a soft tone nearby.
 • Experiment: Play 1 kHz masking tone at 60 dB, plus a test tone at 1.1
   kHz at 40 dB. Test tone can't be heard (it's masked). Stop masking tone,
   then stop test tone after a short delay.
 • Adjust delay time to the shortest time when test tone can be heard (e.g.,
   5 ms).
 • Repeat with different level of the test tone and plot:




CPSC 538a MPEG Audio Tutorial January 12, 2004                              9 of 17
 Combination




CPSC 538a MPEG Audio Tutorial January 12, 2004   10 of 17
 Time to Frequency Transform




 Transforming time/level input signals to frequency/power
 FFT (here) most popular – fast and easy, and in most numerical methods texts.
 Used by psychoacoustic model.
 DCT often used for spatial frequency since represents linear signals better.
 Something similar used by filter bank.
 Wavelets use non-sine/cosine functions for better performance on data with sharp
 discontinuities.                               Demo http://www.jhu.edu/~signals/fourier2/index.html
CPSC 538a MPEG Audio Tutorial January 12, 2004                                                11 of 17
 MPEG Basics


                                Time to             Bit/noise                             Encoded
  PCM                                                                                     bitstream
                              frequency            allocation,               Bitstream
  audio
                             mapping filter      quantizer, and              formatting
  input
                                 bank                coding




                                                             Acillary data
                            Psychoacoustic                    (optional)
                                model




CPSC 538a MPEG Audio Tutorial January 12, 2004                                                  12 of 17
 Algorithm overview
 1.        Use convolution filters to divide the audio signal (e.g., 48
           kHz sound) into 32 frequency subbands --> subband
           filtering. 512 sample FIFO buffer used.
 2.        Determine amount of masking for each band caused by
           nearby band using the psychoacoustic model shown
           above.
 3.        If the power in a band is below the masking threshold,
           don't encode it.
 4.        Otherwise, determine number of bits needed to represent
           the coefficient such that noise introduced by quantization
           is below the masking effect (Recall that one fewer bit of
           quantization introduces about 6 dB of noise).
 5.        Format bitstream

CPSC 538a MPEG Audio Tutorial January 12, 2004                       13 of 17
 Example
 After analysis, the first levels of 16 of the 32 bands
 are:
 Band                  1       2 3         4     5   6   7   8   9   10 11 12 13 14 15 16
 Level(db)0                    8 12 10 6             2   10 60 35 20 15    2   3   5   3      1

 If the level of the 8th band is 60dB, it gives a masking
 of 12 dB in the 7th band, 15dB in the 9th.
 Level in 7th band is 10 dB ( < 12 dB ), so ignore it.
 Level in 9th band is 35 dB ( > 15 dB ), so send it.
 Only the amount above the masking level needs to be
 sent, so instead of using 6 bits to encode it, we can
 use 4 bits saving 2 bits (= 12 dB).
CPSC 538a MPEG Audio Tutorial January 12, 2004                                             14 of 17
 MPEG layers
 Layer 1
        – DCT-type filter with one frame
        – equal frequency spread per band
        – Psychoacoustic model only uses frequency masking.
 Layer 2
        – Use three frames in filter
        – before, current, next, a total of 1152 samples
        – models a bit of temporal masking
 Layer 3 (mp3)
        –    Better critical band filter is used (non-equal frequencies)
        –    psychoacoustic model includes temporal masking effects
        –    takes into account stereo redundancy
        –    uses Huffman coder
CPSC 538a MPEG Audio Tutorial January 12, 2004                      15 of 17
 Related Work
 • MPEG phase 2
        – Multichannel (5.1) audio support
        – Significant in driving DVD sales
 • MPEG-4 Structured Audio
        – Efficient, flexible description of synthetic music
 • Copy protection and copyright
 • Speech Processing
        – Uses many similar techniques



CPSC 538a MPEG Audio Tutorial January 12, 2004             16 of 17
 References
 •    SFU CMPT 365 Course Contents Spring 2003
 •    Basics of Digital Audio,
      http://www.cs.sfu.ca/CC/365/mark/material/notes/Chap3/Chap3.1/Chap3.1.html,
      retrieved January 7, 2004
 •    Audio Compression,
      http://www.cs.sfu.ca/CC/365/mark/material/notes/Chap4/Chap4.4/Chap4.4.html,
      retrieved January 7, 2004
 •    Audio and Multimedia Layer 3,
      http://www.iis.fraunhofer.de/amm/techinf/layer3/index.html, January 8, 2004
 •    MP3 Backgrounder http://www.audioactive.com/intro/papers/backbone.html,
      January 8, 2004
 •    Scheirer, E. D., The MPEG-4 Structured Audio, Proceedings of ICASSP98
 •    Scheirer, E.D., SAOL / MPEG-4 Structured Audio homepage,
      http://web.media.mit.edu/~eds/mpeg4-old/
 •    PERKOWSKI, M. A., Speech Signals in Time and Frequency Domain
      http://www.ee.pdx.edu/~mperkows/CLASS_480/transmit1/A003.time-and-
      frequency-domain.pdf, January 11, 2004
 •    Graps, A. “An Introduction to Wavelets" IEEE Computational Sciences and
      Engineering, Volume 2, Number 2, Summer 1995, pp 50-61. Also available at
      http://www.amara.com/IEEEwave/IEEEwavelet.html
 •    Signals Demonstrations http://www.jhu.edu/~signals/index.html

CPSC 538a MPEG Audio Tutorial January 12, 2004                               17 of 17

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:5
posted:9/28/2011
language:English
pages:17