Document Sample
2-MediaRepresentation-Audio Powered By Docstoc
					CMPT 365 Multimedia Systems

    Media Representations
           - Audio

           Spring 2012

                         CMPT365 Multimedia Systems   1

 Audio Signals
    Sampling
    Quantization

 Audio file format

 Human auditory system

                                CMPT365 Multimedia Systems   2
                    What is Sound ?

 Sound is a wave phenomenon, involving molecules of
  air being compressed and expanded under the
  action of some physical device.
      A speaker in an audio system vibrates back and forth and
       produces a longitudinal pressure wave that we perceive as
      Since sound is a pressure wave, it takes on continuous
       values, as opposed to digitized ones.
        • If we wish to use a digital version of sound waves, we must
          form digitized representations of audio information.

                                                       CMPT365 Multimedia Systems   3

 Digitization means conversion to a stream of
  numbers, and preferably these numbers should be
  integers for efficiency.
 1-dimensional nature of sound: amplitude values
  (sound pressure/level) depend on a 1D variable,

                                        CMPT365 Multimedia Systems   4
                  Digitization        cont’d

 Digitization must be in both time and amplitude
    Sampling: measuring the quantity we are interested in,
     usually at evenly-spaced intervals
 The first kind of sampling, using measurements
  only at evenly spaced time intervals, is simply
  called sampling. The rate at which it is performed
  is called the sampling frequency
      For audio, typical sampling rates are from 8 kHz (8,000
       samples per second) to 48 kHz. This range is determined
       by Nyquist theorem discussed later.
 Sampling in the amplitude or voltage dimension is
  called quantization

                                                 CMPT365 Multimedia Systems   5
Sampling and Quantization

                       CMPT365 Multimedia Systems   6
    Audio Digitization (PCM)

PCM: Pulse coded modulation

                              CMPT365 Multimedia Systems   7
           Parameters in Digitizing

 To decide how to digitize audio data we need to
  answer the following questions:
      1. What is the sampling rate?
      2. How finely is the data to be quantized, and is
  quantization uniform?
      3. How is audio data formatted? (file format)

                                                 CMPT365 Multimedia Systems   8
                  Sampling Rate
 Signals can be decomposed into a sum of sinusoids.
    -- weighted sinusoids can build up quite a complex signals

                                                 CMPT365 Multimedia Systems   9
                Sampling Rate              cont’d

 If sampling rate just equals the actual frequency
    a false signal (constant ) is detected

 If sample at 1.5 times the actual frequency
    an incorrect (alias) frequency that is lower than the
     correct one
       • it is half the correct one -- the wavelength, from peak to
         peak, is double that of the actual signal

                                                      CMPT365 Multimedia Systems   10
                        Nyquist Theorem

 For correct sampling we must use a sampling rate
  equal to at least twice the maximum frequency
  content in the signal. This rate is called the
  Nyquist rate.
 Sampling theory – Nyquist theorem

          If a signal is band(frequnecy)-limited, i.e.,
      there is a lower limit f1 and an upper limit f2 of
       frequency components in the signal, then the
        sampling rate should be at least 2(f2 − f1).

Proof and more math:

                                                                  CMPT365 Multimedia Systems   11
 Quantization (Pulse Code Modulation)

 At every time interval the sound is converted to a
  digital equivalent
 Using 2 bits the following sound can be digitized
      Tel: 8 bits
      CD: 16 bits

                                         CMPT365 Multimedia Systems   12
                      Digitize audio

 Each sample quantized,        Example: 8,000
  i.e., rounded                  samples/sec, 256
      e.g., 28=256 possible     quantized values -->
       quantized values          64,000 bps
 Each quantized value          Receiver converts it
  represented by bits            back to analog signal:
      8 bits for 256 values         some quality reduction
                               Example rates
                                CD: 1.411 Mbps
                                MP3: 96, 128, 160 kbps
                                Internet telephony:
                                 5.3 - 13 kbps
                                                CMPT365 Multimedia Systems   13
Audio Quality vs. Data Rate

                        CMPT365 Multimedia Systems   14
            More on Quantization

 Quantization is lossy !
 Roundoff errors => quantization noise/error

                                        CMPT365 Multimedia Systems   15
                 Quantization Noise

 Quantization noise: the difference between the
  actual value of the analog signal, for the particular
  sampling time, and the nearest quantization
  interval value.
      At most, this error can be as much as half of the
 The quality of the quantization is characterized by
  the Signal to Quantization Noise Ratio (SQNR).
      A special case of SNR (Signal to Noise Ratio)

                                                  CMPT365 Multimedia Systems   16
           Signal to Noise Ratio (SNR)

   Signal to Noise Ratio (SNR): the ratio of the
    power of the correct signal and the noise
       A common measure of the quality of the signal.
 SNR is usually measured in decibels (dB), where 1
    dB is 1/10 Bel. The SNR value, in units of dB, is
    defined in terms of base-10 logarithms of squared
    voltages, as follows:

                                                  CMPT365 Multimedia Systems   17
    Signal to Noise Ratio (SNR)                      cont’d

 The actual power in a signal is proportional to the
  square of the voltage. For example, if the signal
  voltage Vsignal is 10 times the noise, then the SNR
  is 20 log10(10)=20dB.

      if the power from ten violins is ten times that from one
       violin playing, then the ratio of power is 10dB, or 1B.

                                                   CMPT365 Multimedia Systems   18
Common sound levels

                      CMPT365 Multimedia Systems   19
        Quantization Noise Ratio (SQNR) Revisit

 For a quantization accuracy of     N bits per sample, the peak
   SQNR can be simply expressed:

 6.02N is the worst case.

 If the input signal is sinusoidal, the quantization error is
   statistically independent, and its magnitude is uniformly
   distributed between 0 and half of the interval, then it can
   be shown that the expression for the SQNR becomes:

                         Derive it by yourself !
                                                    CMPT365 Multimedia Systems   20

 Audio Signals
    Sampling
    Quantization

 Audio file format

 Human auditory system

                                CMPT365 Multimedia Systems   21
           Audio File Format: .WAV
   Microsoft format: Interleaved multi-channel samples

                                                     CMPT365 Multimedia Systems   22

Create this figure in Matlab:
x = wavread(‘horn.wav’);
plot(x(:, 1));                   -0.5
                                        0         2           4            6            8            10               12
plot(x(4000:10000, 1));                                                                                               5
                                                                                                                 x 10

Note:                           0.05
Wavread() normalizes the           0
Samples to the range of
[-1, 1].
                                        0   500       1000   1500   2000       2500   3000    3500        4000     4500

                                                                                CMPT365 Multimedia Systems       23
           Audio File Format: MIDI

 MIDI: Musical Instrument Digital Interface
   A simple scripting language and hardware setup
   MIDI Overview
   MIDI codes “events" that stand for the production of
    sounds. E.g., a MIDI event might include values for the
    pitch of a single note, its duration, and its volume.
   MIDI is a standard adopted by the electronic music
    industry for controlling devices, such as synthesizers and
    sound cards, that produce music.
   Supported by most sound cards

                                                CMPT365 Multimedia Systems   24

 Audio Signals
    Sampling
    Quantization

 Audio file format

 Human auditory system

                                CMPT365 Multimedia Systems   25
                   Computer vs. Ear

 Multimedia signals are interpreted by humans!
      Need to understand human perception
 Almost all original multimedia signals are analog signals:
      A/D conversion is needed for computer processing

                                                     CMPT365 Multimedia Systems   26
    Properties of HAS: Human Auditory System

 Range of human’ hearing: 20Hz - 20kHz
     Minimal sampling rate for music: 40 kHz (Nyquist
    CD Audio:
        • 44.1 kHz sampling rate
        • each sample is represented by a 16-bit signed integer
        • 2 channels are used to create stereo system
        44100 * 16 * 2 = 1,411,200 bits / second (bps)
      Speech signal: 300 Hz – 4 KHz
        •  Minimum sampling rate is 8 KHz (as in telephone system)

                                                      CMPT365 Multimedia Systems   27
       Properties of Human Auditory System
 Hearing threshold varies dramatically at different
 Most sensitive around 2KHz

                                         CMPT365 Multimedia Systems   28
        Properties of Human Auditory System
Critical Bands:
 Our brains perceive the sounds through 25 distinct critical
   bands, the bandwidth grows logarithmically with frequency.
 At 100Hz, the bandwidth is about 160Hz;
 At 10kHz it is about 2.5kHz in width.

       12   3 4   5   6                   24        25


                                                 CMPT365 Multimedia Systems   29
           Properties of Human Auditory System
 Masking effect:
      what we hear depends on what audio environment we are in
      One strong signal can overwhelm/ hide another

The masking effects in the
frequency domain:

A masker inhibits perception
of coexisting signals
below the masking threshold.

                                                       CMPT365 Multimedia Systems   30
        Properties of Human Auditory System
 Masking thresholds in the time domain:

  Simultaneous masking: Two sounds occur simultaneously and
                        one is masked by the other.
   Backward masking (Pre):         Forward masking (Post):
   A softer sound that occurs      softer sounds that occur as much as
   prior to a loud one will be     200 milliseconds after the loud sound
   masked by the louder sound.     will also be masked.
                                                     CMPT365 Multimedia Systems   31
                HAS: Audio Filtering

 Prior to sampling and AD (Analog-to-Digital)
  conversion, the audio signal is also usually filtered
  to remove unwanted frequencies.
      For speech, typically from 50Hz to 10kHz is retained,
       and other frequencies are blocked by the use of a band-
       pass filter that screens out lower and higher
      An audio music signal will typically contain from about
       20Hz up to 20kHz
      At the DA converter end, high frequencies may reappear
       in the output (Why ?)
        • because of sampling and then quantization, smooth input
          signal is replaced by a series of step functions containing all
          possible frequencies
      So at the decoder side, a lowpass filter is used after the
       DA circuit
                                                         CMPT365 Multimedia Systems   32
         HAS: Perceptual audio coding
 The HAS properties can be exploited in audio coding:
      Different quantizations for different critical bands
        • Subband coding
      If you can’t hear the sound, don’t encode it
      Discard weaker signal if a stronger one exists in the same band
       (frequency-domain masking)
      Discard soft sound after a loud sound (time-domain masking)
      Stereo redundancy: At low frequencies, we can’t detect where
       the sound is coming from. Encode it mono.
 More on later (MP3, APE…)

                                                       CMPT365 Multimedia Systems   33
                Further Exploration

 Links for Chapter 6 in “Further Exploration” of
  the textbook page
      An extensive list of audio file formats.
      CD audio file formats are somewhat different. The main
       music format is called “red book audio.“ A good
       description of various CD formats is on the website.
      A General MIDI Instrument Patch Map, along with a
       General MIDI Percussion Key Map.
      A link to good tutorial on MIDI and wave table music
      A link to a java program for decoding MIDI streams.
      A good multimedia/sound page, including a source for
       locating Internet sound/music materials.

                                                CMPT365 Multimedia Systems   34

Shared By:
fanzhongqing fanzhongqing http://