# 2-MediaRepresentation-Audio

Document Sample

CMPT 365 Multimedia Systems

Media Representations
- Audio

Spring 2012

CMPT365 Multimedia Systems   1
Outline

 Audio Signals
 Sampling
 Quantization

 Audio file format
 WAV/MIDI

 Human auditory system

CMPT365 Multimedia Systems   2
What is Sound ?

 Sound is a wave phenomenon, involving molecules of
air being compressed and expanded under the
action of some physical device.
   A speaker in an audio system vibrates back and forth and
produces a longitudinal pressure wave that we perceive as
sound.
   Since sound is a pressure wave, it takes on continuous
values, as opposed to digitized ones.
• If we wish to use a digital version of sound waves, we must
form digitized representations of audio information.

CMPT365 Multimedia Systems   3
Digitization

 Digitization means conversion to a stream of
numbers, and preferably these numbers should be
integers for efficiency.
 1-dimensional nature of sound: amplitude values
(sound pressure/level) depend on a 1D variable,
time.

CMPT365 Multimedia Systems   4
Digitization        cont’d

 Digitization must be in both time and amplitude
 Sampling: measuring the quantity we are interested in,
usually at evenly-spaced intervals
 The first kind of sampling, using measurements
only at evenly spaced time intervals, is simply
called sampling. The rate at which it is performed
is called the sampling frequency
   For audio, typical sampling rates are from 8 kHz (8,000
samples per second) to 48 kHz. This range is determined
by Nyquist theorem discussed later.
 Sampling in the amplitude or voltage dimension is
called quantization

CMPT365 Multimedia Systems   5
Sampling and Quantization

CMPT365 Multimedia Systems   6
Audio Digitization (PCM)

PCM: Pulse coded modulation

CMPT365 Multimedia Systems   7
Parameters in Digitizing

 To decide how to digitize audio data we need to
1. What is the sampling rate?
2. How finely is the data to be quantized, and is
quantization uniform?
3. How is audio data formatted? (file format)

CMPT365 Multimedia Systems   8
Sampling Rate
 Signals can be decomposed into a sum of sinusoids.
-- weighted sinusoids can build up quite a complex signals

CMPT365 Multimedia Systems   9
Sampling Rate              cont’d

 If sampling rate just equals the actual frequency
 a false signal (constant ) is detected

 If sample at 1.5 times the actual frequency
 an incorrect (alias) frequency that is lower than the
correct one
• it is half the correct one -- the wavelength, from peak to
peak, is double that of the actual signal

CMPT365 Multimedia Systems   10
Nyquist Theorem

 For correct sampling we must use a sampling rate
equal to at least twice the maximum frequency
content in the signal. This rate is called the
Nyquist rate.
 Sampling theory – Nyquist theorem

If a signal is band(frequnecy)-limited, i.e.,
there is a lower limit f1 and an upper limit f2 of
frequency components in the signal, then the
sampling rate should be at least 2(f2 − f1).

Proof and more math: http://en.wikipedia.org/wiki/Nyquist-Shannon_sampling_theorem

CMPT365 Multimedia Systems   11
Quantization (Pulse Code Modulation)

 At every time interval the sound is converted to a
digital equivalent
 Using 2 bits the following sound can be digitized
   Tel: 8 bits
   CD: 16 bits

CMPT365 Multimedia Systems   12
Digitize audio

 Each sample quantized,        Example: 8,000
i.e., rounded                  samples/sec, 256
   e.g., 28=256 possible     quantized values -->
quantized values          64,000 bps
 Each quantized value          Receiver converts it
represented by bits            back to analog signal:
   8 bits for 256 values         some quality reduction
Example rates
 CD: 1.411 Mbps
 MP3: 96, 128, 160 kbps
 Internet telephony:
5.3 - 13 kbps
CMPT365 Multimedia Systems   13
Audio Quality vs. Data Rate

CMPT365 Multimedia Systems   14
More on Quantization

 Quantization is lossy !
 Roundoff errors => quantization noise/error

CMPT365 Multimedia Systems   15
Quantization Noise

 Quantization noise: the difference between the
actual value of the analog signal, for the particular
sampling time, and the nearest quantization
interval value.
   At most, this error can be as much as half of the
interval.
 The quality of the quantization is characterized by
the Signal to Quantization Noise Ratio (SQNR).
   A special case of SNR (Signal to Noise Ratio)

CMPT365 Multimedia Systems   16
Signal to Noise Ratio (SNR)

   Signal to Noise Ratio (SNR): the ratio of the
power of the correct signal and the noise
   A common measure of the quality of the signal.
 SNR is usually measured in decibels (dB), where 1
dB is 1/10 Bel. The SNR value, in units of dB, is
defined in terms of base-10 logarithms of squared
voltages, as follows:

CMPT365 Multimedia Systems   17
Signal to Noise Ratio (SNR)                      cont’d

 The actual power in a signal is proportional to the
square of the voltage. For example, if the signal
voltage Vsignal is 10 times the noise, then the SNR
is 20 log10(10)=20dB.

   if the power from ten violins is ten times that from one
violin playing, then the ratio of power is 10dB, or 1B.

CMPT365 Multimedia Systems   18
Common sound levels

CMPT365 Multimedia Systems   19
Quantization Noise Ratio (SQNR) Revisit

 For a quantization accuracy of     N bits per sample, the peak
SQNR can be simply expressed:

 6.02N is the worst case.

 If the input signal is sinusoidal, the quantization error is
statistically independent, and its magnitude is uniformly
distributed between 0 and half of the interval, then it can
be shown that the expression for the SQNR becomes:

Derive it by yourself !
CMPT365 Multimedia Systems   20
Outline

 Audio Signals
 Sampling
 Quantization

 Audio file format
 WAV/MIDI

 Human auditory system

CMPT365 Multimedia Systems   21
Audio File Format: .WAV
   Microsoft format: Interleaved multi-channel samples
http://ccrma.stanford.edu/courses/422/projects/WaveFormat/

CMPT365 Multimedia Systems   22
Example
0.5

0
Create this figure in Matlab:
plot(x(:, 1));                   -0.5
0         2           4            6            8            10               12
plot(x(4000:10000, 1));                                                                                               5
x 10
0.15

0.1
Note:                           0.05
Samples to the range of
-0.05
[-1, 1].
-0.1
0   500       1000   1500   2000       2500   3000    3500        4000     4500

CMPT365 Multimedia Systems       23
Audio File Format: MIDI

 MIDI: Musical Instrument Digital Interface
 A simple scripting language and hardware setup
 MIDI Overview
 MIDI codes “events" that stand for the production of
sounds. E.g., a MIDI event might include values for the
pitch of a single note, its duration, and its volume.
 MIDI is a standard adopted by the electronic music
industry for controlling devices, such as synthesizers and
sound cards, that produce music.
 Supported by most sound cards

CMPT365 Multimedia Systems   24
Outline

 Audio Signals
 Sampling
 Quantization

 Audio file format
 WAV/MIDI

 Human auditory system

CMPT365 Multimedia Systems   25
Computer vs. Ear

 Multimedia signals are interpreted by humans!
   Need to understand human perception
 Almost all original multimedia signals are analog signals:
   A/D conversion is needed for computer processing

CMPT365 Multimedia Systems   26
Properties of HAS: Human Auditory System

 Range of human’ hearing: 20Hz - 20kHz
  Minimal sampling rate for music: 40 kHz (Nyquist
frequency)
 CD Audio:
• 44.1 kHz sampling rate
• each sample is represented by a 16-bit signed integer
• 2 channels are used to create stereo system
44100 * 16 * 2 = 1,411,200 bits / second (bps)
   Speech signal: 300 Hz – 4 KHz
•  Minimum sampling rate is 8 KHz (as in telephone system)

CMPT365 Multimedia Systems   27
Properties of Human Auditory System
 Hearing threshold varies dramatically at different
frequencies
 Most sensitive around 2KHz

CMPT365 Multimedia Systems   28
Properties of Human Auditory System
Critical Bands:
 Our brains perceive the sounds through 25 distinct critical
bands, the bandwidth grows logarithmically with frequency.
 At 100Hz, the bandwidth is about 160Hz;
 At 10kHz it is about 2.5kHz in width.

12   3 4   5   6                   24        25

……

frequency
CMPT365 Multimedia Systems   29
Properties of Human Auditory System
   what we hear depends on what audio environment we are in
   One strong signal can overwhelm/ hide another

frequency domain:

of coexisting signals

CMPT365 Multimedia Systems   30
Properties of Human Auditory System
 Masking thresholds in the time domain:

Simultaneous masking: Two sounds occur simultaneously and
one is masked by the other.
A softer sound that occurs      softer sounds that occur as much as
prior to a loud one will be     200 milliseconds after the loud sound
CMPT365 Multimedia Systems   31
HAS: Audio Filtering

 Prior to sampling and AD (Analog-to-Digital)
conversion, the audio signal is also usually filtered
to remove unwanted frequencies.
   For speech, typically from 50Hz to 10kHz is retained,
and other frequencies are blocked by the use of a band-
pass filter that screens out lower and higher
frequencies
   An audio music signal will typically contain from about
20Hz up to 20kHz
   At the DA converter end, high frequencies may reappear
in the output (Why ?)
• because of sampling and then quantization, smooth input
signal is replaced by a series of step functions containing all
possible frequencies
   So at the decoder side, a lowpass filter is used after the
DA circuit
CMPT365 Multimedia Systems   32
HAS: Perceptual audio coding
 The HAS properties can be exploited in audio coding:
   Different quantizations for different critical bands
• Subband coding
   If you can’t hear the sound, don’t encode it
   Discard weaker signal if a stronger one exists in the same band
   Stereo redundancy: At low frequencies, we can’t detect where
the sound is coming from. Encode it mono.
 More on later (MP3, APE…)

CMPT365 Multimedia Systems   33
Further Exploration

 Links for Chapter 6 in “Further Exploration” of
the textbook page
   An extensive list of audio file formats.
   CD audio file formats are somewhat different. The main
music format is called “red book audio.“ A good
description of various CD formats is on the website.
   A General MIDI Instrument Patch Map, along with a
General MIDI Percussion Key Map.
   A link to good tutorial on MIDI and wave table music
synthesis.
   A link to a java program for decoding MIDI streams.
   A good multimedia/sound page, including a source for
locating Internet sound/music materials.

CMPT365 Multimedia Systems   34

DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 0 posted: 5/15/2012 language: English pages: 34
fanzhongqing http://