Digital Audio Signal Processing
Syracuse University, Fall 2007
Instructor: Jayant Datta
DIGITAL AUDIO
Basics and Overview
A day for BASICS
Basics of Audio Interconnects
Basics of Sampling
– Analog vs Digital + Aliasing + Quantization + Jitter + Dither
Basics of Audio
Basics of Digital Systems
– LTI + Convolution + Digital Filters
Basics of Pointers + AGU
Audio Interconnects :: DVD
Audio Interconnects :: Receiver
Audio Interconnects
Think about the audio interconnects between the
DVD Player and the Receiver.
Please consider the analog and the digital
connectivity.
Now, try to figure out how information is being
conveyed in each case (analog and digital).
Basics of Sampling
Sampling
– Continuous Time/Amplitude {Analog}
– Discrete Time {Sampled}
Sampling Theorem
– Aliasing -- spoked wheel, {Alias, NormAlias} spectra,
{CreatePlayAlias(fDuration, fGain, iFreqOrig, iFs)}
– Require an anti-aliasing filter before ADC stage
Quantization
– Discrete Amplitude {Quant}
– Effect at different signal levels {Quantizer, NormQuantErr [2,5,8]}
– How good is 16-bits of resolution? 24-bits?
– Dither {DispDither}
Basics of Sampling
Analog / Digital Conversion
– Multi-bit Systems
– Low level signals -- could benefit from dithering
– Oversampled Systems / Single Bit Systems
• Noise Shaping
• Faster Clock Speed
• Low tolerance to Jitter -- {Jitter} [no quantization]
• Better anti-aliasing
• More linear
Analog versus Digital
Cost and Complexity versus Flexibility
– Analog
• inexpensive
• well understood technology
• natural intertia against change
– Digital
• almost by definition greater flexibility
• example of parametric EQ and reverb -- code change
• system design could be simpler and flexible
Programming Expertise -- bundled software
Just mimicking analog underutilizes digital
Analog versus Digital
Nature of Errors
– SNR (small versus large signals)
• Jitter
• Masking
• Distortion
– Non-linearity of magnetic tape -- soft versus hard clipping
– Quantization / Aliasing / Jitter / Dither / Psychoacoustic Masking
– Humans are analog; digital artifacts are not intuitive
Outline
Introduction
Consumer Audio Home Theater Overview
Guessing the Future
Introduction
How Technology advances
– Taking things for granted
– Recorded Sound
– Pursuing perfection
– Law of diminishing returns
Early entertainment
First ‘recording’ made in 1855 “Listening studios” where a
with a “Phonautograph” – dozen people could listen to the
mouthpiece horn, membrane, same recording, each through a
stylus, rotating cylinder with tube in the ear
smoke-blackened paper – Already claims of “tell the
research and analysis difference”
Wax cylinders
Phonograph
Whole variety of media:
Vertical, horizontal bumps; wax
and metal, cylinders and
platters, hand cranked,
electrical, different sizes
If in perfect condition, some
people prefer the “richer”
analog sound, also people find
noise “warmer”
Incidentally, early recordings
were done from the center
outwards
Phonographs / Turntables
Open Reel / Reel to Reel
Open Reel Decks
Open Reel Decks at one point
defined the state of the art in
audio recording
8 Track
All 1966 Fords offered a factory
installed in-dash 8-track player. In
the 1967 model year, Chrysler and
GM offered the same. By the late
1960s, several companies were
making players for the other tape
loop systems, including 4-track, but
the only serious competition came
from cassette tapes (which
appeared at around the same time
as 8-tracks) and the almighty vinyl
records.
Eight-track tapes were with us for
quite a long time. 8-track was the
preeminent portable and car audio
format of the 1970s
8 Track
Head had to be perfectly calibrated; bulky; no recordable ones
Cassette Tape
Was originally intended as a
dictating format
Very convenient format, fits in
a shirt pocket
On introduction, had three main
problems: speed stability,
frequency response, and
background hiss
The first two were
surmountable, through better
tape drive mechanisms, heads,
and tape formulations
Dolby Alphabet Soup
Dolby A [1965]
Dolby B [1968]
Dolby C
Dolby HX
Dolby HX Pro
Dolby SR [1986]
Dolby S
Dolby AC-1 [1985]
Dolby AC-2 [1989]
Dolby AC-3 [1992]
Dolby E
Dolby Stereo / Surround
1975: 35 mm stereo optical release print format Dolby Stereo.
Conventional mono optical soundtrack replaced by two soundtracks
carrying not only left and right, but also a third center-screen and a
fourth surround channel for ambient sound and special effects --
compatible with mono.
1982: Consumer version of Dolby stereo called Dolby Surround
Compact Disc
In 1978 Sony teamed up with Philips to develop a standard, universal
compact disc to hold audio. Two years later a Philips/Sony Compact
Disc Digital Audio standard disc was officially announced.
The disc was 120 millimeters, made from a polycarbonate substrate,
and molded with a groove that provides timing and tracking
information for the compact disc player. It was released formally in
Europe and Japan in 1982 and in the United States the following year.
Sampling rate: 44.1 kHz
Maximum duration: 74 ~ 80 minutes
Quantization: 16-bit linear
Rotational speed: 1.2–1.4 m/sec. (constant linear velocity)
Error correction code: Cross Interleave Reed-Solomon Code (with
25% redundancy)
Compact Disc
Compact Disc
A CD is a fairly simple piece of plastic, about four one-hundredths (4/100) of an inch
(1.2 mm) thick. Most of a CD consists of an injection-molded piece of clear
polycarbonate plastic. During manufacturing, this plastic is impressed with
microscopic bumps arranged as a single, continuous, extremely long spiral track of data.
Once the clear piece of polycarbonate is formed, a thin, reflective aluminum layer is
sputtered onto the disc, covering the bumps. Then a thin acrylic layer is sprayed over the
aluminum to protect it. The label is then printed onto the acrylic. A cross section of a
complete CD (not to scale) looks like this:
Dolby Surround / Pro Logic
Dolby Surround only decoded left, right and surround
1987: Birth of Home theater -- now enjoyed by millions of consumers
worldwide -- was made possible by the inclusion of four-channel
Dolby Surround Pro Logic decoding in products such as A/V receivers
Digital Audio Tape (DAT)
Developed in the 1980's as a
successor to the analog cassette
tape.
Information is recorded using the
"helical scan" recording technique
which is the same method used in
VHS, Beta, and 8mm videocassette
recorders.
Compared to these other formats,
DAT tapes are much smaller and
the information is encoded
digitally.
Used for studio recording …
expensive…consumer versions
were less expensive
Old favorite of bootleg trading
Digital Audio Tape (DAT)
ADAT
ADAT = Alesis DAT
ADATs use S-VHS tapes. A 120
minute tape provides 40 minutes
of recording on the ADAT.
• Storage of up to 999 address
locations – automation
• A greatly expanded dynamic
range
• Elimination of tape hiss and
frequency loss – generation loss
• Backups of master tapes sound
like the first generation master.
• Perfect track separation (no track
"bleed" on multi-channel mixes).
• Lower cost of recording tape.
• Less archiving space required.
Dolby Digital 5.1
1992: Dolby Digital (AC-3) is a multichannel digital audio
coding technology first used for cinema sound. Today it is
also used to bring multichannel sound into the home via a
wide variety of digital formats, including DVD, DTV, and
digital cable.
DCC: Digital Compact Cassette
DCC players are backward
compatible with analog tapes
Track and time codes are on the
tape. DCC decks can locate a
chosen track on either side of the
tape.
The DCC 900 can digitally record
music in 16-bit resolution and
supports sampling frequencies of
32, 44.1, and 48 kHz
Precision Adaptive Sub-band
Coding (PASC) compression to
code the digital information onto
tape. (4:1 ratio)
MD: Mini Disc
MiniDisc is like a floppy disk –
you can record and erase files
on a MiniDisc just as easily as
you can on a floppy disk. The
big difference between the a
MiniDisc and a floppy disk is
that a MiniDisc can hold about
100 times more data
Durable because in
diskette…no scratching
ATRAC encoding (Adaptive
Transform Acoustic Coding),
lossy, compression 5:1
DTS (Digital Theater System)
1993, Steven Spielberg's Jurassic Park introduced the
crisp, clear sound of DTS
Simple future-proof decoder
Intelligence is in encoding stage
Decoder follows instructions within coded bit stream
Encoding algorithm may be updated and modified, automatically
benefiting every consumer decoder
Syntax of data stream specification designed to provide room for
additional audio data – improvements in audio quality or changes
in audio format
DTS – CD
DVD: Digital Versatile Disc
Emerging technology that is extremely powerful.
Looks like a CD – but is a lot more.
Can store up to eight hours of CD quality sound.
Can store up to 133 minutes of high resolution video data.
Soundtrack is presented in up to eight different languages, and uses 5.1
channel Dolby Digital surround sound/DTS.
A single sided, single layer DVD can store 4.38GB of data. A double
sided, double layer DVD can store 15.9GB of data. Most CDs can only
store 700MB of data.
Definitely revolutionizing home theater.
Bonus footage; director’s/actor’s commentary; different camera angles.
Multichannel “Enhancements”
5.1 / 6.1 / 7.1 /10.2
1999: Dolby Digital Surround EX in Star Wars: Episode I:
The Phantom Menace – pseudo 6.1
DTS-ES (Extended Surround), ES 6.1 discrete, Neo:6,
96k/24-bit, 7.1
THX is a technology developed by Lucasfilm to allow the
home viewer to experience the sound of a movie theater at
home
mp3
Invented in 1989 in Erlangen, Germany, mp3 has quickly come to
symbolize a paradigm shift in the way many people access their music.
The home computer revolution, along with the Internet, has allowed
millions of Net-connected music fans to take advantage of the latest
audio medium.
Short for Moving Picture Experts Group, Audio Layer III, MP3 is a
compression format that shrinks digital audio files with negligible
sound-quality degradation.
In 1997, the format truly realized its potential, thanks to a man named
Tomislav Uzelac, who created the AMP MP3 playback engine.
The first mp3 player was invented just in time for the Napster
revolution in the form of 1998's Winamp – widely regarded as the first
free, consumer-ready mp3 player.
DVD Audio
Sampling rates: 96 / 192 kHz Not compatible
24-bit resolution Difference is fairly subtle
Up to 6-channels of audio Competing against SACD
Uses MLP (Meridian Lossless Audio only format
Packing) technology
SACD: Super Audio CD
1-bit Direct Stream Digital
(DSD)
Sampling rate: 2.822 MHz
SACD player can play CD
4 times CD capacity
With this extra capacity, a
standard Super Audio CD will
provide space for 2-channel
stereo data, as well as an area
for up to 6-track multi-channel
data, storage capacity for text
and images, disc variations,
copyright protection and much
more.
AAC : Advanced Audio Coding
AAC is an audio compression technology.
This standard, developed by Dolby, the Fraunhofer
Institute, and others, may become the major ingredient in
21st century digital music distribution.
The AAC codec was formally introduced to the world at
the CES 2001, along with dozens of new digital audio
players able to play AAC files.
Currently, companies such as Liquid Audio distribute
audio using AAC.
mp3 versus AAC
Lower complexity of mp3, makes it the system of choice
for current CD-quality applications.
AAC is the designated successor – providing up to 50%
more playing time, while maintaining the same quality and
allowing encoding of higher quality audio (96 kHz)
AAC also features built-in copyright protection – making it
the system of choice for Electronic Music Distribution
(EMD)
Key Recent Consumer Audio
Developments
Audio Completes the Home Theater Experience
Digital Surround Formats become Popular
Home Audio and Computer Technologies Converge
Music Companies Authorize Paid Downloads
Sales of Internet Audio Portables Increase
The Future
Future of listening to music -- expect the unexpected
What do we want?
– More channels
– Fewer channels
– Instant access
– Flexibility
How do we get it?
– HRTF
– Psychoacoustics
– DSP
– Metadata
Want: More channels => Realism
For home theater applications, the 5.1 format's two discrete digital
surround tracks are a major improvement over the original Dolby
Surround's single mono surround track.
But more discrete digital surrounds would be much better yet. The
ability to truly envelope the listener in a soundfield is a function of the
number of available channels.
Virtual reality, according to a German study, requires at least 18
channels to correctly reproduce the directionality of the diffuse field.
Of course, that amounts to an impractical data stream during encoding
and decoding and also a horrendous mess of speakers and amplifiers
for playback.
A more practical solution is a 10.2 system – in existence today
Want: Fewer channels => Ease
More than anything else, we seek ease of use and
convenience.
– We would want to have the experience of being surrounded by
sound
– We would want to experience this at any time
– Even when we are moving around
– We would like this to be inexpensive (not too many speakers)
– Portable application using headphones
Challenge is to squeeze in multiple channels into a stereo
pair without losing the directionality of the original 360
degree soundfield
Lake Technology, Sensaura, WaveSurround, MIT Media
Labs
Want: Instant Access => Demand
Digital Jukebox
We could all be "pierced" with microchips instead of earrings.
Those chips may enable you to walk into a music "station" and call out
the name of the song you want to hear, only to have it start to play
right into your inner ear.
The future of music no doubt involves convenience, and some form of
an “always on” jukebox
Access to every song ever recorded will be at your fingertips – or more
likely your vocal chords, as voice-recognition technology will likely
become seamlessly integrated into your music-selection process –
better yet, we may just want to think about a particular song, and it will
start playing in our ear!
Want: Flexibility => Control
Consider recording raw audio with control information relating to its
processing contained as part of the ‘audio packet’
This has no analogy in the analog world
Example 1: raw audio file accompanied by EQ information
Example 2: a multi-channel recording of a choir maybe heard from the
vantage of the audience or from that of a performer
Requires more real time processing capability
Allows ‘Final audio’ to be determined at a very late stage -- depending
on user preference or the visual scene the audio is accompanying – for
example, use a raw audio file with or without ‘telephone’ filter
This is done by embedding metadata into the audio
MPEG streams; parameters in DCinema, THX Blackbird
Enabling Technologies
HRTF: To find the sound pressure that an arbitrary source
produces at the ear drum, we need the impulse response from the
source to the ear drum. This is called the Head-Related Impulse
Response (HRIR), and its Fourier transform is called the Head Related
Transfer Function (HRTF). The HRTF captures all of the physical cues
to source localization. Once you know the HRTF for the left ear and
the right ear, you can synthesize a direction for a monaural source.
Psychoacoustics: Goal is to understand how the human brain
processes sound. Mathematical models of the perception process are
tested in experiments on human listeners.
DSP: Efficient Digital Signal Processing / Faster Digital Signal
Processors
Flexibility of Digital Systems
Digital Filters
– General Equation
– Hardware Elements
– Software Implementation
Coefficients
{FreqResp}
Hardware -- more dedicated HW available
Software -- always striving to do more; more
parallel processing, harnessing the power of PCs