PowerPoint Presentation - Fundamentals of Digital Audio by dyr60218

VIEWS: 15 PAGES: 48

									Fundamentals of Digital Audio
The Central Problem
   Sound waves consist of air pressure changes
   This is what we see in an oscilloscope view:
    changes in air pressure over time
The Central Problem
   Waves in nature, including sound waves,
    are continuous:
                                Between any two
                                points on the curve,
                                no matter how close
                                together they are,
                                there are an infinite
                                number of points
The Central Problem
   Analog audio (vinyl, tape, analog synths, etc.)
    involves the creation or imitation of a continuous
    wave.
   Computers cannot represent continuity (or
    infinity).
   Computers can only deal with discrete values.
   Digital technology is based on converting
    continuous values to discrete values.
Digital Conversion
   The instantaneous amplitude of a continuous wave is
    measured (sampled) regularly. The measurement values,
    samples, may be stored in a digital system.
Digital Conversion
   The instantaneous amplitude of a continuous wave is
    measured (sampled) regularly. The measurement values,
    samples, may be stored in a digital system.


                                0.9998 1.0 0.9998
                            0.9993             0.9993
                        0.9986                      0.9986
                      0.9975                            0.9975
                   0.9961                                 0.9961

             0.9945                                              0.9945

          0.9925                                                    0.9925
    Digital Conversion
        The amplitude of a continuous wave is measured
         (sampled) regularly. The measurement values, samples,
         may be stored in a digital system.

[ 0.9925, 0.9945, 0.9961, 0.9975, 0.9986, 0.9993, 0.9998, 1.0, 0.9998, 0.9993, 0.9986, 0.9975, 0.9961, 0.9945, 0.9925 ]
Digital Audio
   Digital representation of audio is analogous to
    cinema representation of motion.
   We know that “moving pictures” are not really
    moving; cinema is simply a series of pictures of
    motion, sampled and projected fast enough that
    the effect is that of apparent motion.
   With digital audio, if a sound is sampled often
    enough, the effect is apparent continuity when the
    samples are played back.
Digital Audio
   Con:
    –   It is, at best, only an approximation of the wave
   Pros:
    –   Significantly lower background noise levels
    –   Sounds are more reliably stored and duplicated
    –   Sounds are easier to manipulate:
        Rather than worry about how to change the shape of a wave,
        engineers need only perform appropriate numerical operations.
        e.g., changing the volume level of a digital audio file is simply a
        matter of multiplication: each sample value is multiplied by a value
        that raises or lowers it by a certain percentage.
Digital Audio
 The theory behind digital representation has
  existed since the 1920s.
 It wasn’t until the 1950s that technology
  caught up to the theory, and it was possible to
  implement digital audio.
Digital Audio
   Bell Labs produced the first digital audio synthesis in
    the 1950s.
   For computer synthesis, a series of samples was
    calculated and stored in a wavetable.
   The wavetable described, in connect-the-dots fashion,
    the shape of a wave (i.e., its timbre).
   Reading through the wavetable at different rates
    (skipping every n samples, the sampling increment)
    allowed different pitches to be created.
   Audio was produced by feeding the samples that were
    to be audified through a digital to analog converter
    (DAC).
Digital Audio
   Contemporary computer sound cards often contain a
    set of wavetable sounds.
   The function is the same: a library of samples
    describing different waveforms.
   They are triggered by MIDI commands. (These will
    be covered fully in a few weeks.) For example, a
    given note number will translate to the table being
    read at a certain sampling increment to produce the
    desired pitch.
Digital Audio
 Digital recording became possible in the
  1970s.
 Voltage input from a microphone is fed to an
  analog to digital converter (ADC), which
  stores the signal as a series of samples.
 The samples can then be sent through a DAC
  for playback.
Digital Audio
 Thus, the ADC produces a “dehydrated”
  version of the audio.
 The DAC then “rehydrates” the audio for
  playback.
(Gareth Loy, Musimathics v. 2)
Characteristics of Digital Audio
   With digital audio, we are concerned with
    two measurements:
    –   Sampling rate
    –   Quantization
   With these measurements, we can describe
    how well a digitized audio file represents
    the analog original.
Sampling Rate
   This number tells us how often an audio signal is sampled,
    the number of samples per second.
   The more often an audio signal is sampled, the better it is
    represented in discrete form:
Sampling Rate
   This number tells us how often an audio signal is sampled,
    the number of samples per second.
   The more often an audio signal is sampled, the better it is
    represented in discrete form:
Sampling Rate
   This number tells us how often an audio signal is sampled,
    the number of samples per second.
   The more often an audio signal is sampled, the better it is
    represented in discrete form:

                                                   Of course, this
                                                   staircase-shaped
                                                   wave needs to be
                                                   smoothed.
                                                   This process will be
                                                   covered during the
                                                   discussion on
                                                   filtering.
Sampling Rate
 So we want to sample an audio wave every
  so often.
  The question is: how “often” is “often
  enough”?
 Harry Nyquist of Bell Labs addressed this
  question in a 1925 paper concerning
  telegraph signals.
Sampling Rate
   Given that a wave will be smoothed by a
    subsequent filtering process, it is sufficient
    to sample both its peak and its trough:
Sampling Rate

   Thus, we have the sampling theorem
    (also called the Nyquist theorem):
    To represent digitally a signal containing frequency
    components up to X Hz, it is necessary to use a
    sampling rate of at least 2X samples per second.

 Conversely, the maximum frequency
  contained in a signal sampled at a rate of SR
  is SR/2 Hz.
 The frequency SR/2 is also termed the
  Nyquist frequency.
Sampling Rate

   In theory, since the maximum audible
    frequency is 20 kHz, a sampling rate of
    40 kHz would be sufficient to re-create a
    signal containing all audible frequencies.
Sampling Rate
   For most frequencies, we will oversample
    (the audio frequency is below the Nyquist
    frequency):
Sampling Rate
   For most frequencies, we will oversample
    (the audio frequency is below the Nyquist
    frequency):
Sampling Rate
   If we sample at precisely the Nyquist frequency,
    our critically sampled signal runs the risk of
    missing peaks and troughs:



                           or



   This problem is also addressed by filtering.
Sampling Rate
   More serious is the problem of undersampling a
    frequency greater than the Nyquist frequency:
Audio                                    RESULT:
signal at
30 kHz,
sampled
at 40 kHz
Sampling Rate
   More serious is the problem of undersampling a
    frequency greater than the Nyquist frequency:
Audio                                           RESULT:
signal at                                       The frequency is
30 kHz,                                         misrepresented
sampled                                         at 10 kHz, at
at 40 kHz                                       reverse phase




      Misrepresented frequencies are termed aliases.
Sampling Rate
   In general, if a frequency, F, sampled at a
    sampling rate of SR, exceeds the Nyquist
    frequency, that frequency will alias to a
    frequency of:
    - (SR - F)

The minus sign indicates that the frequency is in opposite phase
      Sampling Rate
         It is useful to illustrate sampled frequencies on a polar diagram,
          with 0 Hz at 3:00 and the Nyquist frequency at 9:00:

                                    f     The upper half of the circle
                                          represents frequencies from 0 Hz
                                          to the Nyquist frequency

             Nyquist                     0 Hz

                                          The lower half of the circle represents
                                    -f    negative frequencies from 0 Hz to the
                                          Nyquist frequency (there is no distinction
Any audio frequency above the
                                          in a digital audio system between ±NF)
Nyquist frequency will alias to a
frequency shown on the bottom
half of the circle, a negative
frequency between 0 Hz and the                  Frequencies above the
Nyquist frequency.                              Nyquist frequency do not
                                                exist in a digital audio system
Sampling Rate
   In the recording process, filters are used to remove
    all frequencies above the Nyquist frequency
    before the audio signal is sampled.
   This step is critical since aliases cannot be
    removed later.
   Provided these frequencies are not in the sampled
    signal, the signal may be sampled and later
    reconverted to audio with no loss of frequency
    information.
Sampling Rate
   The sampling rate for audio CDs is 44.1 kHz.
   The origin of this rate lies in video formats.
   When digital audio recording began, audio tape was
    not capable of handling the density of digital signals.
   The first digital masters were stored on video as a
    psuedo video signal, in which binary values of 1 and
    0 were stored as video levels of black and white.
Sampling Rate
Video is drawn left to right,               O
starting from the top of the                E
                                            O
screen and moving down.                     E
                                            O
First the odd numbered                      E
                                            O
lines are drawn, then the                   E
                                            O
even numbered lines.                        E

Each video frame has two
fields: the odd field and
the even field.                 Frame n,   Frame n,   Frame n+1, Frame n+1, Frame n+2,
                                odd        even       odd        even       odd

The fields are adjacent to
each other on the video
tape.
Sampling Rate
   There are two video formats:
    –   525 lines, 30 frames per second (USA)
        Minus 35 blank lines, leaving 490 lines per frame
        60 fields per second, 245 lines per field
    –   625 lines, 25 frames per second (European)
        Minus 37 blank lines, leaving 588 lines per frame
        50 fields per second, 294 lines per field
   Three samples could be stored on each line, allowing:
        60 x 245 x 3 = 44,100 samples per second
        or
        50 x 294 x 3 = 44,100 samples per second
   44.1 kHz remains the standard sampling rate for CD
    audio.
Quantization
   This has a few names:
    –   Sample size
    –   Bit depth
    –   Word size
   The term “quantization” takes its origin from
    quantum physics:
    –   Electrons orbit an atom’s nucleus in one of a number of
        well-defined layers;
    –   An electron may be knocked from one layer to another,
        but it can never stay between one of the layers.
Quantization
   In the discussion of sampling rate, we only considered how often
    the amplitude of the wave was measured.
   We did not discuss how accurate these measurements were.
   The effectiveness of any measurement depends on the precision
    of our ruler. (Measuring the thickness of something with many
    small indentations with a ruler only marking feet will probably
    not give a very accurate measurement; we have to estimate many
    measurements.)
   Just as there are limits to how often we can sample, there are
    limits to the resolution of our ruler.
     Quantization
   Like all numbers stored in computers, the amplitude values are stored as
    binary numbers.
   The value that gets stored is the closest available binary number - akin to
    the nearest marking on a ruler.
   The accuracy of our measurement depends on how many bits we have to
    represent these values.
   Clearly, the more bits we have, the finer the resolution of our ruler.

                       2 bits

     Each change of bit
     represents a change in
     voltage level
     Quantization
   Like all numbers stored in computers, the amplitude values are stored as
    binary numbers.
   The value that gets stored is the closest available binary number - akin to
    the nearest marking on a ruler.
   The accuracy of our measurement depends on how many bits we have to
    represent these values.
   Clearly, the more bits we have, the finer the resolution of our ruler.

                       3 bits

     Each change of bit
     represents a change in
     voltage level
     Quantization
   Like all numbers stored in computers, the amplitude values are stored as
    binary numbers.
   The value that gets stored is the closest available binary number - akin to the
    nearest marking on a ruler.
   The accuracy of our measurement depends on how many bits we have to
    represent these values.
   Clearly, the more bits we have, the finer the resolution of our ruler.

                       4 bits

     Each change of bit
     represents a change in
     voltage level
Quantization
   CD audio uses 16-bit quantization.
Quantization
 While aliasing is eliminated if our signal
  contains no frequencies above the Nyquist
  frequency, quantization error can never be
  completely eliminated.
 Every sample is within a margin of error
  that is half the quantization level (the
  voltage change represented by the least
  significant bit).
Quantization
   For a sine wave signal represented with n
    bits, the signal to error ratio is:
                S/E (dB) = 6.02n + 1.76
   The problem is that low-level signals do not
    use all available bits, and therefore the error
    level is greater.
Quantization
   While quantization error may be masked at high
    audio levels, it can become audible at low levels:

     Worst case: a sine
     wave fluctuating
     within one
     quantization
     increment is stored
     as a square wave




    Thus, unlike the constant hissing noise of analog recordings,
    quantization error is correlated with the signal, and is thus a
    type of distortion, rather than noise.
Quantization
   The problem of quantization distortion is
    addressed by dither.
   Dither is low-level noise added to the audio signal
    before it is sampled.

        Low level
        audio signal
        with dither
        added
Quantization
 Dither adds random errors to the signal,
  therefore the quantization results in added
  noise, rather than distortion.
 The noise is a constant factor, not correlated
  with the signal like quantization distortion.
 The result is a noisy signal, rather than a
  signal broken up by distortion.
Quantization
 The auditory system averages the signal at
  all times. We do not hear individual
  samples.
 With dither, this averaging alows the
  musical signal to co-exist with the noise,
  rather than be temporarily eliminated due to
  distortion.
Quantization
   Dither allows resolution below the least significant
    quantization bit.
   Without dither, digital recordings would be far less
    satisfactory than analog recordings - a plucked guitar
    string, for example, fades into something close to a
    sine tone. Without dither, a guitar sound would
    gradually turn into the sound of a square wave.
   With dither, there is significantly less noise in digital
    recordings than in analog recordings.
Quantization and Sampling Rate
 The sampling rate determines the signal’s
  frequency content.
 The number of quantization bits determines
  the amount of quantization error.
Size of Audio Files

  44,100       x 2         x 2           x 60          ≈ 10 MB/minute
   samples     bytes per    channels       seconds
  per second    sample     (for stereo    per minute
               (16 bits)     audio)

								
To top