VIEWS: 15 PAGES: 48 CATEGORY: Education POSTED ON: 2/12/2010 Public Domain
Fundamentals of Digital Audio The Central Problem Sound waves consist of air pressure changes This is what we see in an oscilloscope view: changes in air pressure over time The Central Problem Waves in nature, including sound waves, are continuous: Between any two points on the curve, no matter how close together they are, there are an infinite number of points The Central Problem Analog audio (vinyl, tape, analog synths, etc.) involves the creation or imitation of a continuous wave. Computers cannot represent continuity (or infinity). Computers can only deal with discrete values. Digital technology is based on converting continuous values to discrete values. Digital Conversion The instantaneous amplitude of a continuous wave is measured (sampled) regularly. The measurement values, samples, may be stored in a digital system. Digital Conversion The instantaneous amplitude of a continuous wave is measured (sampled) regularly. The measurement values, samples, may be stored in a digital system. 0.9998 1.0 0.9998 0.9993 0.9993 0.9986 0.9986 0.9975 0.9975 0.9961 0.9961 0.9945 0.9945 0.9925 0.9925 Digital Conversion The amplitude of a continuous wave is measured (sampled) regularly. The measurement values, samples, may be stored in a digital system. [ 0.9925, 0.9945, 0.9961, 0.9975, 0.9986, 0.9993, 0.9998, 1.0, 0.9998, 0.9993, 0.9986, 0.9975, 0.9961, 0.9945, 0.9925 ] Digital Audio Digital representation of audio is analogous to cinema representation of motion. We know that “moving pictures” are not really moving; cinema is simply a series of pictures of motion, sampled and projected fast enough that the effect is that of apparent motion. With digital audio, if a sound is sampled often enough, the effect is apparent continuity when the samples are played back. Digital Audio Con: – It is, at best, only an approximation of the wave Pros: – Significantly lower background noise levels – Sounds are more reliably stored and duplicated – Sounds are easier to manipulate: Rather than worry about how to change the shape of a wave, engineers need only perform appropriate numerical operations. e.g., changing the volume level of a digital audio file is simply a matter of multiplication: each sample value is multiplied by a value that raises or lowers it by a certain percentage. Digital Audio The theory behind digital representation has existed since the 1920s. It wasn’t until the 1950s that technology caught up to the theory, and it was possible to implement digital audio. Digital Audio Bell Labs produced the first digital audio synthesis in the 1950s. For computer synthesis, a series of samples was calculated and stored in a wavetable. The wavetable described, in connect-the-dots fashion, the shape of a wave (i.e., its timbre). Reading through the wavetable at different rates (skipping every n samples, the sampling increment) allowed different pitches to be created. Audio was produced by feeding the samples that were to be audified through a digital to analog converter (DAC). Digital Audio Contemporary computer sound cards often contain a set of wavetable sounds. The function is the same: a library of samples describing different waveforms. They are triggered by MIDI commands. (These will be covered fully in a few weeks.) For example, a given note number will translate to the table being read at a certain sampling increment to produce the desired pitch. Digital Audio Digital recording became possible in the 1970s. Voltage input from a microphone is fed to an analog to digital converter (ADC), which stores the signal as a series of samples. The samples can then be sent through a DAC for playback. Digital Audio Thus, the ADC produces a “dehydrated” version of the audio. The DAC then “rehydrates” the audio for playback. (Gareth Loy, Musimathics v. 2) Characteristics of Digital Audio With digital audio, we are concerned with two measurements: – Sampling rate – Quantization With these measurements, we can describe how well a digitized audio file represents the analog original. Sampling Rate This number tells us how often an audio signal is sampled, the number of samples per second. The more often an audio signal is sampled, the better it is represented in discrete form: Sampling Rate This number tells us how often an audio signal is sampled, the number of samples per second. The more often an audio signal is sampled, the better it is represented in discrete form: Sampling Rate This number tells us how often an audio signal is sampled, the number of samples per second. The more often an audio signal is sampled, the better it is represented in discrete form: Of course, this staircase-shaped wave needs to be smoothed. This process will be covered during the discussion on filtering. Sampling Rate So we want to sample an audio wave every so often. The question is: how “often” is “often enough”? Harry Nyquist of Bell Labs addressed this question in a 1925 paper concerning telegraph signals. Sampling Rate Given that a wave will be smoothed by a subsequent filtering process, it is sufficient to sample both its peak and its trough: Sampling Rate Thus, we have the sampling theorem (also called the Nyquist theorem): To represent digitally a signal containing frequency components up to X Hz, it is necessary to use a sampling rate of at least 2X samples per second. Conversely, the maximum frequency contained in a signal sampled at a rate of SR is SR/2 Hz. The frequency SR/2 is also termed the Nyquist frequency. Sampling Rate In theory, since the maximum audible frequency is 20 kHz, a sampling rate of 40 kHz would be sufficient to re-create a signal containing all audible frequencies. Sampling Rate For most frequencies, we will oversample (the audio frequency is below the Nyquist frequency): Sampling Rate For most frequencies, we will oversample (the audio frequency is below the Nyquist frequency): Sampling Rate If we sample at precisely the Nyquist frequency, our critically sampled signal runs the risk of missing peaks and troughs: or This problem is also addressed by filtering. Sampling Rate More serious is the problem of undersampling a frequency greater than the Nyquist frequency: Audio RESULT: signal at 30 kHz, sampled at 40 kHz Sampling Rate More serious is the problem of undersampling a frequency greater than the Nyquist frequency: Audio RESULT: signal at The frequency is 30 kHz, misrepresented sampled at 10 kHz, at at 40 kHz reverse phase Misrepresented frequencies are termed aliases. Sampling Rate In general, if a frequency, F, sampled at a sampling rate of SR, exceeds the Nyquist frequency, that frequency will alias to a frequency of: - (SR - F) The minus sign indicates that the frequency is in opposite phase Sampling Rate It is useful to illustrate sampled frequencies on a polar diagram, with 0 Hz at 3:00 and the Nyquist frequency at 9:00: f The upper half of the circle represents frequencies from 0 Hz to the Nyquist frequency Nyquist 0 Hz The lower half of the circle represents -f negative frequencies from 0 Hz to the Nyquist frequency (there is no distinction Any audio frequency above the in a digital audio system between ±NF) Nyquist frequency will alias to a frequency shown on the bottom half of the circle, a negative frequency between 0 Hz and the Frequencies above the Nyquist frequency. Nyquist frequency do not exist in a digital audio system Sampling Rate In the recording process, filters are used to remove all frequencies above the Nyquist frequency before the audio signal is sampled. This step is critical since aliases cannot be removed later. Provided these frequencies are not in the sampled signal, the signal may be sampled and later reconverted to audio with no loss of frequency information. Sampling Rate The sampling rate for audio CDs is 44.1 kHz. The origin of this rate lies in video formats. When digital audio recording began, audio tape was not capable of handling the density of digital signals. The first digital masters were stored on video as a psuedo video signal, in which binary values of 1 and 0 were stored as video levels of black and white. Sampling Rate Video is drawn left to right, O starting from the top of the E O screen and moving down. E O First the odd numbered E O lines are drawn, then the E O even numbered lines. E Each video frame has two fields: the odd field and the even field. Frame n, Frame n, Frame n+1, Frame n+1, Frame n+2, odd even odd even odd The fields are adjacent to each other on the video tape. Sampling Rate There are two video formats: – 525 lines, 30 frames per second (USA) Minus 35 blank lines, leaving 490 lines per frame 60 fields per second, 245 lines per field – 625 lines, 25 frames per second (European) Minus 37 blank lines, leaving 588 lines per frame 50 fields per second, 294 lines per field Three samples could be stored on each line, allowing: 60 x 245 x 3 = 44,100 samples per second or 50 x 294 x 3 = 44,100 samples per second 44.1 kHz remains the standard sampling rate for CD audio. Quantization This has a few names: – Sample size – Bit depth – Word size The term “quantization” takes its origin from quantum physics: – Electrons orbit an atom’s nucleus in one of a number of well-defined layers; – An electron may be knocked from one layer to another, but it can never stay between one of the layers. Quantization In the discussion of sampling rate, we only considered how often the amplitude of the wave was measured. We did not discuss how accurate these measurements were. The effectiveness of any measurement depends on the precision of our ruler. (Measuring the thickness of something with many small indentations with a ruler only marking feet will probably not give a very accurate measurement; we have to estimate many measurements.) Just as there are limits to how often we can sample, there are limits to the resolution of our ruler. Quantization Like all numbers stored in computers, the amplitude values are stored as binary numbers. The value that gets stored is the closest available binary number - akin to the nearest marking on a ruler. The accuracy of our measurement depends on how many bits we have to represent these values. Clearly, the more bits we have, the finer the resolution of our ruler. 2 bits Each change of bit represents a change in voltage level Quantization Like all numbers stored in computers, the amplitude values are stored as binary numbers. The value that gets stored is the closest available binary number - akin to the nearest marking on a ruler. The accuracy of our measurement depends on how many bits we have to represent these values. Clearly, the more bits we have, the finer the resolution of our ruler. 3 bits Each change of bit represents a change in voltage level Quantization Like all numbers stored in computers, the amplitude values are stored as binary numbers. The value that gets stored is the closest available binary number - akin to the nearest marking on a ruler. The accuracy of our measurement depends on how many bits we have to represent these values. Clearly, the more bits we have, the finer the resolution of our ruler. 4 bits Each change of bit represents a change in voltage level Quantization CD audio uses 16-bit quantization. Quantization While aliasing is eliminated if our signal contains no frequencies above the Nyquist frequency, quantization error can never be completely eliminated. Every sample is within a margin of error that is half the quantization level (the voltage change represented by the least significant bit). Quantization For a sine wave signal represented with n bits, the signal to error ratio is: S/E (dB) = 6.02n + 1.76 The problem is that low-level signals do not use all available bits, and therefore the error level is greater. Quantization While quantization error may be masked at high audio levels, it can become audible at low levels: Worst case: a sine wave fluctuating within one quantization increment is stored as a square wave Thus, unlike the constant hissing noise of analog recordings, quantization error is correlated with the signal, and is thus a type of distortion, rather than noise. Quantization The problem of quantization distortion is addressed by dither. Dither is low-level noise added to the audio signal before it is sampled. Low level audio signal with dither added Quantization Dither adds random errors to the signal, therefore the quantization results in added noise, rather than distortion. The noise is a constant factor, not correlated with the signal like quantization distortion. The result is a noisy signal, rather than a signal broken up by distortion. Quantization The auditory system averages the signal at all times. We do not hear individual samples. With dither, this averaging alows the musical signal to co-exist with the noise, rather than be temporarily eliminated due to distortion. Quantization Dither allows resolution below the least significant quantization bit. Without dither, digital recordings would be far less satisfactory than analog recordings - a plucked guitar string, for example, fades into something close to a sine tone. Without dither, a guitar sound would gradually turn into the sound of a square wave. With dither, there is significantly less noise in digital recordings than in analog recordings. Quantization and Sampling Rate The sampling rate determines the signal’s frequency content. The number of quantization bits determines the amount of quantization error. Size of Audio Files 44,100 x 2 x 2 x 60 ≈ 10 MB/minute samples bytes per channels seconds per second sample (for stereo per minute (16 bits) audio)