Hi-MD Recorder Manual
R. Danielson 06.2007
[htm] viewable - download [pdf]
The Hi-MD “Disaster” List <>
Hi-MD Set-up Steps <>
The “EOR” Steps <>
USB Cable Transfer Steps with SonicStage v3.4 <>
Analog Hi-MD Transfer Alternative <>
Making Mac& PC Compatible CD’s with Nero <>
Changing File Permissions (Mac) <>
Hearing the Character of Sound <>
Recording Voice <>
Writing & Performing Narration <>
Sound Terms Glossary <>
The Hi-MD “Disaster List”
*1 GB Hi-MD discs are ready to use right out of the box, but recordings made on the older,
74 & 80 minute “MD” discs will not transfer digitally unless the disc is prepared using the
Edit, Option and Rec Set Set-Up Steps.
*This is an easy mistake to make because it looks and sounds like everything is working okay.
You must take the extra steps each time to place the recorder in manual gain mode or the
auto-gain feature will cause location recordings to sound jumpy and fluttery. The recorder
does remain in manual gain mode when stopped with “pause.”
*The recorder can fail to save the data to disc when pressing “stop” if the batteries are too
low to complete this crucial “save to disk” process. Always start recording with a fresh AA
alkaline or NiMH rechargeable battery. When its cold, put a fresh battery in every 90 minutes
to be safe. Do not use “heavy duty” batteries because their voltage can drop too quickly and
cause data loss.
*You can only run the required transfer software, SonicStage on Windows XP. The only way to
achieve transfers with a Mac computer is with a MacIntel platform running Windows on
Bootcamp or another Windows host.
*You must transfer your sound recordings to your computer
using SonicStage software. You cannot use the USB cable and drag-copy any useful files from a
mounted Hi-MD disc like you would with a digital camera.
Hi-MD and MD format Recorder
Set-Up Steps for Hi-MD Recording
The Sony Hi-MD recorders can record in two stereo modes:
Hi-SP Mode for Film 116 Standard Minidisc (74 minutes)
I GB Hi-MD Disk (7 hours & 56 minutes)
PCM Mode for Highest Quality Standard Minidisc (28 minutes)
I GB Hi-MD Disk (92 minutes)
The “Hi-SP” mode, the one we will use in class, uses ATRAC3+ compression that is better than MP3
quality, The PCM mode produces uncompressed sound files with 44.1K/16 resolution— the same as
pro quality DAT recorders of only a few years ago. The recorder’s very low-noise, high-gain mic
preampifier has won respect among professional recordists since its introduction in 2004.
The recorder will also record on both onto older “MD” or Minidiscs or onto the newer, Hi-MD discs,
but both disc types must be formatted in the Hi-MD format in order to utilize digital transfers to a PC
computer via USB cable. Note that the NH700, NH800, NH900, RH-910, and RH-10 recorders will not
make USB transfers on Macintosh computers.
Preparing a MZ-NH700 Recorder for Recording
It is very important to go through these steps before you start recording and, preferably, before you
leave to your location. Refer to this chart when following the steps to locate buttons, jacks, knobs
1. Insert a FRESH 1.5AA NiMH rechargeable or Alkaline battery into the battery holder with polarity
as shown and close the lid. It will record about 3-4 hours on a fresh battery and is much more
prone to problems when operated on previously used batteries.
2. Slide the “Hold” switch on the side of the recorder to OFF. The correct position is to the LEFT--
opposite to that shown by the arrow.
3. Set to Advanced Menu Mode. Press and hold the Navi/Menu button (#10 upper right hand
corner) for 2 seconds until the menu window activates. Rotate the circular jog wheel until the
word, "Option," is blinking. Press down on the small ENTER button (the button in the middle of
the Jog wheel) to select “Option.” Rotate the jog wheel until the word "Menu Mode” is blinking and
select it by pressing ENTER. Rotate the Jog Wheel until “Advanced" is blinking and select it by
4. Insert a Hi-MD or older MD disc into the recorder with the metal slide on the right as you ace the
The “EOR” Steps
EDIT, OPTION and RECORD SET
5. Depending on the type and state of your disk, follow one of these steps:
a) If you want to record onto new 1GB Hi-MD disc, go to step 7.
b) If you are recording on a 1GB Hi-MD MiniDisc or Standard MD disc that you wish to remove all
previous tracks from, press and hold the Navi/Menu button for 2 seconds until the menu
window activates. Rotate jog wheel to EDIT, select it by pressing the ENTER button in the
center of the Jog Wheel. Under Edit -> Erase -> All Erase? -> Yes. After the while it will
indicate "No Tracks."
6. Set Standard MD Disc Mode in Hi-MD format Press and hold the Navi/Menu button for 2
seconds until the menu window activates. Rotate jog wheel to Option, select it by pressing the
ENTER button in the center of the Jog Wheel. Under Option-> select Disc Mode-> and blinking
7. There are two suggested recording modes PCM and Hi-SP. It is suggested that you use Hi-SP
mode for the Film 116 class, but you can use the higher quality PCM mode if you wish.
a) To Set in Hi-SP Record Mode. Press and hold the Navi/Menu button for 2 seconds until the
menu window activates. Rotate Jog Wheel until "REC Set" is blinking and press Enter. Select
“Rec Mode” and then “Hi-SP.” Do not use the Hi-LP mode!
b) To Set PCM Record Mode. Press and hold the Navi/Menu button for 2 seconds until the
menu window activates. Rotate Jog Wheel until "REC Set" is blinking and press Enter. Select
“Rec Mode” and then “PCM.”
8. Look at the Menu window. The Record Mode should show “Hi-MD” or either “”Hi-SP or “PCM”
below. If the window shows “MD” or SP --- this is not right! If so, re-do the above steps, 7-8.
9. Attach the 1/8th stereo mini-plug from your home-made mics into the RED mic jack and your
headphones or ear buds into the BLACK headphone jack.
10. Set/Check Mic Sensitivity (Low or Hi) Press and hold the Navi/Menu button for 2 seconds
until the menu window activates. Use Jog Wheel to navigate to "Rec Set" and then "Mic Sens."
Select "High" if you are recording ambience or "Low" if you are recording loud, close-mic’d voice
or very loud sound effects. (Set to “High” in case of doubt).
11. Set in Manual Record Level Mode Manual record mode is essential for high quality!! Hold
down on the silver pause button and press the red REC button once. This puts the recorder into
“record-pause” mode (but its still in “auto record gain” mode). Press and hold the Navi/Menu
button for 2 seconds until the menu window activates. Rotate jog wheel until "REC Set" is
blinking. Select “REC Volume,” and then, “Manual.”
You can now adjust the record gain manually with the Jog Wheel. Set the level so the peaks (or
loudest moments) fall just short of the last segment. The last segment is right under the last digit
of the time counter (the same position as the “0” of the “30” in the below).
You can change the record level as the deck while recording.
12. To start recording, press again on the “Pause” button. When you are ready to stop this recording,
press on the PAUSE button again. Every time you use the pause button, it creates a new track.
When you finish recording a certain type of sound or location, you can group the pause-created
tracks by pressing STOP. Do not jostle the deck or disrupt the power after pressing stop. Its
best to wait patiently while the recorder saves the recordings to disk. When you start another
recording after pressing STOP, you will have to go through the steps to place the deck into
manual record mode again.
13. To raise the headphone volume, lift up on the top edge of the ENTER button. To lower the
headphone volume lift up on the lower edge.
14. Another way to create a new track while you are recording is to press the "T mark" button (which
is the other function of the red, REC button. This will create a new track at this point.
15. Allow the Recorder to Save the Files. After pressing STOP, the recorder needs to write the
file to the disk in order to save it. This step can take 10-15 seconds. Do not jostle or disrupt the
deck or power to it when it is writing because it might prevent the recordings from being written
correctly or at all.
16. Playing Recordings. Insert the disc, press the left or right edge of the ENTER button to shuffle
through your tracks.
17. Prevent accidental erasure of recorded discs by sliding the white tab on the rear edge of the disc
until the slot is open. You must, however, close the tab again before attempting to transfer files
18. Make sure to label the disc and keep it separate from other, un-recorded discs until the
recordings are transferred and backed-up on CD-R or DVD-R.
For ideas about mic mounting options for recording in the field, see
USB Cable Transfer of Hi-MD Sound files
using SonicStage 3.4
The following steps are for PC computers only. Search “Sonicstage 3.4 download” or see syllabus for
the down-load link for the free Sony SonicStage software for installation on your PC. If you have a Mac
computer, there are PC computers in MIT B-18 and B-56 with this software installed.
1. Your recordings must have been made with a PC compatible Hi-MD recorder like the NH-700, 800,
900, 910 or RH-10 on a Hi-MD-formatted disk. Make sure the disk’s erase protection slot is closed. (If
the disc shows “SP” in the LCD window, it was not Hi-MD formatted and the recordings will have to
transferred with an analog cable. See end of this section for more detail.
2. Start or restart the PC. If you are at school, log on to the PC as "student" option. Leave the password
3. Connect your Sony Hi-MD recorder to the PC using the USB cable that with the platform or in your
Recorder Kit. The USB port is next to the mouse port on the rear of the PC.
4. If you are working a PC in the Film Dept, open the folder on the desktop titled, "Converted Wav Files
here." If there are any files in this folder, drag them to the Recycle Bin.
5. Launch the SonicStage v3.4 software using the Icon on the desktop.
6. Click on the “TRANSFER” button towards the top right of the SS window. Slide in the Hi-MD formatted
disk you wish to transfer sound files from. After the disk mounts, your recordings should show as one
or more ‘Untitled” folders on the right side. Click on the folders and open them. If you scroll the
Transfer window view over to the right, you can see the duration of each track in minutes and the
mode (PCM or Hi-SP) it was recorded in.
7. To listen to a sound file, click on it once and press the Play “>” button. A pair of headphones needs to
be plugged into the jack in the rear of the PC in order to hear it. Stop playing the file. To give the
sound file a relevant name,, click on a sound file's name in the Hi-MD disk list, press F2, and type the
new name in the title bar. EG: “200609015MT_ScrapingFence” Do not exceed 23 characters or it will
have to renamed before importing it into the editing software.
8. In the top right hand of the My Library window there's a free space meter. You need at least 2GB to
transfer a 1GB disk recorded in PCM mode and 5GB to transfer a whole disk recorded in the Hi-SP
9. Hold on the Control Key and individually select the files you wish to transfer from your Hi-MD disc in the
right panel. You can also use "select all" or click on the folder if you want every file in that folder.
When you are sure that you have selected all of the recordings you want, click the Red Transfer button
(left facing arrow). The transfer process takes about 30 minutes for the transfer and wav conversion of
a full 1GB disk and less for a standard MD disk.
10. Be very careful to not disconnect the Hi-MD recorder or power to the PC’s power during this transfer
11. SonicStage v3.4 will first create ".oma" files and then convert them into .wav files automatically
(unless the recording is in excess of 180 minutes long, see below). You can play the oma files on the
My Library side to hear that they transferred okay. Note that "Wav" files are compatible with Mac and
Logic, not ".oma"'s. .
12. Dividing a recording that is longer than 180 minutes. You'll need to "divide" any ".oma" file into
two smaller parts so none of the parts is longer than 180 minutes. OMA files are the ones that show
up on the "My Library" side of the SS window. Using the "divide" command in SonicStage doesn't
destroy any material, it splits the file into parts. (If you don't reduce length, the .wav file will be
longer than 2GB and will show as "damaged" when trying to import it into Logic. If you don't notice
this before, you can re-transfer the files and perform the below steps).
a) Look in the folder named, "Converted Wav Files Here” for your files. If any of the files in this
folder are larger than 2GB (or >2,000,000 KB). Make a note of its exact name and length in
minutes and drag it to the Recycling Bin. You'll probably need to empty the bin.
b) Look in the “My Library” side of SonicStage and find the (large) file with the same name as the
one larger than 2GB. Ifs itis larger than 360 minutes, it will have to be divided into three parts.
c) To divide a file in half, make sure its not playing and select it. Under Edit select, Divide." This will
open up a work window. Slide the upper slider about midway and select "Start Divide." Next it
will show you the length of the first segment created by the divide. If its less than 2GB, click on
OK. A new segment will be created in the Library. If the second segment is still longer then 190
minutes, divide the 2nd segment with the previous steps.
13. Converting the New, >2GB Segments into .wav files. Select all of the segments that need to be
converted into the .wav format. Under Tools, select "Save in .Wav Format." In the window that opens,
navigate to the place you would like the pfiles placed and click OK.
14. To transport them from the computer or to make a back-up copy of them, copy all of the .wav files
you created to a data format CD-R or DVD-R.
Analog Hi-MD Transfer Alternative
If you made a mistake and recorded in SP rather than Hi-SP mode, connect a cable with an 1/8” stereo
“mini-plug” on one end to the headphone output of the Hi-MD and turn up the deck’s headphone
volume all the way. Connect the other end to the LINE input on your computer (which is usually either
a 1/8” stereo input too or two phono plugs for the left and channels, as pictured below.
1/8” stereo “miniplug” to 2-RCA “phono” plug adapter cable
Use the application Audacity (mac and pc) for making a digital recording from the analog input. Instructions on the
Making PC and Mac Compatible Data
Disks with Nero
These steps assume you have already created .wav files on a PC that are ready to be saved
to disk. If any of your sound files is larger than 700mb, use a DVD-R blank. DVD-R will hold a
little over 4.3 GB.
1. Launch Nero by double-clicking on its icon on the desktop.
2. Steps for making a data CD-R’s (max 700mb capacity)
a) In the New Compilation Window that opens, select CD in upper left hand corner.
b) In the vertical window on the left, select CD-ROM [ISO] (the top one in the list).
c) In the middle window, click on the ISO tab at the top. In the settings window, check
these: Data mode: Mode 1; File System: iso9660 only; File Name Length: Max of 11
=8+3 chars (level 1); Character Set: ISO9660 [standard ISO CD-Rom); Turn off all
d) Click on “New” in upper Right Corner.
e) Click on the title bar next to the disk icon in the window that opens and type in a name
that is not in excess of 11 characters in length, e.g. “CD001_Jones”
f) To add .wav files from SonicStage to this CD-R disk, open the folder on the desktop,
“Converted .WAV files here.” And drag all of the files that you wish to copy from this
folder into the far left panel of the Nero window. They will appear in the middle pane.
g) Click on the disk icon with the Red Flame to initiate the last step.
h) Click on the ISO tab to make sure your settings haven’t changed. Click “Burn.”
3. Steps for making data DVD-R’s (max 4.3GB capacity)
a) In the New Compilation Window that opens, select DVD in upper left hand corner.
b) In the vertical window on the left, scroll and select DVD-ROM (UDF)
c) In the middle window, UDF partition type: Physical partition; File System Version: UDF
1.02. Force DVD: leave box unchecked..
d) Click on “New” in upper Right Corner.
e) Click on the title bar next to the disk icon in the window that opens and type in a name
that is not in excess of 11 characters in length, e.g. “DVD001_Jones”
f) To add .wav files from SonicStage to this DVD-R disk, open the folder on the desktop,
“Converted .WAV files here.” And drag all of the files that you wish to copy from this
folder into the far left panel of the Nero window. They will appear in the middle pane.
g) Click on the disk icon with the Red Flame to initiate the last step.
h) Click on the UDF tab to make sure your settings haven’t changed. Click “Burn.” Insert a
blank DVD-R disc when prompted.
i) At the prompt regarding burning option, select “Burn without multi-session.”
j) While it is burning, click in the box for “verify written data.”
Changing Read-Write Permissions
on Hi-MD Soundfiles
• Due to reasons we have not yet determined, the .wav sound files created by SonicStage can acquire “Read
Only” Permission and will not import into your audio editor.
• To give them “Read and Write” permission, first drag-copy them from your data CD-R or FW Hard Drive to the
“Field Recordings” to the Field Recordings folder in your Audacity Session folder. Once in this folder, group
select ALL of them and press the “APPLE” and “I” keys. In the Multiple Item Information window that opens,
under “Ownership and Permissions,” select “Read & Write” from the pull-down menu:
This changes the permission of all the .wav files in this group.
Hearing the Character of Sound
Sound is the sensation of feeling the vibrating patterns in air. All vibrating objects produce
some sort of sound and in typical city environments, there are many vibrating things creating
sounds. Perhaps because of the constancy and relative loudness of urban sounds, we rarely stop to
evaluate their qualities. These so-called "concrete" sounds have rhythms and tonalities too, but one
has to learn to discern them.
Musical instruments are made to produce clear, distinct notes all with similar volume. In
contrast, sounds in the environment have much greater range of tonality and volume.
Environmental or "concrete" sounds are more complex, but our brains must still use subtle
differences in tonality and loudness to identify them-- for example to distinguish the "crunch" of an
apple from the "crinkle" of cellophane. No matter the source, the identity of a sound is an unique
arrangement of two sonic characteristics: loudness and pitch. These characteristics can be portrayed
in relation to time through a graph called the "waveform."
Sound vibrations from many sources and directions arrive at our ears each and every
moment. To make sense of this din, the brain constantly discerns three basic levels in loudness. In
a typical living room, the loudest sounds might be one's own breathing, someone talking, a radio
playing or the pages of a notebook being turned. Of lesser volume might be sounds in the
immediate room like water running down a drainpipe, a furnace fan, a parakeet or a roommate's
computer keystrokes. If one stops to listen carefully, there's another level of sound coming through
the floor, ceiling, walls and windows. These softest sounds come from local street traffic, freeway
traffic, humming factories, airplanes, human voices, sirens, car alarms, snow shoveling, lawn
When two sources are generating sounds with similar loudness, the brain can also distinguish
them by sensing their location in space. This is done through the small differences in time (phasing)
as the sound reaches both ears. A major limitation of film and video soundtracks is that all of the
sounds (and all of the volume differences) come from only one or possibly two locations-- the
speakers. As a result, differences in loudness are the chief means of creating any sense of acoustic
space in a soundtrack.
Humans also hear a wider range of sound energies than most speaker systems can
reproduce. To aid in portraying this range, engineers have assigned a logarithmic unit called the
decibel or "dB" to the phenomenon or apparent volume. Each increase of 6 dB produces a doubling
of loudness to the human ear. On this scale, human hearing spans about 20 doublings of loudness
from the faintest detectable sounds to the most powerful one can withstand without distortion:
0 dB threshold of 120 dB onset of
human hearing distortion & pain
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Each 6 dB step doubles apparent loudness
One limitation in sound reproduction is similar to the problems associated with narrow latitude
film stocks. The range of sound energies in the real world is larger than any sound system can
reproduce. The range is limited both by a "noise bed" at the lowest level (often the room tone or the
hiss/hum of the audio amplifier) and distortion at the highest level. This range between the noise
bed and the peaks is called the signal-to-noise ratio and it determines the dynamic range of the
audio system. Digital recorders have recently boosted the signal-to noise-ratios into the 90 dB
range, but "real world" home stereos, car radios, boom boxes and TV's are used in environments
with loud background ambiance.
In a typical, urban, living room, passing traffic, nearby factories and freeways create a
constant background "rumble" that penetrates the walls producing a loudness of around 36 dB.
("Residential Interior at Day" on chart below). Though we are usually unconscious of it, this
continual "presence" is about 50 times louder than the faintest level we could otherwise detect.
Even the thick, stonewalls of Mitchell Hall absorb about 50% more of this external sound energy so
the "quiet" of our B-61 studio is only relative.
Threshold of hearing Jet Take-Off
0dB 30 36 76 80 100 120dB
B-61 Sound Residential Conversational
Studio at Night Interior at Day Speech at .75' Curbside Traffic
Dynamic Range of TV's
boomboxes, car radios, etc.
Such loud ambiance in places where videos and films are played makes it difficult for sound
artists to create the illusion of space in soundtracks. Here's an example of how this happens:
A person conversing in a living room produces sound energy around 76 dB. By subtracting
the room ambiance of 36 dB from 72, we see that speech is about 40 dB louder. Note that this 40
dB is the same as the entire dynamic range of most "real world" sound reproduction settings.
Interior at Day Speech at 4 feet.
0dB 26 36 58 66 80 100 120dB
Dynamic Range of TV's
boomboxes, car radios, etc.
In order to hear background sounds in a soundtrack with a person speaking, the background
sounds must be mixed-in at a higher level in order to be heard above the loud room ambiance.
Middle ground effects must also be mixed-in higher. As a result, the levels of volume come closer
together than they are in the real world. The closer they are, the harder it is for the listener to
distinguish them and perceive the soundtrack "space." You can think of it like latitude in film: the
softest and loud parts of soundtracks are lost during playback like the light and dark grays of film.
Everyone is familiar with the tonal variations of sounds produced by musical instruments and
the pleasures of hearing tones in rhythmic patterns. The low tones of a bass guitar can be easily
discerned from the higher ones of the soprano saxophone. In a typical quartet the tones of the
rhythm guitar and the vocals might be located somewhere in between.
Scientists have determined that the tonality of a sound is a function of the number of
vibrations made by the source object occurring per unit of time. The more vibrations produced per
unit of time, the higher the pitch. The vibrations are actually air molecules moving back and forth in
sympathy with the vibrating object (guitar string, drumhead, vocal chord etc.) As the air molecules
vibrate back and forth, they change direction or polarity. One, complete back and forth movement is
called a cycle. At first, scientists used the abbreviation c.p.s. for "cycles per second" but this term
has been slowly replaced by, Hz , an abbreviation for Hertz , the last name of the researcher who
discovered the tone/vibration phenomenon.
Human ears are able detect sound vibrations from around 20 cycles per second (or 20 Hz) to
about 20,000 Hz (or 20K Hz). Although most sounds have several pitches occurring at once, one
tone is usually louder. The lowest frequency produced, the fundamental , is often the loudest.
Below is a chart of the fundamental frequencies of common sound sources along this span:
Bass Guitar Rhythm Guitar Nasal Tone of Bass Guitar Cymbal "Sizzle"
Bass Drum 80-1000 Hz Human Voice String "Pluck"
Male Voice Chest Sounds "Telephony" Consonant
Fundamental Human Voice Fatigue Zone Articulation "Brillance"
1600 - 2K Hz Human Voice Range
20 60 160 200 300 400 600 1300 1600 2K 2.5K 3.2K 4K 6K 8K 10K 20K Hz
Piano Lead Guitar
Traffic Rumble "Hiss" of Cassette
Middle A 600- 2500 Hz
20-200 Hz Tape
Exterior "Power" zone Wind rustling
"Airyness" Loudness tree leaves
900 Hz Projection
Locations of fundamentals of some common sounds on the
Audio Frequency Spectrum.
In addition to a fundamental frequency, most natural sounds have structured higher
frequencies as well. For example, a female voice may have a loud, fundamental frequency of 180 Hz
and several loudness peaks at 360 Hz, 540 Hz, 720 Hz and 1080 Hz. These higher frequency
intervals are called harmonics . They usually occur as even and/or odd number multiples of the
fundamental. (E.G. 180X2=360; 180X3=540; 180X4=720; 180X6=1080). When even numbered
multiples of the fundamental (2,4,6 etc.) are pronounced, the sound tends to be fuller -- they have
more resonance or "ring." When odd numbered multiples of the fundamental (3,5,7, etc.) are
pronounced, a sound seems dissonant or harsh.
Harmonics become a interesting factor when combining tones. If an instrument or other
sound has pronounced harmonics with pleasing even interval combinations, musicians say the
instrument has good color. Major chords, which some feel contribute a sense of optimism to a
composition use even harmonic intervals. On the other hand, composers like Glen Branca, La Monte
Young and Tony Conrad, prefer the emotions and dense, shifting dissonances of odd numbered
harmonics. The ever-present hums of 60 cycle motors often produce strong even numbered
harmonics but the result is more sinister than up lifting. Natural and environmental sounds tend to
have more complex harmonics.
When the elements mixed together to create soundtrack spread across the frequency
spectrum, the soundtrack could be said to be of High Fidelity. The term doesn't imply that the
soundtrack is extraordinarily "truthful," rather that there is a co-mingling of a range of tones making
the soundtrack seem full. High Fidelity was coined 50 years ago when only the best audio systems
could reproduce this wide tonal range. Today's equipment can reproduce it, but its up to the sound
artist to design and place the right elements into the soundtrack composition.
Soundtracks lacking high fidelity often have sound elements with frequencies that overlap and
obscure each other. This "muddiness" of two sounds competing for the same region of the
frequency spectrum is called masking. A rhythm guitar with a pronounced loudness at 240 Hz, for
example, can easily mask the vowel sounds of a female vocalist. An alert sound person might
reduce the volume of the guitar in this part of the frequency range to allow the vocal more space in
the mix. Adjusting the volume of a sound at a certain frequency is known as "equalization." or just
The ability to hear low, medium and high pitch sounds help the recordist in the field as well.
The closer a microphone is to soft and moderate level sound sources, the more the low bass and the
high treble tones will be evident. As cassette recordings tend to naturally diminish the high tones,
making sure the microphone is close is a good way to maximize fidelity.
The last characteristic adds the dimension of time to the previous two characteristics: pitch
and loudness. Waveforms are increasingly used since the advent of Digital Audio Workstations,
which use waveforms to depict sound recordings.
Compared to other senses, the events we notice as "sounds" tend to be quite brief. Within
this generally brief range, minuscule differences in their duration can create substantial changes in
their sonic character. Sounds that are very brief are sometimes called "percussive" because of
their quick, drum-like quality. Sounds, which last long enough to discern some of their tonal
character are called, "sustained." Sustained sounds may be a brief as a 1/4 second or they may
continue for many seconds or hours.
A graph that portrays the duration of a sound (time) in relation to its loudness (voltage) is
called a waveform. Below is a waveform of a brief sound called a "square wave." All waveforms
have up and down symmetry because the back and forth vibrations of the air translate into plus and
minus polarity changes of the electrical signal. Here there are pictured two cycles of square wave--
each wave cycle having a + (plus-up) and - (minus-down) section. The amplitude (loudness) of the
first cycle is greater or louder than the second wave.
L time --->
- 1 cycle 1 cycle
If a similar kind of sound were a bit longer in time and have a continuous volume level, it would look
something like this:
1 cycle 1 cycle
- 1/3 of a Second
The above waveform has these characteristics we can derive:
1. We can see that 8 complete wave cycles occur over a duration of 1/3 second. To compute
this in cycles per second (Hz) and identify its pitch, we simply multiply 8 times 3 to
determine how many cycles occur in one second. This sound has a frequency of 24 Hz.
2. By the instantaneous, sharp rise in amplitude at the beginning of the waveform, we can
discern that this sound begins very abruptly. At moment "a" in time, the wave is at the
centerline where no positive or negative impulse occurs. But at moment "b," a fraction of
time later, there is full volume. The character of the beginning of sound is very noticeable
to the ear and it’s called its attack. In this case, the slope of the attack is a maximum
vertical line where the sound begins instantaneously.
3. The ending is also abrupt. At moment "y" in time, there is volume but at moment "z," a
fraction of time later, there is no sound. The character of the ending of sound is also very
noticeable and its called the, decay In this case the slope of the decay is minimum and
the sound stops instantaneously.
The square wave provides a good introduction to waveforms because it is simple in relation
to naturally occurring sounds. Lets evaluate another waveform; a single tom tom drum beat which
is complex by comparison.
D Fundamental Frequency
D time --->
Moment Higher Pitched Frequencies
Five cycles of the longest, fundamental wave
1. The steep sloped attack to the maximum loudness indicates that the sound begins very
quickly towards a volume peak at the letter "D.".
2. The diminishing height of the waves to the right indicates that loudness falls off (decays)
at a steady rate. The decay lasts little over a 1/40th of a second.
3. Much of the complexity of the waveform comes from the presence of the superimposed
waves. These multiple waves are the different tones or frequencies in the sound. The
longest wave (widest on the time axis) has five cycles over the duration of 1/40th second
or 40X5 or 200 Hz. This is the fundamental . The higher pitched tone has 15 cycles over
the same duration or 15 X 40 or 600 Hz. If we divide this tone by the fundamental or
600/2 =3, we can see that this wave is the third harmonic. With this pronounced odd
harmonic we can see that the sound is fairly dissonant.
Next lets look at the waveform for the sound of the letter "s" in human speech.
The sibillant "s" of human speech with
its complex mixture of high frequencies
The multitude of high frequencies in this sound are made evident by the numerous short
wave widths and overall complexity. Sounds with this appearance and "white noise" sonic quality
are often said to possess texture. The earlier square wave example with only one low pitch
conversely has very little texture.
With the waveform, some of the unique characteristics of sound become visible. When one
becomes familiar with the characteristics of sound, its possible to think more musically and less
literally when considering elements for soundtracks. If a mix with high fidelity, tonal variety and
depth is desired, it’s probably more efficient to select a sounds for its character than for its source.
The human voice is a wonderfully complicated, subtle instrument that we humans understand
very well. Our abilities to detect psychological and physical subtleties in a voice seem to be
expanding and the representation issues involved are certain to be a increasing interest of artists in
coming years. Working with voice also serves as an excellent introduction to the roles of rhythm and
expectation in sound composition.
To capture the subtleties and give maximum editing flexibility, a voice should always be
recorded as an isolated foreground element. Every effort should be taken to insure close-micing with
the correct microphone and saturation peaks below the “0” maximum.
Here are a few other tips you might want to keep in mind:
1. Use a lavaliere mic clipped 9" from the mouth on the person's lapel, shirt, blouse etc.
Don't clip the mic onto jewelry or ruffles, attach it in an area free of material contact. (Ask
them to remove jewelry if it jingles). Allow the mic cord to have enough slack for your
subject to move comfortably but not so much that their arms get tangled up in it constantly.
When the person turns their head to different positions, the omni-directional pattern of the
lavaliere will maintain a constant audio level.
2. Subject’s Position. Avoid positioning your subject next to a wall or table. These
surfaces create reflections that induce muddiness and adversely affect tonality. If they prefer
to sit at a table, place a soft cloth like a bath towel on the table right in front of them. If
there is something in the background that makes a sound you can't turn off and you can't
move to another room, position the subject away from the source with their back to it. Their
body will absorb some of the sound. If wind-noise is a problem. have them turn their back to
3. Your Position. If you are talking with your subject, the distance between you and them
will affect the loudness and character of their delivery. Speech has more subtleties at a
comfortable, more intimate output levels. Generally, a distance from 2-5 feet creates this
kind of social distance and character. If desired, you can induce more strain in their voice by
positioning yourself further away, but don't move the mic.
4. Headphones. Use headphones to check for good separation from background sounds,
good cable connections and to make sure you are not picking-up unwanted clothing sounds
etc. Enclosed headphones will make it easier to detect unwanted background sounds.
Writing & Performing Narration
If you are thinking about writing text for yourself or someone else to perform, here are some
stages of the writing process to consider:
1. Speak to one person. The possibility of a "general audience" is a prohibitive notion when
putting an idea into words. Who we talk with has a profound impact on what and how we speak.
One technique many writers and performers use is to think of a particular person and imagine
speaking intimately with that one person. Often the addressee is someone they would genuinely
like to share the information or concern with-- someone it would give them genuine satisfaction to
"reach." Some challenge in the communication helps. Though written for one person, others can
easily interpret the resulting message. The technique also lends consistent tone and guides the
author through many content decisions. Having difficulty getting started? What's the first thing
you would say to this person when breaching the topic?
2. Keep language conversational and simple. As viewers will be attending to images as well in
the final product, its important to keep sentence structure, word choice and overall logic as simple
as possible. Make your sentences compact . Remove all unnecessary phrases and qualifiers.
Avoid long words, jargon and tech talk unless you are critiquing these specific terms.
3. Speak it out loud. What you are writing will be spoken and rhythm is crucial in hearing key
words, constructing meaning and minimizing confusion. If a sentence doesn't flow easily from
your lips, it needs work. Often it is too long or is phrased awkwardly. Always read prior
sentences to establish flow. Short, sentences that flow; this is what you want.
4. Address concepts one at a time. Determine the logical "steps" required to understand the
information, perception, or story you wish to convey. Form these mental steps into clear, one-
paragraph statements. Developing clarity and structure in narration is something like climbing the
trunk of a tree: encourage the viewer to step from central point to central point without
digressing out on any limb. Pauses in the delivery give the viewer time to think about what has
just been said. Pauses fall between paragraphs create the opportunity for visual illustration.
5. The first and last thing you say are the most memorable. The first utterance creates
expectation for what is to follow. The viewer also tends to "hang onto" or remember words that
occur at the ends of sentences. When a word is critical, try arrangements that place it at the end
of the sentence.
6. Consider the obvious. What are some common preconceptions about the topic or situation
you're introducing and discussing? In what ways are you in agreement with these attitudes and
perceptions? In what ways are you in disagreement? If the answer to the later question is "I
don't know,.. " or "very little," it’s a good idea to stop and do some more thinking about your
perceptions. Art is concerned with the growth of ideas, but its generous to acknowledge common
ground first. You might notice that narration often begins with clichés, idioms and hearsay before
moving quickly towards an unique angle.
7. Ask questions. Dramatic structure depends on saying something curious-- something that flirts
with fresh possibilities of "truth." Listeners need to sense that there are important consequences
to what is being said. Statements of truth and opinion are off-putting. One technique is to
introduce topics with questions or descriptions that go against normal expectations. Use
experiences the viewer can readily identify with. Ideas can be complex, but delivery and word
choice should remain simple.
A good reading performance is usually based on connecting the ideas contained within the
text to actual experience. When this is accomplished, qualities in the performed voice reflect the
relevance. Here are some ways to encourage this process.
1. Type or clearly print the words with double-spacing on one page so that you can add notations
and not have to turn the page during the reading. It’s hard to deliver more than one page per
2. If you are asking someone else to read your writing, permit them to change words and phrasing
to establish rhythm and flow.
3. Readings sound more natural when delivered as if to only one person sitting only a few feet
away. Close-microphone technique helps to encourage this informal tone and as essential for
intelligibility and clarity. If you're using a lavaliere, position the microphone about 9" away
from the mouth. One can also use the built-in microphone on a camcorder to record narration.
Place the camcorder on a table or in your lap with the microphone no more than 6" away. Turn
off all background sounds like fans, radios, computers, etc. Unplug the phone, ask your
roommates to be quiet and close the windows, doors to the room. If possible, position yourself
in the middle of the room away from the walls.
4. Use headphones while you are rehearsing and recording to hear enunciation, tone of voice etc. Sit
with your back straight and the script directly ahead of you. A glass of water can help keep your
pallet fresh and articulation clearer.
5. First, record 2-3 takes without trying to hide that you are reading a script. Read as if for a close
friend who is sitting only a few feet away. Pause a second or two between each take to allow for
editing flexibility. Sometimes these lower key readings seem quite honest even though they
may have miscues.
6. Before recording another batch of takes, spend some time to work on voice clarity. Pay special
attention and hear that consonant sounds of speech require more articulation than vowels.
Read your script all the way through 2-3 times making each word very distinct and clear.
7. Next, practice 3-5 additional readings with increasing speed while maintaining clarity. Basically,
read it as fast and as you can without mistakes. If a sentence or passage gives you troubles
repeatedly, try re-writing it. Dividing a complex sentence into two shorter sentences. Is the
point of the problem sentence or phrase clear? Is there a simpler way to say it?
8. Now you're ready to start recording again. Take a sip of water and a moment to recall how you
relate the content to your personal experience. Start the recorder. As you think about your
personal experience, begin reading the first line in a normal tone of voice at a relaxed speed.
Record a few more takes continuing to think about how the material relates to your own
experience as you read. If you make a mistake, go to the top of the paragraph, pause and start
the paragraph again.
9. After you have performed two or three complete takes which seem like they could be usable
material, rewind the tape and listen to all of the takes you've recorded thus far.
10. If the performances still seem to lack feeling and interest, read over the text and underline a
word or two in each paragraph which could benefit from special emphasis. Are there any
phrases that seem to sound better when delivered slower or faster? Underline phrases to be
sped up with a straight line and those to be slowed down with a squiggly line. Practice these
changes in emphasis and speed and record a few more takes.
11. If the writing seems okay but the performance is still lackluster, it’s probably a problem with the
emotional tone of the voice. Some performers use music or location recordings to get them in
mood. If you have a suitable recording, play it a while and stop it just before you start reading.
If you have another tape player and another pair of headphones, you can play the music or
location recording in one side of one earphone pair while monitoring your voice through one side
of the other pair. Don't play back the music or other sounds out loud so they will be picked-up
and recorded with the voice. This will create editing headaches or make the material unusable
even if you plan to add the same sound later.
12. Before you put away the recording equipment, record at least 30 seconds of "room tone" in case
the narration has to be edited. To record "room tone," leave the microphone and the record
level in place and sit quietly as the recorder runs for at least 30 seconds. If there is
considerable background sound like traffic where you are recording, let the recorder run for at
least 5 minutes. If you can, leave the recorder set-up as is until you have evaluated your takes.
If you have to re-record a few sections, the recordings will match much better.
Glossary of Sound Terms
Ambience The atmospheric sounds of a location usually created by many sound sources like automobile traffic or
insects at a distance. Sometimes less pleasant actual sounds are idealized by sustained strains in a
Background Sounds In soundtrack space, the least loud element(s). These low-volume sounds are often continuous
(sustained) and have emphasized lower tones. Though subtle, they can be very instrumental in
supporting a mood in the soundtrack.
Amplifier The electronic device that controls the loudness of the sound through speaker system during audio
Amplitude A term for loudness or volume in electronics. An audio signal with high amplitude is relatively loud; a
audio signal with lower amplitude is softer in volume.
Analog Audio The older but still widely used kind of audio system in which reproduction is realized through voltage
variations in continuous time. Better analog recording systems exceed the abilities of common
playback settings like television and radio.
Bass A term borrowed from low tone musical sound sources used generally to describe the lowest portion of
the audio spectrum. On stereos, boom-boxes and the like, the knob labeled "bass" affects the loudness
of the low frequency range tones.
Channel/Track In audio systems, the location where the separate parts (or elements) of a an audio recording are
stored. Stereo recordings, for example, have two channels which are stored on two separate tracks.
Close-Micing A technique of placing the microphone very close to the source of the sound to achieve more "warmth"
and a sense of intimacy. Very common for voice-over narration and other dubbed voices.
Decibel A unit used in measuring the relative loudness between two sounds.
Diagetic (On-Screen) Sound Sounds that a viewer can associate with on-screen objects, persons, etc. These sound
elements are not necessarily "original recordings" as they still called diagetic even when added or
modified later in production.
Dialogue Conversational speech as recorded sound element.
Digital Audio A newer audio system in which reproduction is realized through "sampling" audio signal voltage
conditions at very brief periods of time and converting (quantizing) the results to a numerical value.
Digital audio can provide greater fidelity but it requires elaborate playback systems for the listener to
discern the improvements.
Distortion Often a "crackling" deterioration of the sound quality that is audible when the recordist allows the
intensity of the audio signal to exceed the highest limits of the recording system.).
Dynamic Range The range between the quietest possible sounds in a recording and the loudest possible sounds. This
range is a function of the recording system. Better Analog systems have a range of about 60 decibels
while digital systems have a much wider range of around 90 decibels. However, everyday audio
playback systems like television and radio have an effective range of around 40 decibels.
Effects Most often effects are short duration (percussive), higher frequency sound elements that are located in
the middle plane (middle volume) in soundtrack space. Most effects are subtle, incidental sounds like
clicks, taps, scrapes, etc. which can be from on or off-screen sources.
Equalization The process of altering the tonality of sound through adjusting the loudness at one or more points along
the audio frequency spectrum.
Fade The lowering or raising of the volume of an element in a soundtrack. In addition to a change in volume
of the whole soundtrack, elements within a soundtrack are frequently faded. A sound element that
"fades-in" rises to audibility within the volumes of the other elements; one that "fades-out" sinks to
inaudibility within the existing volumes.
Foley A term derived from the name of an artist who developed an uncanny ability to simulate a wide variety
of sounds with assorted object, body and mouth sounds. "Foley" performances are recorded and
edited-into (dubbed) soundtracks in synchronization with on-screen events.
Foreground Sounds In soundtrack space, the loudest element(s). Because foreground sounds usually lead the
attention of the viewer, they work best when they have interesting sonic qualities. Voice and tonally
complex effects with considerable variation in amplitude (volume) and tonality (frequency) are most
Frequencies Sound is created when an object vibrates. The vibration rates (frequencies) determine the tonalities
produced and most sounds contain several frequencies. Human ears can detect frequencies from 20-
20,000 cycles per second. Lower vibration rates create low-pitched tones and higher vibration rates
create high-pitched tones.
Frequency Range The range of tones (frequencies) produced by a sound source or reproduced by a sound system.
Three regions of the range commonly referred to are: Low (20-250 Hz) Middle (250- 4,000 Hz) and High
(4,000 -20,000 Hz).
Hertz (Hz) A unit of measure of frequency. Same as cycles per second or (cps).
High Frequencies In the audio frequency range, those from around 4,000 Hz to 20,000. These frequencies add clarity
to recordings. Some common sounds with pronounced high frequencies are: consonant sounds of
human speech, sizzling, rain and cymbal percussion.
Locational Sound Called "Nat (Natural) Sound" by videographers and locational sound by film recordists, these
recordings can be of the background or middle-ground type depending on the micing distance from the
sound source(s). (Greater distance = more background). locational sounds are often used at a low
volume to contribute realistic texture behind voice or mixed with music or middle-ground effects.
Loudness The subjective perception of amplitude or volume.
Low Frequencies In the audio frequency range, those from around 20 Hz to 250 Hz. These frequencies add bass
warmth to recordings. Some common sounds with pronounced low frequencies are: traffic and
machinery "rumble," bass musical instruments and the lowest tones produced by the human voice.
Microphone A device that converts air vibrations into electrical impulses at the input or beginning of the recording
Middle-Ground Sounds In soundtrack space, element(s) whose loudness is less than that of foreground sounds but
more than that of background sounds. Subtle percussive sounds with pronounced high frequencies are
common to this plane.
Mid-Frequencies The part of the frequency range from 250 Hz to 4,000 Hz. Most of the sound that we hear comes from
this portion of the audible range but if a soundtrack has only mid-frequencies it will lack warmth (low
frequencies) and clarity (high frequencies).
Mixing The process of combining several sound elements often from different tracks during re-recording. The
"space" of the soundtrack is achieved through establishing different volume levels for the elements.
Multi-Track Recording Systems Those which utilize separately recorded and edited sound elements on different
tracks or channels for greater flexibility in creating soundtrack or music space. These systems are in
common use today even in lower-cost music and media productions.
Narration From the Latin word, "to tell," a close-mic’d voice recording in the foreground plane directing the
attention of the viewer with language and musical qualities of speech. The voice of a narrator can be
included with or without an on-screen appearance. See Voice-Over.
Pitch The subjective perception of frequency.
Post-Production The stages of making a film or video after the photography or videography stage is complete--
principally involving the logging, edit-planning, editing and final sound mix.
Presence Track A locational recording of only the background sound that is used to fill-in the gaps created by edits in
the foreground elements recorded in the same location. For interior locations, the presence is
sometimes called, "room tone."
Projection Range A portion of the audio frequency range from around 1600 Hz to 2500 Hz which has a substantial
affect on the apparent loudness of a sound. Human ears seem to be are more sensitive to frequencies
in this range and there is a pronounced "tinniness" that can fatigue the listener with prolonged
exposure. Bull-horn and many public address speakers emphasize the projection range frequencies for
Off-Screen Sound Related to off-screen space, sound elements that can be associated as coming from the space
outside of the visible limits of the film rectangle or video screen. Non-diagetic sound elements.
Resonance A phenomenon of one object being vibrated by the sound vibrations transmitted from another object. For
this to happen the "natural" frequencies of both objects must be similar.
Reverberation The audible effect of a sound reflections from nearby surfaces adding an subtle echo-like quality to the
sound. Reverberation gives the listener cues about the physical surroundings but can also significantly
deteriorate clarity when the middle-range frequencies are over-emphasized. Audio engineers call
recordings with pronounced reverberation, "wet" and since reverberation can be easily be more easily
controlled electronically in post-production, the trend is to make the original recordings "dry" with as little
reverberation as possible.
Sound Elements In multi-track sound production, the individual elements (recordings) that are made for use in one of
the three planes of soundtrack space: Foreground, Middle-Ground and Background. See individual
definitions of these planes for the sonic characteristics they possess.
Soundtrack Space The illusion of a realistic sound environment in soundtrack production through the establishment of
at least two planes of loudness (Foreground [close] and Background [far]). Often a third plane is
established with intermittent sound elements of medium volume (Middle-Ground). The illusion of space
is often created by recording the sound element(s) of each plane separately in the field and carefully
controlling their relative volumes and tonalities with multi-track equipment in post-production.
Stereophonic Sound A two channel, two loudspeaker audio system that can give the listener an increased illusion of
depth in soundtrack space through subtle timing differences and element positioning.
Sustain A sense of continuation in a sound element.
Synchronization The simultaneity of a sound with a visual event in an audio/visual temporal form like film or video.
Note that the synchronization can occur with a range of possible relations between sound and image
including literal association and metaphor. With synchronized human speech, the illusion of the talking
image can be created.
Sync Sound A term used most with dialogue scenes in film or video indicating exact image to speech synchronization
or "lip sync." The phrase is also used to clearly distinguish this precise sound-image relationship from
voice-over and other approximate sync forms.
Treble The highest two octaves of the human hearing range-- roughly 4000 Hz to 16,000 Hz. For example, the
"treble" knob on basic audio equipment affects the volume over this frequency range with less effect on
volume below 4,000 Hz.
Wild Sound A term used to distinguish sound recordings made with an audio recorder which does not have the ability to
run in exact synchronization with a motion picture film camera.
Wild-Sync A close approximation of "lip sync" achieved by carefully aligning a wild sound recording to match in the
editing stage. This dubbing process can also be applied to any sound including middle-ground effects.
Voice-Over A voice sound element in a soundtrack occurring without the image of the person producing the recording.
The person may be established visually and then continue speaking "over" other images, but
increasingly, voice-over is used without visual representation of the performer. Voice-over is common
Rob Danielson June 2007