3D AUDIO
In the past year, 3D audio has received plenty of hype and media coverage. But the question remains: Are
game developers using the technology or is it just a bullet point on the sound card box? The response from
developers to whom I spoke in the process of editing this article leads me to believe that 3D audio is
picking up steam. When asked if they were supporting 3D audio in their titles, most PC game developers
responded positively.
SingleTrac’s Sandi Geary said she was definitely supporting 3D audio. “In our upcoming PC title, we are
looking to support multiple 3D audio technologies. We will be exploiting Creative Labs’ solution as well as
providing support for A3D. We are also entertaining the idea of supporting additional software solutions.
On Outwars we used QSound only, and we are looking to expand our 3D support so more players can
appreciate the benefits of 3D audio.”
Over at EA, Alistair Hirst is also a proponent of 3D audio. “We support DirectSound3D as much as
possible. We also support Creative’s EAX. On the Playstation, we support real-time Dolby Surround Sound
encoding,” Hirst said.
Raven’s Chia Chin Lee put it simply: “All future Raven Software games will most likely support 3D audio
in one way or another.”
As to the benefits they believe 3D audio provides to the consumer, the responses were not unanimous.
There seems to be some difference of opinion when it comes to how advantageous 3D audio is to game
players. Sandi Geary feels that 3D audio is best used to enhance game-play cues. She said that SingleTrac
tries to use 3D audio for two main purposes. The first is to provide game-play cues that are enhanced by 3D
sound. These might include weapon fire coming from behind you or potential threats that can’t be seen.
Geary thinks this is by far the most compelling reason to use 3D audio. The second purpose for using 3D
audio is to place the player in an immersive 3D world — to help suspend disbelief. Geary says that to this
end, her company tries to use 3D sound to place critical ambient sounds around the player.
Alistair Hirst pointed out that 3D audio helps heighten the sense of realism in a game. “It is useful in
communicating information about events going on in the game, such as on which side of you a police car is
trying to pass,” he said.
Microsoft’s Matthew Lee Johnston explained that “traditional stereo has been used to localize a sound in
the player’s forward visual field. What 3D audio adds is the ability to localize the sound behind the player,
which is arguably way more important, since the sound is usually the only way to provide the player with
feedback about what’s going on behind them. Some games use maps and have ‘rear view’ options, or even
let you pan your visual field around to look, but using 3D audio to position an object behind the player is
not only more immediate and instinctual, but it allows the player to focus simultaneously on the fore and aft
perspective. This makes 3D gaming much more fun, and allows the designers to build a game-play
environment that’s 360 degrees wide.”
And there were those who were less enthusiastic about 3D audio.
Independent composer Kurt Harland declared simply, “I have always been and continue to be unimpressed
not only with the state of 3D audio technology, but the concept of it in general. I do believe that the advent
of left/right was a big advance. Games such as Descent and Doom benefited from this immensely. The
attempt to add a sense of in-front/in-back and up/down is, I believe, impractical and relatively unimportant.
“Currently, unless you have a perfectly placed surround sound system, the 3D aspects of sound are
essentially lost. Even if systems were developed to make 3D sound more practical, I doubt it would be
important enough to game play, and hence game players, to get consumers to spend money on the physical
equipment necessary, such as speakers and/or headgear.”
Over at DMA Designs, Colin Anderson was also bearish about 3D audio, for different reasons. “Aureal,
Creative Labs and Dolby are doing great things to advance 3D audio in interactive titles. All of these
technologies are very good at the moment and will improve as hardware becomes faster. However, I can’t
help thinking that 3D sound, in some respects, is a bit of a distraction from what we should really be trying
to achieve. Manipulation of sounds in 3D is useful and can add a lot to an already good game, but I’d much
rather our industry concentrate its efforts on fully exploiting stereo sound first, before we start to move on
to multi-speaker sound. I believe there’s still a lot that can be done with stereo sound. It’s just that from a
hardware manufacturer’s point of view, the technology to make 3D sound happen is much cheaper to
produce than that needed to push stereo to its limits.”
SIDEBAR:
MULTI-SPEAKER SOUND
A topic of great interest to game audio developers and audio technology vendors is multi-channel
sound. Let’s find out whether or not the community believes this will take off in the next 18 months.
Independent programmer Martin Wilde observed that “one of the more interesting developments
has been the whole area of multi-speaker playback environments. What was once only a two-driver
(speaker or headphone) technology is now being adapted to included 4-channel, 5.1 and more
delivery arrangements. To some, this runs contrary to creating the virtual space entirely by
processing means, but I think this is a holdover from academia, and is caused by the belief that a
consumer would not pop for more than two speakers. As computers and entertainment systems
become more integrated, 3D technologies must adapt to fit what people have in their homes. It
becomes quite a challenge for a 3D provider to support all these different and often vague formats,
but that’s where we are. I think we will continue to see a number of new and improved hybrid
solutions to 3D audio, which include both 3D processing and sophisticated output device
coordination.”
Well-known game composer Tommy Tallarico is enthusiastic about multi-channel game audio. “I’ve
been working with DTS for over a year now trying to incorporate 5.1 (six-channel) interactive ‘on-
the-fly’ processing,” Tallarico said. “I’ve also spoken with Dolby concerning the same thing. It is my
opinion that 5.1 is going to be the way to go for interactive surround. One of the big reasons is that
DVDs already support this, the movie industry already supports this, and consumers are starting to
support this! Sony has already announced that their next machine will support this. (And I strongly
believe that Sony’s next machine will be in the marketplace for at least the next five to seven years.)
Hell, even Nintendo’s next machine is going to be DVD! The big question is how to deliver the
multiple channels. Where does the ‘on-the-fly’ decoding happen? A sound card, a separate box, a
piece of stereo equipment, or how about the speakers themselves? All ways are being looked at and
tried. It’s all about what consumers are willing to do or spend to enhance their multimedia
experience. However, I can tell you this: I’ve heard every damn 3D demo over the last ten years and
nothing has come close to six separate speakers flying sound around in real time. We have the
technology. It’s getting to the consumer that’s the hard part.”
Microsoft’s program manager for DirectSound and DirectMusic, Brian Schmidt, was candid about
his assessment of multi-channel systems, saying “I’d love to see multi-channel take off, but I just
don’t see it happening for the masses. We’ll have the technology (in several versions) that supports it,
and we will be able to provide the serious gamer with it, but within 18 months? It’s too soon.”
Some professionals believe that by requiring multi-channel support, games will help speed up
consumers’ adoption of the technology. Guy Whitmore of Whitmoreland Productions says he was
happy to see low-cost, surround-sound card and speaker combinations offered to consumers. “That
direction needs to continue if computers with surround capabilities are to become the norm for
gamers,” he said. “I would eventually like to count on most gamers having surround [sound
capabilities], much like our visual counterparts count on and even require a 3D graphics card. Then
the fun can begin: game-crucial information could rely on placement of the audio.”
And then there are the standards. Dolby’s John Loose is trying to push standards forward, and
wants standards to address multiple platforms. “Game developers need unified audio formats that
can travel between PCs and console devices,” Loose asserted. “The USB and Firewire developments
in the next few months will create more ways to get multichannel audio out of the box, be it PC or
console. DirectX 7 should give developers more access to multi-channel delivery of sound effects in
combination with linear Dolby Digital.”
(end sidebar)
From Past to Present in 3D Sound
In order to understand where we are headed, it is important to understand a bit about the history of 3D
audio in the PC world. In 1997, around the time of DirectX 3, there were two main APIs for interactive 3D
sound, A3D 1.0 and DirectSound3D. It was the intent of both APIs to let game developers place sounds —
mostly sound effects — in a (mostly) 360 degree sound field around the player. Both APIs expected only
two speakers and therefore “virtualized” the extended sound field through the use of various algorithms that
simulated the intricate cues that allow humans to place the source of a sound in space. These cues are
essentially made up of small time delays and frequency shifts caused by the shape of the human head and
ear and the distance between the two ears. A3D 1.0 was (and still is) a proprietary API that mainly
supported the small but growing base of sound cards with Aureal hardware (Vortex 1 chips, the predecessor
to the Vortex 2 chip as profiled in Table 1).
The version of DirectSound3D that shipped with DirectX 3 was truly a creature of Microsoft’s past. The
idea was that Microsoft would supply the 3D virtualization algorithms as a software solution and hardware
companies could accelerate these — and only these — algorithms, if they chose to. This presented two
major problems. The first was that, by all accounts, the Microsoft algorithms were not very good, and
second, a number of companies would likely go out of business as their intellectual property was locked out
of the market. Microsoft’s contention was that it needed to provide a consistent experience for game
players, even if that consistency came at the expense of quality.
In response, the IA-SIG’s 3D Audio Working Group (a group of 3D hardware and software vendors
including Aureal Semiconductor, Creative Labs, Diamond Multimedia, DiamondWare, Gulbransen,
Spatializer Audio Labs, Texas Instruments, VLSI Technology, S3, QSound Labs, and Rockwell
Semiconductor) got together and created a work-around called 3Dxp. As stated in its June 1997 release
document, “3Dxp is a simple layer added to DirectSound 3D and to the game/application which enables
parameter passing to external hardware or software.” By the time 3Dxp was released, Microsoft had
already announced that DirectX 5 would support such functionality, so 3Dxp did not have a big impact on
developers directly. But the main purpose of 3Dxp was to open up DirectSound3D, and it accomplished the
group’s goals.
3Dxp subsequently evolved into a more general-purpose document called “The IA-SIG’s Interactive 3D
Audio Rendering and Evaluation Guide — Level 1” (I3DL1). The main purpose of I3DL1 was to help
define a consistent behavioral model for interactive 3D sound and to help consumers and magazine
reviewers differentiate between true 3D sounds and the “psuedo-3D” sound (stereo enhancement) that was
popular at the time. This effort was quite successful and the results rolled into DirectX 5. As a result, the
PC market for hardware-accelerated 3D audio hardware and the support of this hardware by game
developers has grown to the point where most current 3D games support at least a minimal level of
interactive 3D audio.
As is always the case, this standard did not call for an end to innovation. As the 3D Audio Working Group
reconvened to begin discussing enhancements to this basic form of 3D audio, Creative Labs and Aureal
were busy launching their own proprietary APIs. Creative launched its Environmental Audio Extensions
(EAX) and Aureal released A3D 2.0. It was the intent of both APIs to extend the basic positioning of
individual sounds in space by adding additional aural cues as to the nature of the environment in which the
sound was heard. In other words, both APIs attempted to model the acoustical properties of a space — what
musicians would call “reverb.” In general, reverberation can be thought of as the complex sound behavior
that is created by the many reflections of a source sound as it bounces off of the walls or surfaces in a space
and is returned to the listener after some delay. The main difference between the two APIs came in their
respective approaches.
EAX provided a simpler model in which acoustical spaces were abstracted into a small number of basic
preset algorithms (“presets”) such as “large hall” or “tiled bathroom.” The reverberant behavior of the
sound within these acoustical spaces could then be tailored through a small number of high-level
parameters that were exposed by the API. This model had a number of advantages. First, most sound
designers were already familiar with this level of abstraction through the use of reverb units which featured
the same type of abstraction and parameterization. Second, presets allowed complex spaces to be
represented with a small number of parameters, as long as an EAX reverb engine was present to fill in the
details.
A3D 2.0, on the other hand, took a much more literal approach to acoustical modeling, using a technique
called “wave tracing.” In this model, each individual reflection or echo of the sound and its actual path to
the listener would be tracked separately. The advantage of this approach is that it is much more accurate in
terms of the specific geometry of the space than the generalized reverberation presets of EAX. It also
supports many complex behavioral attributes of sound, such as the effects of obstruction and occlusion. (In
this usage, obstruction and occlusion refer to modifications to the characteristics of a sound when it has to
pass through or around solid objects.) Unfortunately, with this audio accuracy came a great deal of
complexity, and complexity generally translates into increased processor speed or hardware acceleration
requirements. It also makes the API somewhat difficult to implement. For wave tracing to work, the entire
3D geometry database for the game must be exposed and fed to the A3D 2.0 driver. It also made it difficult
for sound designers to take creative liberties with the reverberation in order to augment reality or to
audition the effects of A3D 2.0 without fully integrating both the sound design and the geometry into the
game engine.
After fighting it out in the market for some time, both companies, along with the rest of the 3D Working
Group (which then included Aureal Semiconductor, Euphonics, Microsoft, QSound Labs, Rockwell
Semiconductor/ Conexant Systems, Sensaura, Angel Studios, and Spatializer Audio Labs, among others)
reached an agreement and constructed a new guideline called Interactive 3D Audio Level 2 (I3DL2).
Commenting on the agreement, Sensaura’s Peter Clair lamented the fact that the battle had to take place at
all. He said, “It is always unfortunate to have competing standards for what is basically the same feature, a
good example of this being the ‘battle’ for environmental audio. This leads to industry-wide confusion that
may cause content authors either to be forced to support multiple standards or to choose to opt out of
support for 3D audio in the application. Ultimately, the ‘cost’ of multiple standards is always passed on to
the game consumer. Although we have seen API wars between vendors of competing 3D audio products,
the industry has shown that it can work together for the common good. The recently completed I3DL2
specification from the IA-SIG is a very positive example of such cooperation.... We would like to think
that, in the future, the industry will continue with such cooperative development of standards. This
approach still allows vendors to compete on features and performance but doesn’t force content providers
into considering (and possibly rejecting) support for multiple APIs.”
Scott Willing of QSound took a slightly more confrontational view. “I don’t think it’s any secret that
proprietary APIs are generally used not just to access vendor-specific features (some of which are valuable,
some not so valuable). They’re mainly a mechanism that allows the vendor to claim compliant titles as their
own — a classic marketing strategy to sell hardware. But we think it’s weak. In our view, if you have a
useful API extension, you should bring it to the IA-SIG and/or Microsoft and open it up for everyone to
support if they’re able or inclined to.”
The I3DL2 guideline was drafted in large part by Jean-Marc Jot of Creative Labs and owes much in its
form and format to Creative’s EAX technology. According to the specification, I3DL2 is essentially an
extension of I3DL1 that adds the following enhancements:
• An environment reverberation model (conveying a sense of the space where the listener is located).
• An enhanced distance model that takes advantage of the reverberation cues in the current environment.
• Occlusion and obstruction models for rendering the muffling effects of obstacles inside environments or
partitions between environments. The guideline defines obstruction as “the muffling (attenuation and
filtering) of sound by an object between the listener and the source, contained within a common
environment (room).” It defines occlusion as “the muffling of sound by a partition or wall separating two
environments.”
The guidelines’ Reverberation Response Model seen in Chart 1 shows how reverberation is broken down
into three sections:
1. Direct-path sound from a sound source. In other words, the sound that reaches the listener directly, via
free propagation in the air or other medium, possibly through or around an obstacle, but not via reflections
on walls or obstacles.
2. Early reflections. These are the reflected sounds that reaches the listener first, which can be represented
as a set of successive discrete “echoes.”
3. Late reverberation. The reflected sound that follows the early reflections.
insert Chart 1 here
11chart1.gd
caption: Chart 1. I3DL2 Reverberation Response Model (reprinted with permission from the IA-SIG
I3DL2 guidelines).
The I3DL2 guideline passed final review on September 1, 1999, and is available from the IA-SIG web site
(http://www.iasig.org) under the “Working Groups” section. As is consistant with the “new” Microsoft, it is
strongly suspected that I3DL2 will be adopted as the primary method for controlling reverberation in
upcoming versions of DirectX. This is further supported by Microsoft’s recent announcement that it will
license some of the EAX reverberation algorithms as a software fallback for unaccelerated machines.
Where Do We Go From Here?
With widespread support in place for I3DL2 and the growing base of 3D audio hardware acceleration, the
cycle now returns to vendor innovation. With this in mind, let’s look at the latest offerings from the key
technology providers.
Sensaura. Sensaura’s Peter Clair says that the most recent features and extensions to their technology help
content authors create accurate 3D sound effects that enhance the immersive qualities of a game. These
features are MultiDrive, MacroFX, EnvironmentFX and ZoomFX.
MultiDrive is part of Sensaura’s 3D positional audio technology, which incorporates a patented multi-
speaker capability for creating 3D sound fields using four or more loudspeakers. Clair says this is achieved
by using each speaker pair to create a complementary sound field, each addressing the frontal and rearward
hemispheres, respectively. “This creates robust 3D audio, even in difficult listening environments,” Clair
claims. “The frontal and rearward ‘sound-hemispheres’ are seamlessly integrated together, such that any
number of virtual sound sources can be made to move around the listener, controlled by DirectSound3D.”
The MacroFX algorithm lets content creators create near-field effects. Sounds can be made to seem very
close, appearing to move from the loudspeakers up close to the listener’s head — “ultimately even
whispering into the ear,” said Clair. This is achieved by carefully modeling the sound energy distribution in
three dimensions around the head from all spatial positions, and transforming this data into an algorithm.
The algorithm is integrated directly with the Sensaura processing and controlled by DirectSound3D, so it is
transparent to content authors, who can create new effects. “For example, in a flight simulator, one could
create the effect that the listener is a pilot hearing air-traffic information via headphones. In a combat game,
it could be used to make bullets and missiles appear to fly close by the listener’s head. These new effects
are unique to Sensaura,” Clair explained.
The Sensaura EnvironmentFX works with both the sounds themselves and the acoustic contributions of the
environment to give clues about the positions of sound sources. To use the Sensaura EnvironmentFX
reverb engine, a content provider uses either the EAX or I3DL2 property set extensions to DirectSound3D.
Clair said that over the next 18 months, Sensaura will introduce several extensions to its technology to
reflect the changes in the audio industry, based around observations of sound effects that increase what he
calls the “flinch factor” of a game — that is, sounds that make a player duck to avoid a bullet or jump to
avoid a huge thundering tank. ZoomFX is such an extension.
“Conventional 3D positional audio uses head-related transfer-function (HRTF) processing to create virtual
sound sources, but these synthesized virtual sources are point sources of sound,” Clair explained. “In
reality, sound is often emitted from large-area sources, or from composite sources which might contain
several individual sound generators. Large-area and composite sources enable much more realistic sound
effects than point sources can provide. We will enable programmers to use this feature via a property set
extension to the DirectSound3D API, in a similar manner to EAX.”
QSound. QSound’s Scott Willing said that his company was taught a dramatic lesson about the importance
of vendor-neutral APIs in the early 1990s by the sound card and chipset vendors who initially licensed
QSound technology. At that time, the processing capability of first-generation PC digital signal processing
hardware was minimal, and each implementation was totally different. “The idea of a common API was
little more than a dream,” Willing said. “That sent us back to the drawing board to create hardware-
independent 3D audio rendering in software — a decision that has served us very well.”
The result of QSound’s labors in this area was QMixer, a licensed SDK that provides software 3D
rendering and a high-level API. QMixer talks to accelerator hardware, if available, through DirectSound3D
and EAX, and the company says that I3DL2 support is in the works. Willing says that to the extent that it is
practical, QMixer is “vendor neutral” and makes the best use of available acceleration, regardless of
manufacturer. QMixer can manage resources, such as dynamically allocate hardware and internal software
mix/positioning/reverb engine channels and features. “The ultimate goal of QMixer is to provide the best
possible 3D experience with the hardware on a given system — even if it’s only a plain stereo sound card
— with the minimum of development hassle,” Willing said.
QSound is also working with RAD Game Tools to license the Q3D software engine, in the form of a 3D
“provider” for the Miles Sound System, to developers. “Miles is obviously a very popular API, and in
addition to digital audio, it covers stuff like MIDI,” commented Willing. “Miles and QMixer tend to appeal
to different groups of developers, so both companies felt that it made perfect sense to provide an option for
those who wanted the Miles API and soft QSound.”
The company also provides a free SDK called QMDX, which is a special version of QMixer with stereo
(but not 3D) capabilities. Willing said that the recent addition of ambient acoustic rendering effects (such as
reverberation) and multi-speaker output capability are key new features across the board for his company,
but that they haven’t propagated into all products yet.
Aureal. This company is a long-time OEM supplier and licenser of 3D audio technology. But the big news
from Aureal is that they have moved to a direct-to-consumer business model. Aureal has a new series of
sound cards appearing with the Aureal brand name. The cards are based upon a new version of the Vortex 2
chip, which is reported to be much faster than the current model. The high-end SQ2500 hit retail shelves in
mid-September, and features the new Vortex 2 chip and a digital S/PDIF output. Also, Aureal will be
entering the powered-speaker market with 250W 2.1 and 4.1 powered-speaker systems. These systems are
slated to be sold through Aureal’s web site.
On the software side, work continues on the A3D 2.0 API. Suniel Mishra, Aureal’s A3D applications and
support manager, has been hard at work on tools for game programmers which will help process game-
level geometry into easy-to-use formats for A3D. Beyond the current wave-tracing model, Aureal also
reports that it will work on a geometry-based reverb engine which will, according to Suniel Mishra,
“generalize reflection beyond the second order… and catch all EAX calls.” These new features will be
available on current A3D 2.0 hardware through a software-based driver update. I3DL2 support is also
planned.
While Aureal is not planning to design any wave-tracing tools for game sound designers, it is continuing to
work with the 3D web community Flatland (http://www.flatland.com). Aureal’s goal is to provide simple
tools to add A3D wave tracing to Flatland’s VRML-like Sputnik authoring tools.
Creative Labs. George Thorn says that his company has a simple goal: “More and better!” What does that
translate into? Thorn explains: “Creative has proven its ability to innovate with the E-mu 8000 chip, and
more recently the E-mu 10K1 chip. You can expect to see further VLSI development in the coming
months. For developers, we are working on an exciting project called ‘Eagle.’ This will be an extremely
powerful software tool for audio designers and level designers, and it complements the EAX API, so we’ll
be working at both the software and hardware levels to develop new technologies.”
Eagle is an interesting tool that bridges the gap between programmers and sound designers for 3D audio
design. For more information about Eagle, see the section, “Creative Labs’ Eagle Prepares to Fly.”
While many people purport to be 3D audio experts, nobody truly knows just how far it will go. Is it just
marketing hype, or is it essential game technology? Only time and consumers will decide. What is known is
that new and improved 3D audio technology will continue to be developed by vendors, sound designers
will continue to experiment with them, and if we are lucky, better tools will come on the scene to leverage
the technology into game players’ satisfaction.
SIDEBAR:
CREATIVE LABS' EAGLE PREPARES TO FLY
insert images:
11eagle1.gd
Figure 1. The Eagle geometry editor.
11eagle2.gd
Figure 2. A typical file hierarchy in Creative Labs’ Eagle.
The idea behind Creative Labs’ Eagle is to let game programmers export a game’s level geometry
into, essentially, an EAX editor for the sound designer. Currently the tool supports the DirectX .X
file format and Unreal files. Support for other common formats is planned. If the game uses a
proprietary format, Creative offers an SDK which game programmers can use to create Eagle
import components (.COM) that can easily be put in their data pipeline. Once the geometry is
imported, it appears as a wireframe rendering (Figure 1).
The sound designer uses the provided graphical tools to define various areas and portals (openings
between areas), and then assigns EAX parameters to them. The tool uses a hierarchical workspace
model that keeps all of the various EAX parameters organized and transportable (Figure 2). The
really exciting aspect to Eagle is that it lets the sound designer position and trigger sound effects in
the defined 3D audio environment. This feedback can then be used to tweak and iterate the 3D audio
environment without the intervention of the game programmer.
Once the sound designer is satisfied with the workspace, it is returned to the game programmer. The
game programmer then installs the accompanying EAX Manager API. The EAX Manager essentially
takes x, y, and z coordinates (such as those commonly passed to DirectSound3D) and returns the
appropriate EAX parameters to the game program. The game program passes them to the EAX-
enabled sound engine and everything works as the sound designer intended. This is exactly the kind
of tool that is needed to allow sound designers to get the most out of today’s audio technology.
New Eagle File Types
.ASP (Audio Spatial Partition): A file representing the geometry of an imported “map.”
.ENV (Environment File): A unique reverb “Patch.”
.SEF (Source Environment File): A set of properties governing the behavior of a sound effect.
.EAW (Environmental Audio Workspace): A file containing all of the above, organized for a
particular project.
.EAX (Environmental Audio “X” File): The final instruction file delivered to the programming team.
(end of CREATIVE LABS EAGLE SIDEBAR)