Proposal for Concert Performers by rau18735


More Info
									The importance of the direct to reverberant
    ratio in the perception of distance,
  localization, clarity, and envelopment
                    - or -
     Measuring Auditory Engagement
                    - or –

            David Griesinger
           Cambridge MA USA
•   Part one of this talk will consist of:
     – 1. A description of the sonic perception of near/far and its relevance to music
       and drama
     – 2. A plea for acoustic designs that utilize this perception to engage the audience
       by making the music so exciting and accessible that they listen closely – instead
       of sitting back and letting it wash by.
     – 3. A proposal that engagement is encouraged both by low perceived sonic
       distance, and the ease of localizing the azimuth of musicians in an ensemble.
     – 4. A proposal that ease of localization can be used as a proxy for measuring the
       engagement of an acoustic scene.
     – 5. The development and testing of an impulse response based measure for
       ease of localization.
          • The measure is based on how our hearing processes Syllables or Notes and thus
            involves a double integral.
          • The measure integrates the log of sound pressure, not the pressure itself.
          • It includes both lateral and medial reflections.
     – 6. A proposal that envelopment is also enhanced by the presence of direct
•   Part two of this talk will describe the implications of enhancing engagement
    in both large and small halls.
     – Experiments in a particular small hall will be discussed which reveal the
       usefulness of the new measure.
•   The ability to perceive direct sound (sound that travels to the listener without
    reflection) is the key to localization, perceived distance, engagement, and
•   This talk contains concepts that contradict deeply held convictions.
     – I propose that reflections (often early reflections) in the time range of 10 to
       100ms often reduce clarity, envelopment, and engagement.
          • Whether they are lateral reflections or not!
     – These detrimental effects are easy to demonstrate, and I will attempt to do so.

•   I am NOT saying all early reflections are bad!
     – The ability to detect direct sound in the presence of reverberation is frequency
       dependent, and frequencies above 700Hz are particularly important.
     – The critical issue is the amount of early energy and its time delay. If the energy
       above 700Hz is below a critical threshold, early energy and late reverberation
       can enhance the listening experience.

•   Often reflectors directed into the audience, which absorbs the first-order
    reflection, have the effect of reducing the early energy above 700Hz in other
    areas of the hall – with beneficial results.
     – Reflectors placed near certain instruments can reduce disturbing late echoes, or
       reinforce low frequencies without increasing the energy at 700Hz.

•   The major point of the talk is clear: The ability to perceive direct sound in a
    large majority of seats is a vital component of a great hall.
     – And this perception requires close attention to the amount of reflected energy in
       the first 100ms after the direct sound.
•   The apparent closeness of a sound source is a fundamental perception for
    all of us.
     – We can tell instantly if a person talking is within a few feet of us, or further away
       – and this perception has survival value.
     – The perception of “Near” depends critically on our ability to perceive the direct
       sound – the sound that travels to the listener without reflecting.
     – Surprisingly, in a theater or hall it is possible to perceive the performers as both
       acoustically close to the listener and enveloped by the hall.
     – The best halls (Boston Symphony Hall, Concertgebouw, the front half of the
       Musikverrein) provide both, but many, perhaps most, provide only reverberation.

•   Harmonic coherence of speech and music is a principle cue for perceiving
    near and far.
     – The audio examples in the click box above show the decrease in apparent
       distance caused by increasing amounts of harmonic coherence.
     – Note that all of the examples have high intelligibility – but their emotional effect is
       quite different.
•   This perception correlates with musical clarity and the ability to localize
    sound sources.
Neural model Analysis – direct sound
Neural Analysis “ten” with 88ms reflections
Neural Analysis “ten” with 133ms
Neural model Analysis – direct sound
Add reverb at 2s RT -10dB D/R
Add reverb at 1s RT -10dB D/r
   A slide from Asbjørn Krokstad - IoA,NAS Oslo 2008
                        [With permission]

To succeed:           [in bringing new audience into concert halls…]


                 “Interesting” "Nice”

 [We need to make the sonic impression of a concert engage the
 audience – not just the visual and social perceptions. Especially
 since audiences are increasingly accustomed to recordings!]
            ENGANGEMENT, not NICE
•   At the IOA conference in Oslo, Asbjørn Krokstad (a musician, conductor,
    and Norway’s best-known acoustician) gave a lecture where he insisted that
    acousticians needed to provide engagement, not just pleasant music.
     – And not just for drama and opera, but for chamber music and symphony too.
     – At the end of the lecture he showed a picture of the Teatro Colón in Buenos
       Aires, Argentina. “Is this the concert hall of the future” he asked?
         • This hall is not a shoebox, but a large semicircular theater with a high ceiling. It ranks
           at the top in Beranek’s surveys, and the reverberation time is 1.6 seconds occupied.
         • Krokstad may have conducted there.

•   Engagement requires the independent perception of the direct sound

•   We must learn how to provide this essential element in halls.

•   I have been fortunate to hear several of the live broadcasts of the
    Metropolitan Opera in a good theater. For example, the performance of
     – The sound was harsh and dry – radio mikes coupled to directional loudspeakers.
       But you could hear every syllable of Mattila’s impeccable German. The
       performance was totally gripping!
     – This is the dramatic and sonic experience audiences increasingly demand.
         What is “Auditory Engagement”
•   “Engagement” is the perception that you are not just watching a scene from
    distance, but present in the middle of it.
     – Thus lack of distance is a critical component of presence.
•   Auditory engagement is the perception that you are acoustically close to the
    sound sources.
     – Distance is perceived directly through harmonic coherence – but experiments to
       directly measure it with subjects are difficult. However it correlates both with the
       ability to localize sound sources, and the perception presence, or musical clarity.
     – To perceive presence you must be able to localize sound sources nearly all the
     – and be able to distinguish them from one another nearly all the time.

•   Clear localization and the ability to hear most of the notes are key
    components of audience engagement.
     – Although particularly important in drama and opera, it should be (and often is not)
       a part of the emotional experience of music.
     – Being able to hear all the notes and localize the players draws the audience into
       the performance. They don’t just watch it.

•   This view of clarity is different from the one that equates clarity with
    intelligibility. Perhaps we need a new word for it.
                           Barron on Localization
    “Raising the Roof” NATURE Vol 4531 12 June 2008

•    “Much remains to be discovered about how our ears and brains process sound
     reflections. Understanding this has been complicated, for instance, by our remarkable
     ability to work out where a sound is coming from. This ability, called localization,
     works even when the sound arriving directly from the source represents only a small
     proportion of the total sound we receive, perhaps only 5% at the back of a concert
      –   [Without a visual reference precise localization is frequently not possible at this level of direct
          sound. With a visual reference we perceive what we do not hear.]

•    “Usually we are listening to speech or music, which have short elements such as
     syllables or notes that vary with time. Our brains use this time-varying information to
     extract where the initial sound comes from”.
      –   [but to do this we MUST be able to detect and process the direct sound!]

•    “The downside of this localization is that, in effect, our hearing suppresses awareness
     of sound reflections. We notice early sound reflections but are often not conscious of
     their effects - such as making sound seem clearer than it would be otherwise.” [italics

      –   [or less clear, as I believe is often the case. Barron is equating “clear” with “intelligibility” –
          but that is different than engagement. “I would rather the audience not hear the words than
          have the actors sound far away” – said a well known drama director in Copenhagen.]
        Experiment for threshold of Azimuth
                Detection in halls
                                             A model is constructed with a
                                             source position on the left, and
                                             another source on the right
                                             Source signal alternates between
                                             the left and a right position.
                                             When the d/r is less than about
                                             minus 13dB both sources are
                                             perceived in the middle.
                                             Subject varies the d/r, and reports
                                             the value of d/r that separates the
                                             two sources by half the actual
                                             This is the threshold value for
                                             azimuth detection for this model

(Above this threshold the subject also reports a decrease in subjective distance)
        Threshold for azimuth detection as a
       function of frequency and initial delay

As the time gap between the direct     As the time gap between notes increases
sound and the reverberation            (allowing reverberation to decay) the
increases, the threshold for azimuth   threshold goes down.
detection goes down. (the d/r scale
                                       To duplicate the actual perception in small
on this old slide is arbitrary)
                                       halls I need a 50ms gap between notes.
               An important caveat!
• All these thresholds were measured without visual cues

• The author has found that in a concert (with occasional visual input)
  instruments (such as a string quartet) are perceived as clearly
  localized and spread.

• When I record the sound with probes at my own eardrums, and play it
  back through calibrated earphones the sound seems highly accurate,
  but localization often disappears!
    – Without visual cues when the d/r is below threshold the individual
      instruments are localized and spread when they play solo, but collapse to
      the center when they play together.
    – My brain will not allow me to detect this collapse when I am in the concert
      hall – even if I close my eyes most of the time!
    – With eyes closed it is more difficult to separate the sounds of the
      individuals, such as the second violin and the viola. This difficulty persists
      in the binaural recording.
• For this paper we assume sound sources are localized by the direct
    – In some cases localization is aided by early reflections – but these vary
      strongly from seat to seat, and are too complex to consider here.
• For localization to be successful the direct sound must be perceived.
    – Prompt strong reflections can – and do – mask the direct sound.
• Let’s propose that the brain detects the loudness of – and the
  presence of – sounds by integrating nerve firings over a period of
    – If the integrated nerve firings from the direct sound exceed the
      integrated nerve firings from the reflections inside this time window, the
      direct sound will be perceived – and localized.
• We can calculate the threshold of perception by double integrating
  the impulse response over a fixed time window.
    The ear perceives notes – not the impulse
                 response itself.
•   Here is a graph of the ipselateral binaural impulse response from spatially diffuse
    exponentially decaying white noise with an onset time of 5ms and an RT of 1 second.
    This is NOT a note, and NOT what the ear hears!
                                                                                D/R = -10dB
                                                                                RT = 2s:
                                                                                C80 = 3.5dB
                                                                                C50 = 2.2dB
                                                                                IACC80 = .24
                                                                                RT = 1s:
                                                                                C80 = 6.4dB
                                                                                C50 = 4.1dB
                                                                                IACC80 = .20
•   To visualize what the ear hears, we must convolve this with a sound.
     –   Let’s use a 200ms constant level as an example.

•   The nerve firings from the direct component of this note have a constant rate for the
    duration of the sound.

•   The nerve firings from the reverberant component steadily build up until the note ceases
    and then slowly stop as the sound decays.
Direct and reverberation for d/r = -10dB, and RT = 1s

The blue line shows the rate of nerve firing rate for a constant direct sound 10dB less than
the total reverberation energy. The red line shows the rate of nerve firings for the
reverberation, which builds up for the duration of the note. The black line shows a time
window (100ms) over which to integrate the two rates. In this example the area in light
blue is larger than the area in pink, so the direct sound is inaudible.
              Direct and build-up RT = 2s

If we hold the d/r constant, when the reverberation time is two seconds it takes
longer for the reverberation to build up, so the light blue area decreases, while the
pink area stays constant. This makes the direct sound more audible. In a large
hall the time delay between the direct sound and the reverberation also increases,
further reducing the area in light blue. The direct sound would be even more
    Equation for Localizability – 700 to 4000Hz
•   We can use this simple model to derive an equation that expresses the
    ease of perceiving the direction of direct sound as a decibel value. p(t) is the
    sound pressure of the ipselateral channel of a binaural impulse response.
    With the previous simple assumptions, we propose the threshold for
    detection would be 0dB, and clear localization would occur at a localizability
    value of +3dB.

•   Where D is the window width (~ 0.1s), and S is a scale factor:
                                                                 S is the zero nerve firing line in the previous
                                                                two slides. It is 20dB below the maximum
        S  20  10 * log      p(t ) dt
                                                                 loudness. POS means ignore the negative
                                                                 values for the sum of S and the cumulative
                                                                 log pressure.
•   Localizability (LOC) in dB =
                             .005                              D .005                      
      S  1.5  10 * log           p(t ) 2 dt  (1 / D) *              POS ( S  10 * log       p(t ) 2 dt )d
                         0                                 0                                .005

•   The scale factor S and the window width D interact to set the slope of the
    threshold as a function of added time delay. The values I have chosen
    (100ms and -20dB) fit my personal data. The extra factor of +1.5dB is
    added to match my personal thresholds.
         Some explanation of the equation
•   The equation as written in the previous slide simply calculates the ratios of
    the pink and blue areas shown in the previous pictures.
•   The first integral on the left in LOC is the “pink” area – the sum of the nerve
    firings for the direct sound. This area is the product of the normalized
    sound pressure times the length of the window D.
     – However here we have divided through by D – so this factor is not shown.
•   The next two integrals represent the total nerve firings for the reverberation
    – the “blue” area.
     –   Since we have divided by D, a factor of 1/D is included at the beginning.
•   The second of the two integrals is the physical sum of the sound pressure
    that would exist if the impulse response was convolved with a steady
    excitation. The first integral finds the area under this curve. In the second
    integral we have excluded the direct sound – assuming this will be in the
    first 5 milliseconds.
•   The limits of the integrals have been adjusted to account for this exclusion.
    Thus the second integral goes from .005 seconds to the end, and the first
    integral is from zero to the window width minus .005.
•   I have included the -1.5dB adjustment for my personal thresholds.
                                             Matlab code for LOC
% load in a .wav file containing a binaural impulse
response – filter it and truncate the beginning
                                                      % ir_left is an ipselateral binaural impulse response,
upper_scale =20; % 20dB range for firings             %truncated to start at zero and filtered to 1000-4000Hz.
% proposed box length
box_length = round(100*sr/1000); % try 100ms
                                                      % early_time is 5ms in samples, D is 100ms in samples.
early_time = round(5*sr/1000);
                                                      % here starts the equation on the slide:
D = box_length;            %the window width
                                                      S = 20-10*log10(sum(ir_left.^2));
ir_left = data1; % the binaural IR
ir_right = data2;
                                                      early = 10*log10(sum(ir_left(1:early_time).^2));
clear data1 data2 % filter the Irs
wb = [2*1000/sr 2*4000/sr];                           % first integral is a cumsum representing the build up in
[b a] = ellip(3,2,30,wb);                             %energy when the IR is excited by a steady tone:
ir_left = filter(b,a,ir_left);
ir_right = filter(b,a,ir_right);                      ln = length(ir_left);
clear data1 data2
wb = [2*1000/sr 2*4000/sr];                           log_rvb = 10*log10(cumsum(ir_left(early_time:ln).^2));
[b a] = ellip(3,2,30,wb);

ir_left = filter(b,a,ir_left);
                                                      % look at positive values of S+log_rvb only
ir_right = filter(b,a,ir_right);
                                                      for ix = 1:ln-early_time
for il = 1:0.1*sr                                                    if S+log_rvb(ix) < 0
                       if abs(ir_left(il)) > 500                                   log_rvb(ix) = -S;
                       if abs(ir_right(il)) > 500     end
                       end                            LOC = S-1.5+early -(1/D)*sum(S+log_rvb(1:D-early_time))

ir_left(1:il) = [];
ir_right(1:il) = [];
          Use of the localization equation
• Just as RT or C80, LOC uses a measured impulse response as an
  input, with the direct sound starting at time zero. This is the only data a
  user needs to supply.
    – The measure is calibrated for a front facing binaural impulse response.
        • An omnidirectional impulse response will give lower values of LOC for the same
          seat position, due to the lack of head shadowing.
• The localization equation appears more complex than most current
  measures for room acoustics, but it has a simple, physiologically based
    – It is the ratio in dB of the number of nerve firings received by the brain from
      the direct sound in a 100ms window, divided by the number of nerve firings
      received from all reflections in the same time period.
    – It contains three experimentally based parameters: the window width D, the
      dynamic range of the nerve channels S, and the time window for separating
      direct sound from reflections (5ms). These parameters are not intended to
      be adjustable without further experimental work.
    – Matlab code for calculating LOC is simple, and available from the author.
                     Interpretation of LOC
•   LOC was developed and verified as a method for predicting when a sound
    will be accurately localized when the direct sound is much lower in total
    energy than the sum of all reflections.
•   Like C80, IACC80, and similar measures, LOC is based on a time window
    that begins with the onset of the direct sound.
     – In practice, syllables or notes that will be affected by any of these measures will
       depend on the rise time (onset time) of the sound.
     – If the sound starts gradually the precise moment of onset becomes
       indeterminate, and separating direct sound from reflections becomes impossible.
     – Thus LOC – and other such measures – are accurately predictive only for signals
       with sharp onsets.
     – Additionally, if the direct sound from a note or syllable is masked by
       reverberation from a previous sound, the direct sound will not be audible.
•   LOC predicts the audibility of the direct sound for a syllable or note with a
    rapid rise-time when there is sufficient freedom from masking from previous
     – Although musical signals often do not meet these criteria, in practice there are
       enough occasions that do meet the criteria that the LOC equation is useful.
•   Remember that for the purposes of this talk Localization is only a proxy for
    the main goal – predicting when the direct sound is sufficiently audible to
    produce engagement.
     – Preliminary results suggest LOC achieves this goal.
                Localization Equation Setup
•   The Localization Equation was developed and tested using binaural impulse
    response generated using the author’s own HRTFs.
     – The source position was 15 degrees to the left (and right) of center. Only the
       ipselateral channel was analyzed.
     – Male speech alternated from left to right with a time gap of 400ms, to allow for
       complete decay of the reverberation between each word.
     – The reverberation was generated using an independent decaying noise signal
       convolved with each of 54 HRTFs spaced equally around the listening position.
     – The HRTFs were equalized so that the azimuth zero elevation zero HRTF was
       flat from 40Hz to about 4kHz. The elevation notch at 7.8kHz was not equalized
       away, but was left in place.
     – Playback was done through headphones equalized to match a loudspeaker
       placed in front of the listener – again not equalizing the 7.8kHz notch from the
       listener’s frontal HRTF of the loudspeaker.
• Because my data show that the perception of both localization and
  near/far is mostly a high frequency phenomenon, the impulse
  response was bandpass filtered between 700Hz and 4000Hz before
  being analyzed for localization.
     – If a measured binaural impulse response is used as an input, care should be
       taken to insure the dummy head is equalized as described above.
     – Because of the importance of upward masking in localization, if the low
       frequencies in the room signal are significantly stronger than those in the
       frequency range from 700 to 4000Hz, localization is likely to be poorer than the
       equation would predict.
                     Comments on LOC
– LOC is based on the LOG of the build-up of reverberant energy.
     • This follows directly from the physiological model.
     • Current measures integrate the sound energy rather than the log of sound energy. But our
       physiology works differently. One of the consequences is that reflections that arrive early
       have more influence than reflections that arrive later.
           – As energy builds up additional reflections are not counted as strongly.
     • Reflections later than 100ms are ignored in calculating LOC.
– This is very different from C80 or C50, which count the earliest reflections a part of
  the direct sound, and compare the energy sum to the energy sum of all the later
     • In a small hall most of the energy arrives before 80ms regardless of the relative strength
       of the direct sound, so C80 and C50 are usually high.
     • But small halls can have high C80 or C50, poor localization, and a lack of clarity.
– LOC depends strongly on the delay between the direct sound and the build-up of
  the reverberation.
     • late reverberation does not impair localization of short notes.
     • The principle difference between the localizability in small halls and large halls is the rate
       at which reflected energy builds up after the start of a note.
– LOC is NOT related to EDT – even if Jordan’s original definition of EDT is used.
     • EDT is relatively independent of the initial time delay
     • When D/R < -10dB, EDT and RT are the same, as there is insufficient direct sound to be
       detected in a reverse integrated impulse response.
– LOC correlates with IACC80 – but IACC is not sensitive to medial reflections.
     • IACC is sensitive to the sum of reflected energy – not the log of energy, and thus is
       insensitive to when the reflections arrive
                      Tests with speech

A speech signal was convolved with a pair of binaural impulse responses,
such that the sound appears to come from +-15 degrees from the front.
Then a fully spatially diffuse reverberation was added, in such a way as the
D/R, the RT, and the time delay before the reverberation onset could be
               Broadband Speech Data

Blue – experimental thresholds for the alternating speech with a 1 second reverb
time. Red – the threshold predicted by the localization equation. Black –
experimental thresholds for RT = 2seconds. Cyan – thresholds predicted by the
localization equation.
Threshold Data from Other Subjects – 1s RT

                                                       Blue – new data using
                                                       absence of any
                                                       localization as a
                                                       criterion for threshold.
                                                       Red – the author’s
                                                       previous data based
                                                       on a half-angle

• Seven subjects participated in a threshold experiment at Kyushu
   – In these experiments the threshold was defined by the extinction of
     localization , not by the reduction of angle by a factor of two.
   – Consequently the thresholds are lower than they were in my previous
     experiment, and they have more variation.
• However, the data is consistent to within 3dB
       Threshold data in Japan, 2s RT

                                                       Cyan – the authors
                                                       data with a half-angle
                                                       criterion for threshold

• When the RT was raised to 2s the subjects had great difficulty with
  determining the point of extinction, which appeared to be defined
  differently for each subject.
    – There is clearly more spread in the data, and for some subjects the
      effect of added delay is reduced.
    – The criterion of reducing the apparent separation by a factor of two
      seems to give more reliable results.
    Tests with Music – and the difference between
             localization and engagement
•   The gaps between words in the speech selection were deliberately chosen
    so there would be no masking of the direct sound from reverberation. This
    is NOT the case in real music.
•   Tapio Lokki kindly made anechoic music recordings available on the web. I
    used the violin1, violin2, cello, and viola tracks from the Mozart selection to
    form a string quartet. After a lot of noise reduction and balancing it worked
    quite well.
•   In music the direct sound of succeeding notes is frequently masked by
    reverberation from the previous notes.
•   When you first listen to the string quartet at low values of D/R localization is
    impossible, and all the instruments clump together in the middle of the
    sound field.
     – But if the value of the localization equation is above 0dB the localization is not
       always masked, and given time the brain can localize each instrument.
       Succeeding notes with the same timbre are localized to the correct position.
       Thus given time, about two minutes for me, the presence equation predicts the
       localization threshold.
•   But it does NOT predict the sense of engagement. You can localize sounds
    (sort of sometimes) but the music is not clear, and the instruments seem far
    away. (Here is where we need harmonic coherence.)
     – A value of the LOC equation of +3dB does predict engagement!
           Difficulties with the music tests
• Because localization of sound sources with music depends so
  strongly on masking, experiments to determine localization threshold
  and the threshold of engagement are difficult to perform.
• When you first start to listen the localization threshold is as much as
  5dB higher than will be achieved after a few minutes of listening.
    – This is why many (even most) concert halls can give the impression of
      localization, but lack the sense of engagement.
• I found that the adaptation process could be speeded by turning off
  the reverberation and just listening for 10 seconds or so to the direct
  sound alone. This teaches the brain where to expect to hear the
  sound of each instrument. When you turn the reverberation on,
  sounds of the same timbre will be perceived in the correct location.
• The same process occurs in concerts where the visual image is
  present. The eyes train the brain where to expect each sound – and
  this is where we hear it.
• But such a visually constructed sonic image DOES NOT produce
  the impression of engagement!
             Results of music experiments
• I have a lot of data on the music experiments – because of the
  adaptation problem it is not as consistent as I would like.

• But the results are easy to summarize:

• Sufficient localization and musical clarity result for the Mozart string
  quartet at values of the localization equation of +3dB or higher.

• These values are very seldom achieved in modern concert halls (or
  opera houses.) They ARE achieved in Boston Symphony Hall over
  a wide range of seats, and in a number of other old houses.
    – The reasons for the lack of success in modern halls will be discussed in
      the remainder of this talk

• Old opera houses (with their surplus of velvet) achieve these values
  easily – but lack the late reverberation which is so popular these
    – Some opera fans – including myself – would rather have the dramatic
      intensity of the old halls, even without the reverberation.
    – This is the sound for which the operas were written.
          Direct sound and Envelopment
•   Recent work by the author in both experiments with several subjects, and in
    live lecture demonstrations with loudspeakers, have shown that the sense
    of both reverberance and envelopment increases when the direct sound is
     – Where there is no perceivable direct sound the sound can be reverberant, but
       comes from the front.
     – When the direct sound is above the threshold of localization the reverberation
       becomes louder and more spacious.
          • Envelopment and reverberance are created by late energy – at least 100ms after the
            direct sound.
          • When the direct sound is inaudible the brain cannot perceive when a sound has started.
               – So effectively the time between the onset of the direct sound and the reverberation is reduced,
                 and less reverberation is heard.
     – In the absence of direct sound syllabic sound sources (speech, woodwinds,
       brass, solo instruments of all kinds) are perceived as in front of the listener, even
       if reflections come from all around.
          • The brain will not allow the perception of a singer (for example) to be perceived as all
            around the listener.
          • In addition, Barron has shown that reverberation is always stronger in front of a hall
            than in the rear – so in most seats sound decays are perceived as frontal.
     – But when direct sound is separately perceived, the brain can create two separate
       sound streams, one for the direct sound (the foreground) and one for the
       reverberation (the background).
          • A background sound stream is perceived as both louder and more enveloping than the
            reverberation in a single combined sound stream.
                  Part 2 - Main Points
• The ability to hear the Direct Sound – as measured by LOC – is a
  vital component of the sound quality in a great hall.
   – The ability to separately perceive the direct sound when the D/R is less
     than 0dB requires time. When the d/r ratio is low there must be
     sufficient time between the arrival of the direct sound and the build-up of
     the reverberation if engagement is to be perceived.

• Hall shape does not scale
   – Our ability to perceive the direct sound – and thus localization,
     engagement, and envelopment - depends on the direct to reverberant
     ratio (d/r), and on the rate that reverberation builds up with time.
   – Both the direct to reverberant ratio (d/r) and the rate of build-up change
     as the hall size scales – but human hearing (and the properties of
     music) do not change.
   – A hall shape that provides good localization in a high percentage of
     2000 seats may produce a much lower percentage of great seats if it is
     scaled to 1000 seats.
   – And a miniscule number of great seats if it is scaled to 500 seats.
         Diffusing elements do not scale
•   The audibility of direct sound, and thus the perceptions of both localization
    and engagement, is frequency dependent. Frequencies above 700Hz are
    particularly important.
     – Frequency dependent diffusing elements can cause the D/R to vary with
       frequency in ways that improve direct sound audibility.
     – The best halls (Boston, Amsterdam, Vienna) all have ceiling and side wall
       elements with box shape and a depth of ~0.4m.
         • These elements tend to send frequencies above 700Hz back toward the orchestra and
           the floor, where they are absorbed. (The absorption only occurs in occupied halls – so
           the effect will not show up in unoccupied measurements!)
         • The result is a lower early and late reverberant level above 700Hz in the rear of the hall.
         • This increases the D/R for the rear seats, and improves engagement.
              – The LOC equation is sensitive to all reflections in a 100ms window, which will include many
                second-order reflections, especially in small halls.
•   Replacing these elements with smooth curves or with smaller size features
    does not achieve the same result.
     – Some evidence of this effect can be seen in RT and IACC80 measurements
       when the hall and stage are occupied.
•   Measurements in Boston Symphony Hall (BSH) above 1000Hz show a clear
    double slope that is not visible at 500Hz.
     – The hall has high engagement in at least 70% of the seats.
            We need better measures
• Current acoustic measures ignore both the D/R and the time gap
  between the direct (the first wavefront) and the reverberation.
   – RT, C80, and EDT all ignore the strength of the direct sound and the
     effects of musical style on the audibility of the D/R.
   – IACC comes close, but measures something different.

• LOC is an attempt to supply a simple measure for a basic human
  perception which depends on direct sound.
   – But impulse response measurements under occupied conditions are
     notoriously difficult to obtain.

• We need measures that use binaural recordings of actual
  performances as inputs.
   – And the ability to listen to these recordings to test the validity of these
     measures against the true experience.
   – Methods for recording and reproducing binaurally will be discussed in
     the next paper
   – We are working on ways to measure LOC from such recordings.
   Why do large halls sound different?

• In Boston Symphony Hall (BSH), and the Amsterdam
  Concertgebouw (CG) the reverberation decay is nearly
  identical, but the halls sound different.
   – The difference can be explained using the same model that was
     used to develop LOC.
   – Lacking good data with an occupied hall and stage I used a
     binaural image-source model with HRTFs measured from my
     own eardrums.
     Reverberation build-up and decay – from models
               Amsterdam                                       Boston

               LOC =                                          LOC =
               +6dB                                           4.2dB

The seat position in the model has been chosen so that the D/R is -10dB for a continuous note.
The upward dashed curve shows the exponential rise of reverberant energy from a continuous
source assuming exponential decay with no time gap. The solid line shows the build up and
decay from a short note of 100ms duration. Note the actual D/R for the short note is only about
The initial time gap is less in Boston than Amsterdam, but after about 50ms the curves are
nearly identical. (Without the direct sound they sound identical.) Both halls show a high value
of LOC, but the value in Amsterdam is significantly higher – and the sound is clearer.
     Comparisons of C80, C50, IACC80, and LOC
•   Conventional measures for the models of Amsterdam Concertgebouw and Boston
    Symphony Hall give the following results:

•   Amsterdam: C80 = .43dB, C50 = -2.8dB, IACC80 = .38, LOC = +6dB

•   BSH:          C80 = .65dB, C50 = -2.1dB, IACC80 = .22, LOC = +4.2dB

•   Half-Size BSH: C80 = 3.7, C50 = 1.7, IACC80 = .15, LOC = 0.5dB

•   Only the IACC80 shows that Amsterdam might have more direct sound than Boston.
    The standard Clarity measures predict the opposite – and predict that the small hall
    would have high clarity, and it does not.

•   But IACC80 is sensitive only to lateral reflections. Strong reflections from the front,
    overhead, or rear do not affect IACC.

•   An IACC of 0.22 would usually be considered low. In spite of this BSH has both
    clarity and good localization in this seat.
                        Smaller halls
• What if we build a hall with the shape of BSH, but half
  the size?
   – The new hall will hold about 600 seats.
   – The RT will be half, or about 1 second.
   – We would expect the average D/R to be the same. Is it? How
     does the new hall sound?
   – If the client specifies a 1.7s RT will this make the new hall better,
     or worse?
                      Half-Size Boston
                                              The gap between the direct and the
                                              reverberation and the RT have become
                                              half as long.
                                              Additionally, in spite of the shorter RT,
                                              the D/R has decreased from about -6 in
                                              the large BSH model, to about -8.5 in
                                              the half-size model.
                                              This is because the reverberation
                                              builds-up quicker and stronger in the
                                              smaller hall.

The direct sound, which was distinct in more than 50% of the seats in the large hall
will be audible in fewer than 30% of the seats in the small hall.
If the client insists on increasing the RT by reducing absorption, the D/R will be
further reduced, unless the hall shape is changed to increase the cubic volume.
The client and the architects expect the new hall to sound like BSH – but they, and
the audience, will be disappointed. As Leo Beranek said about the Berlin
Philharmonie: “They can always sell the bad seats to tourists.”
An existing small hall – pictures

               Note the highly reflective stage and side
               walls, deeply coffered ceiling, and relatively
               low internal volume per seat.
               The sound in many seats is muddy. Adding
               reflections or decreasing absorption only
               increases the muddiness.
                               Hall data
• The pictures show a recital hall of 65000 cubic feet (1840 cubic
  meters). Designed for 350 seats, it has currently 300 seats, giving a
  volume/seat of 6 cubic meters. There is 1400 square feet of carpet
  under the seats on the floor.

• Reverberation Time (RT) unoccupied is 1.1 seconds from 1000Hz to
  63Hz. C80, dominated by the reverberation time, is ~+5dB

• The parallel side walls of the stage provide little diffusion.

• The hall is generally liked by the audience and players, although
  there are reports of loudness and balance problems on stage.

• Musicians desire more resonance and greater clarity in the middle of
  the hall.
       Experiments with absorption and
           acoustic enhancement
•   Measurements and experiments involving various combinations of
    fiberglass panels and electronic reverberation enhancement were
    conducted in January 2009.
     – Measurements were made with three loudspeakers, three dummy heads, and a
       Soundfield microphone.
     – All musical performances were recorded with the same microphones, and with
       an array of close microphones on stage.
•   About 30 musicians participated, including faculty, staff, students from all
    three divisions, and musicians from the wider community.

•   The goal was to improve the instrumental balance on stage, reduce excess
    stage loudness, and to increase resonance and the ability to hear individual
    instruments throughout the hall.

•   With both panels and enhancement in place comments from the participants
    were favorable. Players and singers found balancing with piano was easier,
    and the middle registers of the piano were more easily heard both by the
    musicians and in the hall.
The absorptive curtains at the rear of the stage could be rapidly withdrawn. The blankets that
simulated audience could be removed in 5 minutes, along with the panels on stage. This
allowed prompt A/B comparisons. Some of the 25 LARES enhancement speakers are visible
           Results from the experiments
•   The experiments in January showed that adding fiberglass panels around
    the stage increased clarity and the ability to localize instruments in the hall,
    raising the measured value of LOC from an average of minus 1.5dB to +3dB
    or more.

•   Localization and clarity in the balcony were additionally improved by adding
    panels to the upper audience right side wall, which eliminated the strong
    lateral reflection from that surface.
     – The lower surface of this wall was already absorptive.

•   The electronic enhancement successfully compensated for the loss of
    resonance due to the panels. Without the enhancement the perceived
    resonance was reduced.

•   In a subsequent experiment with a violin-piano combination and no
    enhancement we found that just 12 fiberglass panels each 2’x6’x2” around
    the bottom of the stage noticeably improved the clarity on the floor of the
    hall, and also improved the balance for the players on stage. For this music
    the reduced resonance was not a problem.
     – These panels absorbed the first-order reflection from the back of the stage,
       which has the highest level and the shortest time delay. Absorbing this reflection
       contributed strongly to increasing LOC.
       Usefulness of the measure LOC
• LOC informs us that the primary contribution to difficulty in
  localization are the first strong reflections, regardless of the direction
  they come from.

• We initially thought that since the floor of the hall is not a significant
  source of these reflections, it is would be likely that removing the
  carpet under the seats would raise the RT without decreasing LOC
    – However LOC is also sensitive to reverberation which arrives before
      100ms, and this would be increased by removing the carpet.
    – A few later experiments suggest that removing the carpet will increase
      the reverberant level sufficiently to eliminate the improvement in LOC
      provided by the absorption on stage.

• The existence of a LOC as a physical measure can help to answer
  these questions in advance – or at least suggest that an experiment
  is needed before drastic alterations are undertaken.
      Small shoebox halls can be OK

• If the client insists on a shoebox it can work by building a
  large hall and installing a small number of seats.
   – I was just in such a small hall in Helsinki, and at least half the
     seats were OK.
• But this is not the ideal solution.
   – With a different shape nearly all the seats could have been OK –
     and it might have been less expensive.
                  Great Small Halls Exist!
                                                Jordan Hall at New England
                                                Conservatory has 1200 seats, an
                                                RT of 1.3s fully occupied. The shape
                                                is half-octagonal, with a high ceiling.
                                                The audience surrounds the stage,
                                                with a single high balcony. The
                                                average seating distance is much
                                                shorter than a shoebox hall,
                                                increasing the direct sound.
                                                The high internal volume allows a
                                                longer RT with low reverberant level.

The sound in nearly every seat is clear and direct, with a marvelous surrounding
Although the hall is renowned as a chamber music hall, it is also ideal for small
orchestras and choral performances. It was built around 1905.
The hall is in constant use – with concerts nearly every night, (and many afternoons.)
              Williams Hall, NEC
• Williams hall, in the same building, has ~350 seats in a square plan
  with a high ceiling.
• The sound from a piano sound is clear and reverberant in most, if
  not all, seats.

                                       (The audience usually sits where the
                                       orchestra is rehearsing in this picture.)
                                       The square plan keeps the average
                                       seating distance low.
                                       The high ceiling and high single balcony
                                       provides a long RT without a high
                                       reverberant level.
                                       The absorbent stage eliminates strong
                                       reflections from the back wall. By
                                       absorbing at least half the backward
                                       energy from the musicians, the stage
                                       increases the d/r.
                                       Note the coffered ceiling – similar to
                          Hard learned lessons
•   Where clarity is a problem in small halls, acousticians usually recommend adding
    early reflections – through a stage shell, side reflectors, etc.
     –   We tried this in the small hall experiments mentioned above. The sound became louder and
         less clear. Just the opposite of what was needed.
•   These measures reduce the gap between the direct sound and the reflected energy
    and decrease LOC.
     –   They increase loudness – which is usually already too high, while increasing the sense of
         distance to the performers.
     –   A better way is to add absorption, or perhaps means of deflecting the earliest reflections to
         the ceiling, or into the front of the audience where they can be absorbed.
           •   Re-direction tricks of this nature do not work well in small halls, as the second and third order
               reflections they create will arrive within the 100ms window that determines LOC.
     –   Small halls have strong direct sound and too many early reflections The reflections also
         come too quickly. Adding more reflections is exactly the wrong thing to do.
     –   Adding absorption will improve clarity but reduce the late reverberant level and the RT.
         Electronics, or more cubic volume, can restore the longer RT without decreasing the D/R
•   In practice, not everyone is aware of, or appreciates, engagement. It is mostly a
    subconscious perception. Reverberation or resonance is immediately apparent to
    everyone – which is why it has become so over emphasized in hall design.
     –   Adding absorption may not be appreciated by everyone unless the decrease in late
         reverberation can be compensated.
     –   Such compensation can be surprisingly easy. Adding a few tenths of a second to the
         reverberation time of a small hall can be accomplished electronically with very few
         loudspeakers. The result is completely transparent.
In the best halls the reverberant level is lower than
    would be expected from classical acoustics
• D/R is frequency dependent in halls, and frequencies above 700Hz
  are particularly important for engagement.
    – Surface features can be used to decrease the reflected energy level in
      the rear of the hall at higher frequencies.

• In addition, the distribution of absorption in a hall significantly alters
  the distribution of the reflected energy.
    – In a good hall absorption is highly non-uniform. A high ceiling with a lot
      of reflecting surfaces above the audience can increase RT without
      increasing the reflected energy level near the audience. The
      reverberation created tends to stay up near the ceiling.
    – This helps to keep the D/R above ~700Hz constant over a large number
      of seats.
    – Current modeling techniques may not properly calculate these effects.
        • Old fashioned light models might work better…
Light models
        I ran across these pictures while
        cleaning out my office. The top
        one is a too-simple model of the
        Philadelphia Academy of Music.
        The bottom is intended to be
        BSH, but with a single balcony.
        I abandoned light modeling
        because it does NOT provide any
        information about the time delay
        gap – nor information about the
        effects of note length on D/R.
        But it DOES provide information
        about the total reverberant energy
        compared to the direct. And very
        complex hall shapes can be
        quickly modeled.
         Hall Shapes as a function of size

                                                         It is better to use a design
A large hall like Boston    If this hall is reduced in   that reduces the average
has many seats above        size while preserving        seating distance, using a
threshold, and many         the shape, many seats        high ceiling to increase
that are near threshold     are below threshold          volume.

 Boston is blessed with two 1200 seat halls with the third shape, Jordan Hall at
 New England Conservatory, and Sanders Theater at Harvard. The sound for
 chamber music and small orchestras is fantastic. RT ~ 1.4 to 1.5 seconds.
 Clarity is very high – you can hear every note – and envelopment is good.
Retro reflectors above 1000Hz
                  Boston, Amsterdam, and
                  Vienna all have side-wall and
                  ceiling elements that reflect
                  frequencies above 1000Hz
                  back to the stage and to the
                  audience close to the stage.
                  This sound is absorbed –
                  reducing the reverberant level
                  in the rear of the hall without
                  changing the RT.
                  Another classic example is the
                  orchestra shell at the
                  Tanglewood Music Festival
                  Shed, designed by Russell
                  Johnson and Leo Beranek.
                  Many modern halls lack these
                  useful features!!!
High frequency retro reflectors
             Rectangular wall features scatter in three
             dimensions – visualize these with the
             underside of the first and second
             High frequencies are reflected back to the
             stage and to the audience in the front of
             the hall.
             The direct sound is strong there. These
             reflections are not easily audible, but they
             contribute to orchestral blend.
             But this energy is absorbed, and thus
             REMOVED from the late reverberation –
             which improves clarity for seats in the back
             of the hall.
             Examples: Amsterdam, Boston, Vienna
High frequency overhead filters
                                  A canopy made of surfaces separated by
                                  some distance becomes a high frequency
                                  Low frequencies pass through, exciting the full
                                  volume of the hall.
                                  High frequencies are reflected down into the
                                  audience, where they are absorbed.
                                  Examples: Tanglewood Music Shed, Davies
                                  Hall San Francisco
                                  In my experience (and Beranek’s) these
                                  panels improve Tanglewood enormously.
                                  They reduce the HF reverberant level in the
                                  back of the hall, improving clarity. The sound
                                  is amazingly good, in spite of RT ~ 3s.
In Davies Hall the panels make the sound in the dress circle and balcony
both clear and reverberant at the same time. Very fine…
(But the sound in the stalls can be loud and harsh.)
 The necessity of occupied measurements
• The effects of frequency dependent reflecting elements depends on
  the presence of absorption on the stage and the front of the
• Measuring the halls without absorption in these areas will not detect
  these vital effects.
• In addition, engagement is highly dependent on the D/R ratio – and
  this is also not correctly measured in an unoccupied hall.

• Thus measurement of localization and engagement requires that
  both hall and stage be occupied!
    – Impulse response measurements under these conditions are difficult to

• Measures for engagement or localization which use binaural
  recordings of live music or speech would be highly desirable.
    – I believe this is possible. If you can hear an effect, you can measure it.
      We only need to figure out how to do it.
                 Binaural Measures
                                              The author has been recording
                                              performances binaurally for years.
                                              Current technology uses probe
                                              microphones at the eardrums.
                                              We can use these recordings to
                                              make objective measurements of
                                              halls and operas.

The methods use a hearing model where the binaural signal is first filtered into
1/3 octave bands, and then is rectified and filtered.
For measures of localization a running IACC is calculated in 10ms overlapping
windows. The maximum values of 1/(1-IACC) are then plotted as a surface
over time and frequency band.
         The figure shows the number of
         times per second that a solo violin
         can be localized from row 4 of a
         small shoebox hall (~500 seats)
         near Helsinki.
         It also shows the perceived
         azimuth of the violin
         As can be seen, the localization –
         achieved at the onsets of notes –
         is quite good, and the azimuth,
         ~10 degrees to the left of center,
         is accurate.
Localization – surface1
                Here we plot the same
                data for the violin as a
                function of (inverse)
                azimuth, and the third
                octave frequency band.
                As can be seen, for this
                instrument the principle
                localization components
                come at about 1300Hz.
                Interestingly, Human ability
                to detect azimuth, as
                shown in the threshold
                data, may be maximum at
                this frequency.
Localization, Surface 2
                 Here we plot 1/(1-IACC) as
                 a function of time and third
                 octave band.
                 Note that the IACC peaks at
                 the onset of notes can have
                 quite high values for a brief
                 This happens when there is
                 sufficient delay between the
                 direct and the reverberation,
                 and sufficient D/R.
Localization – a poor seat
                 Here is a similar diagram for
                 a solo violin in row 11 of the
                 same hall. The sound here
                 is unclear, and the
                 localization of the violin is
                 As can be seen, the number
                 of localizations per second is
                 low (in this case the value
                 really depends on the setting
                 of the threshold in the
                 Perhaps more tellingly, the
                 azimuth detected seems
                 This is really just noise, and
                 is perceived as such.
    Measures based on harmonic coherence
•   In the absence of reflections the formant frequencies above 1000Hz are
    amplitude modulated by the phase coherence of the upper harmonics. This
    modulation is easily heard, creating the perception of “roughness” (Zwicker).
     – Reflections randomize the phase of these harmonics.
•   The result is highly audible, and is a primary cue for the distance of an
    actor, singer, or soloist.
•   This effect can be measured with live recordings, and is sensitive both to
    medial and lateral reflections.

                                         This graph shows the frequency and
                                         amplitude of the amplitude modulation of a
                                         voice fundamental in the 2kHz 1/3 octave
                                         band. The vertical axis shows the effective
                                         D/R ratio at the beginning of two notes from
                                         an opera singer in Oslo to the front of the
                                         third balcony (fully occupied.) The sound
                                         there is often muddy, but the fundamental
                                         pitch of this singer came through strongly
                                         at the beginning of these two notes. He
                                         seemed to be speaking directly to me, and
                                         I liked it.
Another singer
       From the same seat the king (in Verdi’s
       Don Carlos) was not able to reach the
       third balcony with the same strength.
       Like the localization graph shown in a
       previous slide, this graph seems to be
       mostly noise.
       The fundamental pitches are not well
       defined. The singer seemed muddy
       and far away.
       His aria can be heart-rending – but
       here it was somewhat muted by the
       acoustics. We were watching the king
       feel powerless and forlorn. But we
       were not engaged.
              Some demos of eardrum
•   These recordings have been equalized for loudspeaker reproduction. You
    may be able to judge clarity and intelligibility over near-field loudspeakers.
     – Accurate headphone reproduction requires headphone equalization
     – If probes are available the method described here will work,
     – A method which uses equal loudness curves will be described later in this paper.

•   opera balcony 2, seat 11
     – Moderate intelligibility, reverberant sound.
     – OK for non-Italian speakers with subtitles
•   opera balcony 3, seat 12
     – Poor intelligibility, very reverberant
•   opera standing room
     – Deep under balcony 2 – good intelligibility
     – This was preferred by Italian speakers
•   A concert hall – row 8 (quite close)
     – Very good sound. Not so good further back.
•   Performance venues should maximize engagement over a wide range of seats, not
    search for ideal values of RT or C80. To achieve this goal the direct sound must be
    perceived by the brain as distinct from the reflected energy – and this includes early
    reflections from all directions.
     –   Engagement is largely a monaural perception – a spatial property that is sensitive to medial
         reflections. It can be heard with only one ear (and measured with one microphone).
     –   But it is essential to measure with both hall and stage OCCUPIED!
•   The perception of reverberance and envelopment also depends on the audible presence
    of direct sound.
•   The audiblity of direct sound depends on the D/R ratio above 700Hz, and the time delay
    of reflections in the first 100ms.
     –   Hall sound can often be improved by frequency dependent reflecting elements.
•   The optimum value for the D/R ratio depends on the hall size –
     –   The D/R ratio must increase as hall size is reduced if clarity, localization, and the sense of
         envelopment is to be maintained.
     –   D/R and engagement can be increased by decreasing the average seating distance, decreasing
         the reverberation time, increasing the hall volume, or by careful use of rectangular diffusing
     –   This is particularly true in opera houses and halls designed for chamber music.
     –   A 1.8 second reverberation time is NOT necessarily ideal in a 1000 seat hall!!! Remember that
         changes in reverberant LEVEL (D/R) and initial time delay are more audible than changes in RT.
•   To maintain clarity, low sonic distance, azimuth detection and envelopment in a small
    hall (and many large halls) it is desirable to reduce the average seating distance, and
    widely diffuse or absorb the earliest reflections, whether lateral or not.
     –   The best small halls do this already.
•   Most current hall measurements ignore both the D/R and the time delay between direct
    sound and reverberation.
     –   LOC is an attempt to rectify this lack. Measures that use musical signals as input are being
         developed. They need to be used if the success rate of current hall design is to be improved.

To top