The importance of the direct to reverberant ratio in the perception of distance, localization, clarity, and envelopment - or - Measuring Auditory Engagement - or – Near/Far David Griesinger Consultant Cambridge MA USA www.DavidGriesinger.com Introduction • Part one of this talk will consist of: – 1. A description of the sonic perception of near/far and its relevance to music and drama – 2. A plea for acoustic designs that utilize this perception to engage the audience by making the music so exciting and accessible that they listen closely – instead of sitting back and letting it wash by. – 3. A proposal that engagement is encouraged both by low perceived sonic distance, and the ease of localizing the azimuth of musicians in an ensemble. – 4. A proposal that ease of localization can be used as a proxy for measuring the engagement of an acoustic scene. – 5. The development and testing of an impulse response based measure for ease of localization. • The measure is based on how our hearing processes Syllables or Notes and thus involves a double integral. • The measure integrates the log of sound pressure, not the pressure itself. • It includes both lateral and medial reflections. – 6. A proposal that envelopment is also enhanced by the presence of direct sound. • Part two of this talk will describe the implications of enhancing engagement in both large and small halls. – Experiments in a particular small hall will be discussed which reveal the usefulness of the new measure. – • The ability to perceive direct sound (sound that travels to the listener without reflection) is the key to localization, perceived distance, engagement, and envelopment. Warning! • This talk contains concepts that contradict deeply held convictions. – I propose that reflections (often early reflections) in the time range of 10 to 100ms often reduce clarity, envelopment, and engagement. • Whether they are lateral reflections or not! – These detrimental effects are easy to demonstrate, and I will attempt to do so. • I am NOT saying all early reflections are bad! – The ability to detect direct sound in the presence of reverberation is frequency dependent, and frequencies above 700Hz are particularly important. – The critical issue is the amount of early energy and its time delay. If the energy above 700Hz is below a critical threshold, early energy and late reverberation can enhance the listening experience. • Often reflectors directed into the audience, which absorbs the first-order reflection, have the effect of reducing the early energy above 700Hz in other areas of the hall – with beneficial results. – Reflectors placed near certain instruments can reduce disturbing late echoes, or reinforce low frequencies without increasing the energy at 700Hz. • The major point of the talk is clear: The ability to perceive direct sound in a large majority of seats is a vital component of a great hall. – And this perception requires close attention to the amount of reflected energy in the first 100ms after the direct sound. Near/Far • The apparent closeness of a sound source is a fundamental perception for all of us. – We can tell instantly if a person talking is within a few feet of us, or further away – and this perception has survival value. – The perception of “Near” depends critically on our ability to perceive the direct sound – the sound that travels to the listener without reflecting. – Surprisingly, in a theater or hall it is possible to perceive the performers as both acoustically close to the listener and enveloped by the hall. – The best halls (Boston Symphony Hall, Concertgebouw, the front half of the Musikverrein) provide both, but many, perhaps most, provide only reverberation. • Harmonic coherence of speech and music is a principle cue for perceiving near and far. – The audio examples in the click box above show the decrease in apparent distance caused by increasing amounts of harmonic coherence. – Note that all of the examples have high intelligibility – but their emotional effect is quite different. • This perception correlates with musical clarity and the ability to localize sound sources. Neural model Analysis – direct sound Neural Analysis “ten” with 88ms reflections Neural Analysis “ten” with 133ms reflections Neural model Analysis – direct sound Add reverb at 2s RT -10dB D/R Add reverb at 1s RT -10dB D/r A slide from Asbjørn Krokstad - IoA,NAS Oslo 2008 [With permission] To succeed: [in bringing new audience into concert halls…] ENGAGING “Interesting” "Nice” [We need to make the sonic impression of a concert engage the audience – not just the visual and social perceptions. Especially since audiences are increasingly accustomed to recordings!] ENGANGEMENT, not NICE • At the IOA conference in Oslo, Asbjørn Krokstad (a musician, conductor, and Norway’s best-known acoustician) gave a lecture where he insisted that acousticians needed to provide engagement, not just pleasant music. – And not just for drama and opera, but for chamber music and symphony too. – At the end of the lecture he showed a picture of the Teatro Colón in Buenos Aires, Argentina. “Is this the concert hall of the future” he asked? • This hall is not a shoebox, but a large semicircular theater with a high ceiling. It ranks at the top in Beranek’s surveys, and the reverberation time is 1.6 seconds occupied. • Krokstad may have conducted there. • Engagement requires the independent perception of the direct sound • We must learn how to provide this essential element in halls. • I have been fortunate to hear several of the live broadcasts of the Metropolitan Opera in a good theater. For example, the performance of Salome: – The sound was harsh and dry – radio mikes coupled to directional loudspeakers. But you could hear every syllable of Mattila’s impeccable German. The performance was totally gripping! – This is the dramatic and sonic experience audiences increasingly demand. What is “Auditory Engagement” • “Engagement” is the perception that you are not just watching a scene from distance, but present in the middle of it. – Thus lack of distance is a critical component of presence. • Auditory engagement is the perception that you are acoustically close to the sound sources. – Distance is perceived directly through harmonic coherence – but experiments to directly measure it with subjects are difficult. However it correlates both with the ability to localize sound sources, and the perception presence, or musical clarity. – To perceive presence you must be able to localize sound sources nearly all the time, – and be able to distinguish them from one another nearly all the time. • Clear localization and the ability to hear most of the notes are key components of audience engagement. – Although particularly important in drama and opera, it should be (and often is not) a part of the emotional experience of music. – Being able to hear all the notes and localize the players draws the audience into the performance. They don’t just watch it. • This view of clarity is different from the one that equates clarity with intelligibility. Perhaps we need a new word for it. Barron on Localization “Raising the Roof” NATURE Vol 4531 12 June 2008 • “Much remains to be discovered about how our ears and brains process sound reflections. Understanding this has been complicated, for instance, by our remarkable ability to work out where a sound is coming from. This ability, called localization, works even when the sound arriving directly from the source represents only a small proportion of the total sound we receive, perhaps only 5% at the back of a concert hall. – [Without a visual reference precise localization is frequently not possible at this level of direct sound. With a visual reference we perceive what we do not hear.] • “Usually we are listening to speech or music, which have short elements such as syllables or notes that vary with time. Our brains use this time-varying information to extract where the initial sound comes from”. – [but to do this we MUST be able to detect and process the direct sound!] • “The downside of this localization is that, in effect, our hearing suppresses awareness of sound reflections. We notice early sound reflections but are often not conscious of their effects - such as making sound seem clearer than it would be otherwise.” [italics added] – [or less clear, as I believe is often the case. Barron is equating “clear” with “intelligibility” – but that is different than engagement. “I would rather the audience not hear the words than have the actors sound far away” – said a well known drama director in Copenhagen.] Experiment for threshold of Azimuth Detection in halls A model is constructed with a source position on the left, and another source on the right Source signal alternates between the left and a right position. When the d/r is less than about minus 13dB both sources are perceived in the middle. Subject varies the d/r, and reports the value of d/r that separates the two sources by half the actual angle. This is the threshold value for azimuth detection for this model (Above this threshold the subject also reports a decrease in subjective distance) Threshold for azimuth detection as a function of frequency and initial delay As the time gap between the direct As the time gap between notes increases sound and the reverberation (allowing reverberation to decay) the increases, the threshold for azimuth threshold goes down. detection goes down. (the d/r scale To duplicate the actual perception in small on this old slide is arbitrary) halls I need a 50ms gap between notes. An important caveat! • All these thresholds were measured without visual cues • The author has found that in a concert (with occasional visual input) instruments (such as a string quartet) are perceived as clearly localized and spread. • When I record the sound with probes at my own eardrums, and play it back through calibrated earphones the sound seems highly accurate, but localization often disappears! – Without visual cues when the d/r is below threshold the individual instruments are localized and spread when they play solo, but collapse to the center when they play together. – My brain will not allow me to detect this collapse when I am in the concert hall – even if I close my eyes most of the time! – With eyes closed it is more difficult to separate the sounds of the individuals, such as the second violin and the viola. This difficulty persists in the binaural recording. Localization • For this paper we assume sound sources are localized by the direct sound. – In some cases localization is aided by early reflections – but these vary strongly from seat to seat, and are too complex to consider here. • For localization to be successful the direct sound must be perceived. – Prompt strong reflections can – and do – mask the direct sound. • Let’s propose that the brain detects the loudness of – and the presence of – sounds by integrating nerve firings over a period of time. – If the integrated nerve firings from the direct sound exceed the integrated nerve firings from the reflections inside this time window, the direct sound will be perceived – and localized. • We can calculate the threshold of perception by double integrating the impulse response over a fixed time window. The ear perceives notes – not the impulse response itself. • Here is a graph of the ipselateral binaural impulse response from spatially diffuse exponentially decaying white noise with an onset time of 5ms and an RT of 1 second. This is NOT a note, and NOT what the ear hears! D/R = -10dB RT = 2s: C80 = 3.5dB C50 = 2.2dB IACC80 = .24 RT = 1s: C80 = 6.4dB C50 = 4.1dB IACC80 = .20 • To visualize what the ear hears, we must convolve this with a sound. – Let’s use a 200ms constant level as an example. • The nerve firings from the direct component of this note have a constant rate for the duration of the sound. • The nerve firings from the reverberant component steadily build up until the note ceases and then slowly stop as the sound decays. Direct and reverberation for d/r = -10dB, and RT = 1s The blue line shows the rate of nerve firing rate for a constant direct sound 10dB less than the total reverberation energy. The red line shows the rate of nerve firings for the reverberation, which builds up for the duration of the note. The black line shows a time window (100ms) over which to integrate the two rates. In this example the area in light blue is larger than the area in pink, so the direct sound is inaudible. Direct and build-up RT = 2s If we hold the d/r constant, when the reverberation time is two seconds it takes longer for the reverberation to build up, so the light blue area decreases, while the pink area stays constant. This makes the direct sound more audible. In a large hall the time delay between the direct sound and the reverberation also increases, further reducing the area in light blue. The direct sound would be even more audible. Equation for Localizability – 700 to 4000Hz • We can use this simple model to derive an equation that expresses the ease of perceiving the direction of direct sound as a decibel value. p(t) is the sound pressure of the ipselateral channel of a binaural impulse response. With the previous simple assumptions, we propose the threshold for detection would be 0dB, and clear localization would occur at a localizability value of +3dB. • Where D is the window width (~ 0.1s), and S is a scale factor: S is the zero nerve firing line in the previous two slides. It is 20dB below the maximum S 20 10 * log p(t ) dt 2 loudness. POS means ignore the negative .005 values for the sum of S and the cumulative log pressure. • Localizability (LOC) in dB = .005 D .005 S 1.5 10 * log p(t ) 2 dt (1 / D) * POS ( S 10 * log p(t ) 2 dt )d 0 0 .005 • The scale factor S and the window width D interact to set the slope of the threshold as a function of added time delay. The values I have chosen (100ms and -20dB) fit my personal data. The extra factor of +1.5dB is added to match my personal thresholds. Some explanation of the equation • The equation as written in the previous slide simply calculates the ratios of the pink and blue areas shown in the previous pictures. • The first integral on the left in LOC is the “pink” area – the sum of the nerve firings for the direct sound. This area is the product of the normalized sound pressure times the length of the window D. – However here we have divided through by D – so this factor is not shown. • The next two integrals represent the total nerve firings for the reverberation – the “blue” area. – Since we have divided by D, a factor of 1/D is included at the beginning. • The second of the two integrals is the physical sum of the sound pressure that would exist if the impulse response was convolved with a steady excitation. The first integral finds the area under this curve. In the second integral we have excluded the direct sound – assuming this will be in the first 5 milliseconds. • The limits of the integrals have been adjusted to account for this exclusion. Thus the second integral goes from .005 seconds to the end, and the first integral is from zero to the window width minus .005. • I have included the -1.5dB adjustment for my personal thresholds. Matlab code for LOC % load in a .wav file containing a binaural impulse response – filter it and truncate the beginning % ir_left is an ipselateral binaural impulse response, upper_scale =20; % 20dB range for firings %truncated to start at zero and filtered to 1000-4000Hz. % proposed box length box_length = round(100*sr/1000); % try 100ms % early_time is 5ms in samples, D is 100ms in samples. early_time = round(5*sr/1000); % here starts the equation on the slide: D = box_length; %the window width S = 20-10*log10(sum(ir_left.^2)); ir_left = data1; % the binaural IR ir_right = data2; early = 10*log10(sum(ir_left(1:early_time).^2)); clear data1 data2 % filter the Irs wb = [2*1000/sr 2*4000/sr]; % first integral is a cumsum representing the build up in [b a] = ellip(3,2,30,wb); %energy when the IR is excited by a steady tone: ir_left = filter(b,a,ir_left); ir_right = filter(b,a,ir_right); ln = length(ir_left); clear data1 data2 wb = [2*1000/sr 2*4000/sr]; log_rvb = 10*log10(cumsum(ir_left(early_time:ln).^2)); [b a] = ellip(3,2,30,wb); ir_left = filter(b,a,ir_left); % look at positive values of S+log_rvb only ir_right = filter(b,a,ir_right); for ix = 1:ln-early_time for il = 1:0.1*sr if S+log_rvb(ix) < 0 if abs(ir_left(il)) > 500 log_rvb(ix) = -S; break end end if abs(ir_right(il)) > 500 end break end LOC = S-1.5+early -(1/D)*sum(S+log_rvb(1:D-early_time)) end ir_left(1:il) = ; ir_right(1:il) = ; Use of the localization equation • Just as RT or C80, LOC uses a measured impulse response as an input, with the direct sound starting at time zero. This is the only data a user needs to supply. – The measure is calibrated for a front facing binaural impulse response. • An omnidirectional impulse response will give lower values of LOC for the same seat position, due to the lack of head shadowing. • The localization equation appears more complex than most current measures for room acoustics, but it has a simple, physiologically based interpretation. – It is the ratio in dB of the number of nerve firings received by the brain from the direct sound in a 100ms window, divided by the number of nerve firings received from all reflections in the same time period. – It contains three experimentally based parameters: the window width D, the dynamic range of the nerve channels S, and the time window for separating direct sound from reflections (5ms). These parameters are not intended to be adjustable without further experimental work. – Matlab code for calculating LOC is simple, and available from the author. Interpretation of LOC • LOC was developed and verified as a method for predicting when a sound will be accurately localized when the direct sound is much lower in total energy than the sum of all reflections. • Like C80, IACC80, and similar measures, LOC is based on a time window that begins with the onset of the direct sound. – In practice, syllables or notes that will be affected by any of these measures will depend on the rise time (onset time) of the sound. – If the sound starts gradually the precise moment of onset becomes indeterminate, and separating direct sound from reflections becomes impossible. – Thus LOC – and other such measures – are accurately predictive only for signals with sharp onsets. – Additionally, if the direct sound from a note or syllable is masked by reverberation from a previous sound, the direct sound will not be audible. • LOC predicts the audibility of the direct sound for a syllable or note with a rapid rise-time when there is sufficient freedom from masking from previous sounds. – Although musical signals often do not meet these criteria, in practice there are enough occasions that do meet the criteria that the LOC equation is useful. • Remember that for the purposes of this talk Localization is only a proxy for the main goal – predicting when the direct sound is sufficiently audible to produce engagement. – Preliminary results suggest LOC achieves this goal. Localization Equation Setup • The Localization Equation was developed and tested using binaural impulse response generated using the author’s own HRTFs. – The source position was 15 degrees to the left (and right) of center. Only the ipselateral channel was analyzed. – Male speech alternated from left to right with a time gap of 400ms, to allow for complete decay of the reverberation between each word. – The reverberation was generated using an independent decaying noise signal convolved with each of 54 HRTFs spaced equally around the listening position. – The HRTFs were equalized so that the azimuth zero elevation zero HRTF was flat from 40Hz to about 4kHz. The elevation notch at 7.8kHz was not equalized away, but was left in place. – Playback was done through headphones equalized to match a loudspeaker placed in front of the listener – again not equalizing the 7.8kHz notch from the listener’s frontal HRTF of the loudspeaker. • Because my data show that the perception of both localization and near/far is mostly a high frequency phenomenon, the impulse response was bandpass filtered between 700Hz and 4000Hz before being analyzed for localization. – If a measured binaural impulse response is used as an input, care should be taken to insure the dummy head is equalized as described above. – Because of the importance of upward masking in localization, if the low frequencies in the room signal are significantly stronger than those in the frequency range from 700 to 4000Hz, localization is likely to be poorer than the equation would predict. Comments on LOC – LOC is based on the LOG of the build-up of reverberant energy. • This follows directly from the physiological model. • Current measures integrate the sound energy rather than the log of sound energy. But our physiology works differently. One of the consequences is that reflections that arrive early have more influence than reflections that arrive later. – As energy builds up additional reflections are not counted as strongly. • Reflections later than 100ms are ignored in calculating LOC. – This is very different from C80 or C50, which count the earliest reflections a part of the direct sound, and compare the energy sum to the energy sum of all the later reverberation. • In a small hall most of the energy arrives before 80ms regardless of the relative strength of the direct sound, so C80 and C50 are usually high. • But small halls can have high C80 or C50, poor localization, and a lack of clarity. – LOC depends strongly on the delay between the direct sound and the build-up of the reverberation. • late reverberation does not impair localization of short notes. • The principle difference between the localizability in small halls and large halls is the rate at which reflected energy builds up after the start of a note. – LOC is NOT related to EDT – even if Jordan’s original definition of EDT is used. • EDT is relatively independent of the initial time delay • When D/R < -10dB, EDT and RT are the same, as there is insufficient direct sound to be detected in a reverse integrated impulse response. – LOC correlates with IACC80 – but IACC is not sensitive to medial reflections. • IACC is sensitive to the sum of reflected energy – not the log of energy, and thus is insensitive to when the reflections arrive Tests with speech A speech signal was convolved with a pair of binaural impulse responses, such that the sound appears to come from +-15 degrees from the front. Then a fully spatially diffuse reverberation was added, in such a way as the D/R, the RT, and the time delay before the reverberation onset could be varied. Broadband Speech Data Blue – experimental thresholds for the alternating speech with a 1 second reverb time. Red – the threshold predicted by the localization equation. Black – experimental thresholds for RT = 2seconds. Cyan – thresholds predicted by the localization equation. Threshold Data from Other Subjects – 1s RT Blue – new data using absence of any localization as a criterion for threshold. Red – the author’s previous data based on a half-angle criterion. • Seven subjects participated in a threshold experiment at Kyushu University. – In these experiments the threshold was defined by the extinction of localization , not by the reduction of angle by a factor of two. – Consequently the thresholds are lower than they were in my previous experiment, and they have more variation. • However, the data is consistent to within 3dB Threshold data in Japan, 2s RT Cyan – the authors data with a half-angle criterion for threshold • When the RT was raised to 2s the subjects had great difficulty with determining the point of extinction, which appeared to be defined differently for each subject. – There is clearly more spread in the data, and for some subjects the effect of added delay is reduced. – The criterion of reducing the apparent separation by a factor of two seems to give more reliable results. Tests with Music – and the difference between localization and engagement • The gaps between words in the speech selection were deliberately chosen so there would be no masking of the direct sound from reverberation. This is NOT the case in real music. • Tapio Lokki kindly made anechoic music recordings available on the web. I used the violin1, violin2, cello, and viola tracks from the Mozart selection to form a string quartet. After a lot of noise reduction and balancing it worked quite well. • In music the direct sound of succeeding notes is frequently masked by reverberation from the previous notes. • When you first listen to the string quartet at low values of D/R localization is impossible, and all the instruments clump together in the middle of the sound field. – But if the value of the localization equation is above 0dB the localization is not always masked, and given time the brain can localize each instrument. Succeeding notes with the same timbre are localized to the correct position. Thus given time, about two minutes for me, the presence equation predicts the localization threshold. • But it does NOT predict the sense of engagement. You can localize sounds (sort of sometimes) but the music is not clear, and the instruments seem far away. (Here is where we need harmonic coherence.) – A value of the LOC equation of +3dB does predict engagement! Difficulties with the music tests • Because localization of sound sources with music depends so strongly on masking, experiments to determine localization threshold and the threshold of engagement are difficult to perform. • When you first start to listen the localization threshold is as much as 5dB higher than will be achieved after a few minutes of listening. – This is why many (even most) concert halls can give the impression of localization, but lack the sense of engagement. • I found that the adaptation process could be speeded by turning off the reverberation and just listening for 10 seconds or so to the direct sound alone. This teaches the brain where to expect to hear the sound of each instrument. When you turn the reverberation on, sounds of the same timbre will be perceived in the correct location. • The same process occurs in concerts where the visual image is present. The eyes train the brain where to expect each sound – and this is where we hear it. • But such a visually constructed sonic image DOES NOT produce the impression of engagement! Results of music experiments • I have a lot of data on the music experiments – because of the adaptation problem it is not as consistent as I would like. • But the results are easy to summarize: • Sufficient localization and musical clarity result for the Mozart string quartet at values of the localization equation of +3dB or higher. • These values are very seldom achieved in modern concert halls (or opera houses.) They ARE achieved in Boston Symphony Hall over a wide range of seats, and in a number of other old houses. – The reasons for the lack of success in modern halls will be discussed in the remainder of this talk • Old opera houses (with their surplus of velvet) achieve these values easily – but lack the late reverberation which is so popular these days. – Some opera fans – including myself – would rather have the dramatic intensity of the old halls, even without the reverberation. – This is the sound for which the operas were written. Direct sound and Envelopment • Recent work by the author in both experiments with several subjects, and in live lecture demonstrations with loudspeakers, have shown that the sense of both reverberance and envelopment increases when the direct sound is audible. – Where there is no perceivable direct sound the sound can be reverberant, but comes from the front. – When the direct sound is above the threshold of localization the reverberation becomes louder and more spacious. • Envelopment and reverberance are created by late energy – at least 100ms after the direct sound. • When the direct sound is inaudible the brain cannot perceive when a sound has started. – So effectively the time between the onset of the direct sound and the reverberation is reduced, and less reverberation is heard. – In the absence of direct sound syllabic sound sources (speech, woodwinds, brass, solo instruments of all kinds) are perceived as in front of the listener, even if reflections come from all around. • The brain will not allow the perception of a singer (for example) to be perceived as all around the listener. • In addition, Barron has shown that reverberation is always stronger in front of a hall than in the rear – so in most seats sound decays are perceived as frontal. – But when direct sound is separately perceived, the brain can create two separate sound streams, one for the direct sound (the foreground) and one for the reverberation (the background). • A background sound stream is perceived as both louder and more enveloping than the reverberation in a single combined sound stream. Part 2 - Main Points • The ability to hear the Direct Sound – as measured by LOC – is a vital component of the sound quality in a great hall. – The ability to separately perceive the direct sound when the D/R is less than 0dB requires time. When the d/r ratio is low there must be sufficient time between the arrival of the direct sound and the build-up of the reverberation if engagement is to be perceived. • Hall shape does not scale – Our ability to perceive the direct sound – and thus localization, engagement, and envelopment - depends on the direct to reverberant ratio (d/r), and on the rate that reverberation builds up with time. – Both the direct to reverberant ratio (d/r) and the rate of build-up change as the hall size scales – but human hearing (and the properties of music) do not change. – A hall shape that provides good localization in a high percentage of 2000 seats may produce a much lower percentage of great seats if it is scaled to 1000 seats. – And a miniscule number of great seats if it is scaled to 500 seats. Diffusing elements do not scale • The audibility of direct sound, and thus the perceptions of both localization and engagement, is frequency dependent. Frequencies above 700Hz are particularly important. – Frequency dependent diffusing elements can cause the D/R to vary with frequency in ways that improve direct sound audibility. – The best halls (Boston, Amsterdam, Vienna) all have ceiling and side wall elements with box shape and a depth of ~0.4m. • These elements tend to send frequencies above 700Hz back toward the orchestra and the floor, where they are absorbed. (The absorption only occurs in occupied halls – so the effect will not show up in unoccupied measurements!) • The result is a lower early and late reverberant level above 700Hz in the rear of the hall. • This increases the D/R for the rear seats, and improves engagement. – The LOC equation is sensitive to all reflections in a 100ms window, which will include many second-order reflections, especially in small halls. • Replacing these elements with smooth curves or with smaller size features does not achieve the same result. – Some evidence of this effect can be seen in RT and IACC80 measurements when the hall and stage are occupied. • Measurements in Boston Symphony Hall (BSH) above 1000Hz show a clear double slope that is not visible at 500Hz. – The hall has high engagement in at least 70% of the seats. We need better measures • Current acoustic measures ignore both the D/R and the time gap between the direct (the first wavefront) and the reverberation. – RT, C80, and EDT all ignore the strength of the direct sound and the effects of musical style on the audibility of the D/R. – IACC comes close, but measures something different. • LOC is an attempt to supply a simple measure for a basic human perception which depends on direct sound. – But impulse response measurements under occupied conditions are notoriously difficult to obtain. • We need measures that use binaural recordings of actual performances as inputs. – And the ability to listen to these recordings to test the validity of these measures against the true experience. – Methods for recording and reproducing binaurally will be discussed in the next paper – We are working on ways to measure LOC from such recordings. Why do large halls sound different? • In Boston Symphony Hall (BSH), and the Amsterdam Concertgebouw (CG) the reverberation decay is nearly identical, but the halls sound different. – The difference can be explained using the same model that was used to develop LOC. – Lacking good data with an occupied hall and stage I used a binaural image-source model with HRTFs measured from my own eardrums. Reverberation build-up and decay – from models Amsterdam Boston LOC = LOC = +6dB 4.2dB The seat position in the model has been chosen so that the D/R is -10dB for a continuous note. The upward dashed curve shows the exponential rise of reverberant energy from a continuous source assuming exponential decay with no time gap. The solid line shows the build up and decay from a short note of 100ms duration. Note the actual D/R for the short note is only about -6dB. The initial time gap is less in Boston than Amsterdam, but after about 50ms the curves are nearly identical. (Without the direct sound they sound identical.) Both halls show a high value of LOC, but the value in Amsterdam is significantly higher – and the sound is clearer. Comparisons of C80, C50, IACC80, and LOC • Conventional measures for the models of Amsterdam Concertgebouw and Boston Symphony Hall give the following results: • Amsterdam: C80 = .43dB, C50 = -2.8dB, IACC80 = .38, LOC = +6dB • BSH: C80 = .65dB, C50 = -2.1dB, IACC80 = .22, LOC = +4.2dB • Half-Size BSH: C80 = 3.7, C50 = 1.7, IACC80 = .15, LOC = 0.5dB • Only the IACC80 shows that Amsterdam might have more direct sound than Boston. The standard Clarity measures predict the opposite – and predict that the small hall would have high clarity, and it does not. • But IACC80 is sensitive only to lateral reflections. Strong reflections from the front, overhead, or rear do not affect IACC. • An IACC of 0.22 would usually be considered low. In spite of this BSH has both clarity and good localization in this seat. Smaller halls • What if we build a hall with the shape of BSH, but half the size? – The new hall will hold about 600 seats. – The RT will be half, or about 1 second. – We would expect the average D/R to be the same. Is it? How does the new hall sound? – If the client specifies a 1.7s RT will this make the new hall better, or worse? Half-Size Boston The gap between the direct and the reverberation and the RT have become half as long. Additionally, in spite of the shorter RT, the D/R has decreased from about -6 in the large BSH model, to about -8.5 in the half-size model. LOC =0.5 This is because the reverberation builds-up quicker and stronger in the smaller hall. The direct sound, which was distinct in more than 50% of the seats in the large hall will be audible in fewer than 30% of the seats in the small hall. If the client insists on increasing the RT by reducing absorption, the D/R will be further reduced, unless the hall shape is changed to increase the cubic volume. The client and the architects expect the new hall to sound like BSH – but they, and the audience, will be disappointed. As Leo Beranek said about the Berlin Philharmonie: “They can always sell the bad seats to tourists.” An existing small hall – pictures Note the highly reflective stage and side walls, deeply coffered ceiling, and relatively low internal volume per seat. The sound in many seats is muddy. Adding reflections or decreasing absorption only increases the muddiness. Hall data • The pictures show a recital hall of 65000 cubic feet (1840 cubic meters). Designed for 350 seats, it has currently 300 seats, giving a volume/seat of 6 cubic meters. There is 1400 square feet of carpet under the seats on the floor. • Reverberation Time (RT) unoccupied is 1.1 seconds from 1000Hz to 63Hz. C80, dominated by the reverberation time, is ~+5dB everywhere. • The parallel side walls of the stage provide little diffusion. • The hall is generally liked by the audience and players, although there are reports of loudness and balance problems on stage. • Musicians desire more resonance and greater clarity in the middle of the hall. Experiments with absorption and acoustic enhancement • Measurements and experiments involving various combinations of fiberglass panels and electronic reverberation enhancement were conducted in January 2009. – Measurements were made with three loudspeakers, three dummy heads, and a Soundfield microphone. – All musical performances were recorded with the same microphones, and with an array of close microphones on stage. • About 30 musicians participated, including faculty, staff, students from all three divisions, and musicians from the wider community. • The goal was to improve the instrumental balance on stage, reduce excess stage loudness, and to increase resonance and the ability to hear individual instruments throughout the hall. • With both panels and enhancement in place comments from the participants were favorable. Players and singers found balancing with piano was easier, and the middle registers of the piano were more easily heard both by the musicians and in the hall. The absorptive curtains at the rear of the stage could be rapidly withdrawn. The blankets that simulated audience could be removed in 5 minutes, along with the panels on stage. This allowed prompt A/B comparisons. Some of the 25 LARES enhancement speakers are visible Results from the experiments • The experiments in January showed that adding fiberglass panels around the stage increased clarity and the ability to localize instruments in the hall, raising the measured value of LOC from an average of minus 1.5dB to +3dB or more. • Localization and clarity in the balcony were additionally improved by adding panels to the upper audience right side wall, which eliminated the strong lateral reflection from that surface. – The lower surface of this wall was already absorptive. • The electronic enhancement successfully compensated for the loss of resonance due to the panels. Without the enhancement the perceived resonance was reduced. • In a subsequent experiment with a violin-piano combination and no enhancement we found that just 12 fiberglass panels each 2’x6’x2” around the bottom of the stage noticeably improved the clarity on the floor of the hall, and also improved the balance for the players on stage. For this music the reduced resonance was not a problem. – These panels absorbed the first-order reflection from the back of the stage, which has the highest level and the shortest time delay. Absorbing this reflection contributed strongly to increasing LOC. Usefulness of the measure LOC • LOC informs us that the primary contribution to difficulty in localization are the first strong reflections, regardless of the direction they come from. • We initially thought that since the floor of the hall is not a significant source of these reflections, it is would be likely that removing the carpet under the seats would raise the RT without decreasing LOC significantly. – However LOC is also sensitive to reverberation which arrives before 100ms, and this would be increased by removing the carpet. – A few later experiments suggest that removing the carpet will increase the reverberant level sufficiently to eliminate the improvement in LOC provided by the absorption on stage. • The existence of a LOC as a physical measure can help to answer these questions in advance – or at least suggest that an experiment is needed before drastic alterations are undertaken. Small shoebox halls can be OK • If the client insists on a shoebox it can work by building a large hall and installing a small number of seats. – I was just in such a small hall in Helsinki, and at least half the seats were OK. • But this is not the ideal solution. – With a different shape nearly all the seats could have been OK – and it might have been less expensive. Great Small Halls Exist! Jordan Hall at New England Conservatory has 1200 seats, an RT of 1.3s fully occupied. The shape is half-octagonal, with a high ceiling. The audience surrounds the stage, with a single high balcony. The average seating distance is much shorter than a shoebox hall, increasing the direct sound. The high internal volume allows a longer RT with low reverberant level. The sound in nearly every seat is clear and direct, with a marvelous surrounding reverberation. Although the hall is renowned as a chamber music hall, it is also ideal for small orchestras and choral performances. It was built around 1905. The hall is in constant use – with concerts nearly every night, (and many afternoons.) Williams Hall, NEC • Williams hall, in the same building, has ~350 seats in a square plan with a high ceiling. • The sound from a piano sound is clear and reverberant in most, if not all, seats. (The audience usually sits where the orchestra is rehearsing in this picture.) The square plan keeps the average seating distance low. The high ceiling and high single balcony provides a long RT without a high reverberant level. The absorbent stage eliminates strong reflections from the back wall. By absorbing at least half the backward energy from the musicians, the stage increases the d/r. Note the coffered ceiling – similar to BSH. Hard learned lessons • Where clarity is a problem in small halls, acousticians usually recommend adding early reflections – through a stage shell, side reflectors, etc. – We tried this in the small hall experiments mentioned above. The sound became louder and less clear. Just the opposite of what was needed. • These measures reduce the gap between the direct sound and the reflected energy and decrease LOC. – They increase loudness – which is usually already too high, while increasing the sense of distance to the performers. – A better way is to add absorption, or perhaps means of deflecting the earliest reflections to the ceiling, or into the front of the audience where they can be absorbed. • Re-direction tricks of this nature do not work well in small halls, as the second and third order reflections they create will arrive within the 100ms window that determines LOC. – Small halls have strong direct sound and too many early reflections The reflections also come too quickly. Adding more reflections is exactly the wrong thing to do. – Adding absorption will improve clarity but reduce the late reverberant level and the RT. Electronics, or more cubic volume, can restore the longer RT without decreasing the D/R • In practice, not everyone is aware of, or appreciates, engagement. It is mostly a subconscious perception. Reverberation or resonance is immediately apparent to everyone – which is why it has become so over emphasized in hall design. – Adding absorption may not be appreciated by everyone unless the decrease in late reverberation can be compensated. – Such compensation can be surprisingly easy. Adding a few tenths of a second to the reverberation time of a small hall can be accomplished electronically with very few loudspeakers. The result is completely transparent. In the best halls the reverberant level is lower than would be expected from classical acoustics • D/R is frequency dependent in halls, and frequencies above 700Hz are particularly important for engagement. – Surface features can be used to decrease the reflected energy level in the rear of the hall at higher frequencies. • In addition, the distribution of absorption in a hall significantly alters the distribution of the reflected energy. – In a good hall absorption is highly non-uniform. A high ceiling with a lot of reflecting surfaces above the audience can increase RT without increasing the reflected energy level near the audience. The reverberation created tends to stay up near the ceiling. – This helps to keep the D/R above ~700Hz constant over a large number of seats. – Current modeling techniques may not properly calculate these effects. • Old fashioned light models might work better… Light models I ran across these pictures while cleaning out my office. The top one is a too-simple model of the Philadelphia Academy of Music. The bottom is intended to be BSH, but with a single balcony. I abandoned light modeling because it does NOT provide any information about the time delay gap – nor information about the effects of note length on D/R. But it DOES provide information about the total reverberant energy compared to the direct. And very complex hall shapes can be quickly modeled. Hall Shapes as a function of size Above threshold Near threshold Below threshold It is better to use a design A large hall like Boston If this hall is reduced in that reduces the average has many seats above size while preserving seating distance, using a threshold, and many the shape, many seats high ceiling to increase that are near threshold are below threshold volume. Boston is blessed with two 1200 seat halls with the third shape, Jordan Hall at New England Conservatory, and Sanders Theater at Harvard. The sound for chamber music and small orchestras is fantastic. RT ~ 1.4 to 1.5 seconds. Clarity is very high – you can hear every note – and envelopment is good. Retro reflectors above 1000Hz Boston, Amsterdam, and Vienna all have side-wall and ceiling elements that reflect frequencies above 1000Hz back to the stage and to the audience close to the stage. This sound is absorbed – reducing the reverberant level in the rear of the hall without changing the RT. Another classic example is the orchestra shell at the Tanglewood Music Festival Shed, designed by Russell Johnson and Leo Beranek. Many modern halls lack these useful features!!! High frequency retro reflectors Rectangular wall features scatter in three dimensions – visualize these with the underside of the first and second balconies. High frequencies are reflected back to the stage and to the audience in the front of the hall. The direct sound is strong there. These reflections are not easily audible, but they contribute to orchestral blend. But this energy is absorbed, and thus REMOVED from the late reverberation – which improves clarity for seats in the back of the hall. Examples: Amsterdam, Boston, Vienna High frequency overhead filters A canopy made of surfaces separated by some distance becomes a high frequency filter. Low frequencies pass through, exciting the full volume of the hall. High frequencies are reflected down into the audience, where they are absorbed. Examples: Tanglewood Music Shed, Davies Hall San Francisco In my experience (and Beranek’s) these panels improve Tanglewood enormously. They reduce the HF reverberant level in the back of the hall, improving clarity. The sound is amazingly good, in spite of RT ~ 3s. In Davies Hall the panels make the sound in the dress circle and balcony both clear and reverberant at the same time. Very fine… (But the sound in the stalls can be loud and harsh.) The necessity of occupied measurements • The effects of frequency dependent reflecting elements depends on the presence of absorption on the stage and the front of the audience. • Measuring the halls without absorption in these areas will not detect these vital effects. • In addition, engagement is highly dependent on the D/R ratio – and this is also not correctly measured in an unoccupied hall. • Thus measurement of localization and engagement requires that both hall and stage be occupied! – Impulse response measurements under these conditions are difficult to obtain. • Measures for engagement or localization which use binaural recordings of live music or speech would be highly desirable. – I believe this is possible. If you can hear an effect, you can measure it. We only need to figure out how to do it. Binaural Measures The author has been recording performances binaurally for years. Current technology uses probe microphones at the eardrums. We can use these recordings to make objective measurements of halls and operas. The methods use a hearing model where the binaural signal is first filtered into 1/3 octave bands, and then is rectified and filtered. For measures of localization a running IACC is calculated in 10ms overlapping windows. The maximum values of 1/(1-IACC) are then plotted as a surface over time and frequency band. Localization The figure shows the number of times per second that a solo violin can be localized from row 4 of a small shoebox hall (~500 seats) near Helsinki. It also shows the perceived azimuth of the violin As can be seen, the localization – achieved at the onsets of notes – is quite good, and the azimuth, ~10 degrees to the left of center, is accurate. Localization – surface1 Here we plot the same data for the violin as a function of (inverse) azimuth, and the third octave frequency band. As can be seen, for this instrument the principle localization components come at about 1300Hz. Interestingly, Human ability to detect azimuth, as shown in the threshold data, may be maximum at this frequency. Localization, Surface 2 Here we plot 1/(1-IACC) as a function of time and third octave band. Note that the IACC peaks at the onset of notes can have quite high values for a brief time. This happens when there is sufficient delay between the direct and the reverberation, and sufficient D/R. Localization – a poor seat Here is a similar diagram for a solo violin in row 11 of the same hall. The sound here is unclear, and the localization of the violin is poor. As can be seen, the number of localizations per second is low (in this case the value really depends on the setting of the threshold in the software). Perhaps more tellingly, the azimuth detected seems random. This is really just noise, and is perceived as such. Measures based on harmonic coherence • In the absence of reflections the formant frequencies above 1000Hz are amplitude modulated by the phase coherence of the upper harmonics. This modulation is easily heard, creating the perception of “roughness” (Zwicker). – Reflections randomize the phase of these harmonics. • The result is highly audible, and is a primary cue for the distance of an actor, singer, or soloist. • This effect can be measured with live recordings, and is sensitive both to medial and lateral reflections. This graph shows the frequency and amplitude of the amplitude modulation of a voice fundamental in the 2kHz 1/3 octave band. The vertical axis shows the effective D/R ratio at the beginning of two notes from an opera singer in Oslo to the front of the third balcony (fully occupied.) The sound there is often muddy, but the fundamental pitch of this singer came through strongly at the beginning of these two notes. He seemed to be speaking directly to me, and I liked it. Another singer From the same seat the king (in Verdi’s Don Carlos) was not able to reach the third balcony with the same strength. Like the localization graph shown in a previous slide, this graph seems to be mostly noise. The fundamental pitches are not well defined. The singer seemed muddy and far away. His aria can be heart-rending – but here it was somewhat muted by the acoustics. We were watching the king feel powerless and forlorn. But we were not engaged. Some demos of eardrum recordings • These recordings have been equalized for loudspeaker reproduction. You may be able to judge clarity and intelligibility over near-field loudspeakers. – Accurate headphone reproduction requires headphone equalization – If probes are available the method described here will work, – A method which uses equal loudness curves will be described later in this paper. • opera balcony 2, seat 11 – Moderate intelligibility, reverberant sound. – OK for non-Italian speakers with subtitles • opera balcony 3, seat 12 – Poor intelligibility, very reverberant • opera standing room – Deep under balcony 2 – good intelligibility – This was preferred by Italian speakers • A concert hall – row 8 (quite close) – Very good sound. Not so good further back. Conclusions • Performance venues should maximize engagement over a wide range of seats, not search for ideal values of RT or C80. To achieve this goal the direct sound must be perceived by the brain as distinct from the reflected energy – and this includes early reflections from all directions. – Engagement is largely a monaural perception – a spatial property that is sensitive to medial reflections. It can be heard with only one ear (and measured with one microphone). – But it is essential to measure with both hall and stage OCCUPIED! • The perception of reverberance and envelopment also depends on the audible presence of direct sound. • The audiblity of direct sound depends on the D/R ratio above 700Hz, and the time delay of reflections in the first 100ms. – Hall sound can often be improved by frequency dependent reflecting elements. • The optimum value for the D/R ratio depends on the hall size – – The D/R ratio must increase as hall size is reduced if clarity, localization, and the sense of envelopment is to be maintained. – D/R and engagement can be increased by decreasing the average seating distance, decreasing the reverberation time, increasing the hall volume, or by careful use of rectangular diffusing elements. – This is particularly true in opera houses and halls designed for chamber music. – A 1.8 second reverberation time is NOT necessarily ideal in a 1000 seat hall!!! Remember that changes in reverberant LEVEL (D/R) and initial time delay are more audible than changes in RT. • To maintain clarity, low sonic distance, azimuth detection and envelopment in a small hall (and many large halls) it is desirable to reduce the average seating distance, and widely diffuse or absorb the earliest reflections, whether lateral or not. – The best small halls do this already. • Most current hall measurements ignore both the D/R and the time delay between direct sound and reverberation. – LOC is an attempt to rectify this lack. Measures that use musical signals as input are being developed. They need to be used if the success rate of current hall design is to be improved.