Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>

Automated Accident Detection in Intersections via Digital Audio by dyr60218

VIEWS: 29 PAGES: 17

									           Automated Accident Detection in Intersections
              via Digital Audio Signal Processing


                                          Lori Mann Bruce
                                     Navaneethakrishnan Balraj
                          Department of Electrical and Computer Engineering
                                    Mississippi State University,
                                        Starkville, MS, 39762
                                       Phone: (662) 325-8430
                                        Fax: (662) 325-2298
                                   Email: bruce@ece.msstate.edu


                                               Yunlong Zhang
                                                Qingyong Yu
                                       Department of Civil Engineering
                                        Mississippi State University,
                                           Starkville, MS, 39762
                                           Phone: (662) 325-9838
                                            Fax:(662) 325-7189
                                       Email: zhang@engr.msstate.edu




   Total number of words: 4616 (text)+250*6(tables and figures) = 6116




TRB 2003 Annual Meeting CD-ROM                                           Paper revised from original submittal.
   Lori Mann Bruce, Navaneethakrishnan Balraj, Yunlong Zhang, Qingyong Yu                                 1




   ABSTRACT
   In this paper, the authors design a system for automated traffic accident detection in intersections. The
   input to the system is a 3 second segment of audio signal. The system can be operated in two modes: two-
   class and multi-class. The output of the two-class mode is a label of “crash” or “non-crash”. In the
   multi-class mode of operation, the system identifies crashes as well as several types of non-crash
   incidents, including normal traffic and construction sounds. The system is composed of three main
   signal processing stages: feature extraction, feature reduction, and classification. Five methods of feature
   extraction are investigated and compared; these are based on the discrete wavelet transform, fast Fourier
   transform, discrete cosine transform, real cepstral transform, and mel frequency cepstral transform.
   Statistical methods are used for feature optimization and classification. Three types of classifiers are
   investigated and compared; these are the nearest mean, maximum likelihood, and nearest neighbor
   methods. The results of the study show that the wavelet-based features in combination with the maximum
   likelihood classifier is the optimum design. The system is computationally inexpensive relative to the
   other methods investigated, and the system consistently results in accident detection accuracies of 95% to
   100%, when the audio signal has a signal-to-noise-ratio of at least 0 decibels.

   KEY WORDS
   Accident detection, Audio, Acoustic, Wavelet transform, Statistical classifier




TRB 2003 Annual Meeting CD-ROM                                              Paper revised from original submittal.
   Lori Mann Bruce, Navaneethakrishnan Balraj, Yunlong Zhang, Qingyong Yu                                 2




   I. INTRODUCTION
   Traffic incidents are a major cause of congestion on the nation's highways. Approximately 50 to 60
   percent of the delay on urban freeways is associated with incidents [1, 2]. It is estimated that incident-
   related congestion will cost the U.S. public $75 billion in lost productivity and 8.4 billion gallons of
   wasted fuel in the year 2005 [3]. Among various types of traffic incidents, vehicle accidents play a
   significant role in causing non-recurring congestion. Thus, a key strategy for reducing delay and adverse
   impact of accidents on traffic in major urban areas is to handle accidents as quickly as possible to keep
   traffic flowing. It is clear that an accident has to be detected and verified before any other incident
   management actions can be taken to guarantee the success of any incident management process. Timely
   and accurate accident detection becomes more important in incident management.
            On urban surface streets, the majority of traffic accidents occur at or near intersections. While
   freeway incident detection research can be dated back to the early 1960's, accident detection research on
   surface streets has been lagging, and needs more attention.
            This paper describes the design and testing of a system for automated accident detection at
   intersections. The system takes advantage of audio signals to recognize accidents from the ambient noise.
   The system is comprised of feature extraction, feature reduction, and classification. In particular, the
   authors investigate the use of the discrete wavelet transform for feature extraction, an approach that has
   proven successful in recognition of speech signals and computer vision [4]. This model is specially
   developed to classify audio signals into crash and non-crash categories.

   II. BACKGROUND
   Various types of sensors have been used for the detection of traffic accidents, including video cameras,
   ultrasonic transmitter/receivers, and microwave transmitter/receivers [5, 6, 7]. Typically with video
   images, researchers use methods such as object segmentation and motion estimation to track moving
   vehicles. These methods are used to detect incidents that have caused the flow of traffic to stop.
   Recently, researchers have developed an algorithm that uses Gabor wavelet analysis and the Mallat
   wavelet transform for fast, accurate vehicle tracking [5]. The image flow vectors can be quickly
   estimated using low spatial resolution data, and vehicles can be accurately detected providing high spatial
   resolution results. Even in complex motion scenarios, the system could detect small, disconnected objects
   of arbitrary numbers.
            Audio sensors, such as simple microphones, can be a cost effective, computationally efficient
   alternative. Researchers have utilized road-side microphones to collect audio signals which were then
   used to detect passing by vehicles, i.e. vehicle counters [8, 9]. Chen et al. developed an algorithm using
   cross-correlation for the analysis of audio signals collected from microphone arrays [10]. The algorithm
   predicted road traffic conditions, including traffic speed and density. Vehicle classification has also been
   accomplished using audio signals. Nooralahiyan et al. developed vehicle classification system that
   utilized linear predictive coding along with a time delay neural network [11]. The system classified audio
   signals from passing-by vehicles according to four basic vehicle types.
            Recently, Harlow and Wang utilized audio signals to detect traffic accidents in intersections [12].
   The researchers developed a two phase automated detection system. First, mel frequency cepstral




TRB 2003 Annual Meeting CD-ROM                                              Paper revised from original submittal.
   Lori Mann Bruce, Navaneethakrishnan Balraj, Yunlong Zhang, Qingyong Yu                                    3




   coefficients were extracted from the signal and used to form a feature vector. Second, the vector was
   then input to a neural network for classification. The final output was a label of “crash” or “non-crash”.
            For non-stationary events, such as accident sounds in a traffic audio signal, wavelet analysis
   could prove advantageous. Wavelet transforms have been used as a preprocessing step, or feature
   extraction technique, with audio signals in non-traffic applications. These include human speech
   recognition [13] and industrial machinery monitoring [14]. The wavelet transform of a signal results in a
   set of detail and approximation coefficients. These coefficients provide a means of separating fine scale
   (very localized) behavior from large scale (global) behavior in the audio signals. With the wavelet
   transform of traffic audio signals, the detail coefficients, representing fine scale behavior, could be used
   to identify non-stationary incidents, such as traffic accidents. Likewise, the approximation coefficients,
   or large-scale behavior, could be used to identify normal background traffic sounds.
            In this paper, the authors investigate the use of wavelet transforms for automated traffic accident
   detection in intersections. The discrete wavelet transform (DWT) is used for feature extraction, and
   statistical methods are used for feature optimization and classification. The input to the system is a three
   second segment of audio signal. The system can be operated in two modes: two-class and multi-class.
   The output of the two-class mode is a label of “crash” or “non-crash”. A system block diagram is shown
   in Figure 1. In the multi-class mode of operation, the system identifies crashes as well as several types of
   non-crash incidents, including normal traffic and construction sounds. The wavelet-based features are
   compared to various other methods including real cepstral coefficients, mel frequency cepstral
   coefficients, fast Fourier transform coefficients, and discrete cosine transform coefficients. Furthermore,
   various classifiers are investigated and compared; these included the nearest mean, maximum likelihood,
   and nearest neighbor methods.
            A brief background of wavelet analysis is provided in the next section. In Section IV, the authors
   provide a detailed description of the methodologies utilized in the automated system. The authors also
   provide a description of the data collection protocol and the types of data used to test the system.
   Classification results are provided in Section V. Finally, conclusions are drawn and suggestions for
   future research are provided in Section VI.

   III. WAVELET TRANSFORMS
   Wavelet analysis is based on the idea of projecting a signal onto a set of basis functions. A set of wavelet
   basis functions, {ψ a , b (t )} , can be generated by shifting and scaling the basic or mother wavelet, ψ (t ) ,
   according to the following,
                                                                  1 t −b
                                                 ψ a , b (t ) =     ψ   ,                                      (1)
                                                                   a  a 
   where a > 0 and b are real numbers. The variable a is the scaling factor of a particular basis function,
   and b is the translation variable along the function’s range. When a > 1 , the functions are dilated and
                                                                      1
   when a < 1 , the functions are contracted. The coefficient,            , is included to normalize the energy of
                                                                      a
   the wavelets. All of the wavelets {ψ a , b (t )} generated by shifting and scaling the mother wavelet, ψ (t ) ,
   have the same basic shape.



TRB 2003 Annual Meeting CD-ROM                                                 Paper revised from original submittal.
   Lori Mann Bruce, Navaneethakrishnan Balraj, Yunlong Zhang, Qingyong Yu                                                      4




          All wavelet functions must oscillate, have an average value of zero, and have finite support. This
   “admissibility condition” can be represented by
                                                          ℑ(ψ (t ))
                                                     +∞                   2

                                                      ∫
                                                     −∞          s
                                                                              ds = 0 ,                                             (2)

   where ℑ(⋅) denotes the Fourier transform and s is the Fourier domain variable. An important property
   of many wavelet systems is the multiresolution analysis (MRA) property, where the decomposition of a
   signal is in terms of the resolution of detail signals [15]. If a wavelet basis satisfies the MRA criteria, its
   transform can be implemented via multiresolutional filter trees. This type of implementation is very
   useful because it allows for fast algorithms similar to the well-known fast Fourier transform.
            There exist many different types of mother wavelets and wavelet bases. The Haar wavelet is one
   of the simplest examples. The Haar wavelet is discontinuous, and it resembles a step function:
                                                          1, 0 ≤ t ≤ 1 / 2
                                                         
                                                ψ (t ) = − 1, 1 / 2 ≤ t ≤ 1 .                                                     (3)
                                                          0, otherwise
                                                         
           Also, a well-known family of wavelets was developed by Ingrid Daubechies [16], and they are
   generally referred to as Daubechies- n , where n is the “order” of the mother wavelet. The order
   corresponds to the regularity of the mother wavelet, and the Daubechies-1 wavelet is equivalent to the
   Haar wavelet. For this study, the authors investigate the use of many different mother wavelets, including
   the Coiflet, symlet, biorthogonal, and Daubechies families.
            The CWT, denoted by W f (a, b ) , of a function, f (t ) , with respect to the wavelet basis function,
   ψ a, b (t ) , can be defined as
                                                                 ∞
                                                W f (a , b ) =   ∫ f (t )ψ       a ,b   ( t ) dt ,                                 (4)
                                                                 −∞

   where the wavelet function ψ a, b (t ) is given by equation (1). For the CWT, the scale parameter, a , and
   the shift parameter, b , are specified as real numbers. Hence, the transform coefficients, W f (a, b ) , are
   continuous with respect to the variables a and b . For the DWT, the discrete wavelet basis functions are
   represented as
                                                                     −j

                                                ψ j , k (n ) = 2 2 ψ (2− j n − k ) ,                                               (5)
   and the wavelet coefficients are obtained by
                                                  W j , k = f (n ),ψ j , k (n ) .                                                  (6)

            Thus, the scales are a = 2, 4, 8, K ,2 j , K ,2 p . Note audio signal and wavelets are now functions
   of discrete time, n , rather than continuous time, t . In this project, f ( n) is the digitized audio signal of
   traffic noises.
            The DWT has been extensively used in the development of fast wavelet algorithms. The most
   common implementation of the DWT is the well-known dyadic filter tree [17]. The filter tree is
   composed of highpass and lowpass filters corresponding to the user’s selection of mother wavelet




TRB 2003 Annual Meeting CD-ROM                                                                   Paper revised from original submittal.
   Lori Mann Bruce, Navaneethakrishnan Balraj, Yunlong Zhang, Qingyong Yu                                     5




   function. At each stage, j , of the filter tree, a set of approximation, A j , and detail coefficients, D j , are
   produced, corresponding to the input signal’s large and small-scale behavior, respectively.

   IV. METHODOLOGIES

   A. Data Collection and Preprocessing
   Audio data was collected at intersections in Starkville, MS and Jackson, MS. Also, crash audio data was
   provided by Dr. Charles Harlow of Louisiana State University. The audio data was collected using a
   Sony TCD-D8 digital audio tape recorder. The sampling specifications were mono-channel, 22050 Hz
   sampling rate, and 8 bit resolution.. The signals were partitioned using a 3 second rectangular window;
   this resulted in each audio signal having a length of 66176 samples.
            The training and test data consisted of 27 crash, 8 pile drive, 2 brake, and 62 normal traffic
   signals. When the last three classes of data are combined, this paper refers to the data as “crash” and
   “non-crash”. Before any processing was conducted, all signals were normalized such that each signal’s
   maximum amplitude was one.
            In order to mimic practical scenarios, the crash signals were combined with the non-crash
   signals. Assume f nc (n) is the recorded non-crash signal, f c (n) is the recorded crash signal, α is a
                                                                   ~
   weighting variable. Then the new, realistic crash signal, f c (n) , was computed as
                                                ~
                                                f c (n) = f nc (n) + α ⋅ f c (n) .                                (7)
           The authors investigated the automated system’s sensitivity to crash proximity. It was assumed
   that crashes occurring nearer the microphone (directly in the intersection) would be louder in volume
   than crashes occurring farther away (in a street entering the intersection). To model the various
   scenarios, the weight α was varied. The resulting crash signal had a signal-to-noise-ratio (SNR) ranging
                                                                              ~
   from –50 decibels (dB) to +50dB. When computing the SNR of f c (n) , the “signal” component was
    f nc (n) and the “noise” component was f c (n) .

   B. Feature Extraction
   Various methods of feature extraction are investigated and compared. The first method of feature
                                                               r
   extraction is based on the DWT. A feature vector, F , is formed through computing the root-mean-
   square (RMS) energy of the wavelet detail coefficients at each scale, ED j and the RMS energy of the
   wavelet approximation coefficients at the final scale, EAM ,
                                           r
                                           F = [EAM    ED M     ED M −1 L ED1 ]T ,                                (8)

   where the superscript T denotes a vector transpose,
                                                                 A
                                                                PM −1
                                                          1
                                               EAM =
                                                         A         ∑ [AM (i)]2                                    (9)
                                                        PM         i =0
   and




TRB 2003 Annual Meeting CD-ROM                                                  Paper revised from original submittal.
   Lori Mann Bruce, Navaneethakrishnan Balraj, Yunlong Zhang, Qingyong Yu                                 6




                                                              PjD −1

                                                              ∑ [D (i)]
                                                    1                       2
                                             ED j = D                  j        ,                             (10)
                                                   Pj          i =0

   for j = 1,2,3, L , M . The constant M is the maximum wavelet decomposition level; PjD is the number
                                            A
   of detail coefficients at level j ; and PM is the number of approximation coefficients at level M .
   According to equation (8), after the DWT feature extraction, the data space dimension is reduced to
   M + 1 . The value of M is determined by the wavelet filter length and the original signal length. The
   most it can ever be is M = log 2 ( N ) , where N is the length of the original signal and the mother
   wavelet’s high-pass and low-pass filters have length 2, as is the case with the Haar mother wavelet.
   However, for longer mother wavelets, such as the higher order Daubechies or Biorthogonal mother
   wavelets, the maximum scale M can be much lower.
            By computing the RMS of the DWT coefficients at each scale, the features represent a scalar
   energy partitioning of the original signal. From a digital signal processing perspective, scale is similar to
   frequency. However, scale is not equivalent to frequency since the basis functions of the DWT are not
   necessarily sinusoids. Based on one’s selection of mother wavelet the scalar energy partitioning may be
   optimized. However, is this advantageous to simply using frequency analysis of the audio signals? To
   answer this question, the DWT approach was compared to feature extraction based on the fast Fourier
   transform (FFT).
            The second method of feature extraction was to simply use the magnitude of FFT coefficients as
   features. To fairly compare the DWT-based features with FFT-based features, the order of the FFT was
   varied. The order of the FFT was selected such that the number of resulting FFT coefficients, and thus
   features, was equivalent to the number of DWT-based features.
            The third method of feature extraction was based on the discrete cosine transform (DCT) [18].
   The DCT coefficients were used directly as the features. Likewise, the order of the DCT was varied such
   that the number of resulting DCT coefficients, and thus features, was equivalent to the number of DWT-
   based features. The DCT was investigated because it is similar to the FFT in that it is used for frequency
   analysis of the input signal. However, its resulting coefficients are real, where as the FFT’s coefficients
   are complex.
            The fourth and fifth methods of feature extraction were based on the cepstral transform [19].
   Cepstral analysis is based on homomorphic transformations, where logarithms are used to convert
   multiplicative terms into additive terms. Cepstral analysis has been utilized extensively in speech
   recognition, and more recently by Harlow and Yu for automated accident detection using audio signals
   [12]. Cepstral transforms require windowing of the input audio signal, length three seconds for this
   project. The real cepstral transform (RCT) utilizes Hamming windows to segment the input audio signal
   into 100msec intervals with 50msec overlap. For each signal segment, the RCT results in a set of ordered
   coefficients. Typically in speech processing, the first 12 to 20 coefficients are utilized [19]. In this
   project, the first ( M + 1) coefficients are used as features, since the DWT approach results in ( M + 1)
   features. Then for each signal segment, the RCT coefficients are used as features, and statistical feature
   reduction and classification is conducted. If a “crash” is detected within at least three of the signal
   segments, an overall classification of “crash” is assigned to the three second signal. Note that this



TRB 2003 Annual Meeting CD-ROM                                              Paper revised from original submittal.
   Lori Mann Bruce, Navaneethakrishnan Balraj, Yunlong Zhang, Qingyong Yu                                  7




   requires the RCT, as well as the feature reduction and classification, to be completed 60 times.
   Consequently, the computational requirements for the cepstral method are much greater than for the
   DWT, FFT, and DCT methods.
           The fifth method of feature extraction was based on the mel frequency cepstral transform (MCT)
   [19]. The MCT uses a nonlinear frequency scale, where the RCT uses a linear frequency scale. The
   nonlinear scale approximates the behavior of the human auditory system. In speech recognition
   applications, the MCT has been shown to provide superior classification accuracies over the RCT [19].
   As with the RCT, the MCT method requires a windowing of the input audio signal. For the MCT, the
   three second audio signal is partitioned into 40 segments. For each segment, ( M + 1) of the MCT
   coefficients are used for classification. The method is repeated 40 times, and an overall classification of
   “crash” is assigned to the signal if at least three of the segments are classified as “crash”. Like the RCT,
   the MCT is much more computationally intensive than the DWT, FFT, or DCT methods.

   C. Feature Reduction
   Whether the feature extraction is based on DWT, FFT, DCT, RCT, or MCT, the input to the automated
   statistical classifier is a feature vector. In order to optimize the feature vector, it is reduced using
   Fisher’s linear discriminant analysis (LDA) [20]. The reduced feature vector is then the input to a
   statistical classifier.
                                                                         r
            The input to LDA is the audio signal’s feature vector, F . The output from LDA is an optimal
   linear combination weight matrix, W , in the sense of maximizing the interclass variance and minimizing
   the intraclass variance. The weight matrix has a size of ( M + 1) × (C − 1) , where C is the number of
                                                                     r
   classes. With the weight matrix, the reduced feature vector, Fr , can be computed as an optimal linear
                                                             r
   combination of elements in the original feature vector, F ,
                                                       r          r
                                                       Fr = W T ⋅ F .                                          (11)

   D. Classification
   Three types of statistical classifiers are investigated: nearest mean, maximum likelihood, and nearest
   neighbor [20]. All three are supervised classifiers. That is, the classifier must first be trained using data
   for which the user knows the correct classification. The nearest mean classifier is the simplest of the
   three. It is a parametric classifier that requires only first order statistics of the training data, i.e. class
   means. The maximum likelihood classifier is parametric classifier that requires second order statistics of
   the training data, i.e. class means and variances. It is more complicated to implement than the nearest
   mean; however, if the data is not symmetrically distributed, the maximum likelihood method will
   outperform the nearest mean classifier. The nearest neighbor classifier is non-parametric. All training
   data must be stored for use during the testing phase of the classifier.

   E. System Testing
   The overall system is evaluated using the leave-one-out test. Consider the case where we have a finite
   number of audio signals in a database. The automated detection system, the classifier, must be trained
   and tested. In order to have unbiased results, the training and testing data must be mutually exclusive.
   The leave-one-out approach maximizes the use of the limited database. For each signal in the database,



TRB 2003 Annual Meeting CD-ROM                                               Paper revised from original submittal.
   Lori Mann Bruce, Navaneethakrishnan Balraj, Yunlong Zhang, Qingyong Yu                                  8




   the following test is conducted. Remove the signal under investigation. On the remaining signals, where
   the classifications are known, compute the features. Use LDA to determine the optimum linear
   combination and reduce the dimensionality of the feature vectors. Utilize the classifier to compare the
   test signal to the training signals. Refer to the true class of the test signal, and determine if the automated
   classification was correct. This process is repeated for each audio signal in the database. The percentage
   of correct classifications is determined resulting in a final classification accuracy. Since the test signal is
   not used in the training, the use of leave-one-out test leads to an unbiased classification accuracy [20]. As
   compared to jack-knifing, the leave-one-out test makes full use of the test data. This is important when
   the amount of test data is limited, which is often the case when using crash signals. To further account
   for limited test data in this study, a 95% confidence interval [21] was also computed and reported with
   the classification accuracies.

   V. RESULTS AND DISCUSSION
   Several design parameters must be determined to create the automated accident detection system. First,
   the authors considered the different types of feature extraction method, whether to use the DWT, FFT,
   DCT, RCT, or MCT-based methods. Within the DWT approach, a type of mother wavelet must be
   specified, i.e. Haar, Daubechies, Coiflet, Symlet classes of mother wavelets. The authors also
   investigated which type of statistical classifiers (nearest mean, maximum likelihood, or nearest neighbor)
   performs best. Finally, the audio signals could be classified using a two-class system (crash or non-
   crash) or a multi-class system (crash, normal traffic, pile drive, braking). Both approaches are studies.
            Tables 1 and 2 show the maximum likelihood classification results for the DWT method when
   varying the type of mother wavelet. The data used for this experiment was normalized such that all three
   second signals in the database have a maximum amplitude of one. Table 1 is for the two-class system,
   and Table 2 is for the multi-class system. Clearly, the DWT approach provides excellent results. The
   Haar, Daubechies4, Coiflets2, and Symlets8 types of mother wavelets typically performed the best, and
   this is demonstrated with the results shown in Table 1.
            Figure 2 shows the classification results for the DWT(Haar), FFT, DCT, RCT, and MCT
   methods for both the two-class and the multi-class systems. Again, the data used for this experiment was
   normalized such that all three second signals in the database have a maximum amplitude of one. The
   maximum likelihood classifier was used. Note that the two-class system always outperforms the multi-
   class system. This result is intuitive since all of the non-crash signals are combined into one category,
   greatly reducing the chances for misclassification. Also note that the wavelet and cepstral transform
   methods perform the best, achieving greater than 98% and 94% accuracy for the two-class and multi-
   class systems, respectively.
            Figure 3 shows the two-class and multi-class classification accuracies for the DWT(Haar) feature
   extraction method for each of the three types of statistical classifiers. For this experiment, the audio
   signals are manipulated to model various ambient noise conditions, as described in Section IV,
   specifically equation (7). The SNR’s were varied from –50dB to 50dB. Note that the –50dB case
   represents a scenario where the crash audio signal is very low in amplitude (quiet) as compared to the
   non-crash audio signals (loud). The 50dB case represents a scenario where the crash audio signal is very
   high in amplitude (loud) as compared to the non-crash audio signals (quiet). The 0dB case models a




TRB 2003 Annual Meeting CD-ROM                                               Paper revised from original submittal.
   Lori Mann Bruce, Navaneethakrishnan Balraj, Yunlong Zhang, Qingyong Yu                                 9




   scenario where the crash signals and non-crash signals have the same volume. From Figure 2(b), we can
   see that the classification accuracies decrease with decreasing SNR. This is intuitive since with
   decreasing SNR, the crash audio signal is becoming more and more like a non-crash audio signal. Note
   that regardless of SNR, the maximum likelihood classifier performs best.
            Figure 4 shows a comparison of the five different feature extraction methods with respect to
   SNR. The Haar mother wavelet is used with DWT method, and the maximum likelihood classifier is
   utilized. It is apparent that the wavelet and cepstral approaches outperform the FFT and DCT methods.
   As long as the SNR is ≥0dB, the DWT and RCT methods detect accidents with an accuracy ≥95% and
   ≥90% for the two-class and the multi-class system, respectively. This result is very promising
   considering the fact that the 0dB case is a very conservative scenario (the crash sound and the
   background traffic noise are equally loud). The RCT outperforms the DWT method in most cases.
   However, the DWT method is much more computationally efficient than the RCT. That is, the RCT
   method provides superior classification accuracies but at a very high computational cost. Also, note that
   for the two-class system with SNR’s greater than 0dB, the RCT and DWT methods perform equally well.
   This is significant since the goal of the system is to detect traffic accidents in intersections, and a SNR
   greater than 0dB is more realistic. Also, consider the case of implementing the classification algorithms
   in a real-time system. The DWT method, especially when using the Haar mother wavelet, would be a
   much more practical choice.

   VI. CONCLUSIONS
   A system was design and tested for using audio signals for automated detection of traffic accidents in
   intersections. Various feature extraction methods were investigated, including techniques based on the
   DWT, FFT, DCT, RCT, and MCT. As well, various statistical classifiers were investigated, including
   nearest mean, maximum likelihood, and nearest neighbor. The system was tested on recorded audio
   signals of normal traffic and traffic accidents. The results showed that the maximum likelihood classifier
   produced the best results. In terms of overall classification accuracy, the RCT feature extraction method
   worked best. However, when the audio signal’s SNR was greater than 0dB, the RCT and DWT methods
   produced comparable accuracies, ≈99%. Moreover, the DWT approach was much more computationally
   efficient than the RCT method. Thus, if the classification algorithms were used to develop a real-time
   system, the DWT method would be preferred, particularly when using the Haar mother wavelet and the
   maximum likelihood classifier.
            For future research, the authors collected more audio signals. These signals should be collected
   under a variety of environmental conditions, for example various weather, construction, and traffic
   conditions. Once a larger database of audio signals has been constructed, alternative classifier methods
   could be investigated, such as neural networks. Finally, the system should be implemented using in situ
   sensors and hardware, so the system can be tested in a real-time scenario.

   VII. ACKNOWLEDGEMENTS
   The authors would like to sincerely thank Dr. Charles Harlow from Louisiana State University for
   providing audio signals of traffic accidents for this project.




TRB 2003 Annual Meeting CD-ROM                                              Paper revised from original submittal.
   Lori Mann Bruce, Navaneethakrishnan Balraj, Yunlong Zhang, Qingyong Yu                                10




   VIII. REFERENCES

   [1] J. Lindley, “Urban Freeway Congestion: Quantification of the Problem and Effectiveness of
        Potential Solution,” ITE Journal, January 1987.
   [2] Cambridge Systematics, “Incident Management,” October 1990.
   [3] FHWA, “Incident Management Successful Practices, A Cross-Cutting Study,” April 2000.
   [4] L.Tang, L.F.Tian, B.L.Steward and J.F Reid,. “Texture-Based Weed Classification Using Gabor
        Wavelets and Neural Network for Real-time Selective Herbicide Applications,” Proc. American
        Society of Agricultural Engineers, Paper No.991151(UILU No.99-7035).
   [5] K. Subramaniam, S.S. Daly, F.C. Rind, “Wavelet transforms for use in motion detection and tracking
        application,” Proc. Seventh Int. Conf. on Image Processing and Its Applications, vol. 2, pp. 711-715,
        1999.
   [6] K.W. Dickinson and C.L. Wan, “An evaluation of microwave vehicle detection at traffic signal
        controlled intersection,” Proc. Third Int. Conf. On Road Traffic Control, pp. 153-157, 1990.
   [7] I. Ohe, H. Kawashima, M. Kojima, Y. Kaneko, “A Method for Automatic Detection of Traffic
        Incidents Using Neural Networks,” Proc. Vehicle Navigation and Information Systems Conf, pp. 231
        –235, 1995.
   [8] E.M. Brockmann, B.W. Kwan, L.J. Tung, “Audio detection of moving vehicles,” Proc. IEEE Intt.
        Conf. on Systems, Man, and Cybernetics - Computational Cybernetics and Simulation, vol.4. p.
        3817-21, 1997.
   [9] J.F. Forren and D. Jaarsma, “Traffic monitoring by tire noise,” Proc. IEEE Conference on Intelligent
        Transportation Systems, pp. 177-82, 1997.
   [10] S. Chen; Z.P. Sun; B. Bridge, ”Automatic traffic monitoring by intelligent sound detection,” Proc.
        IEEE Conf. on Intelligent Transportation Systems. pp. 171-6, 1997.
   [11] A.Y. Nooralahiyan, H.R. Kirby, D. McKeown, “Vehicle classification by acoustic signature,”
        Mathematical and Computer Modelling, vol.27, no.9-11, pp. 205-14, 1998.
   [12] C. Harlow and Y. Wang, “Automated Accident Detection,” Proc. Trasportation Research Board
        80th Annual Meeting, pp 90-93, 2001.
   [13] S. Kadambe, G.F. Boudreaux-Bartels, “Application of the wavelet transform for pitch detection of
        speech signals,” IEEE Trans. on Information Theory , vol. 38, no. 2, part 2, pp. 917-924, 1992
   [14] M.J. Dowling, “Application of non-stationary analysis to machine monitoring,”. Proc. IEEE Int.
        Conf. on Acoustics, Speech, and Signal Processing, vol. 1, pp. 59-62, 1993.
   [15] S. Burrus, R. Gopinath, H. Guo, Introduction to Wavelets and Wavelet Transforms: A Primer, 1/e,
        Prentice-Hall, New Jersey, 1998.
   [16] I. Daubechies, Ten Lectures on Wavelets, Society for Industrial and Applied Mathematics,
        Philadelphia, PA. 1992.
   [17] G. Strang and T. Nguyen, Wavelets and Filter Banks, Wellesley-Cambridge Press, Wellesley, MA,
        1996.
   [18] A.V. Oppenheim, R.W. Schafer, Discrete-Time Signal Processing, 2nd ed., pp. 589-598, Prentice-
        Hall, 1999.
   [19] X. Huang, A. Acero, H. Hon, Spoken Language Processing: A Guide to Theory, Algorithm, and
        System Development, pp. 306-318, Prentice-Hall, 2001.
   [20] R.Duda, P.Hart, D.Stork, Pattern Classification, Wiley-Interscience, 2001
   [21] L. D. Fisher and G. V. Belle, Biostatistics, A Methodology for the Health Sciences, John Wiley &
        Sons, Inc., 1993.




TRB 2003 Annual Meeting CD-ROM                                              Paper revised from original submittal.
   Lori Mann Bruce, Navaneethakrishnan Balraj, Yunlong Zhang, Qingyong Yu                                11




   Table 1. Maximum likelihood classification accuracies for two-class system using DWT-based features.

           Mother           Crash             Non-crash             Overall             Confidence Interval
          Wavelet          Accuracy           Accuracy             Accuracy                   (95%)
            Haar               1                  1                    1                         0
        Daubechies4            1                  1                    1                         0
        Daubechies15           1               0.9722               0.9798                    0.0232
         Coif lets2         0.9259                1                 0.9798                    0.0232
         Coif lets5            1                  1                    1                         0
          Symlets2           0.964             0.9861               0.9798                    0.0232
          Symlets8          0.9259                1                 0.9794                    0.0237




TRB 2003 Annual Meeting CD-ROM                                              Paper revised from original submittal.
   Lori Mann Bruce, Navaneethakrishnan Balraj, Yunlong Zhang, Qingyong Yu                                12




   Table 2. Maximum likelihood classification accuracies for multi-class system using DWT-based
   features.

                                                                                               Confidence
           Mother                                                              Overall          Interval
          Wavelet           Crash      Normal        Piledrive/Brake          Accuracy           (95%)
            Haar            0.963         1                0.5                 0.9394            0.0393
        Daubechies4        0.9259         1                0.8                 0.9697            0.0283
        Daubechies15       0.9259      0.9839              0.8                 0.9495            0.0361
         Coif lets2        0.9259         1                0.9                 0.9697            0.0283
         Coif lets5        0.9259         1                0.9                 0.9697            0.0283
          Symlets2          0.963         1                0.8                 0.9697            0.0283
          Symlets8          0.963      0.9839             0.875                0.9691            0.0288




TRB 2003 Annual Meeting CD-ROM                                              Paper revised from original submittal.
   Lori Mann Bruce, Navaneethakrishnan Balraj, Yunlong Zhang, Qingyong Yu                                   13




                                       physical phenomenon



                                                      Sensor       (microphone)


                                     digital signal
                                                                  (DWT, FFT, DCT,
                                                     Feature
                                                                    RCT, or MCT
                                                    Extraction
                                                                    based method)

                                  feature vector

                                                     Feature         (Fisher’s LDA)
                                                   Optimization

                         reduced feature vector
                                                                     (nearest mean,
                                                                  maximum likelihood,
                                                    Classifier
                                                                  or nearest neighbor)



                                      label: crash or non-crash


                          Figure 1. Block diagram of automated accident detection.




TRB 2003 Annual Meeting CD-ROM                                                 Paper revised from original submittal.
   Lori Mann Bruce, Navaneethakrishnan Balraj, Yunlong Zhang, Qingyong Yu                                          14




              Overall Classification   100%
                   Accuracy
                                                                                              Two-class
                                       90%                                                    Multi-class




                                       80%
                                              DWT     RCT     MCT     FFT       DCT
                                                    Feature Extraction Method

      Figure 2. Maximum likelihood classification accuracies for two-class and multi-class systems.




TRB 2003 Annual Meeting CD-ROM                                                        Paper revised from original submittal.
   Lori Mann Bruce, Navaneethakrishnan Balraj, Yunlong Zhang, Qingyong Yu                                                             15




                                                                     Two-Class System


                   Overal Classification
                                           100%
                        Accuracy           90%
                                           80%                                                              Nearest Mean
                                           70%                                                              Maximum Likelihood
                                           60%                                                              Nearest Neighbor
                                           50%
                                           40%
                                                  50db

                                                         20 db

                                                                 10 db

                                                                         5 db

                                                                                0 db

                                                                                       -10 db

                                                                                                -50 db
                                                         Signal-to-Noise-Ratio




                                                                   Multi-Class System


                                           100%
                   Overal Classification




                                           90%
                        Accuracy




                                           80%                                                              Nearest Mean
                                           70%                                                              Maximum Likelihood
                                           60%                                                              Nearest Neighbor
                                           50%
                                           40%
                                                  50db

                                                         20 db

                                                                 10 db

                                                                         5 db

                                                                                0 db

                                                                                       -10 db

                                                                                                -50 db




                                                         Signal-to-Noise-Ratio


                          Figure 3. DWT-based feature extraction using Haar mother wavelet.




TRB 2003 Annual Meeting CD-ROM                                                                           Paper revised from original submittal.
   Lori Mann Bruce, Navaneethakrishnan Balraj, Yunlong Zhang, Qingyong Yu                                                16




                                                                 Two-Class System

                 Overall Classification
                                           100%
                                                                                                       DWT (Haar)
                                                                                                       FFT
                      Accuracy

                                           80%
                                                                                                       DCT
                                                                                                       RCT
                                           60%
                                                                                                       MCT

                                           40%
                                                  50db   10 db     5 db   0 db   -10 db -50 db
                                                            Signal-to-Noise-Ratio


                                                             Multi-Class System
                  Overall Classification




                                           100%                                                        DWT (Haar)
                                                                                                       FFT
                       Accuracy




                                            80%
                                                                                                       DCT
                                                                                                       RCT
                                            60%
                                                                                                       MCT
                                            40%
                                                  50db   10 db     5 db   0 db   -10 db -50 db
                                                            Signal-to-Noise-Ratio


        Figure 4. Maximum likelihood classification accuracies for various feature extraction methods.




TRB 2003 Annual Meeting CD-ROM                                                              Paper revised from original submittal.

								
To top