VIEWS: 29 PAGES: 17 CATEGORY: Education POSTED ON: 2/12/2010
Automated Accident Detection in Intersections via Digital Audio Signal Processing Lori Mann Bruce Navaneethakrishnan Balraj Department of Electrical and Computer Engineering Mississippi State University, Starkville, MS, 39762 Phone: (662) 325-8430 Fax: (662) 325-2298 Email: bruce@ece.msstate.edu Yunlong Zhang Qingyong Yu Department of Civil Engineering Mississippi State University, Starkville, MS, 39762 Phone: (662) 325-9838 Fax:(662) 325-7189 Email: zhang@engr.msstate.edu Total number of words: 4616 (text)+250*6(tables and figures) = 6116 TRB 2003 Annual Meeting CD-ROM Paper revised from original submittal. Lori Mann Bruce, Navaneethakrishnan Balraj, Yunlong Zhang, Qingyong Yu 1 ABSTRACT In this paper, the authors design a system for automated traffic accident detection in intersections. The input to the system is a 3 second segment of audio signal. The system can be operated in two modes: two- class and multi-class. The output of the two-class mode is a label of “crash” or “non-crash”. In the multi-class mode of operation, the system identifies crashes as well as several types of non-crash incidents, including normal traffic and construction sounds. The system is composed of three main signal processing stages: feature extraction, feature reduction, and classification. Five methods of feature extraction are investigated and compared; these are based on the discrete wavelet transform, fast Fourier transform, discrete cosine transform, real cepstral transform, and mel frequency cepstral transform. Statistical methods are used for feature optimization and classification. Three types of classifiers are investigated and compared; these are the nearest mean, maximum likelihood, and nearest neighbor methods. The results of the study show that the wavelet-based features in combination with the maximum likelihood classifier is the optimum design. The system is computationally inexpensive relative to the other methods investigated, and the system consistently results in accident detection accuracies of 95% to 100%, when the audio signal has a signal-to-noise-ratio of at least 0 decibels. KEY WORDS Accident detection, Audio, Acoustic, Wavelet transform, Statistical classifier TRB 2003 Annual Meeting CD-ROM Paper revised from original submittal. Lori Mann Bruce, Navaneethakrishnan Balraj, Yunlong Zhang, Qingyong Yu 2 I. INTRODUCTION Traffic incidents are a major cause of congestion on the nation's highways. Approximately 50 to 60 percent of the delay on urban freeways is associated with incidents [1, 2]. It is estimated that incident- related congestion will cost the U.S. public $75 billion in lost productivity and 8.4 billion gallons of wasted fuel in the year 2005 [3]. Among various types of traffic incidents, vehicle accidents play a significant role in causing non-recurring congestion. Thus, a key strategy for reducing delay and adverse impact of accidents on traffic in major urban areas is to handle accidents as quickly as possible to keep traffic flowing. It is clear that an accident has to be detected and verified before any other incident management actions can be taken to guarantee the success of any incident management process. Timely and accurate accident detection becomes more important in incident management. On urban surface streets, the majority of traffic accidents occur at or near intersections. While freeway incident detection research can be dated back to the early 1960's, accident detection research on surface streets has been lagging, and needs more attention. This paper describes the design and testing of a system for automated accident detection at intersections. The system takes advantage of audio signals to recognize accidents from the ambient noise. The system is comprised of feature extraction, feature reduction, and classification. In particular, the authors investigate the use of the discrete wavelet transform for feature extraction, an approach that has proven successful in recognition of speech signals and computer vision [4]. This model is specially developed to classify audio signals into crash and non-crash categories. II. BACKGROUND Various types of sensors have been used for the detection of traffic accidents, including video cameras, ultrasonic transmitter/receivers, and microwave transmitter/receivers [5, 6, 7]. Typically with video images, researchers use methods such as object segmentation and motion estimation to track moving vehicles. These methods are used to detect incidents that have caused the flow of traffic to stop. Recently, researchers have developed an algorithm that uses Gabor wavelet analysis and the Mallat wavelet transform for fast, accurate vehicle tracking [5]. The image flow vectors can be quickly estimated using low spatial resolution data, and vehicles can be accurately detected providing high spatial resolution results. Even in complex motion scenarios, the system could detect small, disconnected objects of arbitrary numbers. Audio sensors, such as simple microphones, can be a cost effective, computationally efficient alternative. Researchers have utilized road-side microphones to collect audio signals which were then used to detect passing by vehicles, i.e. vehicle counters [8, 9]. Chen et al. developed an algorithm using cross-correlation for the analysis of audio signals collected from microphone arrays [10]. The algorithm predicted road traffic conditions, including traffic speed and density. Vehicle classification has also been accomplished using audio signals. Nooralahiyan et al. developed vehicle classification system that utilized linear predictive coding along with a time delay neural network [11]. The system classified audio signals from passing-by vehicles according to four basic vehicle types. Recently, Harlow and Wang utilized audio signals to detect traffic accidents in intersections [12]. The researchers developed a two phase automated detection system. First, mel frequency cepstral TRB 2003 Annual Meeting CD-ROM Paper revised from original submittal. Lori Mann Bruce, Navaneethakrishnan Balraj, Yunlong Zhang, Qingyong Yu 3 coefficients were extracted from the signal and used to form a feature vector. Second, the vector was then input to a neural network for classification. The final output was a label of “crash” or “non-crash”. For non-stationary events, such as accident sounds in a traffic audio signal, wavelet analysis could prove advantageous. Wavelet transforms have been used as a preprocessing step, or feature extraction technique, with audio signals in non-traffic applications. These include human speech recognition [13] and industrial machinery monitoring [14]. The wavelet transform of a signal results in a set of detail and approximation coefficients. These coefficients provide a means of separating fine scale (very localized) behavior from large scale (global) behavior in the audio signals. With the wavelet transform of traffic audio signals, the detail coefficients, representing fine scale behavior, could be used to identify non-stationary incidents, such as traffic accidents. Likewise, the approximation coefficients, or large-scale behavior, could be used to identify normal background traffic sounds. In this paper, the authors investigate the use of wavelet transforms for automated traffic accident detection in intersections. The discrete wavelet transform (DWT) is used for feature extraction, and statistical methods are used for feature optimization and classification. The input to the system is a three second segment of audio signal. The system can be operated in two modes: two-class and multi-class. The output of the two-class mode is a label of “crash” or “non-crash”. A system block diagram is shown in Figure 1. In the multi-class mode of operation, the system identifies crashes as well as several types of non-crash incidents, including normal traffic and construction sounds. The wavelet-based features are compared to various other methods including real cepstral coefficients, mel frequency cepstral coefficients, fast Fourier transform coefficients, and discrete cosine transform coefficients. Furthermore, various classifiers are investigated and compared; these included the nearest mean, maximum likelihood, and nearest neighbor methods. A brief background of wavelet analysis is provided in the next section. In Section IV, the authors provide a detailed description of the methodologies utilized in the automated system. The authors also provide a description of the data collection protocol and the types of data used to test the system. Classification results are provided in Section V. Finally, conclusions are drawn and suggestions for future research are provided in Section VI. III. WAVELET TRANSFORMS Wavelet analysis is based on the idea of projecting a signal onto a set of basis functions. A set of wavelet basis functions, {ψ a , b (t )} , can be generated by shifting and scaling the basic or mother wavelet, ψ (t ) , according to the following, 1 t −b ψ a , b (t ) = ψ , (1) a a where a > 0 and b are real numbers. The variable a is the scaling factor of a particular basis function, and b is the translation variable along the function’s range. When a > 1 , the functions are dilated and 1 when a < 1 , the functions are contracted. The coefficient, , is included to normalize the energy of a the wavelets. All of the wavelets {ψ a , b (t )} generated by shifting and scaling the mother wavelet, ψ (t ) , have the same basic shape. TRB 2003 Annual Meeting CD-ROM Paper revised from original submittal. Lori Mann Bruce, Navaneethakrishnan Balraj, Yunlong Zhang, Qingyong Yu 4 All wavelet functions must oscillate, have an average value of zero, and have finite support. This “admissibility condition” can be represented by ℑ(ψ (t )) +∞ 2 ∫ −∞ s ds = 0 , (2) where ℑ(⋅) denotes the Fourier transform and s is the Fourier domain variable. An important property of many wavelet systems is the multiresolution analysis (MRA) property, where the decomposition of a signal is in terms of the resolution of detail signals [15]. If a wavelet basis satisfies the MRA criteria, its transform can be implemented via multiresolutional filter trees. This type of implementation is very useful because it allows for fast algorithms similar to the well-known fast Fourier transform. There exist many different types of mother wavelets and wavelet bases. The Haar wavelet is one of the simplest examples. The Haar wavelet is discontinuous, and it resembles a step function: 1, 0 ≤ t ≤ 1 / 2 ψ (t ) = − 1, 1 / 2 ≤ t ≤ 1 . (3) 0, otherwise Also, a well-known family of wavelets was developed by Ingrid Daubechies [16], and they are generally referred to as Daubechies- n , where n is the “order” of the mother wavelet. The order corresponds to the regularity of the mother wavelet, and the Daubechies-1 wavelet is equivalent to the Haar wavelet. For this study, the authors investigate the use of many different mother wavelets, including the Coiflet, symlet, biorthogonal, and Daubechies families. The CWT, denoted by W f (a, b ) , of a function, f (t ) , with respect to the wavelet basis function, ψ a, b (t ) , can be defined as ∞ W f (a , b ) = ∫ f (t )ψ a ,b ( t ) dt , (4) −∞ where the wavelet function ψ a, b (t ) is given by equation (1). For the CWT, the scale parameter, a , and the shift parameter, b , are specified as real numbers. Hence, the transform coefficients, W f (a, b ) , are continuous with respect to the variables a and b . For the DWT, the discrete wavelet basis functions are represented as −j ψ j , k (n ) = 2 2 ψ (2− j n − k ) , (5) and the wavelet coefficients are obtained by W j , k = f (n ),ψ j , k (n ) . (6) Thus, the scales are a = 2, 4, 8, K ,2 j , K ,2 p . Note audio signal and wavelets are now functions of discrete time, n , rather than continuous time, t . In this project, f ( n) is the digitized audio signal of traffic noises. The DWT has been extensively used in the development of fast wavelet algorithms. The most common implementation of the DWT is the well-known dyadic filter tree [17]. The filter tree is composed of highpass and lowpass filters corresponding to the user’s selection of mother wavelet TRB 2003 Annual Meeting CD-ROM Paper revised from original submittal. Lori Mann Bruce, Navaneethakrishnan Balraj, Yunlong Zhang, Qingyong Yu 5 function. At each stage, j , of the filter tree, a set of approximation, A j , and detail coefficients, D j , are produced, corresponding to the input signal’s large and small-scale behavior, respectively. IV. METHODOLOGIES A. Data Collection and Preprocessing Audio data was collected at intersections in Starkville, MS and Jackson, MS. Also, crash audio data was provided by Dr. Charles Harlow of Louisiana State University. The audio data was collected using a Sony TCD-D8 digital audio tape recorder. The sampling specifications were mono-channel, 22050 Hz sampling rate, and 8 bit resolution.. The signals were partitioned using a 3 second rectangular window; this resulted in each audio signal having a length of 66176 samples. The training and test data consisted of 27 crash, 8 pile drive, 2 brake, and 62 normal traffic signals. When the last three classes of data are combined, this paper refers to the data as “crash” and “non-crash”. Before any processing was conducted, all signals were normalized such that each signal’s maximum amplitude was one. In order to mimic practical scenarios, the crash signals were combined with the non-crash signals. Assume f nc (n) is the recorded non-crash signal, f c (n) is the recorded crash signal, α is a ~ weighting variable. Then the new, realistic crash signal, f c (n) , was computed as ~ f c (n) = f nc (n) + α ⋅ f c (n) . (7) The authors investigated the automated system’s sensitivity to crash proximity. It was assumed that crashes occurring nearer the microphone (directly in the intersection) would be louder in volume than crashes occurring farther away (in a street entering the intersection). To model the various scenarios, the weight α was varied. The resulting crash signal had a signal-to-noise-ratio (SNR) ranging ~ from –50 decibels (dB) to +50dB. When computing the SNR of f c (n) , the “signal” component was f nc (n) and the “noise” component was f c (n) . B. Feature Extraction Various methods of feature extraction are investigated and compared. The first method of feature r extraction is based on the DWT. A feature vector, F , is formed through computing the root-mean- square (RMS) energy of the wavelet detail coefficients at each scale, ED j and the RMS energy of the wavelet approximation coefficients at the final scale, EAM , r F = [EAM ED M ED M −1 L ED1 ]T , (8) where the superscript T denotes a vector transpose, A PM −1 1 EAM = A ∑ [AM (i)]2 (9) PM i =0 and TRB 2003 Annual Meeting CD-ROM Paper revised from original submittal. Lori Mann Bruce, Navaneethakrishnan Balraj, Yunlong Zhang, Qingyong Yu 6 PjD −1 ∑ [D (i)] 1 2 ED j = D j , (10) Pj i =0 for j = 1,2,3, L , M . The constant M is the maximum wavelet decomposition level; PjD is the number A of detail coefficients at level j ; and PM is the number of approximation coefficients at level M . According to equation (8), after the DWT feature extraction, the data space dimension is reduced to M + 1 . The value of M is determined by the wavelet filter length and the original signal length. The most it can ever be is M = log 2 ( N ) , where N is the length of the original signal and the mother wavelet’s high-pass and low-pass filters have length 2, as is the case with the Haar mother wavelet. However, for longer mother wavelets, such as the higher order Daubechies or Biorthogonal mother wavelets, the maximum scale M can be much lower. By computing the RMS of the DWT coefficients at each scale, the features represent a scalar energy partitioning of the original signal. From a digital signal processing perspective, scale is similar to frequency. However, scale is not equivalent to frequency since the basis functions of the DWT are not necessarily sinusoids. Based on one’s selection of mother wavelet the scalar energy partitioning may be optimized. However, is this advantageous to simply using frequency analysis of the audio signals? To answer this question, the DWT approach was compared to feature extraction based on the fast Fourier transform (FFT). The second method of feature extraction was to simply use the magnitude of FFT coefficients as features. To fairly compare the DWT-based features with FFT-based features, the order of the FFT was varied. The order of the FFT was selected such that the number of resulting FFT coefficients, and thus features, was equivalent to the number of DWT-based features. The third method of feature extraction was based on the discrete cosine transform (DCT) [18]. The DCT coefficients were used directly as the features. Likewise, the order of the DCT was varied such that the number of resulting DCT coefficients, and thus features, was equivalent to the number of DWT- based features. The DCT was investigated because it is similar to the FFT in that it is used for frequency analysis of the input signal. However, its resulting coefficients are real, where as the FFT’s coefficients are complex. The fourth and fifth methods of feature extraction were based on the cepstral transform [19]. Cepstral analysis is based on homomorphic transformations, where logarithms are used to convert multiplicative terms into additive terms. Cepstral analysis has been utilized extensively in speech recognition, and more recently by Harlow and Yu for automated accident detection using audio signals [12]. Cepstral transforms require windowing of the input audio signal, length three seconds for this project. The real cepstral transform (RCT) utilizes Hamming windows to segment the input audio signal into 100msec intervals with 50msec overlap. For each signal segment, the RCT results in a set of ordered coefficients. Typically in speech processing, the first 12 to 20 coefficients are utilized [19]. In this project, the first ( M + 1) coefficients are used as features, since the DWT approach results in ( M + 1) features. Then for each signal segment, the RCT coefficients are used as features, and statistical feature reduction and classification is conducted. If a “crash” is detected within at least three of the signal segments, an overall classification of “crash” is assigned to the three second signal. Note that this TRB 2003 Annual Meeting CD-ROM Paper revised from original submittal. Lori Mann Bruce, Navaneethakrishnan Balraj, Yunlong Zhang, Qingyong Yu 7 requires the RCT, as well as the feature reduction and classification, to be completed 60 times. Consequently, the computational requirements for the cepstral method are much greater than for the DWT, FFT, and DCT methods. The fifth method of feature extraction was based on the mel frequency cepstral transform (MCT) [19]. The MCT uses a nonlinear frequency scale, where the RCT uses a linear frequency scale. The nonlinear scale approximates the behavior of the human auditory system. In speech recognition applications, the MCT has been shown to provide superior classification accuracies over the RCT [19]. As with the RCT, the MCT method requires a windowing of the input audio signal. For the MCT, the three second audio signal is partitioned into 40 segments. For each segment, ( M + 1) of the MCT coefficients are used for classification. The method is repeated 40 times, and an overall classification of “crash” is assigned to the signal if at least three of the segments are classified as “crash”. Like the RCT, the MCT is much more computationally intensive than the DWT, FFT, or DCT methods. C. Feature Reduction Whether the feature extraction is based on DWT, FFT, DCT, RCT, or MCT, the input to the automated statistical classifier is a feature vector. In order to optimize the feature vector, it is reduced using Fisher’s linear discriminant analysis (LDA) [20]. The reduced feature vector is then the input to a statistical classifier. r The input to LDA is the audio signal’s feature vector, F . The output from LDA is an optimal linear combination weight matrix, W , in the sense of maximizing the interclass variance and minimizing the intraclass variance. The weight matrix has a size of ( M + 1) × (C − 1) , where C is the number of r classes. With the weight matrix, the reduced feature vector, Fr , can be computed as an optimal linear r combination of elements in the original feature vector, F , r r Fr = W T ⋅ F . (11) D. Classification Three types of statistical classifiers are investigated: nearest mean, maximum likelihood, and nearest neighbor [20]. All three are supervised classifiers. That is, the classifier must first be trained using data for which the user knows the correct classification. The nearest mean classifier is the simplest of the three. It is a parametric classifier that requires only first order statistics of the training data, i.e. class means. The maximum likelihood classifier is parametric classifier that requires second order statistics of the training data, i.e. class means and variances. It is more complicated to implement than the nearest mean; however, if the data is not symmetrically distributed, the maximum likelihood method will outperform the nearest mean classifier. The nearest neighbor classifier is non-parametric. All training data must be stored for use during the testing phase of the classifier. E. System Testing The overall system is evaluated using the leave-one-out test. Consider the case where we have a finite number of audio signals in a database. The automated detection system, the classifier, must be trained and tested. In order to have unbiased results, the training and testing data must be mutually exclusive. The leave-one-out approach maximizes the use of the limited database. For each signal in the database, TRB 2003 Annual Meeting CD-ROM Paper revised from original submittal. Lori Mann Bruce, Navaneethakrishnan Balraj, Yunlong Zhang, Qingyong Yu 8 the following test is conducted. Remove the signal under investigation. On the remaining signals, where the classifications are known, compute the features. Use LDA to determine the optimum linear combination and reduce the dimensionality of the feature vectors. Utilize the classifier to compare the test signal to the training signals. Refer to the true class of the test signal, and determine if the automated classification was correct. This process is repeated for each audio signal in the database. The percentage of correct classifications is determined resulting in a final classification accuracy. Since the test signal is not used in the training, the use of leave-one-out test leads to an unbiased classification accuracy [20]. As compared to jack-knifing, the leave-one-out test makes full use of the test data. This is important when the amount of test data is limited, which is often the case when using crash signals. To further account for limited test data in this study, a 95% confidence interval [21] was also computed and reported with the classification accuracies. V. RESULTS AND DISCUSSION Several design parameters must be determined to create the automated accident detection system. First, the authors considered the different types of feature extraction method, whether to use the DWT, FFT, DCT, RCT, or MCT-based methods. Within the DWT approach, a type of mother wavelet must be specified, i.e. Haar, Daubechies, Coiflet, Symlet classes of mother wavelets. The authors also investigated which type of statistical classifiers (nearest mean, maximum likelihood, or nearest neighbor) performs best. Finally, the audio signals could be classified using a two-class system (crash or non- crash) or a multi-class system (crash, normal traffic, pile drive, braking). Both approaches are studies. Tables 1 and 2 show the maximum likelihood classification results for the DWT method when varying the type of mother wavelet. The data used for this experiment was normalized such that all three second signals in the database have a maximum amplitude of one. Table 1 is for the two-class system, and Table 2 is for the multi-class system. Clearly, the DWT approach provides excellent results. The Haar, Daubechies4, Coiflets2, and Symlets8 types of mother wavelets typically performed the best, and this is demonstrated with the results shown in Table 1. Figure 2 shows the classification results for the DWT(Haar), FFT, DCT, RCT, and MCT methods for both the two-class and the multi-class systems. Again, the data used for this experiment was normalized such that all three second signals in the database have a maximum amplitude of one. The maximum likelihood classifier was used. Note that the two-class system always outperforms the multi- class system. This result is intuitive since all of the non-crash signals are combined into one category, greatly reducing the chances for misclassification. Also note that the wavelet and cepstral transform methods perform the best, achieving greater than 98% and 94% accuracy for the two-class and multi- class systems, respectively. Figure 3 shows the two-class and multi-class classification accuracies for the DWT(Haar) feature extraction method for each of the three types of statistical classifiers. For this experiment, the audio signals are manipulated to model various ambient noise conditions, as described in Section IV, specifically equation (7). The SNR’s were varied from –50dB to 50dB. Note that the –50dB case represents a scenario where the crash audio signal is very low in amplitude (quiet) as compared to the non-crash audio signals (loud). The 50dB case represents a scenario where the crash audio signal is very high in amplitude (loud) as compared to the non-crash audio signals (quiet). The 0dB case models a TRB 2003 Annual Meeting CD-ROM Paper revised from original submittal. Lori Mann Bruce, Navaneethakrishnan Balraj, Yunlong Zhang, Qingyong Yu 9 scenario where the crash signals and non-crash signals have the same volume. From Figure 2(b), we can see that the classification accuracies decrease with decreasing SNR. This is intuitive since with decreasing SNR, the crash audio signal is becoming more and more like a non-crash audio signal. Note that regardless of SNR, the maximum likelihood classifier performs best. Figure 4 shows a comparison of the five different feature extraction methods with respect to SNR. The Haar mother wavelet is used with DWT method, and the maximum likelihood classifier is utilized. It is apparent that the wavelet and cepstral approaches outperform the FFT and DCT methods. As long as the SNR is ≥0dB, the DWT and RCT methods detect accidents with an accuracy ≥95% and ≥90% for the two-class and the multi-class system, respectively. This result is very promising considering the fact that the 0dB case is a very conservative scenario (the crash sound and the background traffic noise are equally loud). The RCT outperforms the DWT method in most cases. However, the DWT method is much more computationally efficient than the RCT. That is, the RCT method provides superior classification accuracies but at a very high computational cost. Also, note that for the two-class system with SNR’s greater than 0dB, the RCT and DWT methods perform equally well. This is significant since the goal of the system is to detect traffic accidents in intersections, and a SNR greater than 0dB is more realistic. Also, consider the case of implementing the classification algorithms in a real-time system. The DWT method, especially when using the Haar mother wavelet, would be a much more practical choice. VI. CONCLUSIONS A system was design and tested for using audio signals for automated detection of traffic accidents in intersections. Various feature extraction methods were investigated, including techniques based on the DWT, FFT, DCT, RCT, and MCT. As well, various statistical classifiers were investigated, including nearest mean, maximum likelihood, and nearest neighbor. The system was tested on recorded audio signals of normal traffic and traffic accidents. The results showed that the maximum likelihood classifier produced the best results. In terms of overall classification accuracy, the RCT feature extraction method worked best. However, when the audio signal’s SNR was greater than 0dB, the RCT and DWT methods produced comparable accuracies, ≈99%. Moreover, the DWT approach was much more computationally efficient than the RCT method. Thus, if the classification algorithms were used to develop a real-time system, the DWT method would be preferred, particularly when using the Haar mother wavelet and the maximum likelihood classifier. For future research, the authors collected more audio signals. These signals should be collected under a variety of environmental conditions, for example various weather, construction, and traffic conditions. Once a larger database of audio signals has been constructed, alternative classifier methods could be investigated, such as neural networks. Finally, the system should be implemented using in situ sensors and hardware, so the system can be tested in a real-time scenario. VII. ACKNOWLEDGEMENTS The authors would like to sincerely thank Dr. Charles Harlow from Louisiana State University for providing audio signals of traffic accidents for this project. TRB 2003 Annual Meeting CD-ROM Paper revised from original submittal. Lori Mann Bruce, Navaneethakrishnan Balraj, Yunlong Zhang, Qingyong Yu 10 VIII. REFERENCES [1] J. Lindley, “Urban Freeway Congestion: Quantification of the Problem and Effectiveness of Potential Solution,” ITE Journal, January 1987. [2] Cambridge Systematics, “Incident Management,” October 1990. [3] FHWA, “Incident Management Successful Practices, A Cross-Cutting Study,” April 2000. [4] L.Tang, L.F.Tian, B.L.Steward and J.F Reid,. “Texture-Based Weed Classification Using Gabor Wavelets and Neural Network for Real-time Selective Herbicide Applications,” Proc. American Society of Agricultural Engineers, Paper No.991151(UILU No.99-7035). [5] K. Subramaniam, S.S. Daly, F.C. Rind, “Wavelet transforms for use in motion detection and tracking application,” Proc. Seventh Int. Conf. on Image Processing and Its Applications, vol. 2, pp. 711-715, 1999. [6] K.W. Dickinson and C.L. Wan, “An evaluation of microwave vehicle detection at traffic signal controlled intersection,” Proc. Third Int. Conf. On Road Traffic Control, pp. 153-157, 1990. [7] I. Ohe, H. Kawashima, M. Kojima, Y. Kaneko, “A Method for Automatic Detection of Traffic Incidents Using Neural Networks,” Proc. Vehicle Navigation and Information Systems Conf, pp. 231 –235, 1995. [8] E.M. Brockmann, B.W. Kwan, L.J. Tung, “Audio detection of moving vehicles,” Proc. IEEE Intt. Conf. on Systems, Man, and Cybernetics - Computational Cybernetics and Simulation, vol.4. p. 3817-21, 1997. [9] J.F. Forren and D. Jaarsma, “Traffic monitoring by tire noise,” Proc. IEEE Conference on Intelligent Transportation Systems, pp. 177-82, 1997. [10] S. Chen; Z.P. Sun; B. Bridge, ”Automatic traffic monitoring by intelligent sound detection,” Proc. IEEE Conf. on Intelligent Transportation Systems. pp. 171-6, 1997. [11] A.Y. Nooralahiyan, H.R. Kirby, D. McKeown, “Vehicle classification by acoustic signature,” Mathematical and Computer Modelling, vol.27, no.9-11, pp. 205-14, 1998. [12] C. Harlow and Y. Wang, “Automated Accident Detection,” Proc. Trasportation Research Board 80th Annual Meeting, pp 90-93, 2001. [13] S. Kadambe, G.F. Boudreaux-Bartels, “Application of the wavelet transform for pitch detection of speech signals,” IEEE Trans. on Information Theory , vol. 38, no. 2, part 2, pp. 917-924, 1992 [14] M.J. Dowling, “Application of non-stationary analysis to machine monitoring,”. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 1, pp. 59-62, 1993. [15] S. Burrus, R. Gopinath, H. Guo, Introduction to Wavelets and Wavelet Transforms: A Primer, 1/e, Prentice-Hall, New Jersey, 1998. [16] I. Daubechies, Ten Lectures on Wavelets, Society for Industrial and Applied Mathematics, Philadelphia, PA. 1992. [17] G. Strang and T. Nguyen, Wavelets and Filter Banks, Wellesley-Cambridge Press, Wellesley, MA, 1996. [18] A.V. Oppenheim, R.W. Schafer, Discrete-Time Signal Processing, 2nd ed., pp. 589-598, Prentice- Hall, 1999. [19] X. Huang, A. Acero, H. Hon, Spoken Language Processing: A Guide to Theory, Algorithm, and System Development, pp. 306-318, Prentice-Hall, 2001. [20] R.Duda, P.Hart, D.Stork, Pattern Classification, Wiley-Interscience, 2001 [21] L. D. Fisher and G. V. Belle, Biostatistics, A Methodology for the Health Sciences, John Wiley & Sons, Inc., 1993. TRB 2003 Annual Meeting CD-ROM Paper revised from original submittal. Lori Mann Bruce, Navaneethakrishnan Balraj, Yunlong Zhang, Qingyong Yu 11 Table 1. Maximum likelihood classification accuracies for two-class system using DWT-based features. Mother Crash Non-crash Overall Confidence Interval Wavelet Accuracy Accuracy Accuracy (95%) Haar 1 1 1 0 Daubechies4 1 1 1 0 Daubechies15 1 0.9722 0.9798 0.0232 Coif lets2 0.9259 1 0.9798 0.0232 Coif lets5 1 1 1 0 Symlets2 0.964 0.9861 0.9798 0.0232 Symlets8 0.9259 1 0.9794 0.0237 TRB 2003 Annual Meeting CD-ROM Paper revised from original submittal. Lori Mann Bruce, Navaneethakrishnan Balraj, Yunlong Zhang, Qingyong Yu 12 Table 2. Maximum likelihood classification accuracies for multi-class system using DWT-based features. Confidence Mother Overall Interval Wavelet Crash Normal Piledrive/Brake Accuracy (95%) Haar 0.963 1 0.5 0.9394 0.0393 Daubechies4 0.9259 1 0.8 0.9697 0.0283 Daubechies15 0.9259 0.9839 0.8 0.9495 0.0361 Coif lets2 0.9259 1 0.9 0.9697 0.0283 Coif lets5 0.9259 1 0.9 0.9697 0.0283 Symlets2 0.963 1 0.8 0.9697 0.0283 Symlets8 0.963 0.9839 0.875 0.9691 0.0288 TRB 2003 Annual Meeting CD-ROM Paper revised from original submittal. Lori Mann Bruce, Navaneethakrishnan Balraj, Yunlong Zhang, Qingyong Yu 13 physical phenomenon Sensor (microphone) digital signal (DWT, FFT, DCT, Feature RCT, or MCT Extraction based method) feature vector Feature (Fisher’s LDA) Optimization reduced feature vector (nearest mean, maximum likelihood, Classifier or nearest neighbor) label: crash or non-crash Figure 1. Block diagram of automated accident detection. TRB 2003 Annual Meeting CD-ROM Paper revised from original submittal. Lori Mann Bruce, Navaneethakrishnan Balraj, Yunlong Zhang, Qingyong Yu 14 Overall Classification 100% Accuracy Two-class 90% Multi-class 80% DWT RCT MCT FFT DCT Feature Extraction Method Figure 2. Maximum likelihood classification accuracies for two-class and multi-class systems. TRB 2003 Annual Meeting CD-ROM Paper revised from original submittal. Lori Mann Bruce, Navaneethakrishnan Balraj, Yunlong Zhang, Qingyong Yu 15 Two-Class System Overal Classification 100% Accuracy 90% 80% Nearest Mean 70% Maximum Likelihood 60% Nearest Neighbor 50% 40% 50db 20 db 10 db 5 db 0 db -10 db -50 db Signal-to-Noise-Ratio Multi-Class System 100% Overal Classification 90% Accuracy 80% Nearest Mean 70% Maximum Likelihood 60% Nearest Neighbor 50% 40% 50db 20 db 10 db 5 db 0 db -10 db -50 db Signal-to-Noise-Ratio Figure 3. DWT-based feature extraction using Haar mother wavelet. TRB 2003 Annual Meeting CD-ROM Paper revised from original submittal. Lori Mann Bruce, Navaneethakrishnan Balraj, Yunlong Zhang, Qingyong Yu 16 Two-Class System Overall Classification 100% DWT (Haar) FFT Accuracy 80% DCT RCT 60% MCT 40% 50db 10 db 5 db 0 db -10 db -50 db Signal-to-Noise-Ratio Multi-Class System Overall Classification 100% DWT (Haar) FFT Accuracy 80% DCT RCT 60% MCT 40% 50db 10 db 5 db 0 db -10 db -50 db Signal-to-Noise-Ratio Figure 4. Maximum likelihood classification accuracies for various feature extraction methods. TRB 2003 Annual Meeting CD-ROM Paper revised from original submittal.