VIEWS: 16 PAGES: 58 POSTED ON: 1/23/2011 Public Domain
Acoustic Classification Using Independent Component Analysis James Brock May 17, 2006 Motivation!! Scene Aspect Recognition Music Database Organization Motivation • Digital Media Organization • Online music databases categorization • Content-based querying • Automatic generation of meta-data • Scene Aspect Recognition • World Language Recognition Overview • How to determine the class of acoustic information • Get a representative sample of the data • Extract low-level features • Extract high-level patterns in those features • Separate characteristic patterns of different groups of acoustic information Outline • Low-level feature extraction • Mel-frequency Cepstral Coefficients • High-level pattern extraction • Principal Component Analysis • Independent Component Analysis • Previous Work • My Implementation • Conclusions / Future Work Auditory Cognition • Low-level auditory cognition • Outer, middle, inner ear • Cochlea converts sound waves to electrical signals interpretable by the auditory cortex • Humans can distinguish low frequency sound with much better definition then high frequency sound Auditory Cognition • How do humans interpret higher-level characteristics of sound? • “What” & “Where” channels • “Where” interprets spatial information • “What” interprets frequency & temporal information Low-level Acoustic Processing • The simplest, known functions of the human auditory system • Occur when the cochlea converts sound waves to neurological signals • Distinguish frequency & amplitude for a single instance of time • Pass that information to higher-level cognitive processes (Auditory Cortex) Mel-Frequency Cepstral Coefficients (MFCC’s) • MFCC’s have been widely used in audio processing to extract low-level features • Extremely popular in speech recognition • Based on the Mel-scale of band-pass filter banks & Discrete Fourier Transform (DFT) • Produces a set of n coefficients for each window of time, corresponding to relative frequency response • n is typically 13, but can be whatever is useful MFCC’s (cont’d) MFCC’s (cont’d) • The signal is broken down into short, overlapping windows of time • Windows represent frequency and amplitude information for an instant of the audio signal • Overlap ensures that no features are missed • FFT extracts the frequency response of 256 component frequencies for each window MFCC’s (cont’d) • Number of coefficients, n, determines the number of band-pass filters produced • Filters are used to smooth the frequency response across wider frequency ranges MFCC’s (cont’d) • The log of the Discrete Cosine Transform gives the coefficients for a frame of data • The coefficients will depict the relative response of the band-pass filter ranges • Features of the audio signal in the frequency domain MFCC’s (cont’d) • The values in the output matrix of MFCC’s will be larger for the frequency ranges that had the most influence over the audio signal at that point in time. The Next Step • Take the extracted low-level features and try to model the higher-level acoustic processing of the auditory cortex • Extracting and separating temporal patterns and features from the simple frequency domain features Principal Component Analysis (PCA) • Linear Statistical Transform for capturing the variance in a set of data • Extracts structure (patterns) in the relationships between variables • Used to reduce dimensionality of data, summarizing the important parts (features, components) PCA Background • Based on statistical and matrix algebra • Standard deviation • Variance • Both are measures of the spread of the data set from the mean (1-dimensional) PCA Background • Covariance • Represents how much each dimension of data varies from the mean with respect to other dimensions (n-dimensional) • Cov > 0: positive correlation • Cov < 0: negative correlation • Cov = 0: statistical independence PCA Background • Covariance Matrix • Symmetrical around the main diagonal • Values on the main diagonal are that dimension’s variance PCA Background • Eigenvectors • a.k.a. latent vectors, characteristic vectors, latent dimensions • Eigenvectors only exist for square matrices (n x n matrix n eigenvectors, covariance matrices are always square) • Represent the vectors that capture the most variance in the data • Orthogonal in n dimensions • Eigenvalues • Each value corresponds to an eigenvector • Larger eigenvalues indicate which dimensions have the strongest correlation Eigenvector & Eigenvalue Calculation • Compute the eigenvalue matrix D and the eigenvector matrix V of the covariance matrix C C V V D • Matrix D will take the form of an M × M diagonal matrix, where D[ p, q] m for pqm is the mth eigenvalue of the covariance matrix C, and D[ p, q] 0 for pq • Matrix V contains M column vectors, which represent the eigenvectors of the covariance matrix C. Example PCA Data Example PCA Data Original data Normalized data with 1st two Eigenvectors PCA Process 1. Collect Data X 2.5 0.5 2.2 1.9 3.1 2.3 2.0 1.0 1.5 1.1 Y 2.4 0.7 2.9 2.2 3.0 2.7 1.6 1.1 1.6 0.9 2. Subtract the mean from each dimension of data X 0.69 -1.31 0.39 0.09 1.29 0.49 0.19 -0.81 -0.31 -0.71 Y 0.49 -1.21 0.99 0.29 1.09 0.79 -0.31 -0.81 -0.31 -1.01 3. Calculate the covariance matrix PCA Process 4. Calculate the eigenvectors and eigenvalues of the covariance matrix PCA Process 5. Choose components to form a feature vector PCA Applications • Feature extraction • Scene Images • Face Images (eigenfaces) • Audio • Dimension reduction • Modeling multi-dimensional data by a smaller number of principal components • Simpler clustering of observations PCA Example: Eigenfaces 1. Collect data: face images img1 = [row1, row2, row3, … , row n-1, row n]; 2. Organize data into rows imgData = [img1; img2; img3; … ; img n-1; img n]; Each row (image) is a set of dimensional observations PCA Example: Eigenfaces 3. Run the images through your favorite PCA algorithm! Output will be ghost-like faces representing the variance in facial features of the original images PCA Example: Eigenfaces • Each original image is a linear combination of eigenfaces • The images can be almost completely reconstructed with a smaller number of eigenfaces then original images PCA Example: Eigenfaces Independent Component Analysis (ICA) • A more powerful statistical transform • A special case of blind source separation • Used to solve the cocktail party problem • Where in the ideal case there are m microphones recording m physical processes producing sound • The original signals are extracted from the signal mixtures of the microphones Sound Source 1 Mixture 1 Output 1 Sound Source 2 Mixture 2 I C Output 2 A Output 3 Mixture 2 Sound Source 3 PCA vs. ICA • PCA extracts components representing data variance between observations (dimensions) • Linear statistical transform • Components are uncorrelated • ICA extracts statistically independent components (non-gaussian variables) • Non-linear statistical transform • Components are independent ICA Theory • Based off two equations x = As s = Wx x signal mixtures A actual mixing matrix s source signals W estimated unmixing matrix • Assuming that all of the s are from different physical processes, they will be independent • Goal of ICA: Estimate W that maximizes statistical independence for all s ICA Example Source signals Signal Mixtures Extracted sources ICA Theory • Independence • Two variables are independent if the value of one variable provides absolutely no information about the value of the other variable • Non-gaussianity • Equivalent to independence • If an estimated source is perfectly non-gaussian, then it is independent • Measures of Non-gaussianity • Kurtosis • Negentropy ICA Theory • Kurtosis • Normalized version of the 4th moment • 4th moment for Gaussian variable = 3({ y 2 }) 2 • Negentropy • Basic principle of information theory • The more unpredictable a variable, the higher its entropy • Guassian variables have the highest entropy ICA Theory • Independent Components are found by estimating the un-mixing matrix that maximizes non-gaussianity • Extracting one independent at a time is known as Projection Pursuit • Extracting all of the components at once is true Independent Component Analysis • Preprocessing • Preprocessing ICA data will simplify computations • PCA is often used to center and whiten the data ICA Theory • Ambiguities of ICA • ICA can not extract the exact sources because the un-mixing matrix is an estimate • The structure of the components will be the same • Extracted Components • Can be out of phase (inverted) • Can be scaled (amplitude) • Order of components is not known FastICA Algorithm • Developed at Laboratory of Information and CS in Helsinki • Most popular ICA algorithm in use • Consists of 4 basic steps • Take a random initial vector w(0) of norm 1, and let k = 1 • Let w(k ) E{x( w(k 1) x) } 3w(k 1). The expectation can T 3 be estimated using a large sample of x vectors (~1,000 points) • Let w(k ) w(k ) W W T w(k ) • If | w(k )T w(k 1) | is not close enough to 1, let k = k + 1 and go back to step 2. Otherwise output the vector w(k). Temporal vs. Spatial ICA • Temporal ICA • Extracts underlying source signals for each point in time, and only one point in space • Each of the extracted components will be mutually independent across time • Cocktail party problem • Spatial ICA • Extracts signals for each point in space, at one point in time • Determines underlying spatial source signals for each space at all times • Transposed input Previous Work • Towards Cognitive Component Analysis [16] • Demonstrates that different genres of music exhibit signs of independence • Used MFCC’s for low- level features, and PCA for high-level features • Suggested ICA as a means of distinction between musical genres Previous Work • Using this work and the suggested research into using ICA • Generated a process for demonstrating acoustic separation using ICA • Process was tested with musical genres and spoken world languages Process Implementation • Get a representative sample of the data • Random 10sec samples were determined to be enough for genre classification[20] • 10 seconds to determine the class of audio information is intuitive, as the average human can classify acoustic information almost instantly • Because we’re not using spatial information, the samples are mono, not stereo Process Implementation • Extract low-level features (Preprocessing) • Generate 13 Mel-Frequency Cepstral Coefficients for each signal sample • Sparsify the MFCC’s to make them independent of pitch, and only retain the most significant features • The top 5% magnitude coefficients were kept, 0 otherwise Process Implementation • Grouping the information for the most relevant extraction of independent source signals • The low-level features of each sample were grouped by class to be passed to the fastICA algorithm • This process that will show separation of components across different groups, and the similarities of components within a group Process Implementation • Extract high-level patterns in those features • Groups of each MFCC for all of the samples within a group are then passed through ICA • The output is a smaller number of independent temporal features that represent the behavior of that MFCC for that group of acoustic data Testing • Test Tones • Sound system test sounds to establish how variances in frequency and amplitude effect results • Musical Genres • 40 tracks, 10 tracks in 4 different genres • Heavy Metal, Classical, Jazz, Electronica • World Languages • 10 tracks, 5 speakers in 2 different languages • Hindi, and English (all native speakers) • Both PCA and ICA plots were generated Results (Test Tones) Ramp Burst Ramp Ramp Results (Genres - PCA) Results (Genres - ICA) Results (Languages - PCA) Results (Languages - ICA) Conclusions • This process is good for determining the characteristic set of components for classes of acoustic information • Computationally efficient (Avg. run time for genres = 21 seconds) • Demonstrates clear separation of classes of acoustic information • The separation of musical genres, explored in previous work, can be extended to other acoustic data (world languages) Future Work • Clustering / Classifying • Advanced, multi-dimensional clustering algorithms and classifiers could be applied to automatically separate the groups • New Acoustic Data / Updating • A full system based on this technique could be set up to be updated with new information • Once a new piece of data is classified / clustered, the new, larger group’s characteristic components could be recalculated quickly. Questions / Comments References / Sources • See Thesis Bibliography!