Document Sample
defense Powered By Docstoc
					Acoustic Classification Using
Independent Component Analysis

James Brock
May 17, 2006
                   Scene Aspect Recognition

    Music Database Organization
• Digital Media Organization
  • Online music databases categorization
  • Content-based querying
  • Automatic generation of meta-data

• Scene Aspect Recognition
  • World Language Recognition
• How to determine the class of acoustic
  •   Get a representative sample of the data
  •   Extract low-level features
  •   Extract high-level patterns in those features
  •   Separate characteristic patterns of different groups
      of acoustic information
• Low-level feature extraction
  • Mel-frequency Cepstral Coefficients
• High-level pattern extraction
  • Principal Component Analysis
  • Independent Component Analysis
• Previous Work
• My Implementation
• Conclusions / Future Work
Auditory Cognition
• Low-level auditory cognition
  • Outer, middle, inner ear
  • Cochlea converts sound waves to electrical signals
    interpretable by the auditory cortex
  • Humans can distinguish low frequency sound with
    much better definition then high frequency sound
Auditory Cognition

• How do humans interpret
  higher-level characteristics
  of sound?
   • “What” & “Where” channels
   • “Where” interprets spatial
   • “What” interprets frequency &
     temporal information
Low-level Acoustic Processing
• The simplest, known functions of the human
  auditory system
  • Occur when the cochlea converts sound waves to
    neurological signals
  • Distinguish frequency & amplitude for a single
    instance of time
  • Pass that information to higher-level cognitive
    processes (Auditory Cortex)
Mel-Frequency Cepstral Coefficients

• MFCC’s have been widely used in audio
  processing to extract low-level features
  • Extremely popular in speech recognition
• Based on the Mel-scale of band-pass filter
  banks & Discrete Fourier Transform (DFT)
• Produces a set of n coefficients for each
  window of time, corresponding to relative
  frequency response
  • n is typically 13, but can be whatever is useful
MFCC’s (cont’d)
MFCC’s (cont’d)
• The signal is broken down into short,
  overlapping windows of time
  • Windows represent frequency and amplitude
    information for an instant of the audio signal
  • Overlap ensures that no features are missed

• FFT extracts the frequency response of 256
  component frequencies for each window
MFCC’s (cont’d)
• Number of coefficients, n, determines the number of
  band-pass filters produced
• Filters are used to smooth the frequency response
  across wider frequency ranges
MFCC’s (cont’d)
• The log of the Discrete Cosine Transform gives the
  coefficients for a frame of data
• The coefficients will depict the relative response of the
  band-pass filter ranges
• Features of the audio signal in the frequency domain
MFCC’s (cont’d)
• The values in the output matrix of MFCC’s will be
  larger for the frequency ranges that had the most
  influence over the audio signal at that point in time.
The Next Step
• Take the extracted low-level features and try to
  model the higher-level acoustic processing of
  the auditory cortex

• Extracting and separating temporal patterns
  and features from the simple frequency domain
Principal Component Analysis

• Linear Statistical Transform for capturing the
  variance in a set of data
• Extracts structure (patterns) in the relationships
  between variables
• Used to reduce dimensionality of data,
  summarizing the important parts (features,
PCA Background
• Based on statistical and matrix algebra
  • Standard deviation

  • Variance

  • Both are measures of the spread of the data set
    from the mean (1-dimensional)
PCA Background
• Covariance
  •   Represents how much each dimension of data varies from
      the mean with respect to other dimensions (n-dimensional)

• Cov > 0: positive correlation
• Cov < 0: negative correlation
• Cov = 0: statistical independence
PCA Background
• Covariance Matrix

• Symmetrical around the main diagonal
• Values on the main diagonal are that
  dimension’s variance
PCA Background
• Eigenvectors
  •   a.k.a. latent vectors, characteristic vectors, latent
  •   Eigenvectors only exist for square matrices (n x n matrix 
      n eigenvectors, covariance matrices are always square)
  •   Represent the vectors that capture the most variance in the
  •   Orthogonal in n dimensions
• Eigenvalues
  •   Each value corresponds to an eigenvector
  •   Larger eigenvalues indicate which dimensions have the
      strongest correlation
Eigenvector & Eigenvalue Calculation
•   Compute the eigenvalue matrix D and the eigenvector matrix V of the
    covariance matrix C
                            C V  V  D
•   Matrix D will take the form of an M × M diagonal matrix, where

                 D[ p, q]  m              for    pqm
       is the mth eigenvalue of the covariance matrix C, and

                      D[ p, q]  0    for         pq
•   Matrix V contains M column vectors, which represent the eigenvectors of
    the covariance matrix C.
Example PCA Data
Example PCA Data

     Original data   Normalized data with
                     1st two Eigenvectors
PCA Process
1. Collect Data
            X     2.5   0.5    2.2      1.9     3.1   2.3    2.0   1.0    1.5    1.1

            Y     2.4   0.7    2.9      2.2     3.0   2.7    1.6   1.1    1.6    0.9

2. Subtract the mean from each dimension of
 X   0.69       -1.31   0.39     0.09         1.29    0.49     0.19      -0.81   -0.31   -0.71

 Y   0.49       -1.21   0.99     0.29         1.09    0.79     -0.31     -0.81   -0.31   -1.01

3. Calculate the covariance matrix
PCA Process
4. Calculate the eigenvectors and eigenvalues of
   the covariance matrix
PCA Process
5. Choose components to form a feature vector
PCA Applications
• Feature extraction
  • Scene Images
  • Face Images (eigenfaces)
  • Audio

• Dimension reduction
  • Modeling multi-dimensional data by a smaller
    number of principal components
  • Simpler clustering of observations
PCA Example: Eigenfaces
1. Collect data: face images
                       img1 = [row1, row2, row3, … , row n-1, row n];

2. Organize data into rows
    imgData = [img1; img2; img3; … ; img n-1; img n];

  Each row (image) is a set of
   dimensional observations
PCA Example: Eigenfaces
3. Run the images through your favorite PCA
     Output will be ghost-like faces representing the variance in
      facial features of the original images
PCA Example: Eigenfaces
• Each original image is a linear combination of
• The images can be almost completely reconstructed
  with a smaller number of eigenfaces then original
PCA Example: Eigenfaces
Independent Component Analysis

• A more powerful statistical transform
• A special case of blind source separation
• Used to solve the cocktail party problem
  • Where in the ideal case there are m microphones
    recording m physical processes producing sound
  • The original signals are extracted from the signal
    mixtures of the microphones
     Source 1
                Mixture 1

                                Output 1
Source 2
                Mixture 2   I
                            C   Output 2

                                Output 3
                Mixture 2
     Source 3
• PCA extracts components representing
  data variance between observations
  • Linear statistical transform
  • Components are uncorrelated
• ICA extracts statistically independent
  components (non-gaussian variables)
  • Non-linear statistical transform
  • Components are independent
ICA Theory
• Based off two equations
             x = As          s = Wx
x  signal mixtures      A  actual mixing matrix
s  source signals       W  estimated unmixing matrix

• Assuming that all of the s are from different
  physical processes, they will be independent
• Goal of ICA: Estimate W that maximizes
  statistical independence for all s
ICA Example

         Source signals

                           Signal Mixtures

       Extracted sources
ICA Theory
• Independence
  • Two variables are independent if the value of one
    variable provides absolutely no information about
    the value of the other variable
• Non-gaussianity
  • Equivalent to independence
  • If an estimated source is perfectly non-gaussian,
    then it is independent
• Measures of Non-gaussianity
  • Kurtosis
  • Negentropy
ICA Theory
• Kurtosis
  •   Normalized version of
      the 4th moment
  •   4th moment for Gaussian
      variable = 3({ y 2 }) 2
• Negentropy
  •   Basic principle of
      information theory
  •   The more unpredictable
      a variable, the higher its
  •   Guassian variables have
      the highest entropy
ICA Theory
• Independent Components are found by
  estimating the un-mixing matrix that maximizes
  • Extracting one independent at a time is known as
    Projection Pursuit
  • Extracting all of the components at once is true
    Independent Component Analysis
• Preprocessing
  • Preprocessing ICA data will simplify computations
  • PCA is often used to center and whiten the
ICA Theory
• Ambiguities of ICA
  • ICA can not extract the exact sources because the
    un-mixing matrix is an estimate
  • The structure of the components will be the same

• Extracted Components
  • Can be out of phase (inverted)
  • Can be scaled (amplitude)
  • Order of components is not known
FastICA Algorithm
• Developed at Laboratory of Information and CS in
• Most popular ICA algorithm in use
• Consists of 4 basic steps
  • Take a random initial vector w(0) of norm 1, and let k = 1
  • Let w(k )  E{x( w(k  1) x) }  3w(k  1). The expectation can
                                  T   3

    be estimated using a large sample of x vectors (~1,000
  • Let w(k )  w(k )  W W T w(k )
  • If | w(k )T w(k  1) | is not close enough to 1, let k = k + 1 and go
    back to step 2. Otherwise output the vector w(k).
Temporal vs. Spatial ICA
• Temporal ICA
  • Extracts underlying source signals for each point in
    time, and only one point in space
  • Each of the extracted components will be mutually
    independent across time
  • Cocktail party problem
• Spatial ICA
  • Extracts signals for each point in space, at one
    point in time
  • Determines underlying spatial source signals for
    each space at all times
  • Transposed input
Previous Work
• Towards Cognitive
  Component Analysis [16]
  • Demonstrates that
    different genres of music
    exhibit signs of
  • Used MFCC’s for low-
    level features, and PCA
    for high-level features
  • Suggested ICA as a
    means of distinction
    between musical genres
Previous Work
• Using this work and the suggested research
  into using ICA
  • Generated a process for demonstrating acoustic
    separation using ICA
  • Process was tested with musical genres and
    spoken world languages
Process Implementation
• Get a representative sample of the data
  • Random 10sec samples were determined to be
    enough for genre classification[20]

  • 10 seconds to determine the class of audio
    information is intuitive, as the average human can
    classify acoustic information almost instantly

  • Because we’re not using spatial information, the
    samples are mono, not stereo
Process Implementation
• Extract low-level features (Preprocessing)
  • Generate 13 Mel-Frequency Cepstral Coefficients for each
    signal sample
  • Sparsify the MFCC’s to make them independent of pitch,
    and only retain the most significant features
  • The top 5% magnitude coefficients were kept, 0 otherwise
Process Implementation
• Grouping the information for the most relevant
  extraction of independent source signals
  • The low-level features of each sample were grouped by
    class to be passed to the fastICA algorithm
  • This process that will show separation of components across
    different groups, and the similarities of components within a
Process Implementation
• Extract high-level patterns in those features
  • Groups of each MFCC for all of the samples within a group
    are then passed through ICA
  • The output is a smaller number of independent temporal
    features that represent the behavior of that MFCC for that
    group of acoustic data
• Test Tones
  • Sound system test sounds to establish how
    variances in frequency and amplitude effect results
• Musical Genres
  • 40 tracks, 10 tracks in 4 different genres
     • Heavy Metal, Classical, Jazz, Electronica
• World Languages
  • 10 tracks, 5 speakers in 2 different languages
     • Hindi, and English (all native speakers)
• Both PCA and ICA plots were generated
Results (Test Tones)

Results (Genres - PCA)
Results (Genres - ICA)
Results (Languages - PCA)
Results (Languages - ICA)
• This process is good for determining the
  characteristic set of components for classes of
  acoustic information
  • Computationally efficient (Avg. run time for genres =
    21 seconds)
  • Demonstrates clear separation of classes of
    acoustic information
  • The separation of musical genres, explored in
    previous work, can be extended to other acoustic
    data (world languages)
Future Work
• Clustering / Classifying
  • Advanced, multi-dimensional clustering algorithms
    and classifiers could be applied to automatically
    separate the groups

• New Acoustic Data / Updating
  • A full system based on this technique could be set
    up to be updated with new information
  • Once a new piece of data is classified / clustered,
    the new, larger group’s characteristic components
    could be recalculated quickly.
Questions / Comments
References / Sources
• See Thesis Bibliography!