Clap Detection and Discrimination for Rhythm Therapy by dfgh4bnmu

VIEWS: 6 PAGES: 17

									Clap Detection and Discrimination
      for Rhythm Therapy
                     Nathan Lesser & Dan Ellis
  Laboratory for Recognition and Organization of Speech and Audio
     Dept. Electrical Engineering, Columbia University, NY USA
                    {nathan,dpwe}@ee.columbia.edu


     1.    “Rhythm Therapy”
     2.    Clap Range Estimation
     3.    Experiments
     4.    Conclusions

    Clap Detection - Lesser & Ellis           2005-03-22   p. 1 /14
            1. “Rhythm Therapy”
• Rhythmic clapping may
   help neural development
     sensori-motor planning
     focus and attention
• “Interactive metronome”
   devices
     give feedback on synchrony
     sensor-based
• Classroom deployment?                     from interactivemetronome.com
     acoustic-based?
     for multiple simultaneous users??
 Clap Detection - Lesser & Ellis         2005-03-22   p. 2 /14
            Clap Discrimination
• Scenario:
   Many students in same classroom
   each clapping in time to their own laptop
     students wear headphones (but no sensor)
     computer hears neighbors
• Goal:
   Discriminate between ‘near-field’
   and ‘far-field’ claps
     ‘near-field’ = ~1 meter, on-axis
     ‘far-field’ = > 2 meters, maybe off-axis

 Clap Detection - Lesser & Ellis       2005-03-22   p. 3 /14
                  Data Collection
• Record isolated claps at various locations
   can superimpose them later...              Classroom Plan View


• Grid of seats:                                            Clapping
                                                        0 locations


     claps from locations 0..9
                                          1             2              3
     record at locations 5 & 9 only
• Multiple rooms                          4             5              6
     pilot: 1 room,
     2 x 5 claps/location                               5

                                          7             8              9
     main data: 2 (+2) rooms,
     1 x 50 farfield claps/location                                     9
                                                                   Recording

      + 300 nearfield claps/rec.loc.
                                                                   locations


           = 1500 claps/room                       Front of Room


 Clap Detection - Lesser & Ellis      2005-03-22      p. 4 /14
      2. Clap Range Estimation
• Task:
   Discriminate claps from in front of rig
   from all others (more distant)
     main perceptual cue to distance (range):
     direct-to-reverberant ratio (DRR)
     how to differentiate direct and reverb?
• Novel problem: Acoustic range estimation
     define correlates of DRR
     exploit properties of claps (wideband, compact)
     .. then just feed to classifier

 Clap Detection - Lesser & Ellis     2005-03-22   p. 5 /14
                         Clap Examples
                                                                           Near-field (327MUDD nf50:4)                          Far-field (327MUDD ff50:4)
                                                       0.5




                                amplitude
• Absolute level                                   0.25


                                                             0


  varies                                     -0.25




• Decay slopes                                         -0.5

                                                             0




                                   energy (4ms) / dB
    ~ same                                             -20




     reverberation
                                                       -40


                                                       -60


     (RT60 ~ 900ms)                                    -80




•
                                                freq / kHx




                                                         10


    Initial burst for                                        8

                                                             6


    near-field                                                4

                                                             2

     “direct sound”                                          0
                                                                 0   0.1     0.2   0.3   0.4   0.5   0.6   0.7        0   0.1   0.2   0.3   0.4   0.5   0.6     0.7
                                                                                                           time / s                                           time / s




       Clap Detection - Lesser & Ellis                                                                     2005-03-22           p. 6 /14
                                   Processing
• Detection → Features → Classifier
                                                                           !"#$%&'&(&
                                  12/0"&304                             )*#+&,%"%-"./0
                                                         &CDB=.<;B+        -.+*/01.*,230       45670O.<=L*+
     5#6&784./                     >=5/*,0?6.@0

                                                                              *45678
                                   A=.<;B=.2
                                                                           9:,*+:;5<=.2



    <2#.0.0$&>#?%*;


                                                !"#$%&''(&
                                              )#"%$/2.9#"./0
                                                                            <2#.0.0$=<%;"
              :%"&1%#"82%;
                   E5;7*0                                    9,6=.=.2             H)E4
                                                    9*+/
               4*./*,0;F0G6++                        ;,0                               G;<*5
                                                  9,6=.=.2
                                                              9*+/
                                                                          IF*6/J,*0K*L/;,M0C0N;<*5 5%;8*";
              4,;++04;,,*56/=;.




    !"#$#%&
 Clap Detection - Lesser & Ellis                     '()*++*,
                                                                             2005-03-22                 7
                                                                                                     p. ! /14
                  Clap Detection
• Simple transient detector
   limits feature calculation to ‘clap events’
                                                                            Far-field (327MUDD ff50:1-5)
                                                10




                                   freq / kHz
• Adjust threshold
                                                  8


                                                  6



   on Δ(Energy20ms)                               4




   to get desired
                                                  2


                                                   0


   number of claps
                                                  40




                                     ratio / dB
                                                  20




     known for our data
                                                   0

                                                       0   0.5   1   1.5      2      2.5      3      3.5     4   4.5   5




• Backup from maxima to find precise onset
                                                                                                                           time / s




     Fielded system will need to adapt threshold
     and reject non-claps
 Clap Detection - Lesser & Ellis                                           2005-03-22             p. 8 /14
                             Range Features
• Paper: Ctr. of Mass, Slope in 0..20 , 0..100ms
               Near-field (327MUDD nf50:4)                           Far-field (327MUDD ff50:4)
     CoM20ms
dB




     -20             CoM100ms
                                                           -40

   -40                        slope100ms
                                                           -60



•
 slope20ms

     -60
                                                           -80
           0         0.05             0.1           0.15         0        0.05        0.1               0.15
                                                                                             time / s




• New: Slope in 0..20ms , 20..100ms
       + Energy Ratio 0..20ms / 20..100ms
               Near-field (327MUDD nf50:4)                           Far-field (327MUDD ff50:4)
dB




     -20
                                                           -40
                     energy ratio
     -40                            slope20:100ms
                                                           -60

     -60
                                                           -80
           0         0.05             0.1           0.15         0        0.05        0.1               0.15
                                                                                             time / s



 Clap Detection - Lesser & Ellis                                            2005-03-22      p. 9 /14
             Range Feature Behavior
                                                                                             327MUDD loc 5


• Original 4 features
                                                6                                                    3
                                                         CoM 20                                               CoM 100
                                                4                                                    2

   good separation                              2                                                    1

    except CoM20                                0
                                                     0            2        4        6           8
                                                                                                     0
                                                                                                          0        0.5          1          1.5        2



• New features
                                               20                                                   20
                                                         slope 20                                             slope100




                                 2-4kHz band
                                                                                                     0
                                                0


   Eratio excellent
                                                                                                    -20
                                               -20
                                                                                                    -40


   slope20:100 useless...                      -40
                                                 -40
                                                30
                                                              -20          0        20         40
                                                                                                    -60
                                                                                                      -40
                                                                                                     10
                                                                                                                   -30         -20         -10        0




• Range estimation?
                                                         Eratio                                           slope 20:100
                                               20                                                    0

                                               10                                                   -10

   CoM20, slope20
                                                                                                                                                 p0
                                                0                                                   -20                                      p1 p2 p3
                                                                                                                                             p4 NF p6
   show promise
                                               -10                                                  -30
                                                     0        5       10       15       20     25     -30                -20         -10     p7 p8 p9
                                                                                              4-8kHz band

  (each plot shows 4-8 kHz band vs. 2-4 kHz band)

       Clap Detection - Lesser & Ellis                                                       2005-03-22            p. 10/14
                   3. Experiments
• Build and test actual near/far-field classifier
• Feature experiments
     quantitative feature comparison
     best combinations
• Data experiments
     training data: amount, locations
     test data: same/different room/location
• Regularized Least-Squares Classifier (RLSC)
     find a hyperplane in (expanded) feature space
     ~ simplified Support Vector Machine - no QP
 Clap Detection - Lesser & Ellis       2005-03-22   p. 11/14
                            Feature Comparisons
• Train on room 327Mudd; Test on 627Mudd
                                 Feature comparison: All 3 bands, train on all M327, test on all M627
                       50

                       40
 clap error rate / %




                       30

                       20

                       10

                        0
                             CoM_20    CoM_100          slo_20        slo_100     slo_20:100       Eratio



• Eratio alone (9/1500 = 0.6% errors) beats
                                                             feature set




                       best combination of rest:
                        (CoM20+ CoM100+ slo20 = 0.9% errors)
                       difference of ~0.5% required for signficance
  Clap Detection - Lesser & Ellis                                                    2005-03-22             p. 12/14
 Generalizing Location, Room
• Matrix of 2 rooms x 2 recording locations
                                          Test
        CER%
                           M627L5   M627L9 M327L5        M327L9
            M627L5            2.0    0.5      0.4           0.0
            M627L9            3.7    0.4      0.7           0.0
Train
            M327L5            1.5    0.5      0.4           0.0
            M327L9            0.1    0.7      0.4           0.0

     627Mudd loc5 is hard data; 327Mudd loc9 is easy!
     Cross-room (shaded) cases generalize better !?
     Plenty of data: 5 claps/loc (20%) just as good
 Clap Detection - Lesser & Ellis           2005-03-22   p. 13/14
                    4. Conclusions
• Discriminating isolated near- and far-field
  claps is feasible (use Eratio 0..20/20..100ms)
• Detection of candidate claps likely to limit
   accuracy in practice
     but have ‘rhythmic’ expectations...
• Applicability to general range estimation?
     Eratio relies on short-duration direct-sound
     ..but other sounds have clicks (e.g. speech bursts)
     CoM20, slope20 closer to proportional to range


 Clap Detection - Lesser & Ellis       2005-03-22   p. 14/14
               Azimuth Features
• Cross-correlation of L and R for azimuth:
                              ITD scatter vs. source (for MUDD327 pos 5)
                                     p0
                               p1 p2 p3
                               p4 NF p6
                               p7 p8 p9

                   2-4kHz

                            -0.5

                             -1

                            -1.5
                              -1.5    -1   -0.5      0   0.5   1    1.5
                                                  4-8kHz

     nearby locations distinguished - useful
     distant locations (p2) give random results
     needs nonlinear feature space expansion!

 Clap Detection - Lesser & Ellis                                   2005-03-22   p. 15/14
                                                     Error Analysis
        • 627Mudd (record loc 5) is the tough set;
                      look at classifier margins:                                                                                    a few solid
  false      2
                                                         m4[14:16] vs f627
                                                                                                                                   false rejects...
accepts      1



for loc 6                                                                                                                                                  ... really look like
             0

             -1



(ambiguous
             -2
                      50 51 52 53 54 56 57 58 59 5n 5n 5n 5n 5n 5n 90 91 92 93 94 95 96 97 98 9n 9n 9n 9n 9n
                                                         m6[14:16] vs f627                                                                                      far-field???
  Eratio)
             2                                                                                                                                                    Claps 33 and 34 from 627M:nf90
                                                                                                                                  1
             1                                                                                                                  0.5

             0                                                                                                                    0

                                                                                                                                -0.5
             -1                                                                                                                  -1


  0
                                                                                                                                       0       0.2   0.4    0.6       0.8       1       1.2        1.4   1.6   1.8   2
             -2
                      50 51 52 53 54 56 57 58 59 5n 5n 5n 5n 5n 5n 90 91 92 93 94 95 96 97 98 9n 9n 9n 9n 9n                     20



 123         2                                                                                                                    0




 456         1                                                                                                                  -20


             0                                                                                                                  -40


 789
                                                                                                                                       0       0.2   0.4    0.6       0.8       1       1.2        1.4   1.6   1.8   2
                                                                                                                                           4
                                                                                                                                       x 10
             -1                                                                                                                   2




                                                                                                                    Frequency
                                                                                                                                1.5
             -2
                  0         5       10       15       20        25       30      35       40       45          50                 1

                                                                                                                                0.5

                                                                                                                                  0
                                                                                                                                       0       0.2   0.4    0.6       0.8       1        1.2       1.4   1.6   1.8
                                                                                                                                                                              Time




             Clap Detection - Lesser & Ellis                                                                                2005-03-22                              p. 16/14
  Usefulness of Each Position
• Train on 50 near-field claps + 50 far-field
   claps from a single location:
                                      Location comparison (Erat ftrs): train M327L5 one loc, test on all M627L5
                             7

                             6
       clap error rate / %




                             5

                             4

                             3

                             2

                             1

                             0
                                 p0     p1       p2       p3         p4       p6       p7      p8        p9       all
                                                        Far field training examples location



     all recorded at location 5                                                                              p0
                                                                                                          p1 p2 p3
     ‘behind’ (p7-p9) less useful                                                                         p4 p5 p6
     right-side (p3, p6) most useful !?                                                                   p7 p8 p9

 Clap Detection - Lesser & Ellis                                                                 2005-03-22             p. 17/14

								
To top