handwriting by cuiliqing

VIEWS: 51 PAGES: 18

									                On-Line Handwriting
                    Recognition
• Transducer device (digitizer)
• Input: sequence of point coordinates with
  pen-down/up signals from the digitizer
• Stroke: sequence of points from pen-down
  to pen-up signals
• Word: sequence of one or more strokes.

March 15-17, 2002   Work with student Jong Oh   Davi Geiger, Courant Institute, NYU
                    System Overview
                                                     Pre-processing
             Input                               (high curvature points)


          Dictionary                                   Segmentation


 Character Recognizer                             Recognition Engine


      Context Models
                                                  Word Candidates

March 15-17, 2002    Work with student Jong Oh   Davi Geiger, Courant Institute, NYU
          Segmentation Hypotheses
• High-curvature points and segmentation
  points:




March 15-17, 2002   Work with student Jong Oh   Davi Geiger, Courant Institute, NYU
               Character Recognition I
• Fisher Discriminant Analysis (FDA): improves over PCA
  (Principal Component Analysis).


                                    p=WTx


                                    Linear
                                    projec-
         Original space              tion             Projection space
  • Training set: 1040 lowercase letters, Test set: 520 lowercase letters
  • Test results: 91.5% correct
  March 15-17, 2002   Work with student Jong Oh   Davi Geiger, Courant Institute, NYU
      Fisher Discriminant Analysis
• Between-class scatter matrix

               B  i 1 Ni (μi  μ) (μi  μ)
                              C                                        T


     – C: number of classes
     – Ni: number of data vectors in class i
     – i: mean vector of class i and : mean vector

• Within-class scatter matrix
          W  i 1  j 1 ( v ij  μ i ) ( v ij  μ i ) T
                          C          Ni


     – vji: j-th data vector of class i.
March 15-17, 2002     Work with student Jong Oh   Davi Geiger, Courant Institute, NYU
Given a projection matrix W (of size n by m) and its
linear transformation  p  W T v , the between-class scatter
in the projection space is
          
              C
ΨB           i 1
                   N i (μ i ' μ' ) (μ i ' μ' ) T


          
              C
             i 1
                   N i ( W T μ i  W T μ) ( W T μ i  W T μ) T


          
              C
             i 1
                   N i ( W T μ i  W T μ) (μ iT W  μ T W)


          i 1
              C
               W T N i (μ i  μ) (μ iT  μ T ) W


       WT           C
                       i 1
                                                       
                              N i (μ i  μ) (μ i  μ) T W
                                                                               Similarly
                                                                                Ψ W  W T ΦW W
       W ΦB WT



 March 15-17, 2002             Work with student Jong Oh    Davi Geiger, Courant Institute, NYU
Fisher Discriminant Analysis (cont.)
 • Optimization formulation of the fisher
   projection solution: (YB, YW are scatter
   matrices in projection space)

                                      
                                                 YB 
                                                     
                     Wopt    arg max                
                                  W   
                                                 Yw 
                                      
                                                 WT  B W 
                                                           
                             arg max                     
                                 W    
                                                 W  wW 
                                                    T
                                                           
 March 15-17, 2002    Work with student Jong Oh   Davi Geiger, Courant Institute, NYU
                    FDA (continued)
• Construction of the Fisher projection matrix:
     – Compute the n eigenvalues and eigenvectors of
       the generalized eigenvalue problem:
                          S B y   S w y.

     – Retain the m eigenvectors having the largest
       eigenvalues. They form the columns of the
       target projection matrix.
March 15-17, 2002   Work with student Jong Oh   Davi Geiger, Courant Institute, NYU
    Character Recognition Results
• Training set: 1040 lowercase letters
• Test set: 520 lowercase letters
• Test results:
                            FCM                   ECV
                     Recognition
                                                 91.5%
                        rate
                          Avg.
                        candidate                  13.6
                         set size

March 15-17, 2002   Work with student Jong Oh   Davi Geiger, Courant Institute, NYU
                              Challenge I
• The problem of the previous approach is: non-characters are
  classified as characters. When applied to cursive words it creates
  several/too many non-sense word hypothesis by extracting
  characters where they don’t seem to exist.
• More generally, one wants to be able to generate shapes and
  their deformations.




   March 15-17, 2002   Work with student Jong Oh   Davi Geiger, Courant Institute, NYU
                       Challenge II
•How to extract reliable local geometric features of images
(corners, contour tangents, contour curvature, …) ?
•How to group them ?
•Large size data base to match one input, how to do it fast ?
•Hierarchical clustering of the database, possibly over a
tree structure or some general graph. How to do it ? Which
criteria to cluster ? Which methods to use it ?




March 15-17, 2002   Work with student Jong Oh   Davi Geiger, Courant Institute, NYU
                    Recognition Engine
• Integrates all available information,
  generates and grows the word-level
  hypotheses.
• Most general form: graph and its search.
• Hypothesis Propagation Network




March 15-17, 2002    Work with student Jong Oh   Davi Geiger, Courant Institute, NYU
Hypothesis Propagation Network
Recognition of 85% on 100 words (not good)
                                                     H (t, m)
                    Class m's legal
                     predecessors


          List length                                                                T



                                                                                     t

                                                                                Look-
                                                                                 back
                                                                 3             window
                                                             2                  range
                                                         1
             "a" "b”         m            "y"   "z"              Time

March 15-17, 2002        Work with student Jong Oh      Davi Geiger, Courant Institute, NYU
                       Challenge III
•How to search more efficiently in this network and more
generally on Bayesian networks ?




 March 15-17, 2002   Work with student Jong Oh   Davi Geiger, Courant Institute, NYU
    Visual Bigram Models (VBM)
• Some characters can be very ambiguous when isolated: “9”
  and “g”; “e” and “l”; “o” and “0”; etc, but more obvious
  when put in a context.


                                                                     “go”
                                      Relative
                                     height ratio
                                         and
                                     positioning
               Character                                               “90”
                heights
March 15-17, 2002   Work with student Jong Oh   Davi Geiger, Courant Institute, NYU
                    VBM: Parameters

          top1      top2                        • Height Diff. Ratio:

 h1                                                    HDR = (h1- h2) / h
                                                • Top Diff. Ratio:
           bot1                h2 h
                                                    TDR = (top1- top2) / h
                                                • Bottom Diff. Ratio:
                    bot2                            BDR = (bot1- bot2) / h


March 15-17, 2002   Work with student Jong Oh   Davi Geiger, Courant Institute, NYU
     VBM: Ascendancy Categories
          Category                               Members
         Ascender
                                             b, d, f, h, k, l, t
            (A)
         Descender
                                             f, g, j, p, q, y, z
            (D)
           None (N)             a, c, e, i, m, n, o, r, s, u, v, w, x


•Total 9 visual bigram categories (instead of 26x26=676).

 March 15-17, 2002   Work with student Jong Oh    Davi Geiger, Courant Institute, NYU
                    VBM: Test Results
           Using VBM                            Yes                   No
     Recognition rate                           93%                 85%
               Rank-1                            4                      8
               Rank-2                            1                      2
               Rank-3                            1                      2
               Rank-4                            1                      3
March 15-17, 2002   Work with student Jong Oh    Davi Geiger, Courant Institute, NYU

								
To top