Employee Regularized Form

Document Sample
Employee Regularized Form Powered By Docstoc
					Graph-based Analysis of Semantic Drift in
  Espresso-like Bootstrapping Algorithms
  Mamoru Komachi†, Taku Kudo‡, Masashi Shimbo† and Yuji Matsumoto†
                     †Nara Institute of Science and Technology, Japan
                                                        ‡Google Inc.




                                                      EMNLP 2008



                                                     2011/1/31
Bootstrapping and semantic drift
       Bootstrapping grows small amount of seed instances
        Seed instance          Co-occurrence pattern         Extracted instance


                                                         MacBook Air
               iPhone
                               Buy # at Apple Store
                                                             iPod

                                                             car
                iPod                # for sale

                                                          house

                                   Generic patterns
               = Patterns which co-occur with many (irrelevant) instances
    2                                                         1/31/2011
Main contributions of this work
1.   Suggest a parallel between semantic drift in Espresso-
     like bootstrapping and topic drift in HITS (Kleinberg,
     1999)

2.   Solve semantic drift by graph kernels used in link
     analysis community




 3                                            1/31/2011
Espresso Algorithm [Pantel and Pennacchiotti, 2006]

       Repeat
           Pattern extraction
           Pattern ranking
           Pattern selection
           Instance extraction
           Instance ranking
           Instance selection
       Until a stopping criterion is met



    4                                       1/31/2011
Espresso uses pattern-instance matrix A
for ranking patterns and instances
|P|×|I|-dimensional matrix holding the (normalized)
pointwise mutual information (pmi) between patterns and
instances
                                     instance indices
                             1   2      ...    i        ...      |I|
                        1
                        2
                         :
     pattern indices
                        p
                         :
                       |P|
                                       [A]p,i = pmi(p,i) / maxp,i pmi(p,i)


 5                                                            1/31/2011
Pattern/instance ranking in Espresso
p = pattern score vector Reliable instances are supported by
i = instance score vector reliable patterns, and vice versa
A = pattern-instance matrix

                1
             p  Ai ... pattern ranking
                I
                1 T
             i  A p ... instance ranking
                P
     
 |P| = number of patterns    normalization factors
      number of instances
 |I| =                     to keep score vectors not too large


 6                                                1/31/2011
Espresso Algorithm [Pantel and Pennacchiotti, 2006]

       Repeat
           Pattern extraction
           Pattern ranking         For graph-theoretic analysis,
           Pattern selection     we will introduce 3 simplifications
           Instance extraction               to Espresso
           Instance ranking
           Instance selection
       Until a stopping criterion is met



    7                                              1/31/2011
First simplification
       Compute the pattern-instance matrix
       Repeat
           Pattern extraction    Simplification 1
                                  Remove pattern/instance extraction
           Pattern ranking
                                  steps
           Pattern selection
           Instance extraction   Instead,
                                  pre-compute all patterns and instances
           Instance ranking
                                  once in the beginning of the algorithm
           Instance selection
       Until a stopping criterion is met



    8                                                    1/31/2011
Second simplification
       Compute the pattern-instance matrix
       Repeat          Simplification 2
                              Remove pattern/instance selection steps
           Pattern ranking   which retain only highest scoring k patterns
           Pattern selection / m instances for the next iteration
                              i.e., reset the scores of other items to 0
           Instance ranking
                               Instead,
           Instance selection retain scores of all patterns and instances
       Until a stopping criterion is met



    9                                                    1/31/2011
Third simplification
        Compute the pattern-instance matrix
        Repeat
                                   Simplification 3
         Pattern ranking          No early stopping
                                   i.e., run until convergence


         Instance ranking

    Until a stopping criterion met
     Until score vectors p and iisconverge



    10                                            1/31/2011
Simplified Espresso
Input
   Initial score vector of seed instances i  0,1,0,0
   Pattern-instance co-occurrence matrix A

Main loop
 Repeat p  1 Ai                
                        ... pattern ranking
               I
               1
            i  AT p ... instance ranking
               P
  Until
             i and p converge

Output
  
 Instance and pattern score vectors i and p

 11                                               1/31/2011
Simplified Espresso is HITS
Simplified Espresso = HITS in a bipartite graph whose
                      adjacency matrix is A

Problem The ranking vector i tends to the principal
            eigenvector of ATA as the iteration proceeds
            regardless of the seed instances!


No  matter which seed you start with, the same instance is always
ranked topmost
Semantic drift (also called topic drift in HITS)



 12                                              1/31/2011
How about the original Espresso?
Original Espresso has heuristics not present in Simplified
 Espresso

     Early stopping
     Pattern and instance selection

Do these heuristics really help reduce semantic drift?




 13                                          1/31/2011
      Experiments on semantic drift

     Does the heuristics in original Espresso help reduce drift?



14                                               1/31/2011
 Word sense disambiguation task of
 Senseval-3 English Lexical Sample

 Predict the sense of “bank”

… the financial benefits of the bank (finance) 's
employee package ( cheap mortgages and pensions,
etc ) , bring this up to …
                                                           Training instances are
                                                         annotated with their sense
In that same year I was posted to South Shields on the
south bank (bank of the river) of the River Tyne and
quickly became aware that I had an enormous burden

                                                         Predict the sense of target
Possibly aligned to water a sort of bank(???) by a          word in the test set
rushing river.




   15                                                      1/31/2011
Word sense disambiguation by Original
Espresso



           Seed instance



Seed instance = the instance to predict its sense
System output = k-nearest neighbor (k=3)
Heuristics of Espresso
 Pattern and instance selection
   # of patterns to retain p=20 (increase p by 1 on each iteration)
   # of instance to retain m=100 (increase m by 100 on each iteration)
    Early stopping
    16                                               1/31/2011
     Convergence process of Espresso
          1
                                    Heuristics in Espresso helps reducing semantic drift
                                (However, early stopping is required for optimal performance)
         0.9

                                                                           Original Espresso
         0.8
Recall




                                     Simplified Espresso

         0.7
                   Most frequent sense (baseline)

         0.6                Output the most                       Semantic drift occurs (always
                            frequent sense                      outputs the most frequent sense)
                            regardless of input
         0.5
               1            6             11               16           21            26
         17                                       Iteration               1/31/2011
     Learning curve of Original Espresso:
     per-sense breakdown
          1
                                              Most frequent sense
         0.9
         0.8           # of most frequent sense predictions increases
Recall




         0.7
         0.6                       Recall for infrequent senses worsens
                                       even with original Espresso
         0.5
         0.4                                           Other senses

         0.3
               1   6          11         16           21              26
         18                        Iteration           1/31/2011
Summary: Espresso and semantic drift
Semantic drift happens because
 Espresso is designed like HITS
 HITS gives the same ranking list regardless of seeds


Some heuristics reduce semantic drift
 Early stopping is crucial for optimal performance


Still, these heuristics require
 many parameters to be calibrated
 but calibration is difficult

 19                                         1/31/2011
Main contributions of this work
1.    Suggest a parallel between semantic drift in Espresso-
      like bootstrapping and topic drift in HITS (Kleinberg,
      1999)

2.    Solve semantic drift by graph kernels used in link
      analysis community




 20                                            1/31/2011
Q. What caused drift in Espresso?
A. Espresso's resemblance to HITS

HITS is an importance computation method
(gives a single ranking list for any seeds)

Why not use a method for another type of link analysis
   measure - which takes seeds into account?
   "relatedness" measure
(it gives different rankings for different seeds)


 21                                           1/31/2011
The regularized Laplacian kernel
    A relatedness measure
    Takes higher-order relations into account
    Has only one parameter

            Graph Laplacian    L  D A      A: adjacency matrix of the graph
                                             D: (diagonal) degree matrix
                                      

Regularized Laplacian matrix
                             
                               R   L I 
                                   )1
                                 () ( L
                                   n   n
                                                           
                                     
                                     n0

                       β: parameter
                       Each column of Rβ gives the rankings relative to a node

                 
    22                                                  1/31/2011
Evaluation of regularized Laplacian

             Comparison to (Agirre et al. 2006)



23                                1/31/2011
WSD on all nouns in Senseval-3
       Espresso needs optimal stopping to achieve
               an equivalent performance

      algorithm                         F measure
      Most frequent sense (baseline)       54.5
      HyperLex                             64.6
      PageRank                             64.6
      Simplified Espresso                  44.1
      Espresso (after convergence)         46.9
      Espresso (optimal stopping)          66.5
      Regularized Laplacian (β=10-2)       67.1

        Outperforms other graph-based methods
24                                         1/31/2011
Conclusions

    Semantic drift in Espresso is a parallel form of topic drift
     in HITS

    The regularized Laplacian reduces semantic drift
        inherently a relatedness measure ( importance measure)




    25                                             1/31/2011
Future work
    Investigate if a similar analysis is applicable to a wider
     class of bootstrapping algorithms (including co-training)




    Try other popular tasks of bootstrapping such as named
     entity extraction
        Selection of seed instances matters




    26                                           1/31/2011
27   1/31/2011
Label prediction of “bank” (F measure)
                              Espresso suffers from semantic drift
                               (unless stopped at optimal stage)

Algorithm                              Most frequent sense Other senses
Simplified Espresso                                  100.0            0.0
Espresso (after convergence)                         100.0           30.2
Espresso (optimal stopping)                            94.4          67.4
Regularized Laplacian (β=10-2)                         92.1          62.8


                                 The regularized Laplacian keeps
                                 high recall for infrequent senses

  28                                                  1/31/2011
Regularized Laplacian is mostly stable
across a parameter




29                             1/31/2011
 Pattern/instance ranking in Espresso
 Score for pattern p            Score for instance i


                  ( p              ( p
                                       pmi
                                         i )
                                          ,     
                  pmi
                 
                     ,
                    i )
               pmi  ri
                         (
                         )          pmi )
                                      
                                      
                                            r( 
                                             p
                                                
                 
              i I max             p Pmax
                                     
           ()
          rp 
                               ri 
                                 (
                                 )
                    I                     P

                                 p: pattern
                                 i: instance

               p
                 ,,
                 xy              P: set of patterns
            , 
          pmi
           (plog
           i)                  I: set of instances
                ,*,,*
                  *,
                xy p             pmi: pointwise mutual information
                                 max pmi: max of pmi in all the patterns and
                                         instances




     30                                             08.10.24

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:624
posted:1/31/2011
language:English
pages:30
Description: Employee Regularized Form document sample