AG raph-based Algorithm for Inducing Lexical Taxonomies from Scratch by 5D65z6J

VIEWS: 5 PAGES: 31

									A Graph-based Algorithm for Inducing Lexical Taxonomies
from Scratch
Roberto Navigli, Paola Velardi and Stefano Faralli
{navigli,velardi,faralli}@di.uniroma1.it




                     http://lcl.uniroma1.it

     ERC StG: Multilingual Joint Word Sense Disambiguation (MultiJEDI)   1
     Roberto Navigli
                                   Motivations


We present a graph-based approach to learn a
lexical taxonomy automatically starting from a domain
corpus and the Web.

Unlike other approaches, we learn both concepts
and relations entirely from scratch in 3 steps:
1) term extraction
2) definition and hypernym extraction
3) graph pruning



A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch
Roberto Navigli, Paola Velardi and Stefano Faralli
                   Taxonomy Learning Workflow
                                          Web glossaries &
                                            documents



                    Domain                                      Upper
                     terms                                      terms


Domain
Corpus


                                         Definition
                 Terminology                                 Domain           Graph
                                        & hypernym
                  extraction                                 filtering       pruning
                                         extraction




                                               Hypernym graph
                                                                         Induced taxonomy


  A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch
  Roberto Navigli, Paola Velardi and Stefano Faralli
                   Taxonomy Learning Workflow
                                          Web glossaries &
                                            documents



                    Domain                                      Upper
                     terms                                      terms


Domain
Corpus


                                         Definition
                 Terminology                                 Domain           Graph
                                        & hypernym
                  extraction                                 filtering       pruning
                                         extraction




                                               Hypernym graph
                                                                         Induced taxonomy


  A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch
  Roberto Navigli, Paola Velardi and Stefano Faralli
                      Terminology Extraction

                                                                                   maximum
                                                                                   likelihood


                                                     flow network
                                                                        mesh
                                                                      generation




                                                      hash function             pattern
                                                                              recognition



                                                                              information
                                                                               processing



        Domain
        Corpus
                                                                    Domain
                                                                     terms


http://lcl.uniroma1.it/termextractor
A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch
Roberto Navigli, Paola Velardi and Stefano Faralli
                   Taxonomy Learning Workflow
                                          Web glossaries &
                                            documents



                    Domain                                      Upper
                     terms                                      terms


Domain
Corpus


                                         Definition
                 Terminology                                 Domain           Graph
                                        & hypernym
                  extraction                                 filtering       pruning
                                         extraction




                                               Hypernym graph
                                                                         Induced taxonomy

  A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch
  Roberto Navigli, Paola Velardi and Stefano Faralli
               Definition & Hypernym Extraction
                       + Domain Filtering

Domain Corpus                       Web glossaries &
                                      documents



          flow network
Domain                      definition extraction (WCL)
 terms

                                                                        domain   non domain
         In graph theory, a flow network is a directed graph.


         Global Cash Flow Network is a business opportunity to
         make money online.


         A flow network is a network with two distinguished vertices.




    A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch
    Roberto Navigli, Paola Velardi and Stefano Faralli
              Definition & Hypernym Extraction
                      + Domain Filtering

Domain Corpus                      Web glossaries &
                                     documents



         flow network
Domain                     definition extraction (WCL)
 terms


                                                                               directed graph    network
                 In graph theory, a flow network is a directed graph.

                A flow network is a network with two distinguished vertices.




                                                             directed graph

                               hypernym extraction                                      flow network
                                                                network




    A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch
    Roberto Navigli, Paola Velardi and Stefano Faralli
                 Definition & Hypernym Extraction
                         + Domain Filtering

 Domain Corpus                       Web glossaries &
                                       documents                                   graph         data structure

            directed graph
  Terms
   from                       definition extraction (WCL)
previous
iteration

                                                                                directed graph      network
                    A directed graph is a graph where ...

                    A directed graph is a data structure ...




                                                                   graph

                                  hypernym extraction                                      flow network
                                                               data structure



      A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch
      Roberto Navigli, Paola Velardi and Stefano Faralli
      Hypernym Extraction Algorithm (1)

• Large training set with many uncommon patterns “X is a ADJ
  term that refers to a kind of Y”
• Annotated with 4 fields: definiendum (D), definitor (V) containing
  the verbal pattern and definiens (H) containing the hypernym,
  and the rest of the sentence (R).
   – An <Albedo> (often represented by the generic formula HA)/ is
     traditionally considered / any chemical compound / that, when dissolved
     in water, gives a solution with a hydrogen ion activity greater than in pure
     water
• The algorithm builds a set of word lattices from the training set.
  Independent lattices are created for each of the 3 basic fields



   A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch
   Roberto Navigli, Paola Velardi and Stefano Faralli
      Hypernym Extraction Algorithm (2)

Lattice learning consists of three steps:
1.each sentence in the training set is pre-
  processed and each field is generalized to a star
  pattern
“[In arts, a chiaroscuro]D [is]V [a monochrome
   picture]H.”
D=“In *, a <TARGET>”, V=“is”, H=“a * <HYPER>”



  A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch
  Roberto Navigli, Paola Velardi and Stefano Faralli
   Hypernym Extraction Algorithm (3)

2. Clustering: for each field, the training sentences are
   then clustered according to the star patterns they
   belong to;

In arts, a chiaroscuro is a monochrome picture.
In mathematics, a graph is a data structure that consists of . . .
In computer science, a pixel is a dot that is part of a computer image.


D: In * , a <TARGET>
V: is
H: a * <HYPER>

A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch
Roberto Navigli, Paola Velardi and Stefano Faralli
     Hypernym Extraction Algorithm (4)

3. Word-Class Lattice construction: for each
  sentence cluster, a WCL is created by means of
  a greedy alignment algorithm




  A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch
  Roberto Navigli, Paola Velardi and Stefano Faralli
  Performance in definition extraction




              Wikipedia                                     UKWac corpus



   Outperforms existing methods for definition extraction




A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch
Roberto Navigli, Paola Velardi and Stefano Faralli
     Precision in hypernym extraction




                                             Wikipedia




                                              UKWac
  Pattern-based methods achieve much lower recall: 62 vs.
  383 hypernyms extracted from UKWac

A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch
Roberto Navigli, Paola Velardi and Stefano Faralli
                   The iterative
                     growth
                      of the
                 hypernym graph

A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch
Roberto Navigli, Paola Velardi and Stefano Faralli
                   The iterative
                     growth
                      of the
                 hypernym graph

A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch
Roberto Navigli, Paola Velardi and Stefano Faralli
                   Taxonomy Learning Workflow
                                          Web glossaries &
                                            documents



                    Domain                                      Upper
                     terms                                      terms


Domain
Corpus


                                         Definition
                 Terminology                                 Domain           Graph
                                        & hypernym
                  extraction                                 filtering       pruning
                                         extraction




                                               Hypernym graph
                                                                         Induced taxonomy

  A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch
  Roberto Navigli, Paola Velardi and Stefano Faralli
                                  Graph Pruning

                                                     Given the hypernym graph

                                                     1) We disconnect false roots
                                                        and false leaves.
                                                     2) We weight edges and
                                                        nodes with a novel weighting
                                                     algorithm.




A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch
Roberto Navigli, Paola Velardi and Stefano Faralli
                                             Graph Pruning

                         0           5                           Given the hypernym graph

                                     5   5           5           1) We disconnect false roots
                                                 2
                                                                    and false leaves.
                             3

                                                                 2) We weight edges and
                8            8       8                   7
                                                             7
                                                                    nodes with a novel weighting
1
                     2           1
                                                 2
                                                                 algorithm.
                10

                                                         9
    0
                                             1




        A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch
        Roberto Navigli, Paola Velardi and Stefano Faralli
                                             Graph Pruning

                         0           5                           Given the hypernym graph

                                     5   5           5           1) We disconnect false roots
                                                 2
                                                                    and false leaves.
                             3

                                                                 2) We weight edges and
                8            8       8                   7
                                                             7
                                                                    nodes with a novel weighting
1
                     2           1
                                                 2
                                                                 algorithm.
                10                                               3) We apply Chu-Liu/Edmond's
                                                         9          algorithm, to obtain an
    0                                                               Optimal Branching.
                                             1




        A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch
        Roberto Navigli, Paola Velardi and Stefano Faralli
                                  Graph Pruning

                                                     Given the hypernym graph

                                                     1) We disconnect false roots
                                                        and false leaves.
                                                     2) We weight edges and
                                                        nodes with a novel weighting
                                                     algorithm.
                                                     3) We apply Chu-Liu/Edmond's
                                                        algorithm, to obtain an
                                                        Optimal Branching.

                                                     As a result we obtain a
                                                     tree-like taxonomy.

A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch
Roberto Navigli, Paola Velardi and Stefano Faralli
                 From the Noisy Hypernym Graph...




A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch
Roberto Navigli, Paola Velardi and Stefano Faralli
           Application to ACL Taxonomy

• ACL Anthology from year 1979 to 2010 (4176
  papers).
• 29 upper terms from WordNet’s abstaction
• 10,000 terms extracted, first 2000 inspected,
  1006 selected (eliminated e.g. : word pair, input
  sentence, human judgement)
• 5 iterations, 1329 definitions, 1031 nodes 1274
  edges
• After pruning, 936 nodes 935 edges
  A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch
  Roberto Navigli, Paola Velardi and Stefano Faralli
                Evaluation (5 annotators)




      Another application of the algorithm starting from
      IJCAI collection, similar results



A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch
Roberto Navigli, Paola Velardi and Stefano Faralli
Evaluation: WordNet reconstruction




• Same evaluation strategy as in Kozareva&Hovy
  (EMNLP2010)
• Only nodes both in WordNet and in the acquired
  taxonomy are considered in the evaluation (as
  in K&H)
                                   Future work

• From “strict” taxonomy to lattice
• A in-house implementation of google “define” to
  overcome search limitations (no API for Google
  define)
• Extension to other languages




  A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch
  Roberto Navigli, Paola Velardi and Stefano Faralli
http://lcl.uniroma1.it
      April 2011
Initial terminology

Upper terms
Hypernyms from
iteration I
Hypernyms from
iteration II
Hypernyms from
iteration III
Hypernyms from
iteration IV
Hypernyms from
iteration V

								
To top