VIEWS: 5 PAGES: 31 POSTED ON: 9/11/2012 Public Domain
A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli {navigli,velardi,faralli}@di.uniroma1.it http://lcl.uniroma1.it ERC StG: Multilingual Joint Word Sense Disambiguation (MultiJEDI) 1 Roberto Navigli Motivations We present a graph-based approach to learn a lexical taxonomy automatically starting from a domain corpus and the Web. Unlike other approaches, we learn both concepts and relations entirely from scratch in 3 steps: 1) term extraction 2) deﬁnition and hypernym extraction 3) graph pruning A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Taxonomy Learning Workflow Web glossaries & documents Domain Upper terms terms Domain Corpus Definition Terminology Domain Graph & hypernym extraction filtering pruning extraction Hypernym graph Induced taxonomy A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Taxonomy Learning Workflow Web glossaries & documents Domain Upper terms terms Domain Corpus Definition Terminology Domain Graph & hypernym extraction filtering pruning extraction Hypernym graph Induced taxonomy A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Terminology Extraction maximum likelihood flow network mesh generation hash function pattern recognition information processing Domain Corpus Domain terms http://lcl.uniroma1.it/termextractor A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Taxonomy Learning Workflow Web glossaries & documents Domain Upper terms terms Domain Corpus Definition Terminology Domain Graph & hypernym extraction filtering pruning extraction Hypernym graph Induced taxonomy A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Definition & Hypernym Extraction + Domain Filtering Domain Corpus Web glossaries & documents flow network Domain definition extraction (WCL) terms domain non domain In graph theory, a flow network is a directed graph. Global Cash Flow Network is a business opportunity to make money online. A flow network is a network with two distinguished vertices. A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Definition & Hypernym Extraction + Domain Filtering Domain Corpus Web glossaries & documents flow network Domain definition extraction (WCL) terms directed graph network In graph theory, a flow network is a directed graph. A flow network is a network with two distinguished vertices. directed graph hypernym extraction flow network network A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Definition & Hypernym Extraction + Domain Filtering Domain Corpus Web glossaries & documents graph data structure directed graph Terms from definition extraction (WCL) previous iteration directed graph network A directed graph is a graph where ... A directed graph is a data structure ... graph hypernym extraction flow network data structure A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Hypernym Extraction Algorithm (1) • Large training set with many uncommon patterns “X is a ADJ term that refers to a kind of Y” • Annotated with 4 fields: definiendum (D), definitor (V) containing the verbal pattern and definiens (H) containing the hypernym, and the rest of the sentence (R). – An <Albedo> (often represented by the generic formula HA)/ is traditionally considered / any chemical compound / that, when dissolved in water, gives a solution with a hydrogen ion activity greater than in pure water • The algorithm builds a set of word lattices from the training set. Independent lattices are created for each of the 3 basic fields A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Hypernym Extraction Algorithm (2) Lattice learning consists of three steps: 1.each sentence in the training set is pre- processed and each field is generalized to a star pattern “[In arts, a chiaroscuro]D [is]V [a monochrome picture]H.” D=“In *, a <TARGET>”, V=“is”, H=“a * <HYPER>” A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Hypernym Extraction Algorithm (3) 2. Clustering: for each field, the training sentences are then clustered according to the star patterns they belong to; In arts, a chiaroscuro is a monochrome picture. In mathematics, a graph is a data structure that consists of . . . In computer science, a pixel is a dot that is part of a computer image. D: In * , a <TARGET> V: is H: a * <HYPER> A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Hypernym Extraction Algorithm (4) 3. Word-Class Lattice construction: for each sentence cluster, a WCL is created by means of a greedy alignment algorithm A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Performance in definition extraction Wikipedia UKWac corpus Outperforms existing methods for definition extraction A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Precision in hypernym extraction Wikipedia UKWac Pattern-based methods achieve much lower recall: 62 vs. 383 hypernyms extracted from UKWac A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli The iterative growth of the hypernym graph A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli The iterative growth of the hypernym graph A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Taxonomy Learning Workflow Web glossaries & documents Domain Upper terms terms Domain Corpus Definition Terminology Domain Graph & hypernym extraction filtering pruning extraction Hypernym graph Induced taxonomy A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Graph Pruning Given the hypernym graph 1) We disconnect false roots and false leaves. 2) We weight edges and nodes with a novel weighting algorithm. A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Graph Pruning 0 5 Given the hypernym graph 5 5 5 1) We disconnect false roots 2 and false leaves. 3 2) We weight edges and 8 8 8 7 7 nodes with a novel weighting 1 2 1 2 algorithm. 10 9 0 1 A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Graph Pruning 0 5 Given the hypernym graph 5 5 5 1) We disconnect false roots 2 and false leaves. 3 2) We weight edges and 8 8 8 7 7 nodes with a novel weighting 1 2 1 2 algorithm. 10 3) We apply Chu-Liu/Edmond's 9 algorithm, to obtain an 0 Optimal Branching. 1 A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Graph Pruning Given the hypernym graph 1) We disconnect false roots and false leaves. 2) We weight edges and nodes with a novel weighting algorithm. 3) We apply Chu-Liu/Edmond's algorithm, to obtain an Optimal Branching. As a result we obtain a tree-like taxonomy. A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli From the Noisy Hypernym Graph... A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Application to ACL Taxonomy • ACL Anthology from year 1979 to 2010 (4176 papers). • 29 upper terms from WordNet’s abstaction • 10,000 terms extracted, first 2000 inspected, 1006 selected (eliminated e.g. : word pair, input sentence, human judgement) • 5 iterations, 1329 definitions, 1031 nodes 1274 edges • After pruning, 936 nodes 935 edges A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Evaluation (5 annotators) Another application of the algorithm starting from IJCAI collection, similar results A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Evaluation: WordNet reconstruction • Same evaluation strategy as in Kozareva&Hovy (EMNLP2010) • Only nodes both in WordNet and in the acquired taxonomy are considered in the evaluation (as in K&H) Future work • From “strict” taxonomy to lattice • A in-house implementation of google “define” to overcome search limitations (no API for Google define) • Extension to other languages A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli http://lcl.uniroma1.it April 2011 Initial terminology Upper terms Hypernyms from iteration I Hypernyms from iteration II Hypernyms from iteration III Hypernyms from iteration IV Hypernyms from iteration V