Docstoc

469

Document Sample
469 Powered By Docstoc
					         Rogelio Nazar & Maarten Janssen
IULA, Universitat Pompeu Fabra, Barcelona
   Dictionaries good source for information
   Long tradition of taxonomy extraction
       Calzolari (1977), Amsler (1981), Chodorow et al
        (1985), Fox et al. (1988), Alshawi (1989), Boguraev
        (1991), Barrière & Popowich (1996), Chang (1998),
        Renau & Battaner (2008)
   Exploiting Machine Readable Dictionaries
       Parsing definitional phrases
       Pattern extraction, Shallow parsing
       Full treatment of a single dictionary
   There is a lot of information available
     Hand crafted, high-qualify resources

   Combining yields new data
   Taxonomy from multiple dictionaries
     Language-independent shallow method
     Combining definitions of the same word
     Various dictionaries, online versions
     DRAE, DGLE, Clave, DEM
     Frequency Based
   Dictionaries differ
    ◦ Different lexicon and definitions
    ◦ Even if only for legal reasons
   Hyperonym should be the same
    ◦ A cat is an animal
    ◦ Unless there is uncertainty in the hyperonym
   Most dictionaries should use same genus
    ◦ Statistically relevant
3x
ablandabrevas
persona
2x
com.
inútil
1x
substantivo
común
fig.
   Directly from harvested text
    ◦ With begin/end tags
   No textual analysis
   More than definitions
    ◦ Examples, multiple senses, etc.
   Sense matching impossible
    ◦ Entries unsystematic
    ◦ Dictionaries do not match in senses
   Minimum number of dictionaries
   Raw frequency count
    ◦ Hyperonym tends to be repeated
   Candidates have to be words
    ◦ Of the same word-class
   Use of a stop-list
    ◦ Dictionary generated
    ◦ Words that occur in more than 10% entries
# deconstrucción (3 dictionaries)
teoría    2     1
EWN: 0.desconstrucción; 0.deconstrucción; 1.teoría filosófica; 1.doctrina filosófica; 2.filosofía;
3.creencia; 4.contenido mental; 5.conocimiento; 5.cognición; 6.rasgo psicológico;


# descubrimiento (5 dictionaries)
acción    3     3
cosa 3    5
efecto    2     -
EWN: 0.descubrimiento; 1.logro; 1.presentación; 1.revelación; 2.realización; 2.información;
2.exposición; 3.acción; 3.hecho; 3.acto de habla; 3.comunicación visual; 4.acto; 4.actividad humana;
4.comunicación; 5.relación social; 6.relación; 7.abstracción;


# cumbia (5 dictionaries)
danza     2     -
EWN: 0.cumbiamba; 0.cumbia; 1.baile regional; 1.danza popular; 2.baile social; 3.baile; 4.recreación;
4.diversión; 5.actividad; 6.acto; 6.actividad humana;


# asta (5 dictionaries)
mar 6       -
lanza       6     -
media       5     -
toro 5      -
cuerno      5     -
bandera 4         -
EWN: 0.cuerno; 0.asta; 1.tomadero; 1.materia animal; 1.cogedero; 1.bastón; 1.agarradera; 1.asimiento;
1.asidero; 1.asa; 2.materia; 2.apéndice; 2.vara; 2.palo; 3.porción; 3.sustancia; 3.parte; 3.herramienta;
4.utillaje; 5.artefacto; 6.objeto físico; 6.cosa; 6.objeto; 6.objeto inanimado; 7.competente; 7.respirar;
7.capaz; 7.entidad;
   WordNet (still) best available taxonomy
    ◦ Not the best resource for evaluation
   Automatic Verification
    ◦ 100 Random nouns
    ◦ Best 5 hyperonymy candidates
    ◦ Match when candidate in chain
   Only about 50% accurracy
   WordNet
    ◦ Many intermediate/artificial levels
    ◦ Compulsory hyperonym
    ◦ Contains proper names
   Dictonaries
    ◦ More word-senses
    ◦ Alternative definitions (synonymy, paraphrasis, …)
   Differences
    ◦ Different choice of hyperonym
    ◦ Different lexicon
   Question?

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:3
posted:12/19/2011
language:
pages:14