Docstoc

Semantic relations

Document Sample
Semantic relations Powered By Docstoc
					           From WordNet,
          to EuroWordNet,
    to the Global Wordnet Grid:
anchoring languages to universal meaning

               Piek Vossen

         VU University Amsterdam



                                           1
 What kind of resource is wordnet?

• Mostly used database in language
  technology
• Enormous impact in language technology
  development
• Large
• Free and downloadable
• English
                                           2
                  WordNet
 http://wordnet.princeton.edu/
• Developed by George Miller and his team at
  Princeton University, as the implementation of
  a mental model of the lexicon
• Organized around the notion of a synset: a set
  of synonyms in a language that represent a
  single concept
• Semantic relations between concepts
• Covers over 117,000 concepts and over
  150,000 English words
Relational model of meaning
 animal
                      kitten          animal

                                                    man
              boy
   man        woman            cat            dog         cat
                                     meisje

   boy         girl        kitten         puppy
                                                            dog
      puppy
                         woman




                                                                  4
 Wordnet: a network of semantically
           related words
              {conveyance;transport}


                      {vehicle}

                                                       {car mirror}   {armrest}
        {motor vehicle; automotive vehicle}

                                                       {car door}     {doorlock}
  {car; auto; automobile; machine; motorcar}
                                                       {bumper}
                                                                      {hinge;
                                                       {car window}   flexible joint}

{cruiser; squad car; patrol car;   {cab; taxi; hack; taxicab}
police car; prowl car}
  Wordnet Semantic Relations
WN 1.5 starting point

The ‘synset’ as a weak notion of synonymy:
             “two expressions are synonymous in a linguistic context C
             if the substitution of one for the other in C does not alter
              the truth value.” (Miller et al. 1993)

Relations between synsets:
Relation          POS-combination                   Example
ANTONYMY          adjective-to-adjective            good/bad
                  verb-to-verb                      open/ close
HYPONYMY          noun-to-noun                      car/ vehicle
                  verb-to-verb                      walk/ move
MERONYMY          noun-to-noun                      head/ nose
ENTAILMENT        verb-to-verb                      buy/ pay
CAUSE             verb-to-verb                      kill/ die

                                                                            6
    Wordnet Data Model
Relations       Concepts                 Vocabulary of a language
            rec: 12345               1
            - financial institute                bank
            rec: 54321               2
            - side of a river
            rec: 9876                    1       fiddle
            - small string instrument            violin
 type-of    rec: 65438                   2
                                                 fiddler
            - musician playing violin            violist
            rec:42654
type-of     - musician
            rec:35576                1
part-of
            - string of instrument               string
            rec:29551                2
            - underwear
            rec:25876
            - string instrument




                                                                    7
  Some observations on Wordnet
• synsets are more compact representations for concepts than
  word meanings in traditional lexicons
• synonyms and hypernyms are substitutional variants:
   – begin – commence
   – I once had a canary. The bird got sick. The poor animal died.
• hyponymy and meronymy chains are important transitive
  relations for predicting properties and explaining textual
  properties:
   object -> artifact -> vehicle -> 4-wheeled vehicle -> car
• strict separation of part of speech although concepts are
  closely related (bed – sleep) and are similar (dead – death)
• lexicalization patterns reveal important mental structures

                                                                     8
            Lexicalization patterns
                         entity

                  object          organism                     25 unique
    garbage                                                    beginners
threat      artifact       animal        plant         waste
         building bird                   tree flower
                                                        basic level
         church canary dog crocodile      rose          concepts
                            • balance of two principles:
         abbey common           • predict most features
                canary          • apply to most subclasses
                            • where most concepts are created
                            • amalgamate most parts
                            • most abstract level to draw a pictures
                                                               9
Wordnet top level




                    10
      Meronymy & pictures
         beak




                        tail

leg
                               11
Meronymy & pictures




                      12
      Co-reference constraint in wordnet:
         Cats cannot be a kind of cats
•   S: (n) cat, true cat (feline mammal usually having thick soft fur and no ability to roar:
    domestic cats; wildcats)
•   S: (n) guy, cat, hombre, bozo (an informal term for a youth or man) "a nice guy"; "the
    guy's only doing it for some doll"
•   S: (n) cat (a spiteful woman gossip) "what a cat she is!"
•   S: (n) kat, khat, qat, quat, cat, Arabian tea, African tea (the leaves of the shrub Catha
    edulis which are chewed like tobacco or used to make tea; has the effect of a euphoric
    stimulant) "in Yemen kat is used daily by 85% of adults"
•   S: (n) cat-o'-nine-tails, cat (a whip with nine knotted cords) "British sailors feared the
    cat"
•   S: (n) Caterpillar, cat (a large tracked vehicle that is propelled by two endless metal
    belts; frequently used for moving earth in construction and farm work)
•   S: (n) big cat, cat (any of several large cats typically able to roar and living in the wild)
•   S: (n) computerized tomography, computed tomography, CT, computerized axial
    tomography, computed axial tomography, CAT (a method of examining body organs
    by scanning them with X rays and using a computer to construct a series of cross-
    sectional scans along a single axis)

•   S: (n) domestic cat, house cat, Felis domesticus, Felis catus (any domesticated member
    of the genus Felis)                                                              13
14
               Wordnet 3.0 statistics

         POS        Unique      Synsets         Total

                                             Word-Sense
                    Strings                    Pairs
Noun                  117,798       82,115         146,312
Verb                   11,529       13,767          25,047
Adjective              21,479       18,156          30,002
Adverb                  4,481        3,621              5,580
Totals                155,287      117,659         206,941


                                                            15
             Wordnet 3.0 statistics

       POS     Monosemous     Polysemous      Polysemous

                Words and
                 Senses         Words           Senses

Noun                101,863        15,935           44,449

Verb                  6,277         5,252           18,770

Adjective            16,503         4,976           14,399

Adverb                3,748             733          1,832

Totals              128,391        26,896           79,450

                                                             16
             Wordnet 3.0 statistics

       POS       Average Polysemy            Average Polysemy
               Including Monosemous        Excluding Monosemous
                       Words                       Words
Noun                                1.24                        2.79
Verb                                2.17                        3.57
Adjective                            1.4                        2.71
Adverb                              1.25                         2.5




                                                                   17
http://www.visuwords.com




                           18
19
               Usage of Wordnet
• Improve recall of textual based analysis:
   – Query -> Index
      •   Synonyms: commence – begin
      •   Hypernyms: taxi -> car
      •   Hyponyms: car -> taxi
      •   Meronyms: trunk -> elephant
      •   Lexical entailments: gun -> shoot
• Inferencing:
   – what things can burn?
• Expression in language generation and translation:
   – alternative words and paraphrases

                                                   20
              Improve recall
• Information retrieval:
  – small databases without redundancy, e.g. image
    captions, video text
• Text classification:
  – small training sets
• Question & Answer systems
  – query analysis: who, whom, where, what, when

                                                 21
                  Improve recall
• Anaphora resolution:
   – The girl fell off the table. She....
   – The glass fell of the table. It...
• Coreference resolution:
   – When he moved the furniture, the antique table got
     damaged.
• Information extraction (unstructed text to
  structured databases):
   – generic forms or patterns "vehicle" - > text with
     specific cases "car"
                                                          22
             Improve recall
• Summarizers:
  – Sentence selection based on word counts ->
    concept counts
  – Avoid repetition in summary -> language
    generation
• Limited inferencing: detect locations,
  organisations, etc.

                                                 23
              Many others
• Data sparseness for machine learning:
  hapaxes can be replaced by semantic classes
• Use redundancy for more robustness:
  spelling correction and speech recognition
  can built semantic expectations using
  Wordnet and make better choices
• Sentiment and opinion mining
• Natural language learning
                                            24
             Recall & Precision

 “jail”
                    “nerve cell”     “cell         “mobile
                    “police cell”    phone”        phones”
“neuron”




                         found      intersection    relevant



           query:                 recall =
                       Recall < 20% fordoorsnede / relevantengines!
                                           basic search
           “cell”                 precision = doorsnede / gevonden
                       (Blair & Maron 1985)
               EuroWordNet
• The development of a multilingual database with wordnets
  for several European languages
• Funded by the European Commission, DG XIII,
  Luxembourg as projects LE2-4003 and LE4-8328
• March 1996 - September 1999
• 2.5 Million EURO.
• http://www.hum.uva.nl/~ewn
• http://www.illc.uva.nl/EuroWordNet/finalresults-
  ewn.html

                                                       26
                  EuroWordNet
• Languages covered:
   – EuroWordNet-1 (LE2-4003): English, Dutch, Spanish, Italian
   – EuroWordNet-2 (LE4-8328): German, French, Czech, Estonian.
• Size of vocabulary:
   – EuroWordNet-1: 30,000 concepts - 50,000 word meanings.
   – EuroWordNet-2: 15,000 concepts- 25,000 word meaning.
• Type of vocabulary:
   – the most frequent words of the languages
   – all concepts needed to relate more specific concepts




                                                                  27
EuroWordNet Model
                               Domains                  Ontology
         move                                                                   bewegen
                               Traffic                2OrderEntity
         go                                                                     gaan

           III               Air       Road`       Location Dynamic                   III
  ride            drive                                                     rijden           berijden
                                               I             I
           III                                                                        III
                               II                                     II
                                                                              Lexical Items Table
  Lexical Items Table
                                                                              Lexical Items Table
  Lexical Items Table                           ILI-record
                                                  {drive}
           III                                                                        III
cabalgar                       II                                     II
                  conducir                                                 guidare           cavalcare
jinetear
                                          Inter-Lingual-Index
           III                                                                       III

                                   I = Language Independent link
      mover                        II = Link from Language Specific              andare
      transitar                         to Inter lingual Index                   muoversi
                                   III = Language Dependent Link


                                                                                                    28
    Differences in relations between
     EuroWordNet and WordNet

• Added Features to relations

• Cross-Part-Of-Speech relations

• New relations to differentiate shallow hierarchies

• New interpretations of relations

                                                  30
     EWN Relationship Labels
{airplane}    HAS_MERO_PART: conj1         {door}
              HAS_MERO_PART: conj2 disj1   {jet engine}
              HAS_MERO_PART: conj2 disj2   {propeller}

{door}        HAS_HOLO_PART: disj1         {car}
              HAS_HOLO_PART: disj2         {room}
              HAS_HOLO_PART: disj3         {entrance}

{dog}         HAS_HYPERONYM: conj1         {mammal}
              HAS_HYPERONYM: conj2         {pet}

{albino}      HAS_HYPERONYM: disj1         {plant}
              HAS_HYPERONYM: disj2         {animal}

Default Interpretation: non-exclusive disjunction
                                                          32
            EWN Relationship Labels
Factive/Non-factive CAUSES (Lyons 1977)

  factive (default interpretation):

         “to kill causes to die”:
          {kill}            CAUSES                             {die}

  non-factive: E1 probably or likely causes event E2 or E1 is intended to cause
  some event E2:

         “to search may cause to find”.
        {search}          CAUSES                      {find} non-factive




                                                                            33
     Cross-Part-Of-Speech relations

WordNet1.5: nouns and verbs are not interrelated by basic semantic
relations such as hyponymy and synonymy:

 adornment 2     change of state-- (the act of changing something)
 adorn 1         change, alter-- (cause to change; make different)

EuroWordNet: words of different parts of speech can be inter-linked with
explicit xpos-synonymy, xpos-antonymy and xpos-hyponymy relations:

 {adorn V}       XPOS_NEAR_SYNONYM                   {adornment N}
 {size N}        XPOS_NEAR_HYPONYM                   {tall A}
                                                     {short A}
                                                                           34
                          Role relations
In the case of many verbs and nouns the most salient relation is not the hyperonym
but the relation between the event and the involved participants. These relations
are expressed as follows:

{knife}           ROLE_INSTRUMENT                       {to cut}
{to cut}          INVOLVED_INSTRUMENT                   {knife}           reversed
{school}          ROLE_LOCATION                         {to teach}
{to teach}        INVOLVED_LOCATION                     {school}          reversed

These relations are typically used when other relations, mainly hyponymy, do not
clarify the position of the concept network, but the word is still closely related to
another word.


                                                                               35
                Co_Role relations
guitar player   HAS_HYPERONYM           player
                CO_AGENT_INSTRUMENT     guitar
player          HAS_HYPERONYM           person
                ROLE_AGENT              to play music
                CO_AGENT_INSTRUMENT     musical instrument
to play music   HAS_HYPERONYM           to make
                ROLE_INSTRUMENT         musical instrument
guitar          HAS_HYPERONYM           musical instrument
                CO_INSTRUMENT_AGENT     guitar player
ice saw         HAS_HYPERONYM           saw
                CO_INSTRUMENT_PATIENT   ice
saw             HAS_HYPERONYM           saw
                ROLE_INSTRUMENT         to saw
ice             CO_PATIENT_INSTRUMENT   ice saw REVERSED

                                                        36
                     Co_Role relations

Examples of the other relations are:

criminal                 CO_AGENT_PATIENT       victim
novel writer/ poet       CO_AGENT_RESULT        novel/ poem
dough                    CO_PATIENT_RESULT      pastry/ bread
photograpic camera       CO_INSTRUMENT_RESULT   photo




                                                            37
Overview of the Language Internal
    relations in EuroWordnet
Same Part of Speech relations:
NEAR_SYNONYMY                         apparatus - machine
HYPERONYMY/HYPONYMY                   car - vehicle
ANTONYMY                              open - close
HOLONYMY/MERONYMY                     head - nose

Cross-Part-of-Speech relations:
XPOS_NEAR_SYNONYMY                dead - death; to adorn - adornment
XPOS_HYPERONYMY/HYPONYMY          to love - emotion
XPOS_ANTONYMY                     to live - dead
CAUSE                             die - death
SUBEVENT                          buy - pay; sleep - snore
ROLE/INVOLVED           write - pencil; hammer - hammer
STATE                             the poor - poor
MANNER                            to slurp - noisily
                                                                38
BELONG_TO_CLASS                   Rome - city
 Horizontal & vertical semantic relations

  chronical patient ;
  mental patient
HYPONYM                       ρ-PATIENT
                                              cure

        patient                         ρ-CAUSE
                                                                          docter
                                              treat         ρ-AGENT      HYPONYM
    STATE                     ρ-PATIENT
                                                                      child docter
                              ρ-PROCEDURE             ρ-LOCATION
  disease; disorder                                                   co-ρ-
                                                                      AGENT-PATIENT
HYPONYM
                        physiotherapy     hospital, etc.
  stomach disease,      medicine                                         child
  kidney disorder,      etc.
          The Multilingual Design
• Inter-Lingual-Index: unstructured fund of concepts to
  provide an efficient mapping across the languages;

• Index-records are mainly based on WordNet synsets and
  consist of synonyms, glosses and source references;

• Various types of complex equivalence relations are
  distinguished;

• Equivalence relations from synsets to index records: not on a
  word-to-word basis;

• Indirect matching of synsets linked to the same index items;
                                                                  40
           Equivalent Near Synonym
1. Multiple Targets (1:many)
   Dutch wordnet: schoonmaken (to clean) matches with 4
   senses of clean in WordNet1.5:
   • make clean by removing dirt, filth, or unwanted substances from
   • remove unwanted substances from, such as feathers or pits, as of chickens or fruit
   • remove in making clean; "Clean the spots off the rug"
   • remove unwanted substances from - (as in chemistry)
2. Multiple Sources (many:1)
     Dutch wordnet: versiersel near_synonym versiering
     ILI-Record: decoration.
3. Multiple Targets and Sources (many:many)
     Dutch wordnet: toestel near_synonym apparaat
     ILI-records: machine; device; apparatus; tool 41
       Equivalent Hyperonymy
Typically used for gaps in English WordNet:
• genuine, cultural gaps for things not known in
  English culture:
   – Dutch: klunen, to walk on skates over land from one
     frozen water to the other

• pragmatic, in the sense that the concept is known but
  is not expressed by a single lexicalized form in
  English:
   – Dutch: kunststof = artifact substance <=> artifact object

                                                             42
        Equivalent Hyponymy

has_eq_hyponym
Used when wordnet1.5 only provides more narrow
  terms. In this case there can only be a pragmatic
  difference, not a genuine cultural gap, e.g.: Spanish
  dedo = either finger or toe.




                                                    43
Complex mappings across languages
 EN-Net                                   IT-Net

  toe                                      dito
  finger   { toe : part of foot }

  head     { finger : part of hand }
           { dedo , dito :
               finger or toe }
           { head : part of body }
 NL-Net    { hoofd : human head }         ES-Net

  hoofd    { kop : animal head }           dedo
  kop
                   = normal equivalence

                   = eq _has_hyponym

                   = eq _has_hyperonym
                                                   44
 Typical gaps in the (English) ILI
• Dutch:
doodschoppen (to kick to death):
         eq_hyperonym {kill}V and to {kick}V
aardig (Adjective, to like):
         eq_near_synonym {like}V
cassière (female cashier)
         eq_hyperonym {cashier}, {woman}
kunstproduct (artifact substance)
         eq_hyperonym {artifact} and to {product}

• Spanish:
alevín (young fish):
         eq_hyperonym {fish} and eq_be_in_state {young}
cajera (female cashier)
         eq_hyperonym {cashier}, {woman}

                                                          45
 Wordnets as semantic structures
• Wordnets are unique language-specific structures:
   –   different lexicalizations
   –   differences in synonymy and homonymy
   –   different relations between synsets
   –   same organizational principles: synset structure and
       same set of semantic relations.
• Language independent knowledge is assigned to
  the ILI and can thus be shared for all language
  linked to the ILI: both an ontology and domain
  hierarchy
                                                              46
       Autonomous & Language-Specific
             Wordnet1.5                          Dutch Wordnet
                object                                  voorwerp
                                                        {object}
artifact, artefact         natural object (an
(a man-made object)        object occurring
                           naturally)         blok   werktuig{tool}       lichaam
block         instrumentality         body {block}                        {body}

 implement                      device
               container
tool                             instrument     bak     lepel          tas
       box      spoon     bag                   {box}   {spoon}       {bag}

                                                                              47
Linguistic versus Artificial Ontologies
 Artificial ontology:
     • better control or performance, or a more compact and
     coherent structure.
     • introduce artificial levels for concepts which are not
     lexicalized in a language (e.g. instrumentality, hand tool),
     • neglect levels which are lexicalized but not relevant for the
     purpose of the ontology (e.g. tableware, silverware,
     merchandise).

 What properties can we infer for spoons?
 spoon -> container; artifact; hand tool; object; made of metal or
 plastic; for eating, pouring or cooking


                                                                       48
Linguistic versus Artificial Ontologies

Linguistic ontology:
   • Exactly reflects the relations between all the lexicalized words and
     expressions in a language.
   • Captures valuable information about the lexical capacity of
     languages: what is the available fund of words and expressions in a
     language.


What words can be used to name spoons?
spoon -> object, tableware, silverware, merchandise, cutlery,



                                                                       49
       Wordnets versus ontologies
• Wordnets:
  • autonomous language-specific lexicalization
    patterns in a relational network.
  • Usage: to predict substitution in text for
    information retrieval,
  • text generation, machine translation, word-
    sense-disambiguation.
• Ontologies:
  • data structure with formally defined concepts.
  • Usage: making semantic inferences.

                                                 50
     Sharing world knowledge
• All wordnets in the world can be linked to
  the same ontology
• All wordnets in the world can be linked to
  the same thesaurus




                                               51
        Wordnet: Domain information
Vocabularies of languages                 Concepts             Relations                   Domains

                                 1   rec: 12345
                                                                        Clothing   Culture Sport Finance
                                     - financial institute
                                 2   rec: 54321
                       bank          - river side                                  Music    Ball  Winter
                                 1   rec: 9876                                             sports sports
                       violin        - small string instrument
                                 2   rec: 65438
                                     - musician playing a violin
                       violist       rec:42654
                                     - musician              type-of
                                 1   rec:35576                                      type-of
                                     - string of an instrument         part-of
                       string
                                 2   rec:29551
                                     - underwear
                                     rec:25876
                                     - string instrument




                                                                                              52
   How to harmonize wordnets?
• Wordnets are unique language-specific
  lexicalizations patterns
• Define universal sets of concepts that play a major
  role in many different wordnets: so-called Base
  Concepts
• Define base concepts in each language wordnet
   – High level in the hierarchy
   – Many hyponyms
• Provide the closest equivalent in English wordnet
• Determine the intersection of English
  equivalences
                                                      53
            Lexicalization patterns
                         entity

                 object           organism              25 unique
    garbage                                             beginners
threat      artifact      animal         plant         1024 base
         building bird                   tree flower   concepts
                                                       basic level
         church canary dog crocodile         rose
                                                       concepts

         abbey common
               canary



                                                            54
         Base Concept Intersection
                                                 Nouns               Verbs
Intersection EN, NL, IT, ES                      24                  6
Intersection FR, DE, EE, CZ                      70                  30
Intersection All                                 13                  2

{human 1; individual#1; mortal#1; person#1; someone#1; soul#1}
{animal 1; animate being#1; beast#1; brute#1; creature#1; fauna#1}
{flora 1; plant#1; plant life#1}
{matter 1; substance#1}
{food 1; nutrient#1}
{feeling 1}
{act 1; human action#1; human activity#1}
{cause 6; get#9; have#7; induce#2; make#12; stimulate#3}
{create 2; make#13}
{go 14; locomote#1; move#15; travel#4}
{be 4; have the quality of being#1}                                          55
 Explanations for low intersection of
           Base Concepts

• The individual selections are not representative
  enough.
• There are major differences in the way meanings are
  classified, which have an effect on the frequency of
  the relations.
• The translations of the selection to WordNet1.5
  synsets are not reliable
• The resources cover very different vocabularies

                                                     56
     Concepts selected by at least two
     languages: intersections of pairs
             NOUNS                 VERBS

       NL   ES   IT   EN   NL   ES    IT    EN
NL     1027 103 182 333 323        36    42   86
ES      103 523     45 284   36 128      18   43
IT      182    45 334 167    42    18 104     39
EN      333 284 167 1296     86    43    39 236



                                              57
           Common Base Concepts

                                Nouns         Verbs         Total
Physical objects & substances           491                          491
Processes and states                    272           228            500
Mental objects                           33                           33
Total                                   796           228           1024




                                                                      58
Table 4: Number of Common BCs represented in the local wordnets

        Related to CBCs Eq_synonym         Eq_near          CBCs Without
                                                            Direct Equivalent
NL      992              725               269              97
ES      1012             1009              0                15
IT      878              759               191              9



Table 5: BC4 Gaps in at least two wordnets (10 synsets)
body covering#1          mental object#1; cognitive content#1; content#2
body substance#1         natural object#1
social control#1         place of business#1; business establishment#1
change of magnitude#1 plant organ#1
contractile organ#1      plant part#1
psychological feature#1 spatial property#1; spatiality#1
                                                                           59
Table 6: Local senses with complex equivalence
relations to CBCs
                                 NL       ES       IT
Eq_has_hyperonym                 61       40       4
eq_has_hyponym                   34       14       20
Eq_has_holonym                   2        0
Eq_has_meronym                   3        2
Eq_involved                      3
Eq_is_caused_by                  3
Eq_is_state_of                   1

Example of complex relation

        CBC: cause to feel unwell#1, Verb
        Closest Dutch concept: {onwel#1}, Adjective (sick)

Equivalence relation: eq_is_caused_by
                                                             60
                          EuroWordNet data
           Synsets No. of senses Sens./ Entries Sens./ LIRels. LIRels/ EQRels- EQRels/s Synsets
                                 syns.          entry           syns     ILI     yn      without
                                                                                           ILI
Dutch        44015         70201 1,59 56283 1,25 111639           2,54 53448      1,21       7203
Spanish      23370         50526 2,16 27933 1,81 55163            2,36 21236      0,91          0
Italian      40428         48499 1,20 32978 1,47 117068           2,90 71789      1,78       1561
French       22745         32809 1.44 18777 1.75 49494            2.18 22730      1.00         20
German       15132         20453 1.35 17098 1.20 34818            2.30 16347      1.08          0
Czech        12824         19949 1.56 12283 1.62 26259            2.05 12824      1.00          0
Estonian      7678         13839 1.80 10961 1.26 16318            2.13    9004    1.17          0
English      16361         40588 2,48 17320 2,34 42140            2,58      n.a.    n.a.      n.a.
WN15         94515       187602 1,98 126617 1,48 211375           2,24      n.a.    n.a.      n.a.



                                                                                          61
From EuroWordNet to Global WordNet

• Currently, wordnets exist for more than 50
  languages, including:
• Arabic, Bantu, Basque, Chinese, Bulgarian,
  Estonian, Hebrew, Icelandic, Japanese, Kannada,
  Korean, Latvian, Nepali, Persian, Romanian,
  Sanskrit, Tamil, Thai, Turkish, Zulu...

• Many languages are genetically and typologically
  unrelated
• http://www.globalwordnet.org

                                                    62
        Global Wordnet Association
      EuroWordNet       BalkaNet
                        Romanian    •   Danish          Arabic
  •     English     
                                                        Polish
                       Bulgarian   •   Norway
  •     German                                          Welsh
                       Turkish     •   Swedish
  •     Spanish                                         Chinese
                       Slovenian   •   Portuguese
  •     French                                          20 Indian
                       Greek       •   Korean           Languages
  •     Italian                     •   Russian          Brazilian
                       Serbian                      
  •     Dutch                       •   Basque           Portuguese
  •     Czech                       •   Catalan         Hebrew
                                                        Latvian
  •     Estonian                    •   Thai
                                                        Persian
                                                        Kurdish
http://www.globalwordnet.org                            Avestan
                                                        Baluchi
                                                        Hungarian


                                                                  63
         Some downsides of the
          EuroWordnet model
• Construction is not done uniformly
• Coverage differs
• Not all wordnets can communicate with one
  another
• Proprietary rights restrict free access and usage
• A lot of semantics is duplicated
• Complex and obscure equivalence relations due to
  linguistic differences between English and other
  languages
                                                  64
       Next step: Global WordNet Grid
                                                                                       Fahrzeug
                                                                                               1
                                                                                       Auto Zug
                                             Inter-Lingual            voertuig

                vehicle
                                               Ontology                      1
                                                                     auto trein
                                                                                          2
                                                                                   German Words
                          1
              car     train                         Object                  2
                                                                    Dutch Words                 liiklusvahend
                   2                                                                        1
                                                    Device
         English Words                                                                       auto killavoor
                                         3    TransportDevice   3
   vehículo                                                                                 2
         1
                                                                           véhicule        Estonian Words
  auto tren                                                                            1
                       veicolo                                           voiture   train
          2                      1
Spanish Words        auto treno                                                  2
                                      dopravní prostředník                French Words
                          2                    1
                    Italian Words    auto             vlak

                                        2
                                     Czech Words                                                   65
      GWNG: Main Features
• Construct separate wordnets for each Grid
  language
• Contributors from each language encode the
  same core set of concepts plus
  culture/language-specific ones
• Synsets (concepts) can be mapped
  crosslinguistically via an ontology


                                           66
   The Ontology: Main Features
• Formal ontology serves as universal index of
  concepts
• List of concepts is not just based on the lexicon of
  a particular language (unlike in EuroWordNet) but
  uses ontological observations
• Ontology contains only upper and mid-level
  concepts
• Concepts are related in a type hierarchy
• Concepts are defined with axioms

                                                     67
     The Ontology: Main Features

• In addition to high-level (“primitive”) concept
  ontology needs to express low-level concepts
  lexicalized in the Grid languages

• Additional concepts can be defined with
  expressions in Knowledge Interchange Format
  (KIF) based on first order predicate calculus and
  atomic element

                                                      68
   The Ontology: Main Features
• Minimal set of concepts (Reductionist view):
   – to express equivalence across languages
   – to support inferencing
• Ontology must be powerful enough to encode all concepts
  that are lexically expressed in any of the Grid languages
• Ontology need not and cannot provide a linguistic
  encoding for all concepts found in the Grid languages
   – Lexicalization in a language is not sufficient to warrant inclusion
     in the ontology
   – Lexicalization in all or many languages may be sufficient
• Ontological observations will be used to define the
  concepts in the ontology

                                                                           69
        Ontological observations
• Identity criteria as used in OntoClean (Guarino &
  Welty 2002), :
  – rigidity: to what extent are properties true for entities
    in all worlds? You are always a human, but you can be
    a student for a short while.
  – essence: what properties are essential for an entity?
    Shape is essential for a statue but not for the clay it is
    made of.
  – unicity: what represents a whole and what entities are
    parts of these wholes? An ocean is a whole but the
    water it contains is not.

                                                           70
            Type-role distinction
• Current WordNet treatment:
(1) a husky is a kind of dog(type)
(2) a husky is a kind of working dog (role)
• What’s wrong?
(2) is defeasible, (1) is not:
*This husky is not a dog
This husky is not a working dog

Other roles: watchdog, sheepdog, herding dog, lapdog, etc….

                                                          71
          Ontology and lexicon
•Hierarchy of disjunct types:
      Canine  PoodleDog; NewfoundlandDog;
        GermanShepherdDog; Husky
•Lexicon:
   – NAMES for TYPES:
      {poodle}EN, {poedel}NL, {pudoru}JP
      ((instance x Poodle)
   – LABELS for ROLES:
      {watchdog}EN, {waakhond}NL, {banken}JP
      ((instance x Canine) and (role x GuardingProcess))

                                                            72
          Ontology and lexicon
•Hierarchy of disjunct types:
      River; Clay; etc…
•Lexicon:
   – NAMES for TYPES:
      {river}EN, {rivier, stroom}NL
      ((instance x River)
   – LABELS for dependent concepts:
      {rivierwater}NL (water from a river => water is not a unit)
      {kleibrok}NL (irregularly shared piece of clay=>non-essential)
      ((instance x water) and (instance y River) and (portion x y)
      ((instance x Object) and (instance y Clay) and (portion x y)
        and (shape X Irregular))

                                                                   73
                  Rigidity
• The “primitive” concepts represented in the
  ontology are rigid types
• Entities with non-rigid properties will be
  represented with KIF statements

• But: ontology may include some universal,
  core concepts referring to roles like father,
  mother
                                                  74
    Properties of the Ontology
• Minimal: terms are distinguished by
  essential properties only
• Comprehensive: includes all distinct
  concepts types of all Grid languages
• Allows definitions via KIF of all lexemes
  that express non-rigid, non-essential
  properties of types
• Logically valid, allows inferencing
                                              75
   Mapping Grid Languages onto
          the Ontology
• Explicit and precise equivalence relations among synsets in
  different languages:
   – type hierarchy is minimal
   – subtle differences can be encoded in KIF expressions
• Grid database contains wordnets with synsets that label
• --either “primitive” types in the hierarchies,
• --or words relating to these types in ways made explicit in
  KIF expressions
• If 2 lgs. create the same KIF expression, this is a statement
  of equivalence!

                                                              76
   How to construct the GWNG
• Take an existing ontology as starting point;
• Use English WordNet to maximize the number of disjunct
  types in the ontology;
• Link English WordNet synsets as names to the disjunct
  types;
• Provide KIF expressions for all other English words and
  synsets
• Copy the relation to the ontology to other languages,
  including KIF statements built for English
• Revise KIF statements to make the mapping more precise
• Map all words and synsets that are and cannot be mapped
  to English WordNet to the ontology:
   – propose extensions to the type hierarchy
   – create KIF expressions for all non-rigid concepts

                                                         77
      Initial Ontology: SUMO
                (Niles and Pease)

SUMO = Suggested Upper Merged Ontology
--consistent with good ontological practice
--fully mapped to WordNet(s): 1000 equivalence
   mappings, the rest through subsumption
--freely and publicly available
--allows data interoperability
--allows NLP
--allows reasoning/inferencing
                                                 78
                  SUMO
• 1,000 generic, abstract, high-level terms
• 4,000 definitional statements

• MILO (Mid-Level Ontology)
closer to lexicon, WordNet



                                              79
Mapping Grid languages onto the
          Ontology
• Check existing SUMO mappings to
  Princeton WordNet -> extend the ontology
  with rigid types for specific concepts
• Extend it to many other WordNet synsets
• Observe OntoClean principles! (Synsets
  referring to non-rigid, non-essential, non-
  unicitous concepts must be expressed in
  KIF)

                                                80
Lexicalizations not mapped to WordNet
 • Not added to the type hierarchy:
    {straathond}NL (a dog that lives in the streets)
    ((instance x Canine) and (habitat x Street))

 • Added to the type hierarchy:
    {klunen}NL (to walk on skates from one frozen body to
      the next over land)
    WalkProcess  KluunProcess
    Axioms:
    (and (instance x Human) (instance y Walk) (instance z
      Skates) (wear x z) (instance s1 Skate) (instance s2
      Skate) (before s1 y) (before y s2) etc…
 • National dishes, customs, games,....
                                                            81
 Most mismatching concepts are not
           new types
• Refer to sets of types in specific circumstances or
  to concept that are dependent on these types, next
  to {rivierwater}NL there are many other:
      {theewater}NL (water used for making tea)
      {koffiewater}NL (water used for making coffee)
      {bluswater}NL (water used for making extinguishing file)
• Relate to linguistic phenomena:
   – gender, perspective, aspect, diminutives, politeness,
     pejoratives, part-of-speech constraints

                                                                 82
KIF expression for gender marking

• {teacher}EN
((instance x Human) and (agent x
  TeachingProcess))

• {Lehrer}DE ((instance x Man) and (agent
  x TeachingProcess))
• {Lehrerin}DE ((instance x Woman) and
  (agent x TeachingProcess))
                                         83
  KIF expression for perspective
sell: subj(x), direct obj(z),indirect obj(y)
versus
buy: subj(y), direct obj(z),indirect obj(x)
(and (instance x Human)(instance y Human)
  (instance z Entity) (instance e FinancialTransaction)
  (source x e) (destination y e) (patient e)

The same process but a different perspective by subject
  and object realization: marry in Russian two verbs,
  apprendre in French can mean teach and learn
                                                    84
             Aspectual variants
• Slavic languages: two members of a verb pair for an
  ongoing event and a completed event.
• English: can mark perfectivity with particles, as in the
  phrasal verbs eat up and read through.
• Romance languages: mark aspect by verb conjugations on
  the same verb.
• Dutch, verbs with marked aspect can be created by
  prefixing a verb with door: doorademen, dooreten,
  doorfietsen, doorlezen, doorpraten (continue to
  breathe/eat/bike/read/talk).
• These verbs are restrictions on phases of the same process
• Does NOT warrant the extension of the ontology with
  separate processes for each aspectual variant
                                                           85
    Kinship relations in Arabic
•   ‫(عَم‬Eam~)        father's brother,
    paternal uncle.
•       ‫خ‬
    ‫( َال‬xaAl)       mother's brother,
    maternal uncle.
•     ‫َّم‬
    ‫( ع َة‬Eam~ap) father's sister, paternal
    aunt.
•      ‫خل‬
     ‫( َاَة‬xaAlap) mother's sister, maternal
    aunt
                                               86
     Kinship relations in Arabic
•   .........
•    ‫قق‬
    ‫$( شَ ِي َة‬aqiyqapfull) sister, sister on the paternal and
                                             ‫أ‬
    maternal side (as distinct from ‫>( ُخْت‬uxot): 'sister'
    which may refer to a 'sister' from paternal or maternal
    side, or both sides).
•   ‫( ثَكْالن‬vakolAna)       father bereaved of a child (as
                     ‫يت‬                 ‫ي ّم‬
    opposed to ‫( َ ِيم‬yatiym) or ‫( َتِي َة‬yatiymap) for
    feminine: 'orphan' a person whose father or mother died
    or both father and mother died).
•       ْ‫ث‬
    ‫( َكلَى‬vakolaYa)         other bereaved of a child (as
                           ‫ّم‬
    opposed to ‫ يَتِيم‬or ‫ يَتِي َة‬for feminine: 'orphan' a person
    whose father or mother died or both father and mother
    died).
                                                                87
       Complex Kinship concepts
father's brother, paternal uncle

WORDNET
paternal uncle    => uncle
                  => brother of ....????

ONTOLOGY
(=>
 (paternalUncle ?P ?UNC)
 (exists (?F)
  (and
    (father ?P ?F)
    (brother ?F ?UNC))))

                                           88
         Universality as evidence
• English verb cut abstracts from the precise process but
  there are troponyms that implicate the manner :
   – snip, clip imply scissors, chop and hack a large knife or an axe
• Dutch there is no general verb but only specific verbs:
   knippen “clip, snip, cut with scissors or a scissor-like tool'”, snijden
      “cut with a knife or knife-like tool”, hakken “chop, hack, to cut
      with an axe, or similar tool”).

• If lexicalization of the specific process is more universal it
  can be seen as evidence that the specific processes should
  be listed in the ontology and not the generic verb

                                                                              89
    Open Questions/Challenges
• What is a word, i.e., a lexical unit?
• What is the status of complex lexemes like
  English lightning rod, word of mouth, find
  out, kick the bucket?
• What is a semantic unit, i.e. a concept?



                                               90
    Open Questions/Challenges
• Is there a core inventory of concepts that are
  universally encoded?
• If so, what are these concepts?
• How can crosslinguistic equivalence be verified?
• Is there systematicity to the language-specific
  extensions?
• What are the lexicalization patterns of individual
  languages?
• Are lexical gaps accidental or systematic?
                                                       91
   Coverage: what belongs in a
   universal lexical database?
• Formal, linguistic criteria for inclusion
• Informal, cultural criteria
• Both are difficult to define and apply!




                                              92
 Advantages of the Global Wordnet
               Grid
• Shared and uniform world knowledge:
  – universal inferencing
  – uniform text analysis and interpretation
• More compact and less redundant databases
• More clear notion how languages map to
  the knowledge
  – better criteria for expressing knowledge
  – better criteria for understanding variation
                                                  93
Expansion with pure hyponymy
          relations
                        dog
 hunting dog                                   puppy

                                   dachshund
  lapdog
                          poodle                   bitch
           street dog
                   watchdog


                          short hair   long hair
                          dachshund    dachshund


            Expansion from a type to roles


                                                           94
Expansion with pure hyponymy
          relations
                        dog
 hunting dog                                   puppy

                                   dachshund
  lapdog
                          poodle                   bitch
           street dog
                   watchdog


                         short hair    long hair
                         dachshund     dachshund


Expansion from a role to types and other roles


                                                           95
Automotive ontology:
  (http://www.ontoprise.de)




                              96
Who uses ontologies?




                       97
98

				
DOCUMENT INFO