Semantic relations - PowerPoint by pengxiang


									The Global Wordnet Grid: anchoring
  languages to universal meaning

                  Piek Vossen
Irion Technologies/Free University of Amsterdam
•   Wordnet, EuroWordNet background
•   Architecture of the Global Wordnet Grid
•   Mapping wordnets to the Grid
•   Advantages of shared knowledge structure
•   7th Frame work project KYOTO
• Semantic network in which concepts are defined in
  terms of relations to other concepts.
• Structure:
      organized around the notion of synsets (sets
        of synonymous words)
      basic semantic relations between these
 Developed at Princeton by George Miller and his
  team as a model of the mental lexicon.
Relational model of meaning
                      kitten          animal

   man        woman            cat            dog         cat

   boy         girl        kitten         puppy
                            Structure of WordNet
                                  {conveyance; transport}



                                               hyperonym                              {bumper}           {hinge; flexible joint}
                        {motor vehicle; automotive vehicle}                                          meronym
                                                                                      {car door}         {doorlock}
                                                                      meronym                        meronym

                        {car; auto; automobile; machine; motorcar}
                                                                                      {car window}       {armrest}
                                                                                      {car mirror}
                                   hyperonym     hyperonym
{cruiser; squad car; patrol car; police car; prowl car}      {cab; taxi; hack; taxicab; }
      Wordnet Data Model
Relations       Concepts                 Vocabulary of a language
            rec: 12345               1
            - financial institute                bank
            rec: 54321               2
            - side of a river
            rec: 9876                    1       fiddle
            - small string instrument            violin
 type-of    rec: 65438                   2
            - musician playing violin            violist
type-of     - musician
            rec:35576                1
            - string of instrument               string
            rec:29551                2
            - underwear
            - string instrument
               Usage of Wordnet
• Improve recall of textual based analysis:
   – Query -> Index
      •   Synonyms: commence – begin
      •   Hypernyms: taxi -> car
      •   Hyponyms: car -> taxi
      •   Meronyms: trunk -> elephant
      •   Lexical entailments: gun -> shoot
• Inferencing:
   – what things can burn?
• Expression in language generation and translation:
   – alternative words and paraphrases
              Improve recall
• Information retrieval:
  – small databases without redundancy, e.g. image
    captions, video text
• Text classification:
  – small training sets
• Question & Answer systems
  – query analysis: who, whom, where, what, when
                  Improve recall
• Anaphora resolution:
   – The girl fell off the table. She....
   – The glass fell of the table. It...
• Coreference resolution:
   – When he moved the furniture, the antique table got
• Information extraction (unstructed text to
  structured databases):
   – generic forms or patterns "vehicle" - > text with
     specific cases "car"
             Improve recall
• Summarizers:
  – Sentence selection based on word counts ->
    concept counts
  – Avoid repetition in summary -> language
• Limited inferencing: detect locations,
  organisations, etc.
              Many others
• Data sparseness for machine learning:
  hapaxes can be replaced by semantic classes
• Use redundancy for more robustness:
  spelling correction and speech recognition
  can built semantic expections using
  Wordnet and make better choices
• Sentiment and opinion mining
• Natural language learning
• The development of a multilingual database with wordnets
  for several European languages
• Funded by the European Commission, DG XIII,
  Luxembourg as projects LE2-4003 and LE4-8328
• March 1996 - September 1999
• 2.5 Million EURO.
• Languages covered:
   – EuroWordNet-1 (LE2-4003): English, Dutch, Spanish, Italian
   – EuroWordNet-2 (LE4-8328): German, French, Czech, Estonian.
• Size of vocabulary:
   – EuroWordNet-1: 30,000 concepts - 50,000 word meanings.
   – EuroWordNet-2: 15,000 concepts- 25,000 word meaning.
• Type of vocabulary:
   – the most frequent words of the languages
   – all concepts needed to relate more specific concepts
  Wordnet family                                       Princeton WordNet, (Fellbaum 1998):
                                                       Global Wordnet 2004): 1998): 8 languages
                                                       EuroWordNet, (Vossen 6 languages
                                                       BalkaNet, (Tufis Association: all languages
                                                       115,000 conceps
                                     Domains              SUMO DOLCE
                                         Transport                 Object                                     1

                                                                   Device                             Auto Zug
                                      Road    Air Water                              voertuig
                vehicle                                      TransportDevice                             2
                                                                                    auto trein
                          1                            4                                          German Words
              car    train                                                                 2
                                                                                   Dutch Words                 liiklusvahend
                   2                                       ENGLISH                                         1
         English Words                                        Car                                         auto     killavoor
                                              3               …                3
   vehículo                                                                                                2
         1                                                   Train
                                                              …                            véhicule       Estonian Words
  auto tren
                                                            Vehicle                                   1
                                 1                                                      voiture   train
Spanish Words        auto treno                   Inter-Lingual-Index                           2
                                         dopravní prostředník                            French Words
                    Italian Words      auto        1        vlak

                                       Czech Words
• Wordnets are unique language-specific structures:
   –   different lexicalizations
   –   differences in synonymy and homonymy
   –   different relations between synsets
   –   same organizational principles: synset structure and
       same set of semantic relations.
• Language independent knowledge is assigned to
  the ILI and can thus be shared for all language
  linked to the ILI: both an ontology and domain
       Autonomous & Language-Specific
             Wordnet1.5                          Dutch Wordnet
                object                                  voorwerp
artifact, artefact         natural object (an
(a man-made object)        object occurring
                           naturally)         blok   werktuig{tool}       lichaam
block         instrumentality         body {block}                        {body}

 implement                      device
tool                             instrument     bak     lepel          tas
       box      spoon     bag                   {box}   {spoon}       {bag}
Linguistic versus Artificial Ontologies
 Artificial ontology:
     • better control or performance, or a more compact and
     coherent structure.
     • introduce artificial levels for concepts which are not
     lexicalized in a language (e.g. instrumentality, hand tool),
     • neglect levels which are lexicalized but not relevant for the
     purpose of the ontology (e.g. tableware, silverware,

 What properties can we infer for spoons?
 spoon -> container; artifact; hand tool; object; made of metal or
 plastic; for eating, pouring or cooking
Linguistic versus Artificial Ontologies

Linguistic ontology:
   • Exactly reflects the relations between all the lexicalized words and
     expressions in a language.
   • Captures valuable information about the lexical capacity of
     languages: what is the available fund of words and expressions in a

What words can be used to name spoons?
spoon -> object, tableware, silverware, merchandise, cutlery,
       Wordnets versus ontologies
• Wordnets:
  • autonomous language-specific lexicalization
    patterns in a relational network.
  • Usage: to predict substitution in text for
    information retrieval,
  • text generation, machine translation, word-
• Ontologies:
  • data structure with formally defined concepts.
  • Usage: making semantic inferences.
          The Multilingual Design
• Inter-Lingual-Index: unstructured fund of concepts to
  provide an efficient mapping across the languages;

• Index-records are mainly based on WordNet synsets and
  consist of synonyms, glosses and source references;

• Various types of complex equivalence relations are

• Equivalence relations from synsets to index records: not on a
  word-to-word basis;

• Indirect matching of synsets linked to the same index items;
        Equivalent Near Synonym
1. Multiple Targets (1:many)
    Dutch wordnet: schoonmaken (to clean) matches with 4
    senses of clean in WordNet1.5:
   • make clean by removing dirt, filth, or unwanted substances from
   • remove unwanted substances from, such as feathers or pits, as of chickens or fruit
   • remove in making clean; "Clean the spots off the rug"
   • remove unwanted substances from - (as in chemistry)
2. Multiple Sources (many:1)
       Dutch wordnet: versiersel near_synonym versiering
       ILI-Record: decoration.
3. Multiple Targets and Sources (many:many)
       Dutch wordnet: toestel near_synonym apparaat
       ILI-records: machine; device; apparatus; tool
       Equivalent Hyperonymy
Typically used for gaps in English WordNet:
• genuine, cultural gaps for things not known in
  English culture:
   – Dutch: klunen, to walk on skates over land from one
     frozen water to the other

• pragmatic, in the sense that the concept is known but
  is not expressed by a single lexicalized form in
   – Dutch: kunstproduct = artifact substance <=> artifact
From EuroWordNet to Global WordNet

• Currently, wordnets exist for more than 40
  languages, including:
• Arabic, Bantu, Basque, Chinese, Bulgarian,
  Estonian, Hebrew, Icelandic, Japanese, Kannada,
  Korean, Latvian, Nepali, Persian, Romanian,
  Sanskrit, Tamil, Thai, Turkish, Zulu...

• Many languages are genetically and typologically
            Some downsides
• Construction is not done uniformly
• Coverage differs
• Not all wordnets can communicate with one
• Proprietary rights restrict free access and usage
• A lot of semantics is duplicated
• Complex and obscure equivalence relations due to
  linguistic differences between English and other
       Next step: Global WordNet Grid
                                                                                       Auto Zug
                                             Inter-Lingual            voertuig

                                               Ontology                      1
                                                                     auto trein
                                                                                   German Words
              car     train                         Object                  2
                                                                    Dutch Words                 liiklusvahend
                   2                                                                        1
         English Words                                                                       auto killavoor
                                         3    TransportDevice   3
   vehículo                                                                                 2
                                                                           véhicule        Estonian Words
  auto tren                                                                            1
                       veicolo                                           voiture   train
          2                      1
Spanish Words        auto treno                                                  2
                                      dopravní prostředník                French Words
                          2                    1
                    Italian Words    auto             vlak

                                     Czech Words
      GWNG: Main Features
• Construct separate wordnets for each Grid
• Contributors from each language encode the
  same core set of concepts plus
  culture/language-specific ones
• Synsets (concepts) can be mapped
  crosslinguistically via an ontology
• No license constraints, freely available
   The Ontology: Main Features
• Formal, artificial ontology serves as
  universal index of concepts
• List of concepts is not just based on the
  lexicon of a particular language (unlike in
  EuroWordNet) but uses ontological
• Concepts are related in a type hierarchy
• Concepts are defined with axioms
     The Ontology: Main Features

• In addition to high-level (“primitive”) concept
  ontology needs to express low-level concepts
  lexicalized in the Grid languages

• Additional concepts can be defined with
  expressions in Knowledge Interchange Format
  (KIF) based on first order predicate calculus and
  atomic element
   The Ontology: Main Features
• Minimal set of concepts (Reductionist view):

   – to express equivalence across languages
   – to support inferencing

• Ontology must be powerful enough to encode all
  concepts that are lexically expressed in any of the
  Grid languages
   The Ontology: Main Features
• Ontology need not and cannot provide a linguistic
  encoding for all concepts found in the Grid
   – Lexicalization in a language is not sufficient to warrant
     inclusion in the ontology
   – Lexicalization in all or many languages may be
• Ontological observations will be used to define the
  concepts in the ontology
        Ontological observations
• Identity criteria as used in OntoClean (Guarino &
  Welty 2002), :
  – rigidity: to what extent are properties true for entities
    in all worlds? You are always a human, but you can be
    a student for a short while.
  – essence: what properties are essential for an entity?
    Shape is essential for a statue but not for the clay it is
    made of.
  – unicity: what represents a whole and what entities are
    parts of these wholes? An ocean is a whole but the
    water it contains is not.
           Type-role distinction
• Current WordNet treatment:
   (1) a husky is a kind of dog(type)
   (2) a husky is a kind of working dog (role)
• What’s wrong?
   (2) is defeasible, (1) is not:
   *This husky is not a dog
   This husky is not a working dog

Other roles: watchdog, sheepdog, herding dog,
  lapdog, etc….
          Ontology and lexicon
•Hierarchy of disjunct types:
      Canine  PoodleDog; NewfoundlandDog;
        GermanShepherdDog; Husky
   – NAMES for TYPES:
      {poodle}EN, {poedel}NL, {pudoru}JP
      ((instance x Poodle)
   – LABELS for ROLES:
      {watchdog}EN, {waakhond}NL, {banken}JP
      ((instance x Canine) and (role x GuardingProcess))
          Ontology and lexicon
•Hierarchy of disjunct types:
      River; Clay; etc…
   – NAMES for TYPES:
      {river}EN, {rivier, stroom}NL
      ((instance x River)
   – LABELS for dependent concepts:
      {rivierwater}NL (water from a river => water is not Unit)
      ((instance x water) and (instance y River) and (portion x y)
      {kleibrok}NL (irregularly shared piece of clay=>Non-essential)
      ((instance x Object) and (instance y Clay) and (portion x y)
        and (shape X Irregular))
• The “primitive” concepts represented in the
  ontology are rigid types
• Entities with non-rigid properties will be
  represented with KIF statements

• But: ontology may include some universal,
  core concepts referring to roles like father,
    Properties of the Ontology
• Minimal: terms are distinguished by
  essential properties only
• Comprehensive: includes all distinct
  concepts types of all Grid languages
• Allows definitions via KIF of all lexemes
  that express non-rigid, non-essential
  properties of types
• Logically valid, allows inferencing
   Mapping Grid Languages onto
          the Ontology
• Explicit and precise equivalence relations among synsets in
  different languages, which is somehow easier:
   – type hierarchy is minimal
   – subtle differences can be encoded in KIF expressions
• Grid database contains wordnets with synsets that label
   – either “primitive” types in the hierarchies,
   – or words relating to these types in ways made explicit in KIF
• If 2 lgs. create the same KIF expression, this is a statement
  of equivalence!
   How to construct the GWNG
• Take an existing ontology as starting point;
• Use English WordNet to maximize the
  number of disjunct types in the ontology;
• Link English WordNet synsets as names to
  the disjunct types;
• Provide KIF expressions for all other
  English words and synsets
   How to construct the GWNG
• Copy the relation from the English Wordnet to the
  ontology to other languages, including KIF
  statements built for English
• Revise KIF statements to make the mapping more
• Map all words and synsets that are and cannot be
  mapped to English WordNet to the ontology:
   – propose extensions to the type hierarchy
   – create KIF expressions for all non-rigid concepts
      Initial Ontology: SUMO
                (Niles and Pease)

SUMO = Suggested Upper Merged Ontology
--consistent with good ontological practice
--fully mapped to WordNet(s): 1000 equivalence
   mappings, the rest through subsumption
--freely and publicly available
--allows data interoperability
--allows NLP
--allows reasoning/inferencing
Mapping Grid languages onto the
• Check existing SUMO mappings to
  Princeton WordNet -> extend the ontology
  with rigid types for specific concepts
• Extend it to many other WordNet synsets
• Observe OntoClean principles! (Synsets
  referring to non-rigid, non-essential, non-
  unicitous concepts must be expressed in
Lexicalizations not mapped to WordNet
 • Not added to the type hierarchy:
    {straathond}NL (a dog that lives in the streets)
    ((instance x Canine) and (habitat x Street))

 • Added to the type hierarchy:
    {klunen}NL (to walk on skates from one frozen body to
      the next over land)
    KluunProcess => WalkProcess
    (and (instance x Human) (instance y Walk) (instance z
      Skates) (wear x z) (instance s1 Skate) (instance s2
      Skate) (before s1 y) (before y s2) etc…
 • National dishes, customs, games,....
 Most mismatching concepts are not
           new types
• Refer to sets of types in specific circumstances or
  to concept that are dependent on these types, next
  to {rivierwater}NL there are many others:
      {theewater}NL (water used for making tea)
      {koffiewater}NL (water used for making coffee)
      {bluswater}NL (water used for making extinguishing file)
• Relate to linguistic phenomena:
   – gender, perspective, aspect, diminutives, politeness,
     pejoratives, part-of-speech constraints
KIF expression for gender marking

• {teacher}EN
((instance x Human) and (agent x

• {Lehrer}DE ((instance x Man) and (agent
  x TeachingProcess))
• {Lehrerin}DE ((instance x Woman) and
  (agent x TeachingProcess))
  KIF expression for perspective
sell: subj(x), direct obj(z),indirect obj(y)
buy: subj(y), direct obj(z),indirect obj(x)
(and (instance x Human)(instance y Human)
  (instance z Entity) (instance e FinancialTransaction)
  (source x e) (destination y e) (patient e)

The same process but a different perspective by subject
  and object realization: marry in Russian two verbs,
  apprendre in French can mean teach and learn
Parallel Noun and Verb hierarchy
Encoded once as a Process in the ontology!
• event                         • to happen
   – act                           – to act
       • deed                          • to do
           – sail                            – to sell
           – promise                         – a promise
   – change                        – to change
       • movement                      • to move
           – change of                       – to move position
    Part-of-speech mismatches
• {bankdrukken-V}NL vs.{bench press-N}EN
• {gehuil-N}NL vs. {cry-V}EN
• {afsluiting-N}NL vs. {close-V}EN

• Process in the ontology is neutral with respect
  to POS!
             Aspectual variants
• Slavic languages: two members of a verb pair for an
  ongoing event and a completed event.
• English: can mark perfectivity with particles, as in the
  phrasal verbs eat up and read through.
• Romance languages: mark aspect by verb conjugations on
  the same verb.
• Dutch, verbs with marked aspect can be created by
  prefixing a verb with door: doorademen, dooreten,
  doorfietsen, doorlezen, doorpraten (continue to
• These verbs are restrictions on phases of the same
• Which does NOT warrant the extension of the ontology
  with separate processes for each aspectual variant
       Aspectual lexicalization
• Regular compositional verb structures:

    doorademen:    (lit. through+breath, continue to breath)
    doorbetalen:   (lit. through+pay, continue to pay)
    doorlopen:     (lit. through+walk, continue to walk)
    doorfietsen:   (lit. through+walk, continue to walk)
    doorrijden:    (lit. through+walk, continue to walk)

(and (instance x BreathProcess)(instance y Time)
  (instance z Time) (end x z) (expected (end x y)
  (after z y))
   Lexicalization of Resultatives
  openmaken:         (lit. open+make, to cause to be open);
  dichtmaken:        (lit. close+make, to cause to be open);

  openknijpen        (lit. open+squeeze, to open by squeezing)
     has_hyperonym   knijpen (squeeze) & openmaken (to open)

  opendraaien        (lit. open+turn, to open by turning)
     has_hyperonym   draaien (to turn) & openmaken (to open)

  dichtknijpen:      (lit. closed+squeeze, to close by squeezing)
     has_hyperonym   knijpen (squeeze) & dichtmaken (to close)

  dichtdraaien:      (lit. closed +turn, to close by turning)
     has_hyperonym   draaien (to turn) & dichtmaken (to close)
    Kinship relations in Arabic
•   ‫(عَم‬Eam~)        father's brother,
    paternal uncle.
•       ‫خ‬
    ‫( َال‬xaAl)       mother's brother,
    maternal uncle.
•     ‫َّم‬
    ‫( ع َة‬Eam~ap) father's sister, paternal
•      ‫خل‬
     ‫( َاَة‬xaAlap) mother's sister, maternal
     Kinship relations in Arabic
•   .........
•    ‫قق‬
    ‫$( شَ ِي َة‬aqiyqapfull) sister, sister on the paternal and
    maternal side (as distinct from ‫>( ُخْت‬uxot): 'sister'
    which may refer to a 'sister' from paternal or maternal
    side, or both sides).
•   ‫( ثَكْالن‬vakolAna)       father bereaved of a child (as
                     ‫يت‬                 ‫ي ّم‬
    opposed to ‫( َ ِيم‬yatiym) or ‫( َتِي َة‬yatiymap) for
    feminine: 'orphan' a person whose father or mother died
    or both father and mother died).
•       ْ‫ث‬
    ‫( َكلَى‬vakolaYa)         other bereaved of a child (as
    opposed to ‫ يَتِيم‬or ‫ يَتِي َة‬for feminine: 'orphan' a person
    whose father or mother died or both father and mother
       Complex Kinship concepts
father's brother, paternal uncle

paternal uncle    => uncle
                  => brother of ....????

 (paternalUncle ?P ?UNC)
 (exists (?F)
    (father ?P ?F)
    (brother ?F ?UNC))))
 Advantages of the Global Wordnet
• Shared and uniform world knowledge:
  – universal inferencing
  – uniform text analysis and interpretation
• More compact and less redundant databases
• More clear notion how languages map to
  the knowledge
  – better criteria for expressing knowledge
  – better criteria for understanding variation
Expansion with pure hyponymy
 hunting dog                                   puppy

                          poodle                   bitch
           street dog

                          short hair   long hair
                          dachshund    dachshund

            Expansion from a type to roles
Expansion with pure hyponymy
 hunting dog                                   puppy

                          poodle                   bitch
           street dog

                         short hair    long hair
                         dachshund     dachshund

Expansion from a role to types and other roles
Automotive ontology:
Who uses ontologies?
Human dialogues with Alice-bot
     Full understanding is
fundamentally impossible BUT?
• How can people communicate?
• How can people coomunicate with
• As long as language is effective:
  – meaning= to have the desired effect!
  – Link language to useful content!
        in reality


                                 (keitaidenwa )

         Knowledge &                         Expression

                          Useful and effective behavior:
                          -reason over knowledge
                          -collect information and data
                          -deliver services and be helpful
       Concrete goals for GWG
• Global Wordnet Association website:
• 5000 Base Concepts or more:
  –   English
  –   Spanish
  –   Catalan
  –   Czech, Polish, Dutch, other wordnets
• 7th Frame Work project Kyoto
                KYOTO Project
• 7th Frame Work project (under negotiation)
• Kowledge Yielding Ontologies for Transition-based
• Goal:
   – Global Wordnet Grid = ontology + wordnets
   – AutoCons = Automatic concept extractors
   – Kybots = Knowledge yielding robots
   – Wiki environment for encoding domain knowledge in expert
   – Index and retrieval software for deep semantic search
• Languages: Dutch, English, Spanish, Basque, Italian,
  Chinese and Japanese
• Domain of application: environmental organisations
• Period: March/April 2008 - 2011
             KYOTO Consortium
• Vrije Universiteit Amterdam, Amsterdam, Netherlands
• Consiglio Nazionale delle Ricerche, Pisa, Italy
• Berlin-Brandenburg Academy of Sciences and Humantities, Berlin,
• Euskal Herriko Unibertsitatea, San Sebastian, Spain
• Academia Sinica, Taipei, Taiwan
• National Institute of Information and Communications Technology,
   Kyoto, Japan
• Masaryk University, Brno, Czech
• Irion Technologies, Delft, Netherlands
• Synthema, Pisa, Italy
• European Centre for Nature Conservation, Tilburg, Netherlands
• World Wide Fund for Nature, Zeist, Netherlands

            Environmental                                         Environmental
            organizations                                         organizations

          Universal Ontology       Wordnets
                                             Concept
                                               Mining             Docs             Dialogue
             Abstract Physical
                                              Mining             URLs
             Process   Substance
                       water CO2                                 Experts

Domain water CO2                                                 Images
       pollution emission
                                        wordnet      ontology

                                domain                           domain
                                wordnet                         ontology
         Wiki                                                                       User                  Bench
         DEB                                     DEB                              scenarios               mark
         Client                                 Server                                                     data

                                                                                             7                 8
        Manual                        Concept
        Revision                       Miners
                    relations               3                                       Access                Bench
                                                                                   end-users              marking

1     User

        source           Text & Meta data                               Data & Facts
1                                                                                                 Index
         data             in XMLFormat                                 in XML Format

                   Capture                               Kybots                        Indexing
                        2                           5                             6
                       Ontology         Logical Expressions   Wordnets    Linguistic Miners
                                                                          or Kybots
             Abstract      Physical
                                                                         words       words
     Process               Substance

    Chemical            water     CO2

CO2        water                                                         words       words
emission   pollution

To top