MILE

Document Sample
MILE Powered By Docstoc
					        Infrastructural
     Language Resources
               &
   Standards for Multilingual
    Computational Lexicons

             Nicoletta Calzolari
               … with many others

Istituto di Linguistica Computazionale - CNR - Pisa
              glottolo@ilc.cnr.it


                    Pisa, September 2004
                               The ENABLER Mission
  Language Resources (LRs) & Evaluation: central component of
  the “linguistic infrastructure”
  LRs supported by national funding in National Projects
  Availability of LRs also a “sensitive” issue, touching the sphere
  of linguistic and cultural identity, but also with economical and
  political implications
The ENABLER Network of National initiatives, aims at
 “enabling” the realisation of a cooperative framework


     formulate a common agenda of medium- & long-term research
     priorities
     contribute to the definition of an overall framework for the provision of
     LRs
                               Pisa, September 2004
                                                          towards ….

Only
       Combining the strengths of different initiatives & communities
       Exploiting at best the ‘modus operandi’ of the national funding authorities in
       different national situations
       Responding to/anticipating needs and priorities of R&D & industrial
       communities
       Promoting the adoption of [de facto] standards, best practices
       With a clear distinction of tasks & roles for different actors



                            We can produce the
      synergies, economy of scale, convergence & critical mass
   necessary to provide the infrastructural LRs needed to realise the
         full potential of a multilingual global information society
                                   Pisa, September 2004
                      Lexicon and Corpus:
                 a multi-faceted interaction
   LC       tagging
   CL       frequencies (of different linguistic “objects”)
   CL       proper nouns, acronyms, …
   LC       parsing, chunking, …
   CL       training of parsers
   CL       lexicon updating
   CL       “collocational” data (MWE, idioms, gram. patterns ...)
   CL       “nuances” of meanings & semantic clustering
   CL       acquisition of lexical (syntactic/semantic) knowledge
   LC       semantic tagging/word-sense disambiguation
                                                        (e.g. in Senseval)
   CL       more semantic information on LE
   CL       corpus based computational lexicography
   CL       validation of lexical models
   CL       …
   LC       ...              Pisa, September 2004
             ...Language as a “Continuum”
Lexicon & Corpus
as two viewpoints on the same ling. object
                  …. even more in a multilingual context

Interesting - and intriguing - aspects of corpus use:
   impossibility of descriptions based on a clear-cut boundary betw. what is
    admitted and what is not

   in actual usage, language displays a large number of properties behaving as a
    continuum, and not as properties of “yes/no” type

   the same is true for the so-called “rules”, where we find more a “tendency”
    towards rules than precise rules in corpus evidence

   difficult to constrain word meaning within a rigorously                  defined
    organisation: by its very nature it tends to evade any strict boundary
                                                              BUT
                                   Pisa, September 2004
                         Extraction from texts vs.
                 formal representation in lexicons

   It is difficult to constrain word meaning within a rigorously defined
    organisation: by its very nature it tends to evade any strict boundary

   The rigour and lack of flexibility of formal representation languages
    causes difficulties when mapping into it NL word meaning, ambiguous
    and flexible by its own nature

   No clear-cut boundary when analysing many phenomena: it’s more a
    continuum

   The same impression if one looks at examples of types of alternations:
               no clear-cut classes across languages
               or within one language
                                 Pisa, September 2004
                              Correlation between
        different levels of linguistic description
                  in the design of a lexical entry
To understand word-meaning:
   Focus on the correlation between syntactic and semantic aspects

   But other linguistic levels - such as morphology, morphosyntax, lexical
    cooccurrence, collocational data, etc. - are closely interrelated/involved

   These relations must be captured when accounting for meaning
    discrimination

   The complexity of these interrelationships makes semantic
    disambiguation such a hard task in NLP
            Textual corpora as a device to discover and reveal the intricacy of
             these relationships
            Frame/SIMPLE semantics as a device to unravel and disentangle the
             complex situation into elementary and computationally manageable
             pieces
                                  Pisa, September 2004
                    towards Corpus                   based Semantic
                                                           Lexicons
                                                         … at least in principle

   both in the design of the model , &
   in the building of the lexicon (at least partially)
       with (semi-)automatic means



        Design of the lexical entry with a combined approach:
             theoretical: e.g. Fillmore Frame Semantics/
                                Pustejovsky Generative Lexicon, …
             empirical: Corpus evidence

                     o even if: not always there are sound and explicit criteria for
                    classification according to “frame elements”/qualia relations/...
                                  Pisa, September 2004
Infrastructure of Language Resources...
       ...static
Semantic networks: Euro-/ItalWordNet
Lexicons: PAROLE/SIMPLE/CLIPS
TreeBanks                                         International
                                                     Standards
                       But   … they will never be “complete”

      …dynamic
Lexical acquisition systems (syntactic & semantic) from corpora
Infrastructure of tools
   •Robust morphosyntactic & syntactic analysers
   •Word-sense disambiguation systems
   •Sense classifiers
   •...                     Pisa, September 2004
                                                     ItalWordNet
                                                  Semantic Network
                                          [Italian module of EuroWordNet]


~ 50.000 lemmas organized in synonym groups (synsets), structured
in hierarchies & linked by ~ 130.000 semantic relations

   ~ 50.000 hyperonymy/hyponymy relations
   ~ 16.000 relations among different POS (role, cause, derivation, etc..)
   ~ 2.000 part-whole relations
   ~ 1.500 antonymy relations, …etc.

•Synsets linked to the InterLingual Index (ILI=Princeton WordNet),

•Through the ILI link to all the European WordNets (de-facto standard)
                           & to the common Top Ontology

•Possibility of plug-in with domain terminological lexicons
                                 (legal, maritime)
•Usable in IR, CLIR, IE, QA, ...
                                   Pisa, September 2004
  EuroWordNet Multilingual Data Structure

                                  TOP
                LIVING         ONTOLOGY
              ANIMAL         HUMAN
  cane                                   hond


 Italian                dog             Dutch
 WN                                     WN


                                         dog
 perro

Spanish                                  English
WN                      ILI              WN


     French                  Estonian
     WN       German         WN         Czech
              WN Pisa, September 2004   WN
                      TOP Concepts:Object,Artifact,Building
 Hyperonym: {edificio,..}

                                                     home, domicile, ..
{Casa, abitazione, dimora }                          house

                                 Role_location: {stare, abitare, ...}
Hyponym:
{villetta }
                                       Role_target_direction: {rincasare}
{catapecchia, bicocca, .. }
{cottage}
{bungalow }                             Role_patient: {affitto, locazione}


                                               Mero_part: {vestibolo}
 Synsets linked                                             {stanza}

 by Semantic Relations                          Holo_part: {casale}
 in ItalWordNet                                             {frazione}
                                                            {caseggiato}
                              Pisa, September 2004
                          Jur-WordNet
With ITTG-CNR (Istituto di Teoria e Tecniche dell’informazione Giuridica)

   Jur-WordNet  Extension for the juridical domain of
    ItalWordNet
           Knowledge base for multilingual access to sources of legal
            information

           Source of metadata for semantic mark-up of legal texts

           To be used, together with the generic ItalWordNet, in applications of
            Information Extraction, Question Answering, Automatic Tagging,
            Knowledge Sharing, Norm Comparison, etc.



                                 Pisa, September 2004
Terminological Lexicon of Navigation &
                    Sea Transportation
       Nolo




                                      Synsets  1.614
                                      Lemmas  2.116
                                      Senses  2.232
                                      Nouns  1.621
                                      Verbs  205
                                      Adjectives  35
                                      Proper Nouns 
                                         236




               Pisa, September 2004
                                                                     PAROLE/SIMPLE
                                                                     12 harmonised
                               PAROLE Corpus                         computational lexicons
                                                                    http://www.ilc.cnr.it/clips/




             PAROLE                                            SIMPLE
          Ital. Synt. Lex.                                  Ital. Sem. Lex.
                                                                 ’98-2000
                  ’96-’98
                   SGML                                            SGML


    morphology:     20,000 entries                          semantics: 10,000 senses
    syntax:         20,000 words




                                      CLIPS
phonology                            2000-2004                    semantics: 55,000 senses
morphology 55,000 words
syntax                                    XML
                                     Pisa, September 2004
machine language learning




       Pisa, September 2004
linguistic learning

          machine language learning
           development of conceptual networks

                               linguistic change models


                                              language usage models




                                       adaptive classification systems


                           information extraction

           bootstrapping of lexical information

bootstrapping of grammars
                       Pisa, September 2004
Architecture for linguistic knowledge acquisition ...
                 terminology
          unstructured
              text
              data

                          annotation
                            tools                              LKG

                                                                                  cross-lingual
                                                                                   information
                                                                                     retrieval


                          annotated
 lexica                     data                                                      multi-lingual
     lexica                                                                           information
                                                                     structured        extraction
                                                                     knowledge

                           machine
                           learning
     lexicon             for linguistic                 user                       multi-lingual text
      model               knowledge                     need                            mining
                          acquisition                    s




…. towards “dynamic” lexicons, able to auto-enrich
                                          Pisa, September 2004
                                                      Harmonisation:
                    More & more Need of a Global View
                              for Global Interoperability

Integration/sharing of data & software/tools
 Need of compatibility among various components
   An “exemplary cycle”:

                               Formalisms
                                Grammars
                             Software: Taggers,
                            Chunkers, Parsers, …
Representation                                                Annotation
Lexicon                                                          Corpora
Terminology                     Software:
                            Acquisition Systems
                             I/O Interfaces
                               Pisa, September 2004
 A short guide to ISLE/EAGLES
http://www.ilc.cnr.it/EAGLES96/isle/ISLE_Home_Page.htm



      Multilingual Computational Lexicon
                  Working Group


                      Pisa, September 2004
                    Target: … the Multilingual ISLE
                                        Lexical Entry          (MILE)
   General methodological principles (from EAGLES):

       high granularity: factor out the (maximal) set of primitive units of
        lexical info (basic notions) with the highest degree of inter-
        theoretical agreement

       modular and layered: various degrees of specification possible
       explicit representation of info
       allow for underspecification (& hierarchical structure)

       leading principle: edited union of existing lexicons/models
        (redundancy is not a problem)
       open to different paradigms of multilinguality
       oriented to the creation of large-scale & distributed lexicons
                                 Pisa, September 2004
                         Paths to Discover the
                         Basic Notions of MILE

   clues in dictionaries to decide on target equivalent
   guidelines for lexicographers
   clues (to disambiguate/translate) in corpus concordances
   lexical requirements from various types of transfer conditions
    & actions in MT systems
   lexical requirements from interlingua-based systems
   …




         a list of critical information types that will
                compose each module of the MILE

                        Pisa, September 2004
                        Designing MILE
                           Steps towards MILE:

   Creating entries (Bertagna, Reeves, Bouillon)
   Identifying the MILE Basic Notions (Bertagna,Monachini,Atkins,Bouillon)
   Defining the MILE Lexical Model (Lenci, Calzolari, etc.)
   Formalising MILE (Ide)
   Development of the ISLE Lexical Tool (Bel)
   ISLE & spoken language & multimodality (Gibbon)
   Metadata for the lexicon (Peters, Wittenburg)
   A case-study: MWEs in MILE (Quochi, lenci, Calzolari)

                 the MILE Basic Notions
                 the MILE Lexical Model
                               Pisa, September 2004
                  The MILE Basic Notions
                                                  (the EAGLES/ISLE CLWG)



   Basic lexical dimensions & info-types relevant to establish
    multilingual links
   Typology of lexical multilingual correspondences (relevant
    conditions & actions)

Identified by:

       creating sample multilingual lexical entries (Bertagna,
        Reeves)

       investigating the use of sense indicators in traditional
        bilingual dictionaries (Atkins, Bouillon)
       ….
                           Pisa, September 2004
 The MILE Lexical Classes
             –
Data Categories for Content
      Interoperability
Francesca Bertagna*, Alessandro Lenci°,
Monica Monachini*, Nicoletta Calzolari*

          *ILC–CNR – Pisa
           °Pisa University

              Pisa, September 2004
                 Overview
1. MILE Lexical Model with Lexical Objects and
   Data Categories
2. Mapping of existing lexicons onto MILE
3. RDF schema and DC Registry for some pre-
   instantiated lexical objects together with a
   sample entry from the PAROLE-SIMPLE
   lexicons in MILE
4. Future …

                   Pisa, September 2004
       The MILE Lexical Model                                Guideline
                                                             s syntactic
                                                              semantic
                                                              lexicons


    GENELEX
     Model


PAROLE-SIMPLE
   Lexicons
                                Computational Lexicon Working Group
   Multilingual
     Lexicons
(EuroWordNet, etc.)

                                         MILE Lexical Model


                      Pisa, September 2004
               The MILE Main Features
   A general architecture devised as a common representational
    layer for multilingual Computational Lexicons
       both for hand-coded and corpus-driven lexical data

Key features:
   Modularity
   Granularity
   Extensibility and “openess” - User-adaptability
   Resource Sharing
   Content Interoperability
   Reusability

               Semantic Web technologies & standards
                   applied at Lexicon modelling
                                Pisa, September 2004
The MILE Lexical Model (MLM)
   The MLM core is the Multilingual ISLE Lexical
    Entry (MILE)
       a general schema for multilingual lexical resources
       a lexical meta-entry as a common representational layer for
        multilingual lexicons
   Computational lexicons can be viewed as different
    instances of the MILE schema
                               MILE
                            Lexical Model



          lexicon#1           lexicon#2            lexicon#3

                            Pisa, September 2004
                             MILE
              the building-block model
   The MILE architecture is designed according to
    the building-block model:
     Lexical entries are obtained by combining various
      types of lexical objects (atomic and complex)
     Users design their lexicon by:
         selecting and/or specifying the relevant lexical objects
         combine the lexical objects into lexical entries

       Lexical objects may be shared:
          within the same lexicon (intra-lexicon reusability)
         among different lexicons (inter-lexicon reusability)


                             Pisa, September 2004
                  MILE
       the building-block model

Lexical entry 1   Lexical entry 2                 Lexical entry 3




Lexical Objects
                                                                        Sem
                      syntactic
                                                                      feature
                       frame



          slot                                                        Syn
                                         phrase
                                                                    feature
                  Pisa, September 2004
               Modularity in MILE
Horizontal organization, where independent, but interlinked,
modules allow to express different dimensions of lexical entries

                                           multi-MILE

                                           multilingual
   semantic layer                        correspondence
                                            conditions
                       linking
                     conditions

   syntactic layer


   morphological                        multiple levels
      layer                             of modularity     mono-Mile
                     mono-MILE

                             Pisa, September 2004
          The Mono-MILE

    Each monolingual layer within Mono-MILE identifies a
     basic unit of lexical description
                                             basic unit to describe the
  semantic layer                  SemU       semantic properties of the
                                             MU

                                             basic unit to describe the
                                             syntactic behaviour of the
  syntactic layer                   SynU     MU


                                             basic unit to describe the
                                             inflectional and
morphological layer                  MU      derivational morphological
                                             properties of the word
                      Pisa, September 2004
                  The Mono-MILE

                                                    SemU
                         SynU                         SemU
                           SynU                         SemU
           MU
                             SynU                         SemU
                                                            SemU
                               SynU                           SemU
                                                                SemU


Within each layer, a basic linguistic information unit is identified


                             Pisa, September 2004
            Granularity in MILE
   Concerns the vertical dimension. Within a given lexical
    layer, varying degrees of depth of lexical descriptions
    are allowed, both shallow and deep lexical
    representations




                        Pisa, September 2004
             Defining the MLM

   The MLM is designed as an E-R model (MILE Entry
    Schema)
       defines the lexical objects and the ways they can be combined
        into a lexical entry
   The MLM includes 3 types of lexical objects:
      MILE Lexical Classes (MLC)

      MILE Lexical Data Categories (MDC)

     MILE       Lexical Operations (MLO)
                          Pisa, September 2004
        The MILE Lexical Objects
   Within each layer, basic lexical notions are
    represented by lexical objects:
     MILE Lexical Classes MLC
     MILE Data Categories MDC
     Lexical operations

   They are an ontology of lexical objects as an
    abstraction over different lexical models and
    architectures


                     Pisa, September 2004
       The MILE E/R diagrams

   The lexical objects are described with E-R
    diagrams which define them and the ways they
    can be combined into a lexical entry




                    Pisa, September 2004
MILE Lexical Objects: Syntactic Layer
                 hasSyntacticFrame
                                     1..*         MLC:SyntacticFrame



                   hasFrameSet                      MLC:FrameSet
                                          *
  MLC:SynU

                  composedby                       MLC:Composition
                                      *



                  correspondTo                       MLC:SemU
                                      *



             MLC:CorrespSynUSemU

                           Pisa, September 2004
           … expanding one node.
               SynU                           …

           SyntacticFrame
                                              …

   Construction          Self


 Slot             Slot



Function



 Phrase
                                Pisa, September 2004
MILE Lexical Objects: Semantic Layer
                 belongsToSynset
                                     *              MLC:Synset


                  hasSemFrame                    MLC:SemanticFrame
                                   0..1

  MLC:SemU       hasSemFeature               MLC:SemanticFeature
                                     *

                 hasCollocation                   MLC:Collocation
                                      *


                semanticRelation                    MLC:SemU
                                     *



             MLC:SemanticRelation

                          Pisa, September 2004
MILE Lexical Objects: Synt-Sem Linking

                                    hasSourceSynu
                                                        1        MLC:SynU
MLC:CorrespSynUSemU
                                    hasTargetSemu
                                                        1        MLC:SemU




   hasPredicativeCorresp       MLC:PredicativeCorresp
                           1



                                     IncludesSlotArgCorresp
                                                                     MLC:SlotArgCorresp
                                                              0..*



                                 Pisa, September 2004
    Syntax-Semantics Linking

SynU      CorrespSynUSemU            SemU


Frame
            PredCorresp              Predicate
Slot0
             Slot0:Arg1               Arg_0
Slot1        Slot1:Arg0               Arg_1

              filters
                &
            conditions
              Pisa, September 2004
    Syntax-Semantics Linking

                   John gave the book to Mary
                   John gave Mary the book

          SynU#1
                                                        SemU#1

subj_NP   obj_NP      obl_PP_to                 Semantic_Frame:GIVE

                                                Arg1     Arg2    Arg3
          SynU#2
                                                Agent   Theme    Goal

subj_NP   obj_NP       obj_NP

                         Pisa, September 2004
  Syntax-Semantic Linking in SIMPLE
                             SynU_migliorare


                                               Intransitive structure
     Transitive structure      Frameset
                                                Slot0    Ø
         Slot0    Slot1



CorrespSynUSemU                                         CorrespSynUSemU
 isomorphic        SlotArgCorresp   SlotArgCorresp       non-isomorphic




                       PRED_ migliorare
                     ARG0:Agent      ARG1:Patient

       SemU1_migliorare               SemU2_migliorare
      CAUSE_CHANGE_OF_STATESeptember 2004
                        Pisa,          CHANGE_OF_STATE
               The Multilingual layer
                   hasMUMUCorr
                                      1..0          MUMUCorresp


                  hasSynUSynuCorr                  SynUSynUCorresp
                                     1..0

MultiCorresp     hasSemUSemUCorr                   SemUSemUCorresp
                                     1..0

                 hasSynsetMultCorr                 SynsetMultCorresp
                                     1..0


                 hasSemFrameCorr             SemanticFrameMultCorresp
                                     1..0




                            Pisa, September 2004
    MILE approach to multilinguality

   Open to various approaches
       transfer-based
           monolingual descriptions are used to state correspondences
            (tests and actions) between source and target entries
       interlingua-based
           monolingual entries linked to language-independent lexical
            objects (e.g. semantic frames, “primitive predicates”, etc.)




                             Pisa, September 2004
               The Multi-MILE

   Multi-MILE specifies a formal environment to
    express multilingual correspondences between
    lexical items
   Source and target lexical entries can be linked
    by exploiting (possibly combined) aspects of
    their monolingual descriptions
       monolingual lexicons act as pivot lexical repositories,
        on top of which language-to-language multilingual
        modules can be defined
                         Pisa, September 2004
               The Multi-MILE

   Multi-MILE may include:
     Multlingual operations to establish transfer links
      between source and target mono-MILE
     Multlingual lexical objects
          enrich the source and target lexical descripotions, but
          do not belong to the monolingual lexicons

       Language-independent lexical objects:
          Primitive semantic frames, “interlingual synsets”, etc.
          Relevant for interlingua approaches to multilinguality


                          Pisa, September 2004
                     Multi-MILE

                         IT_SemU_2  En_SemU_1
 SemU_1 SemU_2           IT_SynU_2  En_SynU_1        SemU_1
                         IT_Slot_0 EN_Slot_1
                         IT_Slot_1  EN_Slot_0
 SynU_1     SynU_2
                                                      SynU_1
                         AddFeature to source SemU

                         +HUMAN
     MU_1
                         AddSlot to target SynU
                                                       MU_1

Italian                  MODIF [PP_with]             English
mono-MILE                 IT-to-EN multi-MILE        mono-MILE
                       Pisa, September 2004
             Multi-MILE

IT Lexicon                               EN Lexicon
              multilingual conditions
                                           finger
                 modif(mano)
   dito
                modif(piede)
                                             toe


             multilingual conditions
 entrare
                +PP_di_corsa            run + PP_into
“to enter”


                 Pisa, September 2004
             MILE Lexical Classes
   Represent the main building blocks of lexical entries
   Formalize the MILE Basic Notions
   Define an ontology of lexical objects
       represent lexical notions such as semantic unit, syntactic
        feature, syntactic frame, semantic predicate, semantic
        relation, synset, etc.
   Similar to class definitions in OO languages
       specify the relevant attributes
       define the relations with other classes
       hierarchically structured

                             Pisa, September 2004
      MILE Lexical Classes
    an ontology of lexical objects

                     correspondsToSynset
                                                  MLM:Synset
                                   *
                     hasSemanticFrame
   MLM:SemU                                    MLM:SemanticFrame
                                       0..1
id: xs:anyURI        semFeature
comment: xs:string                              MLM:semValues
example: xs:string
                                       *
                     hasCollocation
                                       *        MLM:Collocation

                     semURelation
                                                  MLM:SemU
                                       *

               MLM:SemURelation
                        Pisa, September 2004
    MILE Lexical Data Categories
   MDC are instances of the MILE lexical Classes
       Can be used “off the shelf” or as a departure point for the definition of new or
        modified categories
       Enable modular specification of lexical entities using all or parts of the lexical
        information in the repository
   Each MDC respresents a resource
       uniquely identified by a URI
   Two types of MDC:
       Core MDC
            belong to shared repositories (Lexical Data Category
             Registry)
            lexical objects and linguistic notions with wide consensus
       User Defined MLDC
            user-specific or language specific lexical objects
                                     Pisa, September 2004
            The MILE Data Categories
   Instances of the MILE Lexical Classes are Data Categories
   MDC can belong to a shared repository or be user-defined


                            MLC


                            Core
                            MDC

                     User-defined MDC

                          Pisa, September 2004
  The MILE Data Categories
User-adaptability and extensibility

         MLC:SemanticFeature

             instance_of

          HUMAN            Core
          ARTIFACT
          EVENT
          ANIMAL
          GROUP

          AGE UserDefined
          MAMMAL



                Pisa, September 2004
MILE Lexical Data Categories

                 MLM:Feature


 MLM:SemFeature          MLM:SynFeature               MLM:GrammaticalFunction

   instance_of                  instance_of                    instance_of
MDC                      MDC                             MDC
HUMAN      Core          GENDER                Core      SUBJ         Core
ARTIFACTUAL              CASE                            OBJ
EVENT                    PERSON                          IOBJ
DURATION                 TENSE                           PRED
GROUP                    CONTROL                         X_COMP UserDefined
AGE   UserDefined        ASPECT UserDefined              C_COMP
ANIMATE

                               Pisa, September 2004
     MILE Lexical Operations

   They are used to state conditions and perform
    operations over lexical entries
     Link syntactic slots and semantic arguments
     Constrain the syntax-semantic link
     Express tests and actions in the transfer conditions in
      the multi-MILE
     …

   They provide the “glue” to link various
    independent intra-lexical and inter-lexical
    components
                      Pisa, September 2004
        Multilingual Operations

   Source-to-target language transfer conditions can be
    expressed by combining multilingual operations
   Three types of multingual operations:
       Multilingual correspondences
           Link a source lexical object (MU, SemU, SynU, semantic argument,
            syntactic slot) and a target lexical object (MU, SemU, SynU, semantic
            argument, syntactic slot)
       Add-operations
           Add lexical information relevant for the cross-lingual link, but not
            present in the source or target mono-MILE
       Constrain-operations
           Constrain the transfer link to some portions of source and target
            mono-MILE
                               Pisa, September 2004
               Defining the MLM

                   MILE Lexical                      MILE
                     Classes                     Entry Schema



  RDF/S
Descriptions
                       MDC
                      Registry

                    User Defined
                       MDC
                                            Monolingual/Multilingual
                     Pisa, September 2004
                                                   Lexicon
RDF Instantiation of the MLM

                 Lexicon#2
  Lexicon#1                     Lexicon#3     Resources




                                              Metadata
                  Lexical
                  Objects

                                              Resources
       Lexical                  Lexical
       Classes              Data Categories
                       Pisa, September 2004
            MILE Lexical Model

   Ideal structure for rendering in RDF:
       hierarchy of lexical objects built up by combining
        atomic data categories via clearly defined relations
   Proof of concept:
       Create an RDF schema for the MILE Lexical
        Model
           version 1.2
       Instantiate MILE Lexical Data Categories
                          Pisa, September 2004
User-Adaptability and Resource
      Sharing in MILE
   Compatible with different models of lexical analysis:
       Relational semantic models (e.g. WordNet)
       Syntactic and semantic frames
       Ontology-based lexicons
   Compatible with different degrees of specification:
       Deep lexical representations (e.g. PAROLE-SIMPLE)
       Terminological lexicons
   Compatible with different paradigm of multilinguality
       Lexicons for Transfer Based MT
       Interlingua-based lexicons
       …
                           Pisa, September 2004
The MILE Lexical Model

               MILE
            Lexical Model



  DTD_1           DTD_2                  DTD_n
                                   …



lexicon_1     lexicon_2                lexicon_3
            Pisa, September 2004
               RDF Instantiation of the MLM
   Enable universal access to sophisticated linguistic info
   Provide means for inferencing over lexical info
   Incorporate lexical information into the Semantic Web

   W3C standards:
       Resource Definition Framework (RDF)
       Ontology Web Language (OWL)
   Built on the XML web infrastructure to enable the creation of
    a Semantic Web
       web objects are classified according to their properties
       semantics of relations (links) to other web objects precisely defined

                                   Pisa, September 2004
           The RDF Schema

   Defines classes of objects (MLC) and their
    relations to other objects
   Like a class definition in Java, etc.
   Classes and properties in the schema correspond
    to the E-R model
   Can specify sub-classes/sub-properties and
    inheritance
                   Pisa, September 2004
                     Goals
 Lexical information will form a central
  component of semantic information
 Need a standardized, machine processable
  format so that information can be used,
  merged with others
                                           See
 Main task: get the data model right  Semantic Web




                    Pisa, September 2004
                Advantages of RDF
   Modularity
       Can create “instances” of bits of lexical information for re-
        use in a single lexicon or across lexicons
       Instances can be stored in a central repository for use by
        others
       Can use partial information or all of it
       Building block approach to lexicon creation
   Web-compatible
       RDF instantiation will integrate into Semantic Web
       Inferencing capabilities

                              Pisa, September 2004
                     Example
   Three parts:
     RDF   Schema for lexical entries
        Defines   classes and properties, sub-classes, etc.
     Sample   repository of RDF-instantiated
      lexical objects
        Three   levels of granularity
     Sample     lexicon entries
        Use   repository information at different levels

                       Pisa, September 2004
              Sample Repositories
1       repository of enumerated classes for lexical
        objects at the lowest level of granularity
    •     definition of sets of possible values for various
          lexical objects
2       repository of phrases for common phrase types,
        e.g., NP, VP, etc.
3       repository of constructions for common syntactic
        constructions

                         Pisa, September 2004
<rdfs:Class rdf:about="http://www.cs.vassar.edu/~ide/rdf/isle-enumerated-classes#FunctionType">
<owl:oneOf>
  <rdf:Seq>
    <rdf:li>Subj</rdf:li>
    <rdf:li>Obj</rdf:li>
    <rdf:li>Comp</rdf:li>
    <rdf:li>Arg</rdf:li>
    <rdf:li>Iobj</rdf:li>
  </rdf:Seq>
</owl:oneOf>
</rdfs:Class>

<rdfs:Class rdf:about="http://www.cs.vassar.edu/~ide/rdf/isle-enumerated-classes#SynFeatureName">
<owl:oneOf>
  <rdf:Seq>
    <rdf:li>tense</rdf:li>
    <rdf:li>gender</rdf:li>
    <rdf:li>control</rdf:li>
    <rdf:li>person</rdf:li>
    <rdf:li>aux</rdf:li>
  </rdf:Seq>
</owl:oneOf>
</rdfs:Class>

<rdfs:Class rdf:about="http://www.cs.vassar.edu/~ide/rdf/isle-enumerated-classes#SynFeatureValue">
<owl:oneOf>
  <rdf:Seq>
    <rdf:li>have</rdf:li>
    <rdf:li>be</rdf:li>
    <rdf:li>subject_control</rdf:li>
    <rdf:li>object_control</rdf:li>
    <rdf:li>masculine</rdf:li>
                                                                                    Enumerated
    <rdf:li>feminine</rdf:li>
  </rdf:Seq>                                                                            classes
</owl:oneOf>
</rdfs:Class>
             Sample LDCR for a Phrase
                     Object
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
     xmlns:mlc="http://www.cs.vassar.edu/~ide/rdf/isle-schema-v.6#">

<Phrase rdf:ID="NP" rdfs:label="NP"/>

<Phrase rdf:ID="Vauxhave">
  <hasSynFeature>
   <SynFeature>
     <hasSynFeatureName rdf:value="aux"/>
     <hasSynFeatureValue rdf:value="have"/>
    </SynFeature>
  </hasSynFeature>
</Phrase>

</rdf:RDF>



                                   Pisa, September 2004
     Sample LDCR entry for a Construction object

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
      xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
      xmlns="http://www.cs.vassar.edu/~ide/rdf/isle-schema-v.6#">

<Construction rdf:ID="TransIntrans">
   <slot>
     <SlotRealization rdf:ID="NPsubj">
       <hasFunction rdf:value="Subj"/>
       <filledBy rdf:resource=
       "http://www.cs.vassar.edu/~ide/rdf/isle-datcats/Phrases#NP"/>
     </SlotRealization>
   </slot>
   <slot>
     <SlotRealization rdf:ID="NPobj">
       <hasFunction rdf:value="Obj"/>
       <filledBy rdf:resource=
       "http://www.cs.vassar.edu/~ide/rdf/isle-datcats/Phrases#NP"/>
     </SlotRealization>
   </slot>
</Construction>
</rdf:RDF>                             Pisa, September 2004
                                Full entry
<Entry rdf:ID="eat1">
 <hasSynu rdf:parseType="Resource">
   <SynU rdf:ID="eat1-SynU">
     <example>John ate the cake</example>
     <hasSyntacticFrame>
       <SyntacticFrame rdf:ID="eat1SynFrame">
         <hasSelf>
           <Self rdf:ID="eat1Self">
             <headedBy>
               <Phrase rdf:ID="Vauxhave">
                 <hasSynFeature>
                   <SynFeature>
                     <hasSynFeatureName rdf:value="aux"/>
                     <hasSynFeatureValue rdf:value="have"/>
                   </SynFeature>
                 </hasSynFeature>
               </Phrase>
             </headedBy>
           </Self>
         </hasSelf>
Continued…                          Pisa, September 2004
Continued from previous slide…

         <hasConstruction>
          <Construction rdf:ID="eat1Const">
            <slot>
              <SlotRealization rdf:ID="NPsubj">
                <hasFunction rdf:value="Subj"/>
                <filledBy rdf:value="NP"/>
              </SlotRealization>
            </slot>
            <slot>
              <SlotRealization rdf:ID="NPobj">
                 <hasFunction rdf:value="Obj"/>
                 <filledBy rdf:value="NP"/>
              </SlotRealization>
            </slot>
           </Construction>
         </hasConstruction>
         <hasFrequency rdf:value="8788" mlc:corpus="PAROLE"/>
       </SyntacticFrame>
     </hasSyntacticFrame>
    </SynU>
  </hasSynu>
</Entry>
</rdf:RDF>
                                Pisa, September 2004
<Entry rdf:ID="eat1">
  <hasSynu rdf:parseType="Resource">
   <SynU rdf:ID="eat1-SynU">
     <example>John ate the cake</example>
                                                                              Entry Using
     <hasSyntacticFrame>
       <SyntacticFrame rdf:ID="eat1SynFrame">
         <hasSelf>
                                                                                Phrase
           <Self rdf:ID="eat1Self">
             <headedBy rdf:resource=
              "http://www.cs.vassar.edu/~ide/rdf/isle-datcats/Phrases#Vauxhave"/>
           </Self>
         </hasSelf>
         <hasConstruction>
           <Construction rdf:ID="eat1Const">
             <slot>
              <SlotRealization rdf:ID="NPsubj">
               <hasFunction rdf:value="Subj"/>
               <filledBy rdf:resource=
                "http://www.cs.vassar.edu/~ide/rdf/isle-datcats/Phrases#NP"/>
              </SlotRealization>
             </slot>
             <slot>
              <SlotRealization rdf:ID="NPobj">
               <hasFunction rdf:value="Obj"/>
               <filledBy rdf:resource=
                "http://www.cs.vassar.edu/~ide/rdf/isle-datcats/Phrases#NP"/>
              </SlotRealization>
             </slot>
           </Construction>
         </hasConstruction>
         <hasFrequency rdf:value="8788" mlc:corpus="PAROLE"/>
       </SyntacticFrame>
     </hasSyntacticFrame>
    </SynU>
  </hasSynu>                                        Pisa, September 2004
</Entry>
                 Entry Using Construction
<Entry rdf:ID="eat1">
<hasSynu rdf:parseType="Resource">
   <SynU rdf:ID="eat1-SynU">
     <example>John ate the cake</example>
     <hasSyntacticFrame>
       <SyntacticFrame rdf:ID="eat1SynFrame">
         <hasSelf>
           <Self rdf:ID="eat1Self">
             <headedBy
             rdf:resource=
             "http://www.cs.vassar.edu/~ide/rdf/isle-datcats/Phrases#Vauxhave"/>
           </Self>
         </hasSelf>
         <hasConstruction rdf:resource=
         "http://www.cs.vassar.edu/~ide/rdf/isle-datcats/Constructions#TransIntrans"/>
         <hasFrequency rdf:value="8788" mlc:corpus="PAROLE"/>
       </SyntacticFrame>
     </hasSyntacticFrame>
    </SynU>
  </hasSynu>
</Entry>




                                                Pisa, September 2004
                 Semantic Representation
   The data model underlying RDF/UML, etc. is universal,
    abstract enough to capture all types of info
   Semantic representations:
       Registry of basic data categories
            “meta”-categories: addressee, utterance, etc.
             Information categories: eyebrow movement, gestures, pitch, …
             Supporting ONTOLOGY of information categories
       Interpretative procedures yield another level of meaning represent.
            Registry of categories….


                   UNINTERPRETED                                 INTERPRETED
                   REPRESENATION        INTERPRETATION          REPRESENTATION
                                            PROCESS

                                        Pisa, September 2004
    MILE Lexical Data Category
        Registry (MDC)

   Instantiation of pre-defined lexical objects
   Extension of the shared class schema with lexicon-
    specific sub-classes and sub-properties
   Can be used “off the shelf” or as a departure point for
    the definition of new or modified categories
   Enables modular specification of lexical entities
       eliminate redundancy
       identify lexical entries or sub-entries with shared properties

                           Pisa, September 2004
        MLC in RDF/S
                     features

features are properties of lexical objects

mlm:LexObject                                                mlm:Values
                            mlm:feature

                  rdfs:subPropertyOf

      mlm:semFeature                       rdfs:subClassOf

                                                    rdfs:subClassOf
                       mlm:SemValues


 mlm:synFeature
                    Pisa, September 2004
                                               mlm:SynValues
               MLC in RDF/S
                 syntactic features

<rdfs:Property rdf:ID=“synCat">
         <rdfs:subPropertyOf
                  rdf:resource="http://webilc.ilc.cnr.it/~lenci/isle/mile-
                  schema-v.1#synFeature"/>
         <rdfs:range
                  rdf:resource=“http://webilc.ilc.cnr.it/~lenci/isle/mile-
                  schema-v.1#SynCatValues”/>
</rdfs:Property>
                                                      feature values
<rdfs:Class rdf:ID=“SynCatValues”>
         <rdfs:subClassOf
                  rdf:resource=“http://webilc.ilc.cnr.it/~lenci/isle/mile-
                  schema-v.1 #SynValues”/>
            <owl:oneOf rdf:parseType="Collection">
                  <owl:Thing rdf:about="#Noun"/>
                  <owl:Thing rdf:about="#Verb"/>
                  <owl:Thing rdf:about="#Adjective"/>
                  ...
            </owl:oneOf> </rdfs:Class> </rdfs:RDF>

                          Pisa, September 2004
                 MLC in RDF/S
                   semantic features

<rdfs:Property rdf:ID=“domain">
         <rdfs:subPropertyOf
                  rdf:resource="http://webilc.ilc.cnr.it/~lenci/isle/mile-
                  schema-v.1#semFeature"/>
         <rdfs:range
                  rdf:resource=“http://webilc.ilc.cnr.it/~lenci/isle/mile-
                  schema-v.1 #DomainValues”/>
</rdfs:Property>
                                                      “domain ontology”
<rdfs:Class rdf:ID=“DomainValues”>
         <rdfs:subClassOf
                  rdf:resource=“http://webilc.ilc.cnr.it/~lenci/isle/mile-
                  schema-v.1#SemValues”/>
            <owl:oneOf rdf:parseType="Collection">
                  <owl:Thing rdf:about="#Finance"/>
                  <owl:Thing rdf:about="#Medicine"/>
                  <owl:Thing rdf:about="#Sport"/>
                  ...
            </owl:oneOf> </rdfs:Class> </rdfs:RDF>

                            Pisa, September 2004
          Synsets in RDF/S

                                   mlm:word
      mlm:Synset                                            rdfs:literal

                                mlm:gloss
                                                            rdfs:literal
                    mlm:feature
 mlm:synsetRelation

                                  mlm:Values


      mlm:Synset


cf. also http://www.semanticweb.org/library/wordnet/wordnet-20000620.rdfs
                          Pisa, September 2004
              Synsets in RDF/S

<rdfs:Class rdf:ID="Synset">
         <rdfs:label>Synset</rdfs:label>
         <rdfs:comment>This class formalizes the notion of synset as
         defined in WordNet (Fellbaum 1998).</rdfs:comment>
         <rdfs:subClassOf rdf:resource=“#LexObject”/>
</rdfs:Class>
                                                        relation between
<rdfs:Property rdf:ID="synsetRelation">                      synsets
         <rdfs:domain rdf:resource="#Synset"/>
         <rdfs:range rdf:resource="#Synset"/>           different types of
</rdfs:Property>                                         synset relations
<rdfs:Property rdf:ID="hypernym" mlm:source="WordNet1.7">
         <rdfs:comment>The WordNet hypernym relation</rdfs:comment>
         <rdfs:subPropertyOf rdf:resource="#synsetRelation"/>
</rdfs:Property>
<rdfs:Property rdf:ID="meronym" mlm:source="WordNet1.7">
         <rdfs:comment>The WordNet meronym relation</rdfs:comment>
         <rdfs:subPropertyOf rdf:resource="#synsetRelation"/>
</rdfs:Property>
                           Pisa, September 2004
           WordNet 1.7 Synsets

<mlm:Synset
     rdf:about="http://www.cogsci.princeton.edu/~wn1.7/concept#01752990“
     mlm:source="WordNet1.7">
         <mlm:gloss>A member of the genus Canis</mlm:gloss>
         <mlm:word>dog</mlm:word>
         <mlm:word>domestic dog</mlm:word>
         <mlm:word>Canis familiaris</mlm:word>
         <mdc:synCat rdf:resource="#Noun"/>
         <mdc:domain rdf:resource="#Zoology"/>
         <mdc:hypernym
         rdf:resource="http://www.cogsci.princeton.edu/~wn1.7/concept
                       #01752283"/>
</mlm:Synset>




         hypernym                                        features

                           Pisa, September 2004
 Foundations of the
Mapping Experiment




      Pisa, September 2004
1. The MILE building-block model

   The MILE Lexical Classes and the MILE Lexical
    Data Categories are the main building blocks of the
    MILE lexical architecture

   Building blocks allow two kinds of reusability:
     intra-lexicon reusability (within the same lexicon)
     inter-lexicon reusability (among different lexicons)




                         Pisa, September 2004
  How building-blocks work?


 Lexical entry 1    Lexical entry 2                Lexical entry 3




Lexical Objects
                                                                         Sem
                       syntactic
                                                                       feature
                        frame



           slot                                                        Syn
                                          phrase
                                                                     feature
                   Pisa, September 2004
               2. MILE: a meta-entry
   MILE is
       a general schema for multilingual lexical resources
       a lexical meta-entry, a common representational layer for
        multilingual lexicons
   Computational lexicons can be viewed as different instances
    of the MILE schema


                                MILE



             lexicon#1         lexicon#2              lexicon#3

                               Pisa, September 2004
        MILE and Content Interoperability
   This common shared compatible representation of lexical
    objects is particularly suited to
       manipulate objects available in different lexical resources
       understand their deep semantics
       apply the same operations to lexical objects of the same type




   key elements of Content Interoperability

                                 Pisa, September 2004
The Mapping Experiment: Why?
   It is a concrete experiment aimed to test the expressive
    potentialities and capabilities of the MILE
   The idea is that if the MILE atomic notions combined
    together in different ways suit the different “visions”
    underlying two lexicons such as FrameNet and
    NOMLEX,
     the MILE will come out fortified
     its adoption as an interface between differently conceived
      lexical architectures can be pushed more
     key issues for content interoperability between resources can
      be addressed

                          Pisa, September 2004
           The mapping scenarios
1. High level mapping of the objects of a lexicon into
   the objects of the abstract model
    the native structure is maintained and no format
     conversion is performed


2. Translate instances of lexical entries directly in
   MILE
    acts as a true interchange format


                         Pisa, September 2004
FrameNet to MILE




     Pisa, September 2004
    FrameNet-MILE: Observations
The mapping is promising
 Frame ↔ Predicate (primitive)
 Frame Elements ↔ Argument (enlarge the set of possible values)
 Lexical_Unit ↔ SemU
 Link SemU-Predicate (obligatory) should become underspecified


But …
   Lack of inheritance mechanism in the Predicate does not allow to
    represent the hierarchical organization of Frames and Sub-frames,
    temporal ordering among Frames, subsumption relations among
    Frames
   We could add a new object PredicateRelation to allow for the
    description of relations occurring between predicates and sub-
    predicates
                             Pisa, September 2004
MLC:SynU      MLC:SemU                  MLC:SemanticFrame
                                        TypeOfLinkAgentnom
                                        IncludedArg 0

     MLC:Corresp
     SynUSemU
                                          MLC:Predicate




                               MLC:Argument       MLC:Argument




                                :nom-type ((subject))
              Pisa, September 2004
    NOMLEX-MILE: Observations
The mapping is promising
 Notions represented in NOMLEX have a correspondent in MILE



But ..
    are expressed with two opposite lexical structures
   In NOMLEX,
       lexical information is expressed in a very compact way
       no clear cut boundaries between the levels of linguistic description
   In MILE
       compressed info should be decompressed and spread over different MILE
        lexical layers and objects: SynU, SemU, SemanticFrame with its Predicate and
        relevant Arguments to account for the incorporation of the Agent.


                                    Pisa, September 2004
    Lesson Learned from the mapping
   The results of the experiments are promising
   FrameNet offers the possibility to be confronted with
    two similar lexical models, but not perfectly
    overlapping lexical objects     test the adequacy of the
    linguistic objects
   NOMLEX gives the opportunity to work with two
    lexicons where linguistic notions correspond but are
    expressed with an opposite lexicon structure        test
    the adequacy of the architectural model
   The high granularity and modularity of MILE
       allow the compatibility with differently packaged linguistic
        objects
       allow the addition of new objects and relations without
        perverting the general architecture
                            Pisa, September 2004
             RDF and MILE: Why?
Some reasons (from Nancy Ide et al. 2003)
 MILE as a hierarchy of lexical objects built up by combining
  data categories via clearly defined relations is an ideal structure
  for rendering in RDF
 RDF mechanism, with the capacity of expressing named
  relations between objects, offers a web-based means to
  represent the MILE architecture
 RDF representation of linguistic information is an invaluable
  resource for language processing applications in the Semantic
  Web
 RDF description and instantiation is in line with the goal of
  ISO TC37 SC4

                            Pisa, September 2004
        RDF Representation of MILE
   MILE was already supplied with
     an RDF schema for the MILE Syntactic Layer
     an instantiation of pre-defined syntactic objects

   We increased the repository of shared lexical objects
    with the RDF description and (partial!) instantiations
    of the objects of the semantic and linking layers
   This has been carried out with the intent to
     be submitted within the ISO TC37/SC4
     foster the adoption of MILE, by offering a library of
      RDF objects ready-to-use
                           Pisa, September 2004
An RDF Schema for the synt-sem linking

 <!--
        An RDF Schema for ISLE lexical entries
        v 0.1 2004/05/05
        Author: Monachini
 -->
 <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
      xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
      xmlns:owl ="http://www.w3.org/2002/07/owl#
      xmlns:mlc ="http://www.cs.vassar.edu/~ide/rdf/isle-schema-v.6#
      xmlns:mlc ="http://www.ilc.cnr.it/clips/rdf/isle-schema-syntsemlinking_v.1#">
    <!-- ISLE/MILE lexical objects (classes for the synt-sem linking) -->

 <rdfs:Class rdf:about="http://www.ilc.cnr.it/clips/rdf/isle-schema-syntsemlinking_v.1#CorrespSynUSemU">
 <rdfs:label>CorrespSynUSemU</rdfs:label>
 <rdfs:comment>This class links a SynU to a SemU</rdfs:comment>                                 Classes
 </rdfs:Class>

 <rdfs:Class rdf:about="http://www.ilc.cnr.it/clips/rdf/isle-schema-syntsemlinking_v.1#PredicativeCorresp">
 <rdfs:label>PredicativeCorresp</rdfs:label>
 <rdfs:comment>This class contains the associations between the syntactic slots and semantic
 argument</rdfs:comment>
 </rdfs:Class>

 <rdfs:Class rdf:about="http://www.ilc.cnr.it/clips/rdf/isle-schema-syntsemlinking_v.1#SlotArgCorresp">
 <rdfs:label>SlotArgCorresp</rdfs:label>
 <rdfs:comment>This class links a syntactic slots to a semantic argument</rdfs:comment>
 </rdfs:Class>
                                                 Pisa, September 2004
An RDF Schema for the synt-sem linking


<!-- Properties (relations) between objects and between objects and atomic values -->

<rdf:Property rdf:about="http://www.ilc.cnr.it/clips/rdf/isle-schema-syntsemlinking_v.1#hasSourceSynU">
<rdfs:label>hasSourceSynU</rdfs:label>
<rdfs:domain rdf:resource="http://www.ilc.cnr.it/clips/rdf/isle-schema-syntsemlinking_v.1#CorrespSynUSemU"/>
<rdfs:range rdf:resource="http://www.cs.vassar.edu/~ide/rdf/isle-schema-v.6#SynU"/>
</rdf:Property>
                                                                                                   Properties
<rdf:Property rdf:about="http://www.ilc.cnr.it/clips/rdf/isle-schema-syntsemlinking_v.1#hasTargetSemU">
<rdfs:label>hasTargetSemU</rdfs:label>
<rdfs:domain rdf:resource="http://www.ilc.cnr.it/clips/rdf/isle-schema-syntsemlinking_v.1#CorrespSynUSemU"/>
<rdfs:range rdf:resource="http://www.cs.vassar.edu/~ide/rdf/isle-schema-v.6#SemU"/>
</rdf:Property>

<rdf:Property rdf:about="http://www.ilc.cnr.it/clips/rdf/isle-schema-syntsemlinking_v.1#hasPredicativeCorresp">
<rdfs:label>hasPredicativeCorresp</rdfs:label>
<rdfs:domain rdf:resource="http://www.ilc.cnr.it/clips/rdf/isle-schema-syntsemlinking_v.1#CorrespSynUSemU"/>
<rdfs:range rdf:resource="http://www.ilc.cnr.it/clips/rdf/isle-schema-syntsemlinking_v.1#PredicativeCorresp"/>
</rdf:Property>

<rdf:Property rdf:about="http://www.ilc.cnr.it/clips/rdf/isle-schema-syntsemlinking_v.1#includesSlotArgCorresp">
<rdfs:label>includesSlotArgCorresp</rdfs:label>
<rdfs:domain rdf:resource="http://www.ilc.cnr.it/clips/rdf/isle-schema-syntsemlinking_v.1#PredicativeCorresp"/>
<rdfs:range rdf:resource="http://www.ilc.cnr.it/clips/rdf/isle-schema-syntsemlinking_v.1#SlotArgCorresp"/>
</rdf:Property>
                                              Pisa, September 2004
The library of Pre-instantiated objects
     Enable modular specification of lexical entities
       eliminate redundancy
       identify lexical entries or sub-entries with shared
        properties
       create ready-to-use packages that can be combined
        in different ways
     Can be used “off the shelf” or as a departure
      point for the definition of new or modified
      categories

                        Pisa, September 2004
         MDCR for some objects
<!-- Sample LDCR entry for a PredicativeCorresp and SlotArgCorresp objects
    DataCats for ISLE lexical entries
    v 0.1 2004/05/17
    Author: Monachini -->

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" Pre-instantiated
      ……
<PredicativeCorresp rdf:ID="isobivalent">                               PredicativeCorres
  <includesSlotArgCorresp                                                        p
  rdf:resource=“http://www.ilc.cnr.it/clips/rdf/isle-datacats/SlotArgCorresp#Arg0Slot0 Arg1Slot1“/>
 </includesSlotArgCorresp>
</PredicativeCorresp>

<SlotArgCorresp rdf:ID="Arg0Slot0"
      SlotNumber="0"                         Pre-instantiated
      ArgNumber"0">
                                             SlotArgCorresp
</SlotArgCorresp>

<SlotArgCorresp rdf:ID="Arg1Slot1"
      SlotNumber="1"
      ArgNumber"1">
</SlotArgCorresp>

</rdf:RDF>
                                        Pisa, September 2004
         A Sample Entry in MILE
    The entry is shown in a double alternative:
    1.   the full specification of a lexical object PredicativeCorresp
    2.   an already instantiated object PredicativeCorresp
    The advantage is that
        the object does not need to be specified in the entry
        and can be used and reused in other entries
    explore the potential of MILE for representation of
     lexical data




                             Pisa, September 2004
         Sample full entry for amareV
<!-- The SynU SemU link -->
    <correspondsTo>
      <CorrespSynUSemU>
          <hasSourceSynU mlcp:ID="SYNUamareV">
          </hasSourceSynU>
          <hasTargetSemU mlcp:ID="SEMUamareEXPEVE">
                                                                The “full” object
          </hasTargetSemU>
                                                                PredicativeCorres
          <hasPredicativeCorresp>
                                                                        p
                   <PredicativeCorresp mlcp:ID="amare-PredCorresp">
                        <includesSlotArgCorresp>
                             <SlotArgCorresp SlotNumber="0" ArgNumber="0">
                               </SlotArgCorresp>
                              <SlotArgCorresp SlotNumber="1" ArgNumber="1">
                               </SlotArgCorresp>
                        </includesSlotArgCorresp>
                   </PredicativeCorresp>
          </hasPredicativeCorresp>
      </CorrespSynUSemU>
    </correspondsTo>
 </SynU>
</hasSynu>                           Pisa, September 2004
              … the abbreviated entry

<!-- The SynU SemU link -->
      <correspondsTo>
         <CorrespSynUSemU>
                 <hasSourceSynU mlcp:ID="SYNUamareV">
                 </hasSourceSynU>
                 <hasTargetSemU mlcp:ID="SEMUamareEXPEVE">
                          </hasTargetSemU>
                 <hasPredicativeCorresp
                 rdf:resource=“http://www.ilc.cnr.it/clips/rdf/isle-
   datacats/PredicativeCorresp#isobivalent“/>
          </CorrespSynUSemU>
      </correspondsTo>                                        Instantiated object
     </SynU>                                                  PredicativeCorres
  </hasSynu>                                                           p




                                  Pisa, September 2004
The RDF Schema, the DCR for
MILE objects and the entries
are available at
www.ilc.cnr.it/clips/rdf/



           Pisa, September 2004
             and INTERA? …
   INTERA Multilingual Terminological Lexica
    will follow and merge the two frameworks

     The MILE and
     ISO TMF (Terminological Markup Framework)




                     Pisa, September 2004
        Beyond MILE: future work
   MILE Lexical Model oriented towards an
    Open Distributed Lexical Infrastructure:

       Lexical Information Servers for multiple access to lexical
        information repositories
   Enhance
       user-adaptivity
       resource sharing
       cooperative creation
   Develop integration and interchange tools

                               Pisa, September 2004
                        Broadening MILE:
                                  ... other languages
   Ongoing enlargement to Asian languages (Chinese, Japanese, Korean, Thai,
    Hindi ...)
      promote common initiatives between Asia & Europe (e.g. within the EU 6th FP)


   The creation of an Open Distributed Lexical Infrastructure, also
    supported by Asian Institutions:
        AFNLP
        University of Tokyo (Dept. of Computer Science)
        Korean KAIST and KORTERM
        Academia Sinica (Taiwan)
        …


       To valorise results & increase visibility of LR & standardisation
                      initiatives in a world-wide context,
    while concretely promoting the launching of a new common platform
                                    LR creation
                 for multilingualPisa, September 2004 & management
                       Using semantically tagged corpora to …
                   acquire semantic info and enhance Lexicons


   evaluate the disambiguating power of the semantic types of the lexicon
   assess the need of integrating lexicons with attested senses and/or phraseology
   identify the inadequacy of sense distinctions in lexicons
   check actual frequency of known senses in different text types
   have a more precise and complete view on the semantics of a lemma
              identify the most general senses
              capture the most specific shifts of meaning


           Capture just the core, basic distinctions in a core lexicon
Corpus analysis must not lead to excessive granularity of sense
  distinctions, but draw a distinction between
        sense discrimination – to be kept “under control” - clustering (manually or
         automatically)
        additional, more granular information (often of collocational nature) which
         can/must be acquired/encoded within the broader senses, e.g. to help
         translation
                                            Pisa, September 2004
                                           … Dynamic lexicon
   Current computational lexicons (even WordNets) are static objects, still
    shaped on traditional dictionaries
        suffering from the limitations induced by paper support

Thinking at the complex relationships between lexicon and corpus
   towards a flexible model of dynamic lexicon
        extending the expressiveness of a core static lexicon adapting to the
         requirements of language in use as attested in corpora
        with semantic clustering techniques, etc.

Convert the extreme flexibility & multidimensionality of meaning into large-
                                scale and exploitable (VIRTUAL?) resources


                    a Lexicon and Corpus together
                                     Pisa, September 2004
                                      What to annotate?

Mix of:
 Word-sense annotation (implicit semantic markup)
 Semantic/conceptual markup
 …


Syntagmatic relations
 Dependency relations
 Semantic roles
 …


                      Pisa, September 2004
         Need for a common Encoding Policy ?
Agree on common policy issues?
        is it feasible?
        desirable?
        to what extent?

This would imply, among others:
   analysis of needs – also applicative/industrial - before any large development initiative
   base semantic tagging on commonly accepted standards/guidelines ??
        up to which level?

                                 Common semantic tagset: Gold Standard??
   build a core set of semantically tagged corpora, encoded in a harmonised way, for a
    number of languages??
   make annotated corpora available to the community by large
   involve the community, collect and analyse existing semantically tagged corpora
   devise common set of parameters for analysis
                                      Pisa, September 2004
A few Issues for discussion:
       MILE & lexicon standards
       More standardisation initiatives?

 MILE - a general schema for encoding multilingual lexical info, as a
   meta-entry, as a common representational layer
    Short & medium term requirements wrt standards for
     multilingual lexicons and content encoding, also industrial
     requirements
    Relation with Spoken language community (see ELRA)
    Semantic Web standards & the needs of content processing
     technologies: importance of reaching consensus on (linguistic &
     non-linguistic) “content”, in addition to agreement on formats &
     encoding issues (…words convey content & knowledge)
    Define further steps necessary to converge on common priorities
                               Pisa, September 2004
               Broadening MILE: ... other communities
NLP, lexicons, terminologies, ontologies, Semantic Web:
                                                                     a continuum?
Knowledge management is critical.
For “content” interoperability, need to converge around agreed
  standards also for the semantic/conceptual level
       is the field ‘mature’ enough to converge around agreed standards also for
        the semantic/conceptual level (e.g. to automatically establish links among
        different languages)?
       Is the field of multilingual lexical resources ready to tackle the challenges
        set by the Semantic Web development?
Foster better integration with
       corpus-driven data
       terminology/ontology/semantic web communities
       multimodal & multimedial aspects

           Oriented towards open, distributed lexical resources:
           Lexical Information Servers for multiple access to lexical
                          information2004
                             Pisa, September
                                             repositories
A few Issues for discussion:
       NLP, lexicons, content, ontologies,
            Semantic Web: … a continuum?



      Need for robust systems, able to acquire/tune
       multilingual lexical/linguistic/conceptual knowledge, to
       auto-enrich static basic resources
      Relation betw. lexical standards & acquisition & text
       annotation protocols




                            Pisa, September 2004
Target…..
        Multilingual Knowledge Management
                                             Technical Feasibility:

  Prerequisite: is it an achievable goal a commonly agreed
    text/lexicon annotation protocol also for the
    semantic/conceptual level (to be able to automatically
    establish links among different languages)?

        Yes, at the lexical level

                                                  EAGLES/ISLE


        More complex, for corpus annotation?

                           Pisa, September 2004
                      To make the Semantic Web
                                     a reality ...

…need to tackle the twofold challenge of
   content availability &
   multilinguality


Natural convergence with HLT:
     •multilingual semantic processing
     •ontologies
     •semantic-syntactic computational lexicons
                         Pisa, September 2004
                … enables a new role of Multilingual Lexicons:
                          to become essential component for the
                                                               Semantic Web

   Language - & lexicons - are the gateway to knowledge
   Semantic Web developers need repositories of words & terms - &
    knowledge of their relations in language use & ontological
    classification
   The cost of adding this structured and machine-understandable
    lexical information can be one of the factors that delays its full
    deployment
   The effort of making available millions of ‘words’ for dozens of
    languages is something that no small group is able to afford

                      A radical shift in the lexical paradigm
      - whereby many participants add linguistic content descriptions in an open
                           distributed lexical framework -
                         required to make the Web usable
                                   Pisa, September 2004
                  Beyond MILE: next steps...
                         …. towards an
       Open Distributed Lexical Infrastucture
                         Language
                         Knowledge

•Enhance user-adaptivity, resource sharing, cooperative creation & management
•Lexical Information Servers for multiple access to lexical information repositories



    Create a first repository of shared lexical entries “extracted” from
     different lexical resources & mapped to MILE (choosing e.g. lexical entries in
     areas related to the Olympic Games)
       to test mapping different lexicon models to MILE
       provide a grid with all the ISLE Basic Notions, short descriptions, attributes
         and sub-elements,to be filled with the correspondent "notions”
    Create a list (Open Lexicon Interest Group)
    ...
                                   Pisa, September 2004
                                A new paradigm for
                          a “new generation” of LR?
       Focus on cooperation,
                      also between different communities


         New Strategic Vision
towards a Distributed Open Lexical Infrastructure
  • for distributed & cooperative creation, management, etc. of
  Lexical Resources
  • MILE as a common platform

     • technical & organisational requirements
                           Pisa, September 2004
                                    Beyond MILE:
                          towards open & distributed lexicons


         Ontology                                       Semantic Lexicon
         URI = http://www.zzz…                          URI = http://www.xxx…


                                                        Syntactic Constructions
                                                        URI = http://www.yyy…

Lex_object: semFeature
                                                        Lex_object: syntagmaNT
URI = http://www.xxx…#HUMAN
                                                        URI = http://www.zzz…#NP




             corpora                                    Monolingual/Multilingual
                                                                Lexicon
                                 Pisa, September 2004
    A few issues for the future...

   Integration betw. WLR/SLR/MMR (see e.g. LREC)
   Integration betw. LRs & SemWeb

   Integration of Lexicons/Terminologies/Ontologies: towards
    Knowledge Resources
   Multilingual Resources: an open infrastructure
   Integration of Lexicon/Corpus (see e.g. Framenet)

   Parallel evolution of LRs & LTechnology


                          Pisa, September 2004
from Computational Lexicons to
        Knowledge Resources

Unified framework for lexicons, ontologies,
terminologies, etc.

Towards an open, distributed infrastructure for lexical
resources
   Lexical Information Servers
   flexible and extensible
   integrated with multimodal and multimedial data
   integrated with Web technology
   related initiatives: INTERA, ICWLRE
                        Pisa, September 2004
for Language Resources &
                   Semantic Web….

….. pushing to launch an
      Open & Distributed Lexical Infrastructure

           for   content description and content interoperability,

            to   make lexical resources usable within the emerging
                                           Semantic Web scenario


        …with a world-wide participation
        looking for an appropriate call
                          Pisa, September 2004
     How to go to a framework allowing
       incremental creation/merging/…

How to:
   "organise" creation/acquisition of multilingual LRs: evaluate
    different models
   cope with/affect maintenance

   organise technology transfer among languages
   support BLARK (a commonly agreed list of minimal
    requirements for “national” LRs)
   launch an international initiative linking Semantic Web & LRs
   bootstrap this by "opening" a few LRs              role of standards
                             Pisa, September 2004
                 Lexical WEB &
             Content Interoperability
   As a critical step for semantic mark-up in the SemWeb

        NomLex                                           WordNets
                                     WordNets

    ComLex                                         WordNets               with
                                                                       intelligent
                                                                        agents??
    SIMPLE                        MILE
                                                               Lex_x
                 FrameNet

                                                       Lex_y


                            Pisa, September 2004
    A new paradigm for
a “new generation” of LRs?
      Ontology                               Semantic Lexicon
      http://www.zzz…                        http://www.xxx…

                                             Syntactic Lexicon
                                             http://www.yyy…




                                   corpora

            Pisa, September 2004

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:2
posted:2/16/2013
language:Unknown
pages:129