Automatic acquisition of semantic classes for adjectives by kpj14447

VIEWS: 15 PAGES: 55

									                                     Introduction
                            Initial classification
 Experiments A and B: Testing the classification
          Experiment C: Integrating polysemy
                                      Conclusion




Automatic acquisition of semantic classes for
                 adjectives

                              Gemma Boleda Torrent

                                    GLiCom
              Universitat Pompeu Fabra / Fundació Barcelona Media


                                       April 18, 2007



                                                                    1 / 32
                                        Introduction
                               Initial classification
    Experiments A and B: Testing the classification
             Experiment C: Integrating polysemy
                                         Conclusion


Overview



     automatic acquisition of semantic classes for Catalan
     adjectives
     two main hypotheses:
            adjective meanings can be assigned to a set of classes
            semantic distinctions mirrored at different linguistic levels
     Lexical Acquisition
            infer properties of words from their linguistic behaviour in
            corpora




                                                                            2 / 32
                                         Introduction
                                Initial classification
     Experiments A and B: Testing the classification
              Experiment C: Integrating polysemy
                                          Conclusion


Approach (I)



     no general, well established semantic classification
     → propose and test classification
     iterative methodology
             deductive phase: define a classification and apply it to a set
             of adjectives
             → manual annotation and machine learning experiments
             inductive phase: use the evidence gathered to refine the
             classification proposal




                                                                            3 / 32
                                         Introduction
                                Initial classification
     Experiments A and B: Testing the classification
              Experiment C: Integrating polysemy
                                          Conclusion


Approach (II)



      three iterations

    Experiment             Technique                    Main goal
    A                                                   refine classification
                           Unsupervised
    B                                                   validate refined classification
    C                      Supervised                   integrate polysemy




                                                                                        4 / 32
                                           Introduction
                                  Initial classification
       Experiments A and B: Testing the classification
                Experiment C: Integrating polysemy
                                            Conclusion


Contents


  1   Introduction

  2   Initial classification

  3   Experiments A and B: Testing the classification

  4   Experiment C: Integrating polysemy

  5   Conclusion



                                                          5 / 32
                                          Introduction
                                 Initial classification
      Experiments A and B: Testing the classification
               Experiment C: Integrating polysemy
                                           Conclusion


Initial classification

      insights from descriptive grammar and formal semantics
  Qualitative adjectives denote attributes or properties of objects.
                           ample, autònom
               Examples:
                           ‘wide’, ‘autonomous’
  Intensional adjectives denote second order properties.
                           presumpte, antic
               Examples:
                           ‘alleged’,     ‘former’
  Relational adjectives denote a relationship to an object.
                           pulmonar,        botànic
               Examples:
                           ‘pulmonary’, ‘botanical’

      semantic classification
      supported by distinctions at other levels of description
                                                                       6 / 32
                                          Introduction
                                 Initial classification
      Experiments A and B: Testing the classification
               Experiment C: Integrating polysemy
                                           Conclusion


Initial classification

      insights from descriptive grammar and formal semantics
  Qualitative adjectives denote attributes or properties of objects.
                           ample, autònom
               Examples:
                           ‘wide’, ‘autonomous’
  Intensional adjectives denote second order properties.
                           presumpte, antic
               Examples:
                           ‘alleged’,     ‘former’
  Relational adjectives denote a relationship to an object.
                           pulmonar,        botànic
               Examples:
                           ‘pulmonary’, ‘botanical’

      semantic classification
      supported by distinctions at other levels of description
                                                                       6 / 32
                                          Introduction
                                 Initial classification
      Experiments A and B: Testing the classification
               Experiment C: Integrating polysemy
                                           Conclusion


Initial classification

      insights from descriptive grammar and formal semantics
  Qualitative adjectives denote attributes or properties of objects.
                           ample, autònom
               Examples:
                           ‘wide’, ‘autonomous’
  Intensional adjectives denote second order properties.
                           presumpte, antic
               Examples:
                           ‘alleged’,     ‘former’
  Relational adjectives denote a relationship to an object.
                           pulmonar,        botànic
               Examples:
                           ‘pulmonary’, ‘botanical’

      semantic classification
      supported by distinctions at other levels of description
                                                                       6 / 32
                                          Introduction
                                 Initial classification
      Experiments A and B: Testing the classification
               Experiment C: Integrating polysemy
                                           Conclusion


Initial classification

      insights from descriptive grammar and formal semantics
  Qualitative adjectives denote attributes or properties of objects.
                           ample, autònom
               Examples:
                           ‘wide’, ‘autonomous’
  Intensional adjectives denote second order properties.
                           presumpte, antic
               Examples:
                           ‘alleged’,     ‘former’
  Relational adjectives denote a relationship to an object.
                           pulmonar,        botànic
               Examples:
                           ‘pulmonary’, ‘botanical’

      semantic classification
      supported by distinctions at other levels of description
                                                                       6 / 32
                                          Introduction
                                 Initial classification
      Experiments A and B: Testing the classification
               Experiment C: Integrating polysemy
                                           Conclusion


Initial classification

      insights from descriptive grammar and formal semantics
  Qualitative adjectives denote attributes or properties of objects.
                           ample, autònom
               Examples:
                           ‘wide’, ‘autonomous’
  Intensional adjectives denote second order properties.
                           presumpte, antic
               Examples:
                           ‘alleged’,     ‘former’
  Relational adjectives denote a relationship to an object.
                           pulmonar,        botànic
               Examples:
                           ‘pulmonary’, ‘botanical’

      semantic classification
      supported by distinctions at other levels of description
                                                                       6 / 32
                                            Introduction
                                   Initial classification
        Experiments A and B: Testing the classification
                 Experiment C: Integrating polysemy
                                             Conclusion


Criteria (I): position with respect to the head noun


   qualitative (1)                              intensional (2)   relational (3)
   pre- and post-nominal                        pre-nominal       post-nominal
    1   les avingudes amples / les amples avingudes
        ‘wide avenues’
    2   #l’assassí presumpte / el presumpte assassí
        ‘the alleged murderer’
    3   una malaltia pulmonar / #una pulmonar malaltia
        ‘a pulmonary disease’




                                                                                   7 / 32
                                            Introduction
                                   Initial classification
        Experiments A and B: Testing the classification
                 Experiment C: Integrating polysemy
                                             Conclusion


Criteria (II): predicativity


    qualitative (1)             intensional (2)            relational (3)
    predicative                 non-predicative            marginally predicative
    1   les avingudes són amples
        ‘avenues are wide’
    2   #l’assassí és presumpte
        ‘the murderer is alleged ’
    3   ?la malaltia és pulmonar
        ‘the disease is pulmonary’




                                                                                    8 / 32
                                           Introduction
                                  Initial classification
       Experiments A and B: Testing the classification
                Experiment C: Integrating polysemy
                                            Conclusion


Polysemy


   1   edifici antic (qualitative) / antic president (intensional)
       ‘ancient building / former president’

   2   reunió familiar (relational) / cara familiar (qualitative)
       ‘family meeting / familiar face’

       in each sense, the adjective’s behaviour corresponds to
       that of the relevant class




                                                                    9 / 32
                                           Introduction
                                  Initial classification
       Experiments A and B: Testing the classification
                Experiment C: Integrating polysemy
                                            Conclusion


Polysemy


   1   edifici antic (qualitative) / antic president (intensional)
       ‘ancient building / former president’

   2   reunió familiar (relational) / cara familiar (qualitative)
       ‘family meeting / familiar face’

       in each sense, the adjective’s behaviour corresponds to
       that of the relevant class




                                                                    9 / 32
                                         Introduction
                                Initial classification
     Experiments A and B: Testing the classification
              Experiment C: Integrating polysemy
                                          Conclusion


Motivation for unsupervised experiments


     classification based primarily on literature review
     does it account for the semantics of a broad range of
     adjectives?
     empirical test: use information extracted from corpus in
     machine learning experiments
     exploratory experiments → clustering (unsupervised)
             no bias by previous annotation
             insight into the actual structure of the data
     two sets of experiments (Exp. A, Exp. B)



                                                                10 / 32
                                         Introduction
                                Initial classification
     Experiments A and B: Testing the classification
              Experiment C: Integrating polysemy
                                          Conclusion


Motivation for unsupervised experiments


     classification based primarily on literature review
     does it account for the semantics of a broad range of
     adjectives?
     empirical test: use information extracted from corpus in
     machine learning experiments
     exploratory experiments → clustering (unsupervised)
             no bias by previous annotation
             insight into the actual structure of the data
     two sets of experiments (Exp. A, Exp. B)



                                                                10 / 32
                                         Introduction
                                Initial classification
     Experiments A and B: Testing the classification
              Experiment C: Integrating polysemy
                                          Conclusion


Experiment A: Material and method (I) – resources



     resources also used in Experiments B and C
     CTILC corpus (Institut d’Estudis Catalans):
             14.5 million words, written, formal texts
             manually lemmatised and POS-tagged
             automatically shallow-parsed (noise)
     adjective database [Sanromà, 2003]:
             almost 2,300 lemmata from CTILC corpus
             morphological information manually coded




                                                         11 / 32
                                         Introduction
                                Initial classification
     Experiments A and B: Testing the classification
              Experiment C: Integrating polysemy
                                          Conclusion


Experiment A: Material and method (II)


     Gold Standard: 101 lemmata, random choice
             classes: qualitative, relational, intensional, int-qual, qual-rel
     technique: clustering, k -means (CLUTO)
     features: semantic, distributional
        semantic (6 features) pre-nominal position, predicativity,
                   ...
     distributional (36 features) POS unigrams (two words left,
                   two words right of target)



                                                                                 12 / 32
                                         Introduction
                                Initial classification
     Experiments A and B: Testing the classification
              Experiment C: Integrating polysemy
                                          Conclusion


Experiment A: Material and method (II)


     Gold Standard: 101 lemmata, random choice
             classes: qualitative, relational, intensional, int-qual, qual-rel
     technique: clustering, k -means (CLUTO)
     features: semantic, distributional
        semantic (6 features) pre-nominal position, predicativity,
                   ...
     distributional (36 features) POS unigrams (two words left,
                   two words right of target)



                                                                                 12 / 32
                                         Introduction
                                Initial classification
     Experiments A and B: Testing the classification
              Experiment C: Integrating polysemy
                                          Conclusion


Experiment A: Feature example

                                                  Predicative
                              I = intensional
                      5
                              IQ = int−qual
                              Q = qualitative
                              QR = qual−rel
                              R = relational
                      4
                      3
                      2
                      1
                      0




                                  I          IQ         Q       QR   R

                                                                         13 / 32
                                         Introduction
                                Initial classification
     Experiments A and B: Testing the classification
              Experiment C: Integrating polysemy
                                          Conclusion


Experiment A: Summary of results


     strong support for classes qualitative and relational
     intensional class not separated with this methodology
     group of problematic adjectives identified in error analysis:
       indicador,    parlant,      protector,   salvador
       ‘indicating’, ‘speaking’, ‘protecting’, ‘saviour’
     (they do not fit into the classification)
     classification is modified according to these results
     approach to polysemy is clearly wrong → Exp. C
     semantic and distributional features yield similar results



                                                                    14 / 32
                                         Introduction
                                Initial classification
     Experiments A and B: Testing the classification
              Experiment C: Integrating polysemy
                                          Conclusion


Experiment A: Summary of results


     strong support for classes qualitative and relational
     intensional class not separated with this methodology
     group of problematic adjectives identified in error analysis:
       indicador,    parlant,      protector,   salvador
       ‘indicating’, ‘speaking’, ‘protecting’, ‘saviour’
     (they do not fit into the classification)
     classification is modified according to these results
     approach to polysemy is clearly wrong → Exp. C
     semantic and distributional features yield similar results



                                                                    14 / 32
                                         Introduction
                                Initial classification
     Experiments A and B: Testing the classification
              Experiment C: Integrating polysemy
                                          Conclusion


Experiment A: Summary of results


     strong support for classes qualitative and relational
     intensional class not separated with this methodology
     group of problematic adjectives identified in error analysis:
       indicador,    parlant,      protector,   salvador
       ‘indicating’, ‘speaking’, ‘protecting’, ‘saviour’
     (they do not fit into the classification)
     classification is modified according to these results
     approach to polysemy is clearly wrong → Exp. C
     semantic and distributional features yield similar results



                                                                    14 / 32
                                         Introduction
                                Initial classification
     Experiments A and B: Testing the classification
              Experiment C: Integrating polysemy
                                          Conclusion


Experiment A: Summary of results


     strong support for classes qualitative and relational
     intensional class not separated with this methodology
     group of problematic adjectives identified in error analysis:
       indicador,    parlant,      protector,   salvador
       ‘indicating’, ‘speaking’, ‘protecting’, ‘saviour’
     (they do not fit into the classification)
     classification is modified according to these results
     approach to polysemy is clearly wrong → Exp. C
     semantic and distributional features yield similar results



                                                                    14 / 32
                                          Introduction
                                 Initial classification
      Experiments A and B: Testing the classification
               Experiment C: Integrating polysemy
                                           Conclusion


Modified classification

  Basic adjectives (formerly qualitative)
  Object-related adjectives (formerly relational)
  Event-related adjectives denote a relationship to an event.
                            protector,     variable
               Examples:
                            ‘protecting’, ‘variable’
               Syntactic/distributional properties?

      relationship with morphology
       basic           event    object
       non-derived deverbal denominal
      supported by Ontological Semantics
      [Raskin and Nirenburg, 1998]

                                                                15 / 32
                                          Introduction
                                 Initial classification
      Experiments A and B: Testing the classification
               Experiment C: Integrating polysemy
                                           Conclusion


Modified classification

  Basic adjectives (formerly qualitative)
  Object-related adjectives (formerly relational)
  Event-related adjectives denote a relationship to an event.
                            protector,     variable
               Examples:
                            ‘protecting’, ‘variable’
               Syntactic/distributional properties?

      relationship with morphology
       basic           event    object
       non-derived deverbal denominal
      supported by Ontological Semantics
      [Raskin and Nirenburg, 1998]

                                                                15 / 32
                                          Introduction
                                 Initial classification
      Experiments A and B: Testing the classification
               Experiment C: Integrating polysemy
                                           Conclusion


Modified classification

  Basic adjectives (formerly qualitative)
  Object-related adjectives (formerly relational)
  Event-related adjectives denote a relationship to an event.
                            protector,     variable
               Examples:
                            ‘protecting’, ‘variable’
               Syntactic/distributional properties?

      relationship with morphology
       basic           event    object
       non-derived deverbal denominal
      supported by Ontological Semantics
      [Raskin and Nirenburg, 1998]

                                                                15 / 32
                                         Introduction
                                Initial classification
     Experiments A and B: Testing the classification
              Experiment C: Integrating polysemy
                                          Conclusion


Experiment B

     characteristics
             acquire predominant class of each adjective
                      ignore polysemy
             focus on distributional features (empirical approach)
     results
             one-to-one mapping between clusters and manually
             assigned classes. Accuracy:

         baseline             49%          distr. (clust.)   73%   morph.   65%

     distributional information superior to morphological
     information
     difficulties with the event class arise

                                                                                  16 / 32
                                         Introduction
                                Initial classification
     Experiments A and B: Testing the classification
              Experiment C: Integrating polysemy
                                          Conclusion


Experiment B

     characteristics
             acquire predominant class of each adjective
                      ignore polysemy
             focus on distributional features (empirical approach)
     results
             one-to-one mapping between clusters and manually
             assigned classes. Accuracy:

         baseline             49%          distr. (clust.)   73%   morph.   65%

     distributional information superior to morphological
     information
     difficulties with the event class arise

                                                                                  16 / 32
                                         Introduction
                                Initial classification
     Experiments A and B: Testing the classification
              Experiment C: Integrating polysemy
                                          Conclusion


Experiment B

     characteristics
             acquire predominant class of each adjective
                      ignore polysemy
             focus on distributional features (empirical approach)
     results
             one-to-one mapping between clusters and manually
             assigned classes. Accuracy:

         baseline             49%          distr. (clust.)   73%   morph.   65%

     distributional information superior to morphological
     information
     difficulties with the event class arise

                                                                                  16 / 32
                                           Introduction
                                  Initial classification
       Experiments A and B: Testing the classification
                Experiment C: Integrating polysemy
                                            Conclusion


Polysemy, revisited


   1   explicació embolicada (basic) / regal embolicat (event)
       ‘unclear explanation / wrapped present’
   2   reunió familiar (object) / cara familiar (basic)
       ‘family meeting / familiar face’
   3   tasca docent (event) / planificació/equip docent (object)
       ‘teaching task / teaching planning/team’

       each adjective can be assigned to more than one class
       → polysemy acquisition as multi-label classification



                                                                  17 / 32
                                           Introduction
                                  Initial classification
       Experiments A and B: Testing the classification
                Experiment C: Integrating polysemy
                                            Conclusion


Polysemy, revisited


   1   explicació embolicada (basic) / regal embolicat (event)
       ‘unclear explanation / wrapped present’
   2   reunió familiar (object) / cara familiar (basic)
       ‘family meeting / familiar face’
   3   tasca docent (event) / planificació/equip docent (object)
       ‘teaching task / teaching planning/team’

       each adjective can be assigned to more than one class
       → polysemy acquisition as multi-label classification



                                                                  17 / 32
                                           Introduction
                                  Initial classification
       Experiments A and B: Testing the classification
                Experiment C: Integrating polysemy
                                            Conclusion


Goals of Experiment C




   1   include polysemy in the acquisition experiment
   2   assess the role of different levels of linguistic description
       for semantic classification
   3   test ways to combine linguistic information




                                                                       18 / 32
                                           Introduction
                                  Initial classification
       Experiments A and B: Testing the classification
                Experiment C: Integrating polysemy
                                            Conclusion


Goals of Experiment C




   1   include polysemy in the acquisition experiment
   2   assess the role of different levels of linguistic description
       for semantic classification
   3   test ways to combine linguistic information




                                                                       18 / 32
                                         Introduction
                                Initial classification
     Experiments A and B: Testing the classification
              Experiment C: Integrating polysemy
                                          Conclusion


Material


     210 lemmata
     stratified sampling approach
             frequency, morphology (derivational type, suffix)
     large-scale manual annotation experiment
             administered via Web
             322 subjects
             does not yield reliable classification (K 0.31-0.45)
     Gold Standard classification: committee of 3 experts




                                                                   19 / 32
                                        Introduction
                               Initial classification
    Experiments A and B: Testing the classification
             Experiment C: Integrating polysemy
                                         Conclusion


Method

    algorithm: Decision Trees (C4.5 as implemented in Weka)
    features:

   Level          Explanation                                      # F.
   morph          morphological (derivational) properties            2
   func           syntactic function                                 4
   uni            uni-gram distribution                             24
   bi             bi-gram distribution                              50
   sem            distributional cues of semantic properties        18
   all            combination of the 5 linguistic levels          10.3
                      Table: Linguistic levels as feature sets.


                                                                          20 / 32
                                        Introduction
                               Initial classification
    Experiments A and B: Testing the classification
             Experiment C: Integrating polysemy
                                         Conclusion


Procedure

     binary decision
            basic/non-basic, event/non-event, object/non-object
     for each adjective, merge classifications
               basic event object merged
     familiar:
               yes     no     yes      basic-object (BO)
     obtain 100 accuracy estimates for each class and level
            10 run, 10-fold cross-validation
     test differences between levels with a statistical test
            corrected resampled t-test [Nadeau and Bengio, 2003]
     baseline: most frequent class (basic)


                                                                   21 / 32
                                        Introduction
                               Initial classification
    Experiments A and B: Testing the classification
             Experiment C: Integrating polysemy
                                         Conclusion


Procedure

     binary decision
            basic/non-basic, event/non-event, object/non-object
     for each adjective, merge classifications
               basic event object merged
     familiar:
               yes     no     yes      basic-object (BO)
     obtain 100 accuracy estimates for each class and level
            10 run, 10-fold cross-validation
     test differences between levels with a statistical test
            corrected resampled t-test [Nadeau and Bengio, 2003]
     baseline: most frequent class (basic)


                                                                   21 / 32
                                        Introduction
                               Initial classification
    Experiments A and B: Testing the classification
             Experiment C: Integrating polysemy
                                         Conclusion


Procedure

     binary decision
            basic/non-basic, event/non-event, object/non-object
     for each adjective, merge classifications
               basic event object merged
     familiar:
               yes     no     yes      basic-object (BO)
     obtain 100 accuracy estimates for each class and level
            10 run, 10-fold cross-validation
     test differences between levels with a statistical test
            corrected resampled t-test [Nadeau and Bengio, 2003]
     baseline: most frequent class (basic)


                                                                   21 / 32
                                                      Introduction
                                             Initial classification
                  Experiments A and B: Testing the classification
                           Experiment C: Integrating polysemy
                                                       Conclusion


Results

                  accuracy results for merged classification


                           A: Full accuracy                           B: Partial accuracy
             65




                                                             80



                                                                                                bl
             60




                                                                                                morph
                                                             75




                                                                                                func
  Accuracy




                                                  Accuracy




                                                                                                uni
                                                                                                bi
                                                             70
             55




                                                                                                sem
                                                                                                all
                                                             65
             50




                                                             60




                  0    2        4       6     8                   0   2      4      6       8

                                 Runs                                        Runs




                                                                                                        22 / 32
                                         Introduction
                                Initial classification
     Experiments A and B: Testing the classification
              Experiment C: Integrating polysemy
                                          Conclusion


Results


     baseline (most frequent class – basic): 51%
     level morph (60.6%) is the best unique level of information
             morphology is most useful for our task?
             BUT: morphology included in the sampling scheme. . .
     level all (combination of information) improves upon
     morph: 62.3%
     error analysis shows that all and morph make very different
     mistakes
     → is there a better combination than all?



                                                                    23 / 32
                                         Introduction
                                Initial classification
     Experiments A and B: Testing the classification
              Experiment C: Integrating polysemy
                                          Conclusion


Results


     baseline (most frequent class – basic): 51%
     level morph (60.6%) is the best unique level of information
             morphology is most useful for our task?
             BUT: morphology included in the sampling scheme. . .
     level all (combination of information) improves upon
     morph: 62.3%
     error analysis shows that all and morph make very different
     mistakes
     → is there a better combination than all?



                                                                    23 / 32
                                         Introduction
                                Initial classification
     Experiments A and B: Testing the classification
              Experiment C: Integrating polysemy
                                          Conclusion


A better combination




     ensemble classifier
     each level proposes one class, the majority class is chosen
     intuition: expert committee
             morphologist, syntactician, engineer...




                                                                   24 / 32
                                         Introduction
                                Initial classification
     Experiments A and B: Testing the classification
              Experiment C: Integrating polysemy
                                          Conclusion


Results

      Levels                                            Classifs.
      morph+func+uni+bi+sem+all                         6           84.0±0.06
      morph+func+uni+bi+sem                             5           82.3±0.04
      func+uni+bi+sem                                   4           81.5±0.04
      morph+func+sem+all                                4           72.4±0.03
      morph+func+sem                                    3           76.2±0.03
      bl                                                -            51.0±0.0
      all                                               -            62.3±2.3
                        Table: Results for ensemble classifier.

     average improvement over level all: 15.8


                                                                                25 / 32
                                         Introduction
                                Initial classification
     Experiments A and B: Testing the classification
              Experiment C: Integrating polysemy
                                          Conclusion


Results

      Levels                                            Classifs.
      morph+func+uni+bi+sem+all                         6           84.0±0.06
      morph+func+uni+bi+sem                             5           82.3±0.04
      func+uni+bi+sem                                   4           81.5±0.04
      morph+func+sem+all                                4           72.4±0.03
      morph+func+sem                                    3           76.2±0.03
      bl                                                -            51.0±0.0
      all                                               -            62.3±2.3
                        Table: Results for ensemble classifier.

     average improvement over level all: 15.8


                                                                                25 / 32
                                         Introduction
                                Initial classification
     Experiments A and B: Testing the classification
              Experiment C: Integrating polysemy
                                          Conclusion


Results

      Levels                                            Classifs.
      morph+func+uni+bi+sem+all                         6           84.0±0.06
      morph+func+uni+bi+sem                             5           82.3±0.04
      func+uni+bi+sem                                   4           81.5±0.04
      morph+func+sem+all                                4           72.4±0.03
      morph+func+sem                                    3           76.2±0.03
      bl                                                -            51.0±0.0
      all                                               -            62.3±2.3
                        Table: Results for ensemble classifier.

     combination > type of linguistic information used


                                                                                26 / 32
                                         Introduction
                                Initial classification
     Experiments A and B: Testing the classification
              Experiment C: Integrating polysemy
                                          Conclusion


Results

      Levels                                            Classifs.
      morph+func+uni+bi+sem+all                         6           84.0±0.06
      morph+func+uni+bi+sem                             5           82.3±0.04
      func+uni+bi+sem                                   4           81.5±0.04
      morph+func+sem+all                                4           72.4±0.03
      morph+func+sem                                    3           76.2±0.03
      bl                                                -            51.0±0.0
      all                                               -            62.3±2.3
                        Table: Results for ensemble classifier.

     combination > type of linguistic information used


                                                                                26 / 32
                                        Introduction
                               Initial classification
    Experiments A and B: Testing the classification
             Experiment C: Integrating polysemy
                                         Conclusion


To sum up


     use of computational techniques for linguistic research
            [Merlo and Stevenson, 2001, Schulte im Walde, 2006]
            one step further: re-shaping target classification based on
            experimental results
     combine insights from linguistic theory with evidence
     gathered from machine learning and human annotation
     experiments
     revise hypotheses wrt adjective classification according to
     analysis of results



                                                                         27 / 32
                                        Introduction
                               Initial classification
    Experiments A and B: Testing the classification
             Experiment C: Integrating polysemy
                                         Conclusion


To sum up


     use of computational techniques for linguistic research
            [Merlo and Stevenson, 2001, Schulte im Walde, 2006]
            one step further: re-shaping target classification based on
            experimental results
     combine insights from linguistic theory with evidence
     gathered from machine learning and human annotation
     experiments
     revise hypotheses wrt adjective classification according to
     analysis of results



                                                                         27 / 32
                                         Introduction
                                Initial classification
     Experiments A and B: Testing the classification
              Experiment C: Integrating polysemy
                                          Conclusion


Adjective classification


      broad and consistent classification proposal
      obtained through theoretical and empirical exploration
      characterisation of the classes
             morphological, syntactic, semantic properties
      exploration of polysemy
      some difficulties in the classification
             event-related adjectives
             non-prototypical basic adjectives
             nationality-denoting adjectives



                                                               28 / 32
                                         Introduction
                                Initial classification
     Experiments A and B: Testing the classification
              Experiment C: Integrating polysemy
                                          Conclusion


Human annotation experiments



     3 manual annotation experiments
     large-scale, Web experiments can be fruitful to gather
     linguistic data
             multiple analysis possibilities
     but they are very difficult to design (non-expert subjects)
     bottleneck for our task: obtention of reliable linguistic data




                                                                      29 / 32
                                         Introduction
                                Initial classification
     Experiments A and B: Testing the classification
              Experiment C: Integrating polysemy
                                          Conclusion


Automatic acquisition of semantic classes for
adjectives

     3 sets of machine learning experiments: existing
     techniques, new uses
             use of unsupervised techniques to provide feedback wrt
             classification proposal
             multi-label classification architecture → polysemy
             systematic comparison of different linguistic levels of
             description
             combination of different types of linguistic evidence
     semantic classification of Catalan adjectives using
     morphological and distributional information → feasible
             no need of intensive resources
             methodology can be extended to other languages

                                                                       30 / 32
                                         Introduction
                                Initial classification
     Experiments A and B: Testing the classification
              Experiment C: Integrating polysemy
                                          Conclusion


Future research

     adjective classification: theoretical implication of results
             definition of a semantic classification for adjectives
             characterisation of each class
             polysemy within our task (e.g., polysemy judgements)
     manual annotation experiments:
             design of adequate experiments to build reliable datasets
     machine learning experiments:
             type of information (selectional restrictions)
             datasets (other corpora)
             machine learning techniques (MBL, kernel methods, other
             types of ensemble classifiers)
             external evaluation (POS-tagging, Paraphrase Detection)

                                                                         31 / 32
                                     Introduction
                            Initial classification
 Experiments A and B: Testing the classification
          Experiment C: Integrating polysemy
                                      Conclusion




Automatic acquisition of semantic classes for
                 adjectives

                              Gemma Boleda Torrent

                                    GLiCom
              Universitat Pompeu Fabra / Fundació Barcelona Media


                                       April 18, 2007



                                                                    32 / 32
                                    Introduction
                           Initial classification
Experiments A and B: Testing the classification
         Experiment C: Integrating polysemy
                                     Conclusion

Merlo, P. and Stevenson, S. (2001).
Automatic verb classification based on statistical distributions of argument
structure.
Computational Linguistics, 27(3):373–408.
Nadeau, C. and Bengio, Y. (2003).
Inference for the generalization error.
Machine Learning, 52(3):239–281.
Raskin, V. and Nirenburg, S. (1998).
An applied ontological semantic microtheory of adjective meaning for natural
language processing.
Machine Translation, 13(2-3):135–227.
Sanromà, R. (2003).
Aspectes morfològics i sintàctics dels adjectius en català.
Master’s thesis, Universitat Pompeu Fabra.
Schulte im Walde, S. (2006).
Experiments on the automatic induction of German semantic verb classes.
Computational Linguistics, 32(2):159–194.




                                                                               32 / 32

								
To top