Docstoc

Corporate Social Responsibility CSR Plan

Document Sample
Corporate Social Responsibility CSR Plan Powered By Docstoc
					The Impact of Grammar Enhancement
on Semantic Resources Induction
Luca Dini (dini@celi.it)

Giampaolo Mazzini (mazzini@celi.it)
Objectives


> Bridging from dependency parsing to kowledge representation;
 > Need of an intermediate level
 > Semantic Role Labelling
     – Easily configurable;
     – Rule based;
     – Moderately learning based (MLN)

> Production of a reasonably large repository of lexical units with
assigned frames and mappings to syntax.
> Objective of this presentation: To measure the inpact of grammar
enhancement on the derivation of semantic resources.
Plan of This Talk


> Architecture and Methodology;
> First Evaluation;
> The Effect of Grammar Improvement;
Architecture and Methodology;
  Architecture
                        Machine
Source Annotation      Translation         Target Example


    Parsing           <LU, FRAME>              Parsing

                        Target LU
parsed Annotation                          parsed Example
                      Identification


                     FE alignement        annotated Example


                                          Dep. Extraction
                    <tLU,FRAME,VALENCE>
Example

                      <dispute.n,Quarreling>


…foreign policy dispute                         …disputa di politica straniera

6     foreign 8      MOD                       17    disputa 14       ARG
7     policy 8       MOD                       18    di       17      MOD
8     dispute 3      MOD                       19    politica 18      ARG
6     foreign 8      MOD                       17    disputa 14       ARG
7     policy 8       MOD                       18    di       17      MOD
8     dispute 3      MOD                       19    politica 18      ARG

6     foreign 8      MOD                       17    disputa 14       ARG
7     policy 8       MOD                       18    di       17      MOD
8     dispute 3      MOD                       19    politica 18      ARG



                      <disputa.n,Quarreling, <Issue,Prep[di]>>
Ingredients


> Bilingual MT System (Systran)
> Comparable parsers for Italian and English (XIP, Xerox Incremental
Parser)
>Lexicon look up module (350.000 it <-> en)
>Word sense disambiguation and clustering
 > Semantic vectors for source and target
Challenges


> Ambiguity of translation:
 > Write.v ->{scrivere, fare lo scrittore, scolpire, vergare,documentare,
   comporre, scrivere una lettera, cantare, trascrivere}.
> Lack of translation.
> Identification of the semantic head of the Frame Element.
> Grammatical transformations.
> Grammar Errors.
Results (1)



                         EN    IT
       Instantiated     721   628
       Frames
       Available     10,195 5,960
       units
       Available    139,382 42,923
       examples
       Available FE 70,075 8,426
       instances
Results (2)


       PT           GF       Occ.
       ADJ          NMOD     6707
       NOUN[di]     NMOD     5639
       NOUN         SUBJ     4318
       NOUN         OBJ      2872
       VERB[+sg]    TOP      2289
       VERB[+inf]   TOP      1945
       DET          DETD     962
       ADV          ADJMOD   883
       NOUN[in]     VMOD     862
       NOUN[da]     VMOD     825
       NOUN[a]      VMOD     733
       NOUN[di]     VMOD     679
Results (3)


      PT         GF       FE            Occ.
      ADJ        NMOD     Descriptor    940
      ADV        ADJMOD   Degree        806
      ADJ        NMOD     Manner        533
      NOUN       SUBJ     Agent         514
      NOUN       SUBJ     Speaker       495
      ADJ        NMOD     Degree        470
      NOUN[di]   NMOD     Individuals   423
      ADJ        NMOD     Possessor     415
      NOUN[di]   NMOD     Material      369
      NOUN       SUBJ     Self_mover    318
Evaluation
Evaluation(1): SRL (1)


>Manual annotation of TUT corpus (Lesmo et Al. 2002):
 > 1000 sentences
 > Corpus annotated only with frame bearing induced LU;
 > Selection of correct frame (if any)
 > FE annotation of all dependants # form           POS    Frame       Dep   depName
 > Export in CoNLL format                                              #
                                     1   Mio        ADJ    ADJ         2     Ego
                                     2   fratello   NOUN   Kinship     5     Victim
                                     3   è          VERB   VERB        4     AUX
                                     4   stato      VERB   VERB        5     AUX
                                     5   ucciso     VERB   Killing     11    OBJ
                                     6   alle       PREP   PREP        5     Manner
                                     7   alle       ART    ART         8     DETD
                                     8   spalle     NOUN   Observable_ 7     ARG
                                                           bodyparts
Evaluation (1): SRL (2)


> Second step: “parse” the corpus for SRL:
 > No real parser;
 > Very simple algorithm for assignement;
 > Random choice in case of ambiguity;
> Results: According to Toutanova et al. (2008) F-Measure metrics:
     – precision of 0.53, a recall of 0.33 and a consequent precision of 0.41.

> Poor comparison with state of the art SRL.
Evaluation (2)


> “Standard” corpus annotation:
 > 20 sentences X 20 lexical units (no ambiguity).
> Creation of a DB of <Lunit, frame, Valence> triples.
> Comparison with induced resources based on standard precision
and recall metrics.
 > A hit counts as positive if Part-of-speech, Grammatic Function and
   Frame element all matches
 > A “boost” was assigned on the basis of the importance of valence
   population (based both number and variety of realization).
 > Global precision and recall is the arithmetic mean of all weights:
     – Precision: 0,65
     – Recall: 0,41
The Effects of Grammar
    Improvement;
Errors


> No translation for a lexical unit (7,815);
> Absence of examples in the source FrameNet (4,922);
> No translated example contains the candidate translation(s) of the
lexical unit (1,736).
> No head could be identified for English frame element realization
(parse error or difficult structure, e.g. coordination) (6,191)
> The translation of the semantic head of the frame element or of the
frame bearing head could not be matched in the Italian example.
(99,808)
> The semantic heads of both the lexical unit and the frame element
are found in the Italian example but the parser could not find any
dependency among them. (94,004)
The Enhancement Phase


> Improvements concerned only one side of the parsing mechanism,
i.e. the Italian Dependency Grammar;
> Development:
 > Using the XIP IDE (Mokhtar et al., 2001).
 > The development period lasted about 6 month (Testa & al. ,2009)).
 > It was based on iterative verification on different corpus
   (TUT/ISST).
> Improvement in LAS 40% -> 70%
Consequences


> The architecture was kept exactly the same and the source code
“frozen” during the six month period.
> Results
                       Old P     New P      Old R     New R


            Eval 1     0,53      0,59       0,33      0,34


            Eval 2     0,65      0,71       0,41      0,51
Comments


> Both evaluation types shows an increase in precision of about 6%;
> Strangely recall stay almost constant in ev1, while it increases
considerably in ev2
> Explanation (?):
 > Unmapped phenomena;
 > “Random” effect due to small evaluation set.
Issues & Conclusions


> Was it worth 6 month labour ?
 > Probably not, if grammar enhancement is finalized just to the
   acquisition of the resources.
 > Probably yes, if it is independently motivated.
> In general evaluation of the impact of lower modules on high level
application is something crucial for strategic choices and a rather
“neglected” aspect.
> We need to understand the correct trade-off.
>Convergency: IFRAME project
(http://sag.art.uniroma2.it/iframe/doku.php)
Thank You!

				
DOCUMENT INFO