Docstoc

Gnuplot is program for drawing graphs verb def LT eL

Document Sample
Gnuplot is program for drawing graphs verb def LT eL Powered By Docstoc
					Combining pattern-based and
machine learning methods to
   detect definitions for
    eLearning purposes

        Eline Westerhout
               &
        Paola Monachesi
               Overview
• Extraction of definitions within eLearning
• Types of definitory contexts
• Grammar approach
• Machine learning approach
• Conclusions
• Future work
• Discussion
 Extraction of definitions within
            eLearning
• Definition extraction:
  – question answering
  – building dictionaries from text
  – ontology learning

• Challenges within eLearning:
  – corpus
  – size of LOs
                     Types - I
• is_def:
Gnuplot is een programma om grafieken te maken
‘Gnuplot is a program for drawing graphs’
• verb_def:
E-learning omvat hulpmiddelen en toepassingen die via het
   internet beschikbaar zijn en creatieve mogelijkheden
   bieden om de leerervaring te verbeteren .
‘eLearning comprises resources and applications that are
   available via the internet and provide creative
   possibilities to improve the learning experience’
                      Types - II
• punct_def
Passen: plastic kaarten voorzien van een magnetische strip,
  [...] toegang krijgt tot bepaalde faciliteiten.
‘Passes: plastic cards equipped with a magnetic strip, that
  [...] gets access to certain facilities. ’
• pron_def
Dedicated readers. Dit zijn speciale apparaten, ontwikkeld
  met het exclusieve doel e-boeken te kunnen lezen.
‘Dedicated readers. These are special devices, developed
  with the exclusive goal to make it       possible to read e-
  books.’
Grammar approach

    • General

    • Example

     • Results
Identification of definitory contexts
• Make use of the linguistic annotation of LOs (part-
  of-speech tags)
• Domain: computer science for non-experts
• Use of language specific grammars
• Workflow
  – Searching and marking definitory contexts in LOs
    (manually)
  – Drafting local grammars on the basis of these examples
  – Apply the grammars to new LOs
   Grammar example




Een vette letter is een letter die zwarter
  wordt afgedrukt dan de andere letters.
<rule name="simple_NP" >
  <seq>
    <and>
      <ref name="art"/>
      <ref name="cap"/>
    </and>
    <ref name="adj" mult="*"/>
    <ref name="noun" mult="+"/>
  </seq>
</rule>




        Een vette letter is een letter die zwarter
          wordt afgedrukt dan de andere letters.
<query match="tok[@ctag='V' and @base='zijn'
and @msd[starts-with(.,'hulpofkopp')]]"/>




Een vette letter is een letter die zwarter
  wordt afgedrukt dan de andere letters.
           <rule name="noun_phrase">
            <seq>
              <ref name="art" mult="?"/>
              <ref name="adj" mult="*" />
              <ref name="noun" mult="+" />
            </seq>
           </rule>




Een vette letter is een letter die zwarter
  wordt afgedrukt dan de andere letters.
<rule name="is_are_def">
 <seq>
   <ref name="simple_NP"/>
   <query match="tok[@ctag='V' and @base='zijn' and
@msd[starts-with(.,'hulpofkopp')]]"/>
   <ref name="noun_phrase" />
   <ref name="tok_or_chunk" mult="*"/>
 </seq>
</rule>




   Een vette letter is een letter die zwarter
     wordt afgedrukt dan de andere letters.
<definingText>
 <markedTerm>
   <tok sp="n" msd="onbep,zijdofonzijd,neut" ctag="Art" base="een" id="t214.2">Een</tok>
   <tok sp="n" msd="attr,stell,vervneut" ctag="Adj" base="vet"
      id="t214.3">vette</tok>
   <tok sp="n" msd="soort,ev,neut" ctag="N" base="letter" id="t214.4">letter</tok>
 </markedTerm>
 <tok sp="n" msd="hulpofkopp,ott,3,ev" ctag="V" base="zijn" id="t214.5">is</tok>
 <tok sp="n" msd="onbep,zijdofonzijd,neut" ctag="Art" base="een"
 id="t214.6">een</tok>
 <tok sp="n" msd="soort,ev,neut" ctag="N" base="letter" id="t214.7">letter</tok>
 ...
 <tok sp="n" msd="onbep,neut,attr" ctag="Pron" base="andere"
 id="t214.14">andere</tok>
 <tok sp="n" msd="soort,mv,neut" ctag="N" base="letter" id="t214.15">letters</tok>
 <tok sp="n" msd="punt" ctag="Punc" base="." id="t214.16">.</tok>
</definingText>
      Results (grammar)

              P        R        F
is_def      0.2810   0.8652   0.4242
verb_def    0.4464   0.7576   0.5618
punct_def   0.0991   0.6818   0.1731
pron_def    0.0918   0.4130   0.1502
Machine learning

    • Features

  • Configurations

     • Results
                Features
• Text properties: bag-of-words, bigrams,
  and bigram preceding the definition
• Syntactic properties: type of determiner
  within the defined term (definite,
  indefinite, no determiner)
• Proper nouns: presence of a proper noun
  in the defined term
                         Configurations
S e ttin g                                   A ttribu te s
    1        using ba g-of-words
    2        using bigra m s
    3        com bining ba g-of-words and bigra m s
    4        a ddin g bigra m precedin g definition to s etting 3
    5        a ddin g definitenes s of article in m arked term to s etting 3
    6        a ddin g presen ce of proper noun to s etting 3
    7        a ddin g bigra m precedin g definition & definitenes s of article in m arked
             term to s etting 3
    8        a ddin g bigra m precedin g definition & pres en ce of proper noun to
             s etting 3
    9        a ddin g definitenes s of article in m arked term & presen ce of proper
             noun to s etting 3
   10        using all attributes
Results – is_def (ML)
      P       R        F
 1   0.6944   0.6494   0.6711
 2   0.6625   0.6883   0.6752
 3   0.7662   0.7662   0.7662
 4   0.7662   0.7662   0.7662
 5   0.7763   0.7662   0.7712
 6   0.7662   0.7662   0.7662
 7   0.7867   0.7662   0.7763
 8   0.7632   0.7532   0.7582
 9   0.7895   0.7792   0.7843
10   0.8000   0.7792   0.7895
Results – is_def (final)
     P        R         F
1    0.6944   0.5618   0.6211
2    0.6625   0.5955   0.6272
3    0.7662   0.6629   0.7108
4    0.7662   0.6629   0.7108
5    0.7763   0.6629   0.7152
6    0.7662   0.6629   0.7108
7    0.7867   0.6629   0.7195
8    0.7632   0.6517   0.7030
9    0.7895   0.6742   0.7273
10   0.8000   0.6742   0.7317
Results – punct_def (ML)
        P        R        F
   1   0.4324   0.3556   0.3902
   2   0.3171   0.2889   0.3023
   3   0.4510   0.5111   0.4792
   4   0.4681   0.4889   0.4783
   5   0.4528   0.5333   0.4898
   6   0.5000   0.5333   0.5161
   7   0.5106   0.5333   0.5217
   8   0.5000   0.5333   0.5161
   9   0.5000   0.5778   0.5361
  10   0.5000   0.5333   0.5161
Results – punct_def (final)
         P       R         F
    1   0.4324   0.2424   0.3107
    2   0.3171   0.1970   0.2430
    3   0.4510   0.3485   0.3932
    4   0.4681   0.3333   0.3894
    5   0.4528   0.3636   0.4034
    6   0.5000   0.3636   0.4211
    7   0.5106   0.3636   0.4248
    8   0.5000   0.3636   0.4211
    9   0.5000   0.3939   0.4407
   10   0.5000   0.3636   0.4211
                   Final results
                             P       R         F
  is_def      before        0.2810   0.8652   0.4242
              after (10)    0.8000   0.6742   0.7317
  punct_def   before        0.0991   0.6818   0.1731
              after (9)     0.5000   0.3939   0.4407


• precision                    (50 % and 40 %)
• recall                  (20 % and 30 %)
• f-score                 (30 % and 25 %)
              Related work
• Question answering:
  – Fahmi & Bouma (2006)
  – Miliaraki & Androutsopoulos (2004)
• Glossary creation:
  – Muresan & Klavans (2002)
• Ontology learning:
  – Storrer & Wellinghof (2006)
  – Walter & Pinkal (2006)
             Future work
• try different features
• evaluate other classifiers
• extend to all types of definitions
• scenario based evaluation of the GCD
                Discussion
• Good features?

• Apply filtering: yes or no?

• How to evaluate the performance?
  – scenario based?
  – compare with manual annotation?
  – ...

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:3
posted:12/10/2012
language:English
pages:25