session4-slides.pptx by KevenMealamu


									CMSC 723: Computational Linguistics I ― Session #4

 Part-of-Speech Tagging

                          Jimmy Lin
                          The iSchool
                          University of Maryland

                          Wednesday, September 23, 2009

                                                          Source: Calvin and Hobbs

Today’s Agenda                                              Parts of Speech
   What are parts of speech (POS)?                                   “Equivalence class” of linguistic entities
   What is POS tagging?                                                     “Categories” or “types” of words
                                                                     Study dates back to the ancient Greeks
   Methods for automatic POS tagging
                                                                            Dionysius Thrax of Alexandria (c. 100 BC)
       Rule-based POS tagging
                                                                            8 parts of speech: noun, verb, pronoun, preposition, adverb,
       Transformation-based learning for POS tagging                        conjunction, participle, article
   Along the way…                                                           Remarkably enduring list!
       Supervised machine learning


How do we define POS?                                       Parts of Speech
   By meaning                                                        Open class
       Verbs are actions                                                    Impossible to completely enumerate
       Adjectives are properties                                            New words continuously being invented, borrowed, etc.
       Nouns are things                                              Closed class
   By the syntactic environment                                             Closed, fixed membership
       What occurs nearby?                                                  Reasonably easy to enumerate
       What does it act as?                                                 Generally, short function words that “structure” sentences
   By what morphological processes affect it
       What affixes does it take?
   Combination of the above

Open Class POS                                                        Nouns
  Four major open classes in English                                    Open class
     Nouns                                                                 New inventions all the time: muggle, webinar, ...
     Verbs                                                              Semantics:
                                                                           Generally, words for people, places, things
                                                                           But not always (bandwidth, energy, ...)
         g g                                  y
  All languages have nouns and verbs... but may not have
                                                                        Syntactic i      t
                                                                        S t ti environment:
  the other two
                                                                           Occurring with determiners
                                                                           Pluralizable, possessivizable
                                                                        Other characteristics:
                                                                           Mass vs. count nouns

Verbs                                                                 Adjectives and Adverbs
  Open class                                                            Adjectives
     New inventions all the time: google, tweet, ...                       Generally modify nouns, e.g., tall girl
  Semantics:                                                            Adverbs
     Generally, denote actions, processes, etc.                            A semantic and formal potpourri…
  Syntactic environment:                                                   Sometimes modify verbs, e.g., sang beautifully
                                                                                             adjectives, e.g.,
                                                                           Sometimes modify adjectives e g extremely hot
     Intransitive, transitive, ditransitive
  Other characteristics:
     Main vs. auxiliary verbs
     Gerunds (verbs behaving like nouns)
     Participles (verbs behaving like adjectives)

Closed Class POS                                                      Particle vs. Prepositions
     In English, occurring before noun phrases
     Specifying some type of relation (spatial, temporal, …)          He came by the office in a hurry               (by = preposition)
                                                                      He came by his fortune honestly                (by = particle)
     Examples: on the shelf, before noon
  Particles                                                           We ran up the phone bill                       (up = particle)
     Resembles a preposition but used with a verb (“phrasal verbs”)
                  preposition,                    ( phrasal verbs )   We ran up the small hill                       (up = preposition)
     Examples: find out, turn over, go on
                                                                      He lived down the block                        (down = preposition)
                                                                      He never lived down the nicknames              (down = particle)

More Closed Class POS                                            Closed Class POS: Conjunctions
  Determiners                                                       Coordinating conjunctions
    Establish reference for a noun                                      Join two elements of “equal status”
    Examples: a, an, the (articles), that, this, many, such, …          Examples: cats and dogs, salad or soup
  Pronouns                                                          Subordinating conjunctions
    Refer to person or entities: he, she, it                            Join two elements of “unequal status”
    Possessive pronouns: his her its
                           his, her,                                                 We ll                     eating.
                                                                        Examples: We’ll leave after you finish eating While I was waiting
    Wh-pronouns: what, who                                              in line, I saw my friend.
                                                                        Complementizers are a special case: I think that you should finish
                                                                        your assignment

                                                                                 The (Linguistic)Twilight Zone
Lest you think it’s an Anglo-centric world,
           It’s time to visit ......                             Perhaps, not so strange…
                              The (Linguistic)                    uygarlaştıramadıklarımızdanmışsınızcasına →
                               Twilight Zone                      uygar+laş+tır+ama+dık+lar+ımız+dan+mış+sınız+casına
                                                                  behaving as if you are among those whom we could not cause to become civilized

                                                                  No verb/adjective distinction!
                                                                  漂亮: beautiful/to be beautiful

                       Digression                                                          Digression
             The (Linguistic)Twilight Zone                                       The (Linguistic)Twilight Zone
     Tzeltal (Mayan language spoken in Chiapas)                                         Riau Indonesian/Malay

    Only 3000 root forms in the vocabulary
                                                                  No Articles
    The verb ‘EAT’ has eight variations:
       General : TUN
       G        l                                                 N T      Marking
                                                                  No Tense M ki
       Bananas and soft stuff : LO’
       Beans and crunchy stuff : K’UX                             3rd person pronouns neutral to both gender and number
       Tortillas and bread : WE’
       Meat and Chilies : TI’                                     No features distinguishing verbs from nouns
       Sugarcane : TZ’U
       Liquids : UCH’

             The (Linguistic)Twilight Zone
                    Riau Indonesian/Malay

                  Ayam (chicken) Makan (eat)
                   The chicken is eating                                Back to regularly scheduled
                      The chicken ate
                    The chicken will eat
                 The chicken is being eaten
                Where the chicken is eating
                 How the chicken is eating
               Somebody is eating the chicken
                 The chicken that is eating

POS Tagging: What’s the task?                                       Penn Treebank Tagset: 45 Tags
  Process of assigning part-of-speech tags to words
  But what tags are we going to assign?
     Coarse grained: noun, verb, adjective, adverb, …
     Fine grained: {proper, common} noun
     Even finer-grained: {proper, common} noun ± animate
  I    t ti        t       b
  Important issues to remember
     Choice of tags encodes certain distinctions/non-distinctions
     Tagsets will differ across languages!
  For English, Penn Treebank is the most common tagset

Penn Treebank Tagset: Choices                                       Why do POS tagging?
  Example:                                                            One of the most basic NLP tasks
     The/DT grand/JJ jury/NN commmented/VBD on/IN a/DT                   Nicely illustrates principles of statistical NLP
     number/NN of/IN other/JJ topics/NNS ./.
                                                                      Useful for higher-level analysis
  Distinctions and non-distinctions                                      Needed for syntactic analysis
     Prepositions and subordinating conjunctions are tagged “IN”         Needed for semantic analysis
     (“Although/IN I/PRP..”)
     Except the preposition/complementizer “to” is tagged “TO”
                                                                      Sample applications that require POS t
                                                                      S   l     li ti     th t     i           i
                                                                         Machine translation
                                                                         Information extraction
                                                                         Lots more…
    Don’t think this is correct? Doesn’t make sense?
    Often, must suspend linguistic intuition
    and defer to the annotation guidelines!

Why is it hard?                                        Try your hand at tagging…
  Not only a lexical problem                             The back door
     Remember ambiguity?                                 On my back
  Better modeled as sequence labeling problem
                                                         Win the voters back
     Need to take into account context!
                                                         Promised to back the bill

Try your hand at tagging…                              Why is it hard?*
  I thought that you...
  That day was nice
  You can go that far

Part-of-Speech Tagging
  How do you do it automatically?
  How well does it work?                  This first

                                                            It s
                                                            It’s all about the benjamins

Evolution of the Evaluation                               Evaluation Metric
  Evaluation by argument                                    Binary condition (correct/incorrect):
  Evaluation by inspection of examples                         Accuracy
                                                            Set-based metrics (illustrated with document retrieval):
  Evaluation by demonstration
                                                                               Relevant         Not relevant
                                                                                                               Collection size = A+B+C+D
  Evaluation by improvised demonstration                         Retrieved        A                   B        Relevant = A+C
                                                                                                               Retrieved = A+B
  Evaluation data i
  E l ti on d t using a fi
                        figure of merit
                                f    it                        Not retrieved      C                   D

  Evaluation on test data                                      Precision = A / (A+B)
  Evaluation on common test data                               Recall = A / (A+C)
                                                               Miss = C / (A+C)
  Evaluation on common, unseen test data                       False alarm (fallout) = B / (B+D)

                                                               F-measure:      F=
                                                                                      (β   2
                                                                                               + 1 PR
                                                                                       β 2P + R

Components of a Proper Evaluation                         Part-of-Speech Tagging
  Figures(s) of merit                                       How do you do it automatically?                             Now this
  Baseline                                                  How well does it work?
  Upper bound
  Tests of statistical significance

Automatic POS Tagging                                     Rule-Based POS Tagging
  Rule-based POS tagging (now)                              Dates back to the 1960’s
  Transformation-based learning for POS tagging (later)     Combination of lexicon + hand crafted rules
  Hidden Markov Models (next week)                             Example: EngCG (English Constraint Grammar)

  Maximum Entropy Models (CMSC 773)
  Conditional R d
  C diti             Fields
            l Random Fi ld (CMSC 773)

EngCG Architecture                                                                 EngCG: Sample Lexical Entries

                   56,000 entries                      3,744 rules
      w1                                 w1w1                               t1
      w2                                 w2w2                               t2
       .              Lexicon             . .                                .
       .              Lookup              . .                                .
                                                       C   t i t
       .                                  . .                                .
      wn                                  wN
                                         wN                                 tn
                      Stage 1                           Stage 2

  sentence                          overgenerated                         final
                                         tags                             tags

EngCG: Constraint Rule Application                                                 EngCG: Evaluation
Example Sentence: Newman had originally practiced that ...                           Accuracy ~96%*
                                                                                     A lot of effort to write the rules and create the lexicon
                                                    ADVERBIAL‐THAT Rule                 Try debugging interaction between thousands of rules!
                                                    Given input: that
had        HAVE <SVO> V PAST VFIN                                                       Recall discussion from the first lecture?
           HAVE <SVO> PCP2                          if
originally ORIGINAL ADV                                 (+1 A/ADV/QUANT);
practiced PRACTICE <SVO> <SV> V PAST VFIN               (+2 SENT‐LIM);               Assume we had a corpus annotated with POS tags
           PRACTICE <SVO> <SV> PCP2                     (NOT ‐1 SVOC/A);
                                                        (NOT  1 SVOC/A)
that       ADV                                      then eliminate non‐ADV tags
                                                                                        Can we learn POS tagging automatically?
           PRON DEM SG                              else eliminate ADV tag
                                                       disambiguation constraint
              overgenerated tags

            I thought that you...      (subordinating conjunction)
            That day was nice.         (determiner)
            You can go that far.       (adverb)

Supervised Machine Learning                                                        Three Laws of Machine Learning
    Start with annotated corpus                                                      Thou shalt not mingle training data with test data
           Desired input/output behavior                                             Thou shalt not mingle training data with test data
    Training phase:
                                                                                     Thou shalt not mingle training data with test data
           Represent the training data in some manner
           Apply learning algorithm to produce a system (tagger)
    Testing h
    T ti phase:
           Apply system to unseen test data
           Evaluate output

Three Pillars of Statistical NLP                                Automatic POS Tagging
  Corpora (training data)                                         Rule-based POS tagging (before)
  Representations (features)                                      Transformation-based learning for POS tagging (now)
  Learning approach (models and algorithms)                       Hidden Markov Models (next week)
                                                                  Maximum Entropy Models (CMSC 773)
                                                                  Conditional R d
                                                                  C diti             Fields
                                                                            l Random Fi ld (CMSC 773)

  Learn to automatically paint the
                                                                                TBL: Training
     next Cubist masterpiece

TBL: Training                                                   TBL: Training

                                                  Error: 100%                                                    Error: 44%

                                   Most common: BLUE                                             change B to G if touching 

         Initial Step: Apply Broadest Transformation              Step 2: Find transformation that decreases error most

TBL: Training                                             TBL: Training

                                             Error: 44%                                                        Error: 11%

                            change B to G if touching                                        change B to R if shape is 

         Step 3: Apply this transformation                         Repeat Steps 2 and 3 until “no improvement”

TBL: Training                                             TBL: Training
                                                            What was the point? We already had the right answer!
                                             Error: 0%      Training gave us ordered list of transformation rules
                                                            Now apply to any empty canvas!

                  Finished !

                                                          TBL: Testing

            TBL: Testing                                                                      Ordered transformations:

                                                                                             Initial: Make all B

                                                                                             change B to G if touching 

                                                                                             change B to R if shape is 

TBL: Testing                                                     TBL: Testing

                                    Ordered transformations:                                         Ordered transformations:

                                    Initial: Make all B                                              Initial: Make all B

                                    change B to G if touching                                        change B to G if touching 

                                    change B to R if shape is                                        change B to R if shape is 

TBL: Testing                                                     TBL: Testing

                                    Ordered transformations:

                                    Initial: Make all B                                              Accuracy: 93%
                                    change B to G if touching 

                                    change B to R if shape is 

TBL Painting Algorithm                                           TBL Painting Algorithm

function TBL‐Paint                                               function TBL‐Paint
(given: empty canvas with goal painting)                         (given: empty canvas with goal painting)
begin                                                            begin                 Now, substitute:
  apply initial transformation to canvas                           apply initial transformation to canvas
                                                                                          ‘tag’ for ‘color’
  repeat                                                           repeat
                                                                                       ‘corpus’ for ‘canvas’
      try all color transformation rules                                              ‘untagged’ for ‘empty’
                                                                       try all color transformation rules
      find transformation rule yielding most improvements                             ‘tagging’ for ‘painting’
                                                                       find transformation rule yielding most improvements
      apply color transformation rule to canvas                        apply color transformation rule to canvas
  until improvement below some threshold                           until improvement below some threshold
end                                                              end

TBL Painting Algorithm                                         TBL Templates

 function TBL‐Paint                                               Change tag t1 to tag t2 when:
 (given: empty canvas with goal painting)                            w‐1 (w+1) is tagged t3
                                                                     w‐2 (w+2) is tagged t3                                Non-Lexicalized
                                                                     w‐1 is tagged t3 and w+1 is tagged t4
   apply initial transformation to canvas                            w‐1 is tagged t3 and w+2 is tagged t4
                                                                  Change tag t1 to tag t2 when:
       try all color transformation rules                            w‐1 (w+1) is foo
       find transformation rule yielding most improvements           w‐2 (w+2) is bar                                         Lexicalized
                                                                     w is foo and w‐1 is bar
       apply color transformation rule to canvas                     w is foo, w‐2 is bar and w+1 is baz
   until improvement below some threshold

                                                                Only try instances of these (and their combinations)

TBL Example Rules                                              TBL POS Tagging
                                                                          Rule-based, but data-driven
                                                                                 No manual knowledge engineering!
          He/PRP is/VBZ as/IN tall/JJ as/IN her/PRP$
                                                                          Training on 600k words, testing on known words only
               Change from IN to RB if w+2 is as
                                                                                 Lexicalized rules: learned 447 rules, 97.2% accuracy
          He/PRP is/VBZ as/RB tall/JJ as/IN her/PRP$                             Early rules do most of the work: 100 → 96.8%, 200 → 97.0%
                                                                                                                    rules, 97.0%
                                                                                 Non-lexicalized rules: learned 378 rules 97 0% accuracy
                                                                                 Little difference… why?

  He/PRP is/VBZ expected/VBN to/TO race/NN today/NN                       How good is it?
                                                                                 Baseline: 93-94%
          Change from NN to VB if w‐1 is tagged as TO
                                                                                 Upper bound: 96-97%
  He/PRP is/VBZ expected/VBN to/TO race/VB today/NN

                                                             Source: Brill (Computational Linguistics, 1995)

Three Pillars of Statistical NLP                               In case you missed it…
                                                                              Uh… what about this assumption?
   Corpora (training data)                                                Assume we had a corpus annotated with POS tags
   Representations (features)                                                    Can we learn POS tagging automatically?
                                                                                               Yes, as we’ve just shown…
   Learning approach (models and algorithms)

                                                                        knowledge engineering vs. manual annotation

Penn Treebank Tagset                                                    Turkish Morphology
  Why does everyone use it?                                                      Remember agglutinative languages?
  What’s the problem?                                                                  uygarlaştıramadıklarımızdanmışsınızcasına →
  How do we get around it?                                                             behaving as if you are among those whom we could not cause to
                                                                                       become civilized
                                                                                 How bad does it get?
                                                                                       uyu – sleep
                                                                                       uyut – make X sleep
                                                                                       uyuttur – have Y make X sleep
                                                                                       uyutturt – have Z have Y make X sleep
                                                                                       uyutturttur – have W have Z have Y make X sleep
                                                                                       uyutturtturt – have Q have W have Z …

                                                                      Source: Yuret and Türe (HLT/NAACL 2006)

Turkish Morphological Analyzer                                          Morphology Annotation Scheme
  Example: masalı                                                                masa+Noun+A3sg+Pnon+Nom^DB+Adj+With
     masal+Noun+A3sg+Pnon+Acc (= the story)
     masal+Noun+A3sg+P3sg+Nom (= his story)                                        stem
     masa+Noun+A3sg+Pnon+Nom^DB+Adj+With (= with tables)
  Disambiguation in context:                                                                                    inflectional group (IG)         derivational     IG
     Uzun masalı anlat         (Tell the long story)                                                                                             boundary y

     Uzun masalı bitti         (His long story ended)
     Uzun masalı oda           (Room with long table)
                                                                                 How rich is Turkish morphology?
                                                                                       126 unique features
                                                                                       9129 unique IGs
                                                                                       infinite unique tags
                                                                                       11084 distinct tags observed in 1M word training corpus

How to tackle the problem…                                              Learning Decision Lists
  Key idea: build separate decision lists for each feature                       Start with tagged collection
  Sample rules for +Det:                                                               1 million words in the news genre
                                                                                 Apply greedy-prepend algorithm
   R1    If    (W = çok) and (R1 = +DA)     “pek çok alanda”   (R1)
                                                                                       Rule templates based on words, suffixes, character classes within
        Then   W has +Det                   “pek çok insan”    (R2)                    a five word window
   R2    If    (L1 = pek)                   “insan çok daha”   (R4)
        Th       h     D t
               W has +Det                                                                 GPA(data)
   R3    If    (W = +AzI)                                                                 1 dlist = NIL
        Then   W does not have +Det                                                       2 default-class = Most-Common-Class(data)
   R4    If    (W = çok)                                                                  3 rule = [If TRUE Then default-class]
        Then   W does not have +Det                                                       4 while Gain(rule, dlist, data) > 0
   R5    If    TRUE                                                                       5     do dlist = prepend(rule, dlist)
        Then   W has +Det                                                                 6         rule = Max-Gain-Rule(dlist, data)
                                                                                          7 return dlist

Results                                                       What we covered today…
                                                                What are parts of speech (POS)?
            7000                             100
            6000                             98                 What is POS tagging?
            5000                             96
                                                                Methods for automatic POS tagging


                                             92                   Rule-based POS tagging
                                             90                   Transformation-based learning for POS tagging

            2000                             88                 Along the way…
            1000                             86
              0                              84
                  ro                                              Supervised machine learning

               A3 b
              Pn un

               No n

                Po j


               Ze p
              P3 s
              No g

              P2 g
               Pr g
               Ve B

             Ad Acc





                   Overall accuracy: ~96%!


To top