Lecture05 Morph

Document Sample
Lecture05 Morph Powered By Docstoc
					  Morphology

CIS 530/430 Lecture 5
Levels of Representation
                   Full
                 Semantics

                  Explicit
                 Semantics


                  Syntax


                   Words

                Morphology

  Also, higher representations require lower
                                          2
• Language understanding is the most
  complex, possibly most interesting task

• But it depends on (or is performed in
  conjunction with) other tasks that may at
  first seem easier
What are the words in a language?
• We discussed issues about capitalization,
  dealing with punctuation, etc

• What about morphology?
  – cat, cats
  – show, showed, showing
  – catch, caught, catching
          English Morphology
• Morphology is the study of the ways that words
  are built up from smaller meaningful units called
  morphemes

• We can usefully divide morphemes into two
  classes
  – Stems: The core meaning-bearing units
  – Affixes: Bits and pieces that adhere to stems to
    change their meanings and grammatical functions
       Morphological parsing
• Surface/input form
  – going

• Morphological parse
  – VERB-go + GERUND-ing

  In order to know which suffixes can attach to
    which stems, we do need to know which
    words are nouns, verbs, adjective, adverbs
         English Morphology
• Inflectional
  – combining a word stem with a grammatical
    morpheme, usually resulting in a word of the
    same class as the original stem and usually
    filling some syntactic function
• Derivational
  – Combination of a word stem with a
    grammatical morpheme, usually resulting in a
    word of a different class, often with a meaning
    hard to predict
              English nouns
• Simple inflectional morphology
  – an affix that marks plural
  – possessive
            English verbs
• Inflectional morphology marks tense
      Regulars and Irregulars

• It is a little complicated by the fact that
  some words misbehave (refuse to follow
  the rules)
  – Mouse/mice, goose/geese, ox/oxen
  – Go/went, fly/flew
• The terms regular and irregular are used
  to refer to words that follow the rules and
  those that don’t
English morphology is very simple
compared to most other languages
Morphological parsing
     Derivational Morphology
• Derivational morphology is the messy stuff
  that no one ever taught you
  – Quasi-systematicity
  – Irregular meaning change
  – Changes of word class
         Derivational Examples
• Verbs and Adjectives to Nouns

-ation          computerize       computerization

-ee             appoint           appointee
-er             kill              killer
-ness           fuzzy             fuzziness
          Derivational Examples
• Nouns and Verbs to Adjectives

  -al               computation   computational

  -able             embrace       embraceable

  -less             clue          clueless
            Example: Compute
• Many paths are possible…
• Start with compute
   – Computer -> computerize -> computerization
   – Computer -> computerize -> computerizable
• But not all paths/operations are equally good
  (allowable?)
   – Clue
      • Clue -> *clueable
   – Happy unhappy
   – Sad  *unsad
• Can we represent this knowledge about
  words in a more compact way than listing
  all possible word forms?
To build a morphological parser we
               need
• Lexicon
  – The list of stems and affixes, and if they are a verb or
    noun
• Morphotactics
  – The model of morpheme ordering inside a word
     • the English plural morpheme follows the noun
• Orthographic rules
  – Model changes that occur in the word
     • City+scities
Morpholgy and finite state automata
• We’d like to use the machinery provided
  by FSAs to capture these facts about
  morphology
  – Accept strings that are in the language
  – Reject strings that are not
  – And do so in a way that doesn’t require us to
    in effect list all the words in the language
     Simple Rules
a compact representation
Verbs
Adjectives
Now Plug in the Words
Derivational Rules
                  Ambiguity

• Recall that in non-deterministic recognition
  multiple paths through a machine may
  lead to an accept state.
  • Didn’t matter which path was actually
    traversed
• In FSTs the path to an accept state does
  matter since different paths represent
  different parses and different outputs will
  result
                Ambiguity
• What’s the right parse (segmentation) for
  • Unionizable
  • Union-ize-able
  • Un-ion-ize-able
• Each represents a valid path through the
  derivational morphology machine.
                 Ambiguity
• There are a number of ways to deal with
  this problem
  • Simply take the first output found
  • Find all the possible outputs (all paths) and
    return them all (without choosing)
  • Bias the search so that only one or a few
    likely paths are explored
• http://www.economist.com/blogs/johnson/2
  010/06/ambiguous_headlines
    Finite state transducers
producing a morphological parse
                     Transitions
          c:c        a:a        t:t       +N: ε    +PL:s




• c:c means read a c on one tape and write a c on the other
• +N:ε means read a +N symbol on one tape and write nothing on the
  other
• +PL:s means read +PL and write an s
             Typical Uses
• Typically, we’ll read from one tape using
  the first symbol on the machine transitions
  (just as in a simple FSA).
• And we’ll write to the second tape using
  the other symbols on the transitions.
    Multi-Level Tape Machines




• We use one machine to transduce between the
  lexical and the intermediate level, and another to
  handle the spelling changes to the surface tape
Lexical to Intermediate Level
 What if you don’t have a lexicon?
• Porter stemmer
  – ATIONAL  ATE [relational  relate]
  – ING  [motoring  motor]
  – SSES  SS [grasses  grass]
 Human morphological processing
• Full listing hypothesis
  – All words of a language are listed in the
    mental lexicon without any internal
    morphological structure
• Minimum redundancy hypothesis
  – Only constituent morphemes are represented
    in the lexicon and we access both
    morphemes to recognize a word form
    (walked)
         Slips of the tongue
• Show ability of affixes to be produced
  separately from their stems

• it’s not only us who have screw looses (for
  “screws loose”)
• words of rule formation (for “rules of word
  formation”)
• easy enoughly (for “easily enough”)
       Priming tests for verifying
              hypotheses
• A word is recognized faster if it has been seen
  before recently

• Stanner et al (1979)
  – Some derived forms (happiness, happily) seem to be
    stored separately from their stem (happy) but that
    regularly inflected forms (pouring) are not distinct in
    the lexicon from their stems (pour)
     • Lifting primed lift
     • Burned primed burning
     • Selective didn’t prime select
   Marslen-Wilson et al (1994)
• Found that spoken derived words primed
  their stems, but only if the meaning of the
  derived form is closely related to the stem
  – government primes govern
  – department does not prime depart

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:3
posted:9/14/2012
language:Unknown
pages:38