Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out

Morphology

VIEWS: 9 PAGES: 10

									      Finite State Transducers
• The machine model we will study for
  morphological parsing is called the finite
  state transducer (FST)
• An FST has two tapes
  – input tape (with an input alphabet)
  – output tape (with an output alphabet)
   Formal definition of FST (from
                text)
• M = (Q, q0, F, d), where
  – Q is a finite set of states
  –  is a finite alphabet of complex symbols (i.e.
    pairs of input-output symbols).  = { i:o | i is
    an input tape symbol and o is an output tape
    symbol}
  – q0 Q is the initial state
  – F Q is a set of final (accepting) states
  – dQ 
                Example
• We want to be able to parse words (recover
  structure for them) including such words as
  goose (which is ambiguous):
  – goose  [goose +N +SG] or [goose +V]
  – geese  [goose +N +PL]
  – gooses  [goose +V +3SG]
 Components of a morphological parser
• lexicon: morphemes (stems and affixes)
  together with category information
• morphotactics: rules of morpheme order
• orthographic (spelling) rules: rules of
  changes in spelling when morphemes
  combine
           Lexicon for FST

• The lexicon can be modelled using two
  levels:
  – Surface form (e.g. geese)
  – Underlying form (e.g. [goose +N +PL])
• This will allow lexicon to handle irregular
  forms
• Example lexicon on next slide
              Example lexicon
f:f o:o x:x           [fox +N +SG]
c:c a:a t:t           [cat +N +SG]
g:g o:o o:o s:s e:e   [goose +N +SG] or [goose +V]
g:g o:e o:e s:s e:e   [goose +N +PL]
g:g o:o o:o s:s e:e   [goose +V +3SG]
e:d
s:s h:h e:e e:e p:p   [sheep +N +SG] or [sheep +N
                      +PL]
m:m o:o u:u s:s e:e   [mouse +N +SG]
m:m o:i u:e s:c e:e   [mouse +N +PL]
Generation example: foxes


 f   o   x   +N +PL

 f   o   x   ^   s    #

 f   o   x   e   s
FST for [fox +N +PL]  fox^s#


        f:f    o:o    x:x    +N:e   +PL:^s#

   q0         q1     q2     q5       q6        q7
        FST for E-insertion rule



                                           #
                                           other
             #
                               z,x
             other
                                                   e:e        s        #
        q0            q1                   q2            q3       q4
^:e
#                                    ^:e
other         z,x,s          z,x,s
                                             ^:e     s
                                 z,x,s
                     other                 q5


 “other” means any symbol except “s”, “x”, “z”, “^”, “e”, “#”
Generation example: foxes


    f       o       x       +N +PL
0       1       2       5       6       7


    f       o       x       ^       s       #
0       0       0       1   2   3       4       0


    f       o       x       e       s

								
To top