Why-NL-is-not-CF

Document Sample
Why-NL-is-not-CF Powered By Docstoc
					       Why NL is not CF

     As proven in Stuart Shieber‟s 1985 paper,
      Evidence Against Context Freeness,
and explained informally for people whose eyes glaze
                  over formal proofs




                                                       1
                  Motivation
• Important to theoretical linguists and
  philosophers of language. Central to Chomsky‟s
  innateness hypothesis as well as to critics of
  transformational grammars and their derivatives
• Should be of at least theoretical interest to
  Computational Linguists because computational
  processing difficulty of languages is directly
  linked to their formal complexity
• Personal motivation: If I label myself a Linguist I
  should have more than just a vague idea of what
  this question is about
                                                    2
   Tiny subset of the question
• A. Why are the Swiss German constructions a
  proof that at least some NLs are not CFLs?
• B. How are the Swiss German examples
  crucially different from similar Dutch examples
  that are not a proof of the non-CFL-ness of NLs?
• C. Are Polish (and other free word order
  languages) that contain sentences essentially
  identical to the Swiss German ones additional
  examples non-CFLs?

                                                 3
          The Chomsky Hierarchy
                                 adapted from www.wikipedia.com
          α, β, γ – strings of terminals and nonterminals, A,B – nonterminals, x-string of nonterminals



Grammar                   Languages                      Automaton                       Production rules


Type-0                    Recursively                    Turing Machine                  α->β (no
                          Enumerable                                                     restrictions)


Type-1                    Context Sensitive              Linear-Bounded Non- αAβ -> αγβ, where
                                                         Deterministic Turing γ is not empty
                                                         Machine

Type-2                    Context Free                   Non-deterministic               A->γ
                                                         Pushdown
                                                         Automaton

Type-3                    Regular                        Finite State                    A->x
                                                         Automaton                       A->xB


                                                                                                            4
Preliminaries – Regular Languages
• Productions (A->xB, A->x, where A and B are nonterminals and x is
  any string in the language)
• Here is an example of a regular grammar:
• Vocabulary of terminals={a, cat, dog, mouse, chased, scared,
  squeaked}
• Vocabulary of non-terminals={S,VP,NP}
• S is the only initial symbol.
• S->a mouse VP
• VP->squeaked
• VP->chased NP
• VP->scared NP
• NP->a N
• N->cat
• N->dog
• N->mouse
                                                                      5
Preliminaries – Regular Languages
•   S->a mouse VP
•   VP->squeaked
•   VP->chased NP
•   VP->scared NP
•   NP->a N
•   N->cat
•   N->dog
•   N->mouse
•   What sentences can this grammar recognize/generate?
•   A mouse squeaked.
•   A mouse chased a cat.
•   A mouse chased a dog.
•   A mouse chased a mouse.
•   A mouse scared a cat.
•   A mouse scared a dog.
•   A mouse scared a mouse.

                                                          6
Preliminaries – Regular Languages
• If we want this grammar to generate sentences
  with a subject other than “a mouse” we have to
  add the following productions:
• S->a cat VP
• S->a dog VP
• This is inefficient
• What‟s worse, it doesn‟t capture the NP
  generalization (i.e. doesn‟t give us the “right”
  structure)
• We need S->NP VP but this is not a legitimate
  production

                                                     7
Preliminaries – Regular Languages
• Consider a CFG that accepts the same sentences (slightly different
  nonterminal vocabulary):
• S->NP VP
• NP->DT N
• DT->a
• N->cat
• N->dog
• N->mouse
• VP->VI
• VP->VT NP
• VI->squeaked
• VT->chased
• VT->scared



                                                                       8
Preliminaries – Regular Languages
 Crucial point: the regular grammar shown
 here recognizes the strings we want it to
 recognize but it doesn‟t assign to them the
 structure we want. That is, this grammar
 weakly generates the language in question
 but doesn‟t strongly generate it. Why is
 this important?


                                           9
Preliminaries – Regular Languages
• An example of why it is important:
• I saw the man with a telescope.
• Recognizing the string is not sufficient for expressing its
  syntactic ambiguity
• We need some way of expressing the ambiguity to get at
  the two meanings.
• The reason I am drawing attention to the weak/strong
  distinction is that it seems that there is some conceptual
  confusion. When people say “language x is CF” that
  don‟t always say whether they mean that it is strongly CF
  or just weakly CF. In the context of formal languages and
  automata, people talk about weak generative capacity,
  but to us linguists strong generative capacity is of greater
  interest.
                                                            10
Preliminaries – Regular Languages
• In any case, for some sentences of English, one cannot
  write a regular grammar at all. That is, some sentences
  cannot be even recognized by a regular grammar, let
  alone assigned the correct structure. That is, they are
  not even weakly regular.
• A mouse a cat chased squeaked.
• A mouse a cat a dog scared chased squeaked.
• Example of Center Embedding:
• NP1 NP2 NP3 … V3 V2 V1
• There is no way to write a regular grammar for an
  arbitrary number of such embeddings.
• How do we know this for sure?
                                                            11
Preliminaries – Regular Languages
• Pumping Theorem for finite state
  languages
• If a language is an infinite set over some
  alphabet E, then there are strings x,y,z
  made out of the characters of E, such that
  y is not the empty string, and xynz is in the
  language for all n>=0.
• What does this mean?
                                             12
Preliminaries – Regular Languages
•   Example: L={abn|n>=0}
•   Some strings in this language are:
•   a
•   ab
•   abb
•   abbb
•   There are strings xynz such that y is not empty and xynz is in the
    language for all n>=0. For example, the string abb is such a string:
    x=a, y=b, and z=b. The following are all in the language:
•   n=0, x=a, y=b0, z=b, ab
•   n=1, x=a, y=b1, z=b, abb
•   n=2, x=a, y=b2, z=b, abbb
•   Why does this have to be true?



                                                                           13
Preliminaries – Regular Languages
• If a language is regular then by definition there is
  some regular grammar that accepts it.
• By definition a grammar has a finite set of
  productions. In our example, one grammar for
  this language could be: S->aB, B->bB, B->Ø
• But if the language is to consist of an infinite
  number of strings then there are strings in this
  language that have more symbols in them than
  there are productions, so some production must
  be applied more than once to generate the
  string.
                                                    14
Preliminaries – Regular Languages
• Let‟s call the substring read by the grammar up to the point in which
  the production which ends up being used more than once is used for
  the first time, x (so in our example, let‟s say that we have a
  production such as S->aB; so x=a) .
• Now let‟s call the substring that is read when the production
  eventually used more than once is used for the first time, y (so in our
  example, let‟s say that we have a production such as B->bB; so
  y=b).
• Let‟s call the substring that is read from the point where we used
  that B->bB production for the first time to the end of the string, z (in
  this case z can result from B->b or even be empty and correspond to
  no production).
• But since the middle substring, y, is the result of applying a recursive
  production (in this example, B->bB), we know that we can apply this
  production arbitrarily many times and thus make n in yn arbitrarily
  large.


                                                                        15
Preliminaries – Regular Languages
•   So how would we use the Pumping theorem to prove that the language that
    accepts the sentences exhibiting center embedding that are shown above
    cannot be regular.
•   Similar language: L={anbn|n>=0}
•   ab
•   aabb
•   aaabbb
•   aaaabbbb
•   ….
•   This language does not contain any strings in which the number of b‟s does
    not equal the number of preceding a‟s or which includes any a‟s after b‟s:
•   *a
•   *aab
•   *abb
•   *abab



                                                                            16
Preliminaries – Regular Languages
• Imagine that L={anbn|n>=0} were a regular
  language.
• Then there would be some string xyz, such that
  y is not the empty string, and xynz is in the
  language for all n>=0.
• The substring y is the substring created by the
  recursive rule and it therefore cannot contain
  both a‟s and b‟s because we‟d end up with b‟s
  following a‟s when we pump the string
• So the substring y must consist entirely of a‟s or
  entirely of b‟s.

                                                   17
Preliminaries – Regular Languages
• If y consists entirely of a‟s then z consists entirely of b‟s.
• But every time we apply the recursive rule that created y
  we get one more a and since z is fixed we cannot
  increase it by the same number of b‟s, so it will always
  be possible to get a greater number of a‟s than b‟s.
• If y consists entirely of b‟s then x consists entirely of a‟s.
  But every time we apply the recursive rule that created y
  we get one more b and since x is fixed we cannot
  increase it by the same number of a‟s, so it will always
  be possible to get a greater number of b‟s than a‟s.
• So there is no string xynz that satisfies the conditions for
  a regular language. So L={anbn|n>=0} is not a regular
  language.

                                                               18
Preliminaries – Regular Languages
• Now how does this relate to the center
  embedding examples?
• The set of sentences that exhibit center
  embedding as described above may be viewed
  as a special case of the L={anbn|n>=0} language,
  with nouns being a‟s and verbs being: b‟s, or
  L={(cat|dog|mouse)n(chased|scared|squeaked)n|
  n>=0}.
• There is no way of writing a regular grammar
  that accepts strings with an arbitrarily long
  number of nouns followed by the same exact
  number of verbs.
                                                19
Preliminaries – Regular Languages
• So we have shown that the subset of English
  which consists of these types of sentences is not
  regular.
• But this doesn‟t in itself prove that English itself
  is not regular.
• In order to show that English is not regular we
  need to use a few more steps in our proof.
• Regular languages are closed under
  intersection. This means that intersecting a
  regular language with a regular language
  produces a regular language.

                                                    20
Preliminaries – Regular Languages
• If English were a regular language than intersecting it
  with some other regular language would result in a
  regular language. We will try to find some regular
  language and show that intersecting it with English
  results in
  L={(cat|dog|mouse)n(chased|scared|squeaked)n|n>=0}
  which we have already shown is not a regular language.
• What language when intersected with English would
  produce
  L={(cat|dog|mouse)n(chased|scared|squeaked)n|n>=0}?
• L={(cat|dog|mouse)*(chased|scared|squeaked)*} is
  clearly regular and intersecting it with English results in
  L={(cat|dog|mouse)n(chased|scared|squeaked)n|n>=0.
• So English is not regular.
                                                            21
                      Brief History
• Chomsky (1963) – NLs are not regular or CF. Proposed the
  transformational grammar model as an alternative
• The notion that all NL phenomena are regular was put to rest. But
  the inadequacy of context free grammars for handling NL proved
  more controversial. Chomsky‟s proofs for the non-CFG-ness of NL
  are not accepted.
• Peters & Ritchie (1973) showed that Chomsky‟s transformational
  grammar framework was powerful enough to describe any
  recursively enumerable set - perhaps too powerful.
• Until 1985, all the arguments for the claim that NLs are not CF were
  shown to be flawed (see alleged counterexamples debunked in
  Gazdar & Pullum (1982))
• Shieber (1985) provides the first syntactic counterexample to the
  claim that CFGs are powerful enough to generate NL. Shieber‟s
  argument survived until today.


                                                                     22
                                Dutch
• Dutch has been initially introduced as a counterexample
  but later dismissed. The example and the explanation of
  why it is not a counterexample is presented in Bresnan,
  Kaplan, Peters & Zaenen (1982).
• Dutch has the following structures:
•   …dat Jan Marie Piet de kinderen zag        helpen laten      zwemmen
•    that Jan Marie Piet the children see-past help-inf make-inf swim-inf
• „..that Jan saw Marie help Piet make the children swim‟
• The structure is:
• …that NP1 NP2 NP3 NP4 V1 V2 V3 V4



                                                                            23
                        Dutch
• Arbitrarily many of these NP V pairs may be inserted to
  form longer sentences.
• The number of verbs and NPs must be the same.
• The first verb has to be tensed and it must agree with the
  first NP.
• All the other verbs have to be infinitives.
• The subcategorization constraints between the final NP
  and final verb must be satisfied.
• This is an example of cross-serial dependencies.
• A language that has strings with arbitrarily long cross-
  serial dependencies is not a CFL.

                                                          24
   Center Embedding
                            [ [ [ …] ] ]
                            1 2 3     3 2 1




Cross Serial Dependencies   [ [ [ …] ] ]
                            1   2 3   1 2 3




                                              25
     Context Free Languages
• L={ambncmdn|m,n>=0}
• We can use the pumping theorem for CFLs to
  show this.
• If L is an infinite CFL, then there is some
  constant K such that any string w in L longer
  than K can be factored into substrings w=uvxyz
  such that v and y are not both empty and
  uvnxynz is in L for all n>=0.
• What does this mean?

                                                   26
       Context Free Languages
•   Let‟s look first at L={anbn|n>=1}
•   One CFG for this L would be:
•   S->aSb
•   S->ab
•   n=0, w=empty, v=a0, x=ab, y=b0, z=empty, ab
•   n=1, w=empty, v=a1, x=ab, y=b1, z=empty, aabb
•   n=2, w=empty, v=a2, x=ab, y=b2, z=empty,
    aaabbb

                                                27
     Context Free Languages
• We can show that ambncmdn is not CF by
  using the pumping theorem.
• If ambncmdn were a CFL then there would
  be some constant K such that any string in
  L longer than K, say ak bk ck dk, for
  example, could be written as w=uvxyz
  such that v and y are not both empty and v
  and y are pumpable.

                                           28
      Context Free Languages
• v can‟t consist of both a‟s and b‟s because when
  pumped it would produce strings with a‟s after
  b‟s. Similarly, it cannot consist of both b‟s and
  c‟s or both c‟s and d‟s. The same goes for the
  other pumpable term, y.
• So v must consist entirely of a‟s or entirely of b‟s
  or entirely of c‟s or entirely of d‟s. Then no
  matter what y we choose, any pumping of v and
  y simultaneously will result in strings not in L
  because we can pump only 2 symbols at a time
  but not 4.
                                                     29
                  Dutch
• The Dutch example seems to exhibit the
  same cross serial dependencies we just
  showed could not be handled by a CFG.
• However, it is possible to write a CFG that
  would accept these Dutch strings.




                                            30
                                         Dutch
•   We can divide the verbs as follows:
•   1. V-index: Form infinitive, Subcats for a subject (swim)
•   2. V-tensed: Form tensed, Subcats for a subject it agrees with and an S‟ or S” complement
    without complementizer (saw)
•   3. V-infinitive: Form infinitive, Subcats for a subject and an S‟ or S‟‟ complement without
    complementizer (help, make)
•   1. S->NP-agr S‟-agr-index V-index
•   2. S‟-agr-index->NP S‟-agr-index V-infinitive
•   3. S‟-agr-index->NP S‟‟-agr-index V-infinitive
•   4. S‟‟-agr-index->NP-index V-tensed
•   5. NP-index -> Jan|Piet|Marie|the children
•   6. V-tensed->saw
•   7. V-index->swim
•   8. V-infinitive-> help|make
•   These productions would accept the example sentence as well as the following sentences:
•   …that Jan Marie Piet the children see-past make-inf help-inf swim-inf
•   …that Jan Marie the children Piet see-past help-inf make-inf swim-inf
•   …that Jan Marie the children Piet see-past make-inf help-inf swim-inf
•   These are perfectly grammatical.




                                                                                                  31
                                Dutch
• How come this works? Note that the only items
  we care about are the ones in bold.

•   …that Jan Marie Piet the children see-past help-inf make-inf swim-inf
•   „…thar Jan saw Marie help Piet make the children swim‟
•   …that Jan Marie Piet the children see-past make-inf help-inf swim-inf
•   „…thar Jan saw Marie make Piet help the children swim‟
•   …that Jan Marie the children Piet see-past help-inf make-inf swim-inf
•   „…thar Jan saw Marie help the children make Piet swim‟
•   …that Jan Marie the children Piet see-past make-inf help-inf swim-inf
•   „…thar Jan saw Marie make the children help Piet swim‟


                                                                            32
                              Dutch
• We can recognize and generate all the grammatical strings with this
  grammar because the number of items we need to cross-reference
  is finite and all the other items are interchangeable syntactically.
• Note that the sentences above are all grammatical but each has a
  different interpretation in Dutch.
• The final tree structure of each sentence will only reflect the order of
  the words in the sentence and not all the cross-serial dependencies.
• So we can write a grammar to recognize and generate all these
  strings but not to assign a structure to them that will preserve the
  cross-serial dependencies.
• So the grammar above weakly generates the cross-serial examples
  but doesn‟t strongly generate them.
• This is sufficient if we are interested in classifying sentences as
  grammatical or ungrammatical but is it sufficient for semantic
  interpretation? Probably not.


                                                                        33
                Swiss German
• How is the Swiss German example set
  presented by Shieber (1985) crucially different
  from the Dutch example set?
• …mer em Hans es haus             halfed aastriiche
• …we Hans-DAT the house-ACC helped paint
• „…we helped Hans paint the house.‟
• …mer d‟chind em          Hans      es haus         lond
  halfe aastriiche
• …we the-children-ACC Hans-DAT the house-ACC let
  help paint
• „…we let the children help Hans paint the house.‟

                                                        34
                Swiss German
• …we the-children-ACC Hans-DAT the house-ACC let   help paint


• In Swiss German verbs subcategorize for NPs with
  specific cases.
• Some verbs subcategorize for accusative NPs and some
  verbs subcategorize for dative NPs
• The number of verbs subcategorizing for accusative
  case NPs must be the same as the number of
  accusative NPs in the sentence and the number of verbs
  subcategorizing for dative case NPs must be the same
  as the number of dative NPs in the sentence.


                                                             35
            Swiss German
• Why can‟t we produce arbitrarily long
  sentences of this type with a CFG?
• Simplest case: all the accusatives precede
  all the datives:
• NPam NPdn Vam Vdn
• This is the same as ambncmdn which is
  non-CF, as shown earlier.

                                           36
                Swiss German
• The Swiss German strings with all the accusatives
  preceding all the datives may be presented as
• …NP_ACCmNP_DATn…V_ACCmV_DATn…
• which is the same as:
• ambncmdn which has been shown not to be CF.
• CFLs are closed under intersection with regular
  languages.
• If Swiss German were CF then intersecting it with the
  regular language a*b*c*d* would yield a CF.
• But the intersection, wambnxcmdny, is not CF, so Swiss
  German cannot be CF.

                                                           37
                Swiss German
• Shieber‟s argument rests on the impossibility of writing a
  CFG that would insure that the total number of
  accusatives and datives matched and not on the order
  they appear in.
• This argument is SUFFICIENT for Swiss German‟s
  status as a non-CFL.
• The argument does NOT DEPEND on the ORDER of the
  inner NPs and inner Verbs.
• Surprising.
• Orders other than NP1 NP2 NP3..NPn V1 V2 V3 ..Vn are
  acceptable.
• So, Polish and other similar examples provide the same
  exact counterexamples that Swiss German does.

                                                          38
              Conclusions
• If I am right then one important point to
  take away from this: the often cited Swiss
  German example is by no means unique.
  It just happens to be the first published
  counterexample of NL syntax not being
  CF. Once the inadequacy of CFGs was
  shown for one language, there is no need
  to show it for other languages.

                                               39
                Conclusions
• Another important point is that the proof shows
  that Swiss German, and thus NL, is not even
  weakly CF.
• Since in CL we are not interested in merely
  recognizing strings, weak generative capacity
  without strong generative capacity is of limited
  use to us and plenty of examples have been
  presented to support the claim that NLs are not
  strongly CF before Shieber‟s paper, so its
  importance is perhaps more theoretical than
  practical from a CL point of view.
                                                     40
                                    References
•   Bresnan, Kaplan, Peters & Zaenen (1982) - Joan Bresnan, Ron Kaplan, Stanley
    Peters, and Annie Zaenen. 1982. Cross-serial dependencies in Dutch. Linguistic
    Inquiry, 13(4):613--35.
•   Chomsky (1963) - Chomsky, Noam. 1963. Formal properties of grammar. In Luce,
    R.D., R.R. Bush and E. Galanter (eds), Handbook of Mathematical Psychology, vol.II.
    New York: Wiley, pp. 323-418.
•   Partee (1993) - Mathematical Methods in Linguistics, Corrected second printing of the
    first edition (Studies in Linguistics and Philosophy) by Barbara H. Partee, Alice Ter
    Muelen and Robert Wall
•   Peters & Ritchie (1973) - Peters, Stanley; R. Ritchie (1973). "On the generative
    power of transformational grammars". Information Sciences 6: 49-83.
•   Pullum & Gazdar (1982) - Pullum, Geoffrey K., and Gerald Gazdar (1982) "Natural
    languages and context-free languages," <u>Linguistics and Philosophy</u> 4, 471--
    504.
•   Savitch (1987) - The formal complexity of natural language" by Walter J. Savitch,
    Emmon Bach, William Marsh, and Gila Safran-Naveh. D. Reidel 1987
•   Shieber (1985) - Stuart M. Shieber. Evidence against the context-freeness of natural
    language. Linguistics and Philosophy, 8:333-343, 1985.
    http://www.eecs.harvard.edu/~shieber/Biblio/Papers/shieber85.pdf
•


                                                                                      41

				
DOCUMENT INFO
Shared By:
Tags: Why-N, L-is-
Stats:
views:13
posted:12/1/2009
language:English
pages:41
Description: Why-NL-is-not-CF