VIEWS: 18 PAGES: 6 POSTED ON: 4/21/2010 Public Domain
General Structure of Automata a0 a1 a2 ... .... an read input tape head Push-Down Automata Finite Auxiliary State Memory COMP2600 — Formal Methods for Software Engineering Control The input tape is a sequence of tokens. Clem Baker-Finch Each time a symbol is processed the read head advances. Australian National University The auxiliary memory is usually a linear organisation (e.g. a stack). Semester 2, 2008 The memory alphabet is usually Vt ∪Vn . The ﬁnite state control can be in any one of a ﬁnite number of states. COMP 2600 — Push-down Automata 1 COMP 2600 — Push-down Automata 3 Languages and Automata General Automata ctd Recall that to deﬁne a language we can either: Each action of the machine may change the FSC state, change the auxiliary memory, advance to the next input symbol. 1. Give a set of rules (i.e. a grammar) to produce all the legal strings (sentences) of the language. The action of the machine depends on the current FSC state, the current input symbol, the current memory symbol(s). 2. Provide a machine (i.e. an algorithm) to recognise all the sentences of the language. The machine starts in some particular start state (q0 ), with the read head at the ﬁrst input symbol (a0 ), with the memory empty. There is a close relationship between the two approaches. Commonly we deﬁne a language by giving a grammar and then base parsers (or compilers) A machine accepts an input string as a sentence of the language if it on the corresponding machine. reaches a goal state with the input exhausted. The machines are automata like Turing machines, but constrained in the same sense as the Chomsky hierarchy. COMP 2600 — Push-down Automata 2 COMP 2600 — Push-down Automata 4 Automata and Grammars PDAs ctd The kind of auxiliary memory in a machine determines the class of Each action of the machine may involve change to the FSC state, pushing languages that the machine can recognise: or popping the stack, advance to the next input symbol. Language Class Memory The action of the machine may depend on the current FSC state, the regular none current input symbol, the current top-of-stack symbol. context-free stack The machine accepts an input string if it reaches a speciﬁed goal state, with context-sensitive tape (bounded by input length) the input exhausted and the stack empty. unrestricted unbounded tape We have already looked at Finite State Automata (i.e. automata without memory and their relation to regular languages. We now consider Push-Down Automata (i.e. automata with stack memory) and their relation to context-free grammars and languages. COMP 2600 — Push-down Automata 5 COMP 2600 — Push-down Automata 7 Push-down Automata — PDA Example {an bn | n ∈ N} a0 a1 a2 ... .... an Recall that this language cannot be recognised by a FSA (because there read input tape can only be a ﬁnite number of states). But it can be recognised by a PDA. head Ad hoc design: Finite • phase 1: (state q1 ) stack as State zk • phase 2: (state q2 ) pop as, if there is a b on input Control stack • ﬁnalise: if the stack is empty and the input is exhausted in the goal state memory z2 (q3 ), accept the string. z1 COMP 2600 — Push-down Automata 6 COMP 2600 — Push-down Automata 8 Example ctd Example ctd — PDA Trace PDA transitions modify the stack as well as change the FSC state, so we PDA conﬁgurations can be written as a triple (state, remaining input, stack) write transitions are a function δ of type: with the top of stack to the left. δ : (state, input token, tos) → (state, string) (q0 , aaabbb, Z) ⇒ (q1 , aabbb, aZ) ⇒ (q1 , abbb, aaZ) The string in the result is the symbols with which to replace the top-of-stack ⇒ (q1 , bbb, aaaZ) symbol. (This notational device makes it simple to specify pushes and pops in a uniform way.) ⇒ (q2 , bb, aaZ) ⇒ (q2 , b, aZ) To simplify (the notation for) testing for empty stack, assume a marker symbol Z is initially on the stack. ⇒ (q2 , , Z) ⇒ (q3 , , ) The machine halts in the goal state with input exhausted, so the string is accepted. COMP 2600 — Push-down Automata 9 COMP 2600 — Push-down Automata 11 Example ctd Example ctd — Rejection PDA to recognise an bn : The string aaba should be rejected by the PDA: δ(q0 , a, Z) = q1 /aZ ··· push ﬁrst a (q0 , aaba, Z) ⇒ (q1 , aba, aZ) δ(q1 , a, a) = q1 /aa ··· push a’s ⇒ (q1 , ba, aaZ) δ(q1 , b, a) = q2 /ε ··· start popping a’s ⇒ (q2 , a, aZ) δ(q2 , b, a) = q2 / ε ··· pop a’s ⇒ ??? δ(q2 , , Z) = q3 /ε ··· accept No transition applies, and the PDA is “stuck” without reaching a goal state. COMP 2600 — Push-down Automata 10 COMP 2600 — Push-down Automata 12 Grammars and PDAs From CFG to PDA, ctd Theorem 3. Initialise the process by pushing S onto the stack. For start symbol S: δ(q0 , , Z) = q1 /SZ The class of languages recognised by PDA’s is exactly the class of context-free languages. 4. For termination, add the transition: We will only justify this result in one direction: for any CFG, there is a δ(q1 , , Z) = q2 /ε corresponding PDA. In general we get a non-deterministic PDA since there may be several This is the most interesting direction since it is the basis of automatically productions for each non-terminal. deriving parsers from grammars. Unfortunately, there is no algorithm for obtaining a deterministic PDA from a non-deterministic one. COMP 2600 — Push-down Automata 13 COMP 2600 — Push-down Automata 15 From CFG to PDA Example — Derive a PDA for a CFG The translation uses three states: q0 (initial), q1 (processing), q2 (goal). E → T | E +T 1. For all terminal symbols t , pop the stack if it matches the input: T → F | T ∗F δ(q1 ,t,t) = q1 /ε F → id | (E) 2. If a non-terminal is on top of stack, expand it to one of its right-hand 1. Match and pop terminals: sides. For all productions A → α: δ(q1 , +, +) = q1 /ε δ(q1 , , A) = q1 /α δ(q1 , ∗, ∗) = q1 /ε δ(q1 , id, id) = q1 /ε continued. . . δ(q1 , (, () = q1 /ε δ(q1 , ), )) = q1 /ε COMP 2600 — Push-down Automata 14 COMP 2600 — Push-down Automata 16 CFG to PDA ctd Example Parse, ctd 2. Expand non-terminals: Notes: δ(q1 , , E) = q1 /T • The parse was guided through the non-determinism (by me, the Oracle) δ(q1 , , E) = q1 /E + T to always make the correct choice towards a successful parse. δ(q1 , , T ) = q1 /F • In practical terms states q0 and q2 and the initialisation and termination δ(q1 , , T ) = q1 /T ∗ F transitions are unnecessary. δ(q1 , , F) = q1 /id • The stack always contains the unmatched part of the sentential form. δ(q1 , , F) = q1 /(E) 3,4. Initiate and terminate: δ(q0 , , Z) = q1 /EZ δ(q1 , , Z) = q2 /ε COMP 2600 — Push-down Automata 17 COMP 2600 — Push-down Automata 19 Example Parse A Context-Sensitive Language (q0 , id ∗ id, ) ⇒ (q1 , id ∗ id, E) Just for completeness, a brief look at context-sensitive languages. ⇒ (q1 , id ∗ id, T) The following language is not context-free: ⇒ (q1 , id ∗ id, T ∗ F) ⇒ (q1 , id ∗ id, F ∗ F) {an bn cn | n ∈ N} ⇒ (q1 , id ∗ id, id ∗ F) Intuitively, we can imagine a CFG generating either the ab pairs or the bc ⇒ (q1 , ∗id, ∗F) pairs, but this language requires us to keep the generation process in step, ⇒ (q1 , id, F) in two different points in the sentential forms. ⇒ (q1 , id, id) A context-sensitive grammar is on the next slide. Each production is of the ⇒ (q1 , , ) form ⇒ (q2 , , ) αAβ → αγβ ⇒ accept That is, A can go to γ provided it is in the context α β. COMP 2600 — Push-down Automata 18 COMP 2600 — Push-down Automata 20 CSGs and Automata S → aRc R → aRT | b The automata that recognise CSGs have a tape memory, of length bounded bT c → bbcc by a linear function of the length of the input . . . bT T → bbUT UT → UU UUc → VUc → V cc UV → VV bV c → bbcc bVV → bbWV WV → WW WW c → TW c → T cc WT → TT COMP 2600 — Push-down Automata 21 COMP 2600 — Push-down Automata 23 The trick is to use non-terminals as markers and to shift and convert them to tokens. For example: S → aRc → aaRT c → aaaRT T c → aaabT T c → aaabbUT c → aaabbUUc → aaabbVUc → aaabbV cc → aaabbbccc COMP 2600 — Push-down Automata 22