VIEWS: 2 PAGES: 16 POSTED ON: 9/15/2011
CT206 Languages and their Implementation Selected Revision Exercises Sample Solutions Lecture 3. Exercises: Q3. varlexeme id comma id semicolon beginlexeme id assign const semicolon id assign id semicolon id assign const addlexeme const semicolon endlexeme dot 4. a) NFA reflects exactly the construction of regular expressions, i.e. more complex NFA‟s are built from simpler NFA‟s using language operators. Steps involved are:- i) Separate expression into it‟s two alternatives: aa* and bb* ii) NFA to recognise „a‟ is trivial:- a i f iii) NFA to recognise a* is a i 1 2 f iv) NFA to recognise aa* is a a i 0 1 2 3 (NFA for bb* to be added here as revision exercise) v) Part of NFA to recognise whole expression shown below, rest to be added by as revision exercise. a a 1 2 3 4 5 0 10 11 (Rest of above DFA to be added here as revision exercise) b) The equivalent DFA is built using the subset construction algorithm described in the notes. Consider the following argument:- A transition table for an NFA would, in general, have an entry which represents a set of states, whereas for a DFA an entry has a single state. To convert an NFA which recognises a given RE into a DFA which recognises the same RE we see that each DFA state corresponds to a set of NFA states. For each example, consider a DFA in state d1 (corresponding to the set of NFA states n1, n2, n5, n8). After reading an alphabet symbol, e.g. „a‟, the DFA moves into the state d2. This will correspond to the set of NFA states that can be reached from n1, n2, n5 and n8. The subset construction algorithm requires the following functions:- -closure(s) -closure(T) move(T, a) (see notes) Compute the initial state of the DFA which corresponds to the set of NFA states that can be reached from the initial NFA state by -transitions -closure(state 0) = {0, 1, 6} state A This will be DFA state A Compute the set of states that can be reached from the states in A on transitions on „a‟ and „b‟ -closure(move(A, a)) move(A, a) is the set of states reachable from the state sin A on „a‟ transitions alone, thus: move(A, a) = {2} -closure(move(A, a)) = {2, 3, 5, 11} state B This will be state B. (State C to be added here as revision exercise) Note that because DFA states B and C contain NFA state 11 (the accepting or final state) then B and C are accepting states of the DFA. The transition table thus far is: a b A B C Compute the transitions that occur when DFA is in state B and reads „a‟ and „b‟ and the same for state C move(B, a) = {4} -closure(move(B, a)) = {4, 3, 5, 11} state D move(B, b) = {} -closure(move(B, a)) = {} (State E to be added here as revision exercise) move(D, a) = {4} -closure({4}) = {4, 3, 5, 11} state D move(D, b) = {} -closure(move(D, b)) = {} move(E, a) = {} -closure(move(E, a)) = {} move(E, b) = {9} -closure(E, b) = {9, 10, 11, 8} state D No new states have been added thus computations are finished. (Full transition table to be added here as revision exercise) The DFA is a 10 a 10 B D a b A b 10 C b 10 E The conversion algorithm does not necessarily give the optimum DFA, i.e. the DFA with the minimum number of states. (Reduced DFA to be added here as revision exercise) Lecture 4. Exercises: 1. a) An s-grammar is a type 2 grammar whose production right-hand‟s start with a terminal symbol. The alternative productions for a non-terminal all start with different terminals and no alternative is an empty string. An LL(1) grammar is also a type 2 grammar but can contain production rules whose lkeft hand sides start with a non-terminal symbol. The restrictions on LL1 grammars can be described by the following general production A -> | The restrictions are:- ∆ For no terminal „a‟ do both and derive strings starting with „a‟ ∆ At most, one of and can derive the empty string s-grammars are a proper subset of LL1 grammars The grammar in the question is LL1 because the right-hand side of the production for „A‟ begins with a non-terminal and the grammar is therefore not simple. To modift the grammar to be an s-grammar:- S -> “p” A “b” A -> “c” “d” “e” B | “d” “e” “f” B -> “e” A The LL1 parsing table for this s-grammar can then easily be constructed, i.e. it has a row for each non-terminal and a column for each terminal. The production rule for each non-terminal is palced in the row for that non-terminal in the colum of the terminal which starts the string on the right-hand side of the production rule. Thus, production rule 1 is placed in the row for S in the column for P b c d e f p S prod 1 A prod 2 prod 3 B prod 4 Using the LL1 parsing algorithm to parse the string pcdeedfeb gives: input buffer stack rule ^pcdeedfeb$ $S - ^pcdeedfeb$ $bAp 1 p^cdeedfeb$ $bA - p^cdeedfeb$ $bBedc 2 pc^deedfeb$ $bBed - pcd^eedfeb$ $bBe - pcde^edfeb$ $bB - pcde^edfeb$ $bAe 4 pcdee^dfeb$ $bA - pcdee^dfeb$ $befd 3 pcdeed^feb$ $bef - pcdeedf^eb$ $be - pcdeedfe^b$ $b - pcdeedfeb^$ $ - pcdeedfeb$^ empty - b) see notes on FIRST and FOLLOW sets The FIRST and FOLLOW sets are used in the construction of an LL1 parser as follows:- Given A -> is a production with „a‟ in FIRST() then the parser will expand A by when the current input lexeme is „a‟. A complication occurs when -> or -> *, i.e. we should again expand A by if the current symbol is in FOLLW(A) or if the $ symbol has been reached and $ is in FOLLOW(A) c) by the rules given in the associated lecture FIRST(A) by the part of rule 2 and the third production FIRST(A) = {w} FIRST(S) by the first part of rule 2 and using the alternative of the first production, „a‟ will be in FISRT(S). Using rule 3 and FIRST(A) „w‟ is also in FIRST(S) thus FIRST(S) = {a, w} FOLLOW(S) by rule 1 for FOLLOW, = {$} FOLLOW(A) by rule 2 and using the first and second productions = {z, y} FOLLOW(X) by rule 2 and the third production, „b‟ is in FOLLOW(X), by rule 3 and using first production and FOLLOW(S) „$‟ is in FOLLOW(X), thus, FOLLOW(X) = {b, $} Using the rules for parse table construction given in the lecture, the parse table is a b w x y z $ S S->aX s->aY X X->b X->Aza A x->wXb 2. Top down parsing begins with the srat symbol and attempts to derive the input string by the expansion of non-terminals, i.e. by substituting the right hand side of the the production rule for it‟s non-terminal. AT each step, the choice of production to ally is made solely on the current symbol To parse abbcde input derivation rule ^abbcde S P1 a^bbcde a^AbcBe P2 first alternative ab^bcde ab^bcBe P2 second alternative abb^cde abb^cBe none abbc^de abbc^Be none abbcd^e abbcd^e P1 abbcde^ abbcde^ none Bottom-up parsing starts with the input string and attempts to recognise substrings within the sentence which are the right-hand sides of production rules (handles). When recognised, substrings are replaced by the non-terminal on the left-hand side of the production rule. Common practice is to add a „$‟ to the end of the string. To parse: abbcde$ input rule ^abbcde$ a^bbcde$ ab^bcde$ a^Abcde$ P2 second alternative aAb^cde$ aAbc^de$ a^Ade$ P2 first alternative aA^de$ aAd^e$ aA^Be$ P3 aAB^e$ aABe^$ ^S$ P1 S^$ finished 3. Eliminate the left-recursion in the production for „decl‟ decl -> vname P P -> “,” vname P | Transformed grammar is then:- production 1 prog -> “var” decl “;” stmt “.” production 2 decl -> vname P production 3 P -> “,” vname P production 4 P -> production 5 stmt -> vname “:=“ digit productions 6, 7, 8 vname -> “x” | “y” | “z” productions 9, 10 digit -> “0” | “1” FIRST(“var” decl “;” stmt “.”) = {“var”} FIRST(vname P) = {“x”, “y”, “z”} (Rest of FIRST sets to be added here as revision exercise) FOLLOW(prog) = {$} FOLLOW(decl) = {“;”} FOLLOW(stmt) = {“.”} (Rest of FOLLOW sets to be added here as revision exercise) “var” “,” “;” “.” “:=“ “x” “y” “z” “0” “1” $ prog 1 decl 2 2 stmt 5 5 vnam 6 7 e P 3 4 parse of string:- var x, y; x:= 0. input buffer stack rule ^var x, y; x:= 0.$ $S 1 ^var x, y; x:= 0.$ $S.stmt; decl var var^ x, y; x:= 0.$ $S.stmt; P vname 2 var^ x, y; x:= 0.$ $S.stmt; P x var x^, y; x:= 0.$ $S.stmt; P var x^, y; x:= 0.$ $S.stmt; P vname , 3 (Rest of parse to be added here as revision exercise) 4. FIRST() is the set of terminals which can start strings derivable from FOLLOW(A) is the set of terminals which can immediately follow A in a sentential form FIRST(Aa) = {b, a, d} FIRST(bB) = {b} FIRST(Cd) = {a, d} FISRT(ef) = {e} FIRST(aa) = {a} FIRST() = {} FOLLOW(A) = {a} FOLLOW(S) = empty FOLLOW(B) = {a} FOLLOW(C) = {d} PROGRAM parse; TYPE ...; VAR ...lexeme: lex_kind; PROCEDURE get_lexeme(...); BEGIN ... END; PROCEDURE check(...); BEGIN ... END; PROCEDURE rec_S; BEGIN rec_A; check(a) END; PROCEDURE rec_A; PROCEDURE rec_B; BEGIN check(e); get_lexeme; check(f) END; PROCEDURE rec_C; BEGIN check(a); get_lexeme; check(a); get_lexeme END; BEGIN CASE lexeme OF a: BEGIN rec_C; check(d); get_lexeme END; b: BEGIN check(b); get_lexeme END; d: BEGIN check(d); get_lexeme END END END; BEGIN get_lexeme; rec_S END. Procedure calls when parsing string:- da get_lexeme (lexeme = d) call rec_S call rec_A check(s) (ok) gete_lexeme (lexeme = a) return to rec_S check(a) (ok) return to program body In general for a production A -> | to be parseable by LL1 methods e.g. recursive descent the following conditions must hold:- 1. FIRST() and FIRST() must be disjoint 2. Only one of and may derive 3. IF -> THEN FIRST() and FOLLOW(A) must be disjoint For the left recursive production FIRST() is a subset of FIRST(X) and thus the first sets for alternative productions for the same non-terminal are not disjoint. (condition 1 is violated) IF => then FIRST(X) contains FIRST() and we know FOLLOW(X) contains FIRST() thus FIRST(X) and FOLLOW(X) are not disjoint since they both contain FIRST(). (condition 3 is violated) 5. Procedure calls made by compiler Comments get_lexeme lexeme = varlexeme prog check(varlexeme) ok get_lexeme lexeme = id (x) idlist check(id) ok get_lexeme lexeme = comma new_idlist check(comma) ok get_lexeme lexeme = id (y) check(id) ok get_lexeme lexeme = semicolon new_idlist return from new_idlist semi colon is in FOLLOW set of new_idlist control returns from new_idlist with no procedure calls return from new_idlist return from idlist check(semicolon) ok get_lexeme lexeme = beginlexeme block check(beginlexeme) ok . . . . newstatemenetlist check(semicolon) ok get_lexeme lexeme = endlexeme statement SYNTAX ERROR: identifier, IF or WHILE expected Note how the precise nature of the error can be stated in the „error message‟. The parse tree for this program is far too big to draw. The significance of the parse tree is that its structure is exactly equivalkent to the call and return structure of the parser, i.e. a call to a parsing procedure is equivalent to the expansion of a non-terminal and the creation of a sub-tree of the parse tree. Calls to check correspond to the construction of the leaft nodes which denote terminals. 6. a) and b) see notes c ) The details of parse table construction are given in the lecture notes. A reasonable answer should explain how the algorithm ensures that the correct parsing „decisions‟ are made on the basis of a single token „look ahead‟ even when the parse involves complex cases, i.e. when the RHS of alternative productions begin tith non-terminal symbols (hence the need to consider the FIRST sets of theRH sides) and also where productions have empty alternatives (and hence the need to consider the FOLLOS sets). With reference to the position of productions in the table, consider SLIST -> SX. The FIRST set of the RHS = {“const”, “ident”) thus whenat this point in the parse the expansion of SLIST via the grammar indicates through the FIST set of the RHS, that the next token should be a “const” or an “ident”. Indexing into the table with [S, “const] or [S, “ident”] gives the correct production to apply. Consider also X, i.e. on expansion the next token is “]” and the parse table indicates that X should be rewritten by X-> since in the cases where the empty alternative has been used the next input token according to the grammar should be a member of the FOLLOWS set of X. Similar arguments can be used to develop each entry in the table. FIRST(S) = {“const”, “ident”} FIRST(“^” “ident”) = {“^”} FIRST(“ARRAY” “[“ STLIST “]” “OF” T) = {“ARRAY”} FIRST(“ident”) = {“ident”} FIRST(“const”) = {“const”} FIRST(S X) = {“const”, “ident”} FIRST(“,” S X) = {“,”} FIRST() = } FOLLOW(T) = {$} FOLLOW(SLIST) = {“]”} FOLLOW(X) = {“]”} by rule 6 i.e. anything following X must also follow SLIST FOLLOW(S) = {$, “,”, “]”} FOLLOW(S) must include FIRST(X). If X-> THEN FOLLOW(S) must include FOLLOW(SLIST) d) Input Stack Rule ARRAY[IDENT1, IDENT2] OF ^IDENT3$ $T ARRAY[IDENT1, IDENT2] OF ^IDENT3$ $T “OF” “]” SLIST “[“ “ARRAY” 3 ARRAY [IDENT1, IDENT2] OF ^IDENT3$ $T “OF” “]” SLIST “[“ ARRAY[IDENT1, IDENT2] OF ^IDENT3$ $T “OF” “]” SLIST ARRAY[IDENT1, IDENT2] OF ^IDENT3$ $T “OF” “]” X S 6 ARRAY[IDENT1, IDENT2] OF ^IDENT3$ $T “OF” “]” X “ident” 4 ARRAY[IDENT1, IDENT2] OF ^IDENT3$ $T “OF” “]” X ARRAY[IDENT1, IDENT2] OF ^IDENT3$ $T “OF” “]” X S “,” 7 ARRAY[IDENT1, IDENT2] OF ^IDENT3$ $T “OF” “]” X S ARRAY[IDENT1, IDENT2] OF ^IDENT3$ $T “OF” “]” X “ident” 4 ARRAY[IDENT1, IDENT2] OF ^IDENT3$ $T “OF” “]” X ARRAY[IDENT1, IDENT2]OF ^IDENT3$ $T “OF” “]” 8 ARRAY[IDENT1, IDENT2]OF ^IDENT3$ $T “OF” (Rest of parse to be added here as revision exercise) Lecture 5. (New exercise) In the Pascal implementation ord is assumed to return the ASCII value of a char. In the „C‟ implementation it is assumed that the numerical value of a variable of type char is the ASCII code. DECLARATIONS CONST max_range = 64; max_name = 10; TYPE hash_range = 1..max_range; name_range = 1..max_name; name = ARRAY[name_range]; reference = ^entry; entry = RECORD value: name; next : reference END; table = ARRAY[hash_range] OF reference; FUNCTION hash(id: name; length: name_range): hash_range; VAR i : name_range; sum: natural; BEGIN sum:=0; FOR i:=1 TO length DO sum:=sum + ord(id[i]); hash:= sum MOD max_range END; Lecture 6: Procedure calls made Comments getlexme lexeme = varlexeme prog check(varlexeme) ok getlexeme lexeme = id (i.e. „x‟) „x‟ added to symbol table idlist check(id) ok sematic check for redeclaration (checkdec) getlexeme lexeme = comma newidlist check(comma) ok etc, etc, etc. A semantic error should be identified (much further on!!!!) i.e. statement check(id) ok semantic check for declaration (checkiduse) SEMANTIC ERROR ‘z’ NOT DECLARED Lecture 8: sort quicksort d[0] partition d[1] d[2] exchange static chain Dynamic chain display Note: „exchange‟s stack frame will contain the „old‟ value of d[1] which points to „quicksort‟s stack frame. This will be restored when exchange ceases to execute.