# ans by keralaguest

VIEWS: 2 PAGES: 16

• pg 1
```									CT206 Languages and their Implementation

Selected Revision Exercises

Sample Solutions
Lecture 3.

Exercises:

Q3.
varlexeme
id
comma
id
semicolon
beginlexeme
id
assign
const
semicolon
id
assign
id
semicolon
id
assign
const
const
semicolon
endlexeme
dot

4. a) NFA reflects exactly the construction of regular expressions, i.e. more complex NFA‟s are
built from simpler NFA‟s using language operators. Steps involved are:-

i) Separate expression into it‟s two alternatives: aa* and bb*
ii) NFA to recognise „a‟ is trivial:-

a
i               f

iii) NFA to recognise a* is


             a             
i                1             2             f


iv) NFA to recognise aa* is



a                             a           
i                0             1             2               3



(NFA for bb* to be added here as revision exercise)

v) Part of NFA to recognise whole expression shown below, rest to be added by as revision
exercise.



a                             a           
1             2             3               4           5
                                                                       

0                                                                               10
11


                                                                       

(Rest of above DFA to be added here as revision exercise)

b) The equivalent DFA is built using the subset construction algorithm described in the notes.
Consider the following argument:-

A transition table for an NFA would, in general, have an entry which represents a set of states,
whereas for a DFA an entry has a single state. To convert an NFA which recognises a given RE
into a DFA which recognises the same RE we see that each DFA state corresponds to a set of
NFA states.
For each example, consider a DFA in state d1 (corresponding to the set of NFA states n1, n2, n5,
n8). After reading an alphabet symbol, e.g. „a‟, the DFA moves into the state d2. This will
correspond to the set of NFA states that can be reached from n1, n2, n5 and n8.

The subset construction algorithm requires the following functions:-

-closure(s)
 -closure(T)
move(T, a)

(see notes)

Compute the initial state of the DFA which corresponds to the set of NFA states that can be
reached from the initial NFA state by -transitions

-closure(state 0) = {0, 1, 6}                  state A

This will be DFA state A

Compute the set of states that can be reached from the states in A on transitions on „a‟ and „b‟

-closure(move(A, a))

move(A, a) is the set of states reachable from the state sin A on „a‟ transitions alone, thus:

move(A, a) = {2}

-closure(move(A, a)) = {2, 3, 5, 11} state B

This will be state B.

(State C to be added here as revision exercise)

Note that because DFA states B and C contain NFA state 11 (the accepting or final state) then B
and C are accepting states of the DFA. The transition table thus far is:

a                b
A               B                C

Compute the transitions that occur when DFA is in state B and reads „a‟ and „b‟ and the same for
state C

move(B, a) = {4}
-closure(move(B, a)) = {4, 3, 5, 11} state D

move(B, b) = {}
-closure(move(B, a)) = {}

(State E to be added here as revision exercise)

move(D, a) = {4}
-closure({4}) = {4, 3, 5, 11}        state D

move(D, b) = {}
-closure(move(D, b)) = {}
move(E, a) = {}
-closure(move(E, a)) = {}

move(E, b) = {9}
-closure(E, b) = {9, 10, 11, 8}                state D

No new states have been added thus computations are finished.

(Full transition table to be added here as revision exercise)

The DFA is

a

10     a       10
B              D
a
b
A

b
10
C       b       10
E

The conversion algorithm does not necessarily give the optimum DFA, i.e. the DFA with the
minimum number of states.

(Reduced DFA to be added here as revision exercise)
Lecture 4.

Exercises:

1. a) An s-grammar is a type 2 grammar whose production right-hand‟s start with a terminal
symbol. The alternative productions for a non-terminal all start with different terminals and no
alternative is an empty string.

An LL(1) grammar is also a type 2 grammar but can contain production rules whose lkeft hand
sides start with a non-terminal symbol. The restrictions on LL1 grammars can be described by the
following general production

A ->  | 

The restrictions are:-

∆        For no terminal „a‟ do both  and  derive strings starting with „a‟
∆        At most, one of  and can derive the empty string

s-grammars are a proper subset of LL1 grammars

The grammar in the question is LL1 because the right-hand side of the production for „A‟ begins
with a non-terminal and the grammar is therefore not simple. To modift the grammar to be an
s-grammar:-

S -> “p” A “b”

A -> “c” “d” “e” B
|
“d” “e” “f”

B -> “e” A

The LL1 parsing table for this s-grammar can then easily be constructed, i.e. it has a row for each
non-terminal and a column for each terminal. The production rule for each non-terminal is palced
in the row for that non-terminal in the colum of the terminal which starts the string on the
right-hand side of the production rule. Thus, production rule 1 is placed in the row for S in the
column for P

b                c               d             e            f        p
S                                                                                   prod 1
A                               prod 2          prod 3
B                                                             prod 4

Using the LL1 parsing algorithm to parse the string

pcdeedfeb

gives:
input buffer     stack    rule
^pcdeedfeb\$      \$S       -
^pcdeedfeb\$      \$bAp     1
p^cdeedfeb\$      \$bA      -
p^cdeedfeb\$      \$bBedc   2
pc^deedfeb\$      \$bBed    -
pcd^eedfeb\$      \$bBe     -
pcde^edfeb\$      \$bB      -
pcde^edfeb\$      \$bAe     4
pcdee^dfeb\$      \$bA      -
pcdee^dfeb\$      \$befd    3
pcdeed^feb\$      \$bef     -
pcdeedf^eb\$      \$be      -
pcdeedfe^b\$      \$b       -
pcdeedfeb^\$      \$        -
pcdeedfeb\$^      empty    -

b) see notes on FIRST and FOLLOW sets

The FIRST and FOLLOW sets are used in the construction of an LL1 parser as follows:-

Given A ->  is a production with „a‟ in FIRST() then the parser will expand A by when the
current input lexeme is „a‟. A complication occurs when  ->  or  -> *, i.e. we should again
expand A by if the current symbol is in FOLLW(A) or if the \$ symbol has been reached and \$
is in FOLLOW(A)

c) by the rules given in the associated lecture

FIRST(A) by the part of rule 2 and the third production FIRST(A) = {w}

FIRST(S) by the first part of rule 2 and using the alternative of the first production, „a‟ will be in
FISRT(S). Using rule 3 and FIRST(A) „w‟ is also in FIRST(S) thus FIRST(S) = {a, w}

FOLLOW(S) by rule 1 for FOLLOW, = {\$}

FOLLOW(A) by rule 2 and using the first and second productions = {z, y}

FOLLOW(X) by rule 2 and the third production, „b‟ is in FOLLOW(X), by rule 3 and using first
production and FOLLOW(S) „\$‟ is in FOLLOW(X), thus, FOLLOW(X) = {b, \$}

Using the rules for parse table construction given in the lecture, the parse table is

a         b         w        x           y         z         \$
S        S->aX               s->aY
X                  X->b      X->Aza
A                            x->wXb

2. Top down parsing begins with the srat symbol and attempts to derive the input string by the
expansion of non-terminals, i.e. by substituting the right hand side of the the production rule for
it‟s non-terminal. AT each step, the choice of production to ally is made solely on the current
symbol
To parse

abbcde

input          derivation    rule
^abbcde               S                P1
a^bbcde               a^AbcBe                 P2 first alternative
ab^bcde               ab^bcBe                 P2 second alternative
abb^cde               abb^cBe                 none
abbc^de               abbc^Be                 none

abbcd^e                abbcd^e                P1
abbcde^                abbcde^                none

Bottom-up parsing starts with the input string and attempts to recognise substrings within the
sentence which are the right-hand sides of production rules (handles). When recognised,
substrings are replaced by the non-terminal on the left-hand side of the production rule. Common
practice is to add a „\$‟ to the end of the string.

To parse:

abbcde\$

input          rule
^abbcde\$
a^bbcde\$
ab^bcde\$
a^Abcde\$       P2 second alternative
aAb^cde\$
aAbc^de\$
aA^de\$
aA^Be\$         P3
aAB^e\$
aABe^\$
^S\$            P1
S^\$            finished

3. Eliminate the left-recursion in the production for „decl‟

decl -> vname P
P -> “,” vname P | 

Transformed grammar is then:-

production 1           prog -> “var” decl “;” stmt “.”
production 2           decl -> vname P
production 3           P -> “,” vname P
production 4           P -> 
production 5           stmt -> vname “:=“ digit
productions 6, 7, 8    vname -> “x” | “y” | “z”
productions 9, 10      digit -> “0” | “1”
FIRST(“var” decl “;” stmt “.”) = {“var”}
FIRST(vname P) = {“x”, “y”, “z”}

(Rest of FIRST sets to be added here as revision exercise)

FOLLOW(prog) = {\$}
FOLLOW(decl) = {“;”}
FOLLOW(stmt) = {“.”}

“var”      “,”   “;”    “.”   “:=“   “x”   “y”    “z”   “0”   “1”   \$
prog           1
decl                                                2     2
stmt                                                5     5
vnam                                                6     7
e
P                         3     4

parse of string:-

var x, y; x:= 0.

input buffer              stack                           rule
^var x, y; x:=      0.\$   \$S                              1
^var x, y; x:=      0.\$   \$S.stmt;    decl var
var^ x, y; x:=      0.\$   \$S.stmt;    P vname             2
var^ x, y; x:=      0.\$   \$S.stmt;    P x
var x^, y; x:=      0.\$   \$S.stmt;    P
var x^, y; x:=      0.\$   \$S.stmt;    P vname ,           3

(Rest of parse to be added here as revision exercise)

4. FIRST() is the set of terminals which can start strings derivable from 
FOLLOW(A) is the set of terminals which can immediately follow A in a sentential form

FIRST(Aa) = {b, a, d}
FIRST(bB) = {b}
FIRST(Cd) = {a, d}
FISRT(ef) = {e}
FIRST(aa) = {a}
FIRST() = {}

FOLLOW(A) = {a}
FOLLOW(S) = empty
FOLLOW(B) = {a}
FOLLOW(C) = {d}

PROGRAM parse;

TYPE ...;

VAR ...lexeme: lex_kind;

PROCEDURE get_lexeme(...);
BEGIN
...
END;

PROCEDURE check(...);
BEGIN
...
END;

PROCEDURE rec_S;
BEGIN
rec_A;
check(a)
END;

PROCEDURE rec_A;

PROCEDURE rec_B;
BEGIN
check(e);
get_lexeme;
check(f)
END;

PROCEDURE rec_C;
BEGIN
check(a);
get_lexeme;
check(a);
get_lexeme
END;

BEGIN
CASE lexeme OF
a: BEGIN
rec_C;
check(d);
get_lexeme
END;
b: BEGIN
check(b);
get_lexeme
END;
d: BEGIN
check(d);
get_lexeme
END
END
END;

BEGIN
get_lexeme;
rec_S
END.

Procedure calls when parsing string:-

da

get_lexeme           (lexeme = d)
call rec_S
call rec_A
check(s)      (ok)
gete_lexeme   (lexeme = a)
check(a)        (ok)
In general for a production A ->  |  to be parseable by LL1 methods e.g. recursive descent the
following conditions must hold:-

1. FIRST() and FIRST() must be disjoint
2. Only one of  and  may derive 
3. IF  ->  THEN FIRST() and FOLLOW(A) must be disjoint

For the left recursive production FIRST() is a subset of FIRST(X) and thus the first sets for
alternative productions for the same non-terminal are not disjoint. (condition 1 is violated)

IF  => then FIRST(X) contains FIRST() and we know FOLLOW(X) contains FIRST()
thus FIRST(X) and FOLLOW(X) are not disjoint since they both contain FIRST(). (condition 3
is violated)

5.

get_lexeme                                                      lexeme = varlexeme
prog
check(varlexeme)                                              ok
get_lexeme                                                    lexeme = id (x)
idlist
check(id)                                                   ok
get_lexeme                                                          lexeme = comma
new_idlist
check(comma)                                              ok
get_lexeme                                                lexeme = id (y)
check(id)                                                 ok
get_lexeme                                                     lexeme = semicolon
new_idlist
return from new_idlist                               semi colon is in FOLLOW
set of new_idlist control
returns from new_idlist
with no procedure calls
return from new_idlist
return from idlist
check(semicolon)                                           ok
get_lexeme                                                 lexeme = beginlexeme
block
check(beginlexeme)                                       ok
.
.
.
.
newstatemenetlist
check(semicolon)                                 ok
get_lexeme                               lexeme = endlexeme
statement
SYNTAX ERROR: identifier, IF or WHILE expected

Note how the precise nature of the error can be stated in the „error message‟.

The parse tree for this program is far too big to draw. The significance of the parse tree is that its
structure is exactly equivalkent to the call and return structure of the parser, i.e. a call to a parsing
procedure is equivalent to the expansion of a non-terminal and the creation of a sub-tree of the
parse tree. Calls to check correspond to the construction of the leaft nodes which denote
terminals.

6. a) and b) see notes

c ) The details of parse table construction are given in the lecture notes. A reasonable answer
should explain how the algorithm ensures that the correct parsing „decisions‟ are made on the
basis of a single token „look ahead‟ even when the parse involves complex cases, i.e. when the
RHS of alternative productions begin tith non-terminal symbols (hence the need to consider the
FIRST sets of theRH sides) and also where productions have empty alternatives (and hence the
need to consider the FOLLOS sets).

With reference to the position of productions in the table, consider SLIST -> SX. The FIRST set
of the RHS = {“const”, “ident”) thus whenat this point in the parse the expansion of SLIST via
the grammar indicates through the FIST set of the RHS, that the next token should be a “const”
or an “ident”. Indexing into the table with [S, “const] or [S, “ident”] gives the correct production
to apply.

Consider also X, i.e. on expansion the next token is “]” and the parse table indicates that X
should be rewritten by X->  since in the cases where the empty alternative has been used the
next input token according to the grammar should be a member of the FOLLOWS set of X.

Similar arguments can be used to develop each entry in the table.

FIRST(S) = {“const”, “ident”}
FIRST(“^” “ident”) = {“^”}
FIRST(“ARRAY” “[“ STLIST “]” “OF” T) = {“ARRAY”}
FIRST(“ident”) = {“ident”}
FIRST(“const”) = {“const”}
FIRST(S X) = {“const”, “ident”}
FIRST(“,” S X) = {“,”}
FIRST() = }

FOLLOW(T) = {\$}
FOLLOW(SLIST) = {“]”}
FOLLOW(X) = {“]”}         by rule 6 i.e. anything following X must also follow
SLIST
FOLLOW(S) = {\$, “,”, “]”} FOLLOW(S) must include FIRST(X). If X->  THEN
FOLLOW(S) must include FOLLOW(SLIST)

d)

Input                                         Stack                              Rule
ARRAY[IDENT1, IDENT2] OF ^IDENT3\$                    \$T
ARRAY[IDENT1, IDENT2] OF ^IDENT3\$                    \$T “OF” “]” SLIST “[“ “ARRAY”          3
ARRAY [IDENT1, IDENT2] OF ^IDENT3\$                   \$T “OF” “]” SLIST “[“
ARRAY[IDENT1, IDENT2] OF ^IDENT3\$                  \$T “OF” “]” SLIST
ARRAY[IDENT1, IDENT2] OF ^IDENT3\$                  \$T “OF” “]” X S                       6
ARRAY[IDENT1, IDENT2] OF ^IDENT3\$                  \$T “OF” “]” X “ident”
4
ARRAY[IDENT1, IDENT2] OF ^IDENT3\$                  \$T   “OF”   “]”   X
ARRAY[IDENT1, IDENT2] OF ^IDENT3\$                  \$T   “OF”   “]”   X S “,”             7
ARRAY[IDENT1, IDENT2] OF ^IDENT3\$                  \$T   “OF”   “]”   X S
ARRAY[IDENT1, IDENT2] OF ^IDENT3\$                  \$T   “OF”   “]”   X “ident”
4
ARRAY[IDENT1, IDENT2] OF ^IDENT3\$                  \$T “OF” “]” X
ARRAY[IDENT1, IDENT2]OF ^IDENT3\$                  \$T “OF” “]”                           8
ARRAY[IDENT1, IDENT2]OF ^IDENT3\$                  \$T “OF”

(Rest of parse to be added here as revision exercise)
Lecture 5. (New exercise)

In the Pascal implementation ord is assumed to return the ASCII value of a char. In the „C‟
implementation it is assumed that the numerical value of a variable of type char is the ASCII
code.

DECLARATIONS

CONST max_range = 64;
max_name = 10;

TYPE   hash_range   =   1..max_range;
name_range   =   1..max_name;
name         =   ARRAY[name_range];
reference    =   ^entry;
entry        =   RECORD
value: name;
next : reference
END;

table        = ARRAY[hash_range] OF reference;

FUNCTION hash(id: name; length: name_range): hash_range;
VAR i : name_range;
sum: natural;
BEGIN
sum:=0;
FOR i:=1 TO length DO sum:=sum + ord(id[i]);
hash:= sum MOD max_range
END;
Lecture 6:

getlexme                                              lexeme = varlexeme
prog
check(varlexeme)                                    ok
getlexeme                                           lexeme = id (i.e. „x‟)
idlist
check(id)                                      ok
sematic check for
redeclaration (checkdec)
getlexeme                                       lexeme = comma
newidlist
check(comma)                                   ok

etc,
etc,
etc.

A semantic error should be identified (much further on!!!!) i.e.

statement
check(id)                                  ok
semantic check for
declaration (checkiduse)
SEMANTIC ERROR ‘z’
NOT DECLARED

Lecture 8:

sort

quicksort
d[0]

partition             d[1]

d[2]
exchange
static
chain
Dynamic       chain

display

Note: „exchange‟s stack frame will contain the „old‟ value of d[1] which points to „quicksort‟s
stack frame. This will be restored when exchange ceases to execute.

```
To top