Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>

Note that SL is case sensitive by d9n1aQO

VIEWS: 7 PAGES: 7

									CS 4303 Programming Language Concepts

Project 3 – Recursive Descent Parsing

The purpose of this programming assignment is to write a syntactic analyzer for a small language
called SL. The grammar of SL, which defines variable declarations, arithmetic expressions, and
assignments in C syntax, is given by the following BNF definitions.

<unit>            ::=    <declaration-list> <assign-sequence>
<declaration-list> ::=   <declaration-list> <declaration> | <empty>
<declaration>     ::=    <type> <var> ’;’ | <type> <var> ’=’ <number> ’;’
<type>            ::=    int | float
<var>             ::=    A|B|C|D|E
<assign-sequence> ::=    <assign-sequence> <assignment> | <empty>
<assignment>      ::=    <var> ‘=’ <expr> ‘;’
<expr>            ::=    <expr> * <term> | <expr> / <termm> | <term>
<term>            ::=    <term> + <atom> | <term> - <atom> | <atom>
<atom>            ::=    ‘(‘ <expr> ‘)’ | <var> | <number>
<number>          ::=    <integer> | <float>
<integer>         ::=    <integer> <digit> | <digit>
<digit>           ::=    0|1|2|3|4|5|6|7|8|9
<float>           ::=    <integer> ‘.’ <integer>

Your program will read a program written in SL and print the value of each variable. Note that SL
is case sensitive. To receive full credit, write the program with good logic, well indented and well
documented. Turn in a copy of your program and a copy of the data files used to test your program.
Sample inputs are given in the following.

Requirements:
    Write a C/C++ program and use Visual C++ to test your program.
    Your program must read the input from a file, one input program one file. Your program
       should prompt the user to enter the file name and path. Make your program repeat to read
       next file until the user indicates termination.
    All output must be printed on screen.
    If a variable is not assigned any value, either in the declaration or in a previous assignment,
       its value is undefined and your program should print “Undefined” for its value.
    We do not handle syntax errors in this parser. In other words, we assume the input does not
       have syntax errors. If a syntax error is encountered, your program should simply terminate
       with an error message.
    All variables used in the input program must be declared. If an undeclared variable is
       encountered in an assignment statement, an error message should be printed and the
       program terminates.
    A variable cannot be re-declared. If so, the program prints an error message and terminates.
    All four operator, viz., +, -, *, and /, are left associative.
    If a division-by-zero occurs, print an error message and terminate.
    Automatic type conversion must be performed properly. The following rules should apply:
          o During the evaluation of an expression, if one operant is an int and the other float,
            int is converted to float.
          o If the type of the result of the evaluation of the expression in an assignment
            disagrees with the type of the right hand side variable, the result of the evaluation is
            converted to the type of the variable.

Sample input 1:

       int A = 0; int B;
       B = 1;
       A = A + B;

       Output:

       A = 1
       B = 1

Sample input 2:

       int B; float A = 2.5; int C = 2;
       B = 1;
       A = (A – B) * C + A;

       Output:

       A = 5.5
       B = 1

Sample input 3:

       int B; float A = 2.5; int C = 2;
       B = 1 / (C – 1);
       A = (A – B) * C / B + A;

       Output:

       A = 5.5
       B = 1
       C = 2

Sample input 4:

       int B; float A = 2.5; int C = 2;
       B = 1 / C;
       A = (A – B) * C / B + A;

       Output:
       Division-by-zero error

Sample input 5:

       int B; float A; int C = 2;
       B = 1;
       A = (A – B) * C + A;

       Output:

       A is undefined

Sample input 6:

       int B; float A = 2.5;
       B = 1;
       A = (A – B) * C + A;

       Output:

       C is undeclared

Sample input 7:

       int B; float A = 2.5; int B = 2;
       B = 1;
       A = (A – B) * B + A;

       Output:

       B cannot be re-declared


Hints: The easiest way to do this assignment is the following:

Step 1: Remove left recursion in the grammar. The above grammar contains left recursive
productions with the left hand side non-terminals <declaration-list>, <assign-sequence>, <expr>,
<term>, and <integer>, respectively.

It can be rewritten into EBNF (figure out what it is.) and then you write the recursive descent
recognizer for the EBNF.

Step 2: Write a recursive descent parser for each of the productions.

In this step, however, we should consider how to separate lexical analysis from syntactic analysis.
Without a clear cut-off between what lexical analyzer (scanner) and what syntactic analyzer
(parser) do, the design of the parser can be easily messed up. For example, the recursive descent
recognizer of <declaration> may end up with a very complicated control structure to deal with the
problem in reading and recognizing key words “int” and “float”.

To make the design of the recursive descent recognizers clear, function “GetToken” should
recognize all tokens that are used by the language. With the use of a well-designed “GetToken”, the
recursive descent recognizers need only analyze syntactic structures.

For language SL, the tokens used are in the following (separated by comas):

;, =, int, float, A, B, C, D, E, (, ), *, /, +, -, <number>

where <number>s are regarded as a token although they are defined by the grammar. This will ease
the design of the parser.

An example code for “GetToken” can be found at the end of this file. “GetToken” assumes C string
“Token” is defined globally. “in” is the global input stream open for the input file. Function
“GetToken” reads in input programs line by line (The maximum length of a line is defined by
global constant “MAX_LINE_LEN”), stores the input line in a global buffer called “line”, and
extract tokens from “line” buffer. Whenever it reaches the end of line, and the read pointer of the
file does not reach the end of the file, “GetToken” reads in the next line; otherwise, does nothing.

void GetToken() {
       int tokenp = 0;
       bool gotit = false;
       bsize = strlen[line];

        while (!gotit && ! bsize) {
                while (linep < MAX_LINE_LEN && line[linep] != ‘\n’ &&
                        (line[linep] ==’ ‘ || line[linep] ==’\t’)) {
                        // strip off leading spaces
                        // linep is the pointer to “line” buffer,
                        // and it is defined as a global variable
                        linep++;

                if (linep >= MAX_LINE_LEN || line[linep] != ‘\n’) {
                         // Read in a new line
                         bsize = in.getline(line, MAX_LINE_LEN);
                         linep = 0;
                }else {
                         Token[tokenp] = line[linep]; linep++;
                         switch (Token[tokenp]) {
                                 case ‘;’:
                                 case ‘A’:
                                 case ‘B’:
case ‘C’:
case ‘D’:
case ‘E’:
case ‘(‘:
case ‘)’:
case ‘*’:
case ‘/’:
case ‘+’:
case ‘-’: Token[++tokenp] = ‘\0’; break;
case ‘=’:
        if (linep>=MAX_LINE_LEN)
                 // return “=”
                 Token[++tokenp] = ‘\0’;
        else if (line[linep]==’=’) {
                 // return “==”
                 tokenp++;
                 Token[tokenp] = line[linep];
                 linep++;
                 Token[++tokenp] = ‘\0’;
        } else
                 // return “=”
                 Token[++tokenp] = ‘\0’;
        break;
case ‘i’:
        if (linep>=MAX_LINE_LEN)
                 // error, return whatever we have
                 Token[++tokenp] = ‘\0’;
        else if (line[linep]==’f’) {
                 // return “if”
                 tokenp++;
                 Token[tokenp] = line[linep];
                 linep++;
                 Token[++tokenp] = ‘\0’;
        } else if (line[linep]==’n’) {
                 // expect an “int”
                 tokenp++;
                 Token[tokenp] = line[linep];
                 linep++;
                 if (linep>=MAX_LINE_LEN)
                          // error, return whatever we have
                          Token[++tokenp] = ‘\0’;
                 else if (line[linep]==’t’) {
                          // return “int”
                          tokenp++;
                          Token[tokenp] = line[linep];
                         linep++;
                         Token[++tokenp] = ‘\0’;
                } else
                         // error, return whatever we have
                         Token[++tokenp] = ‘\0’;
       } else
                // error, return whatever we have
                Token[++tokenp] = ‘\0’;
        break;
case ‘f’:
        // expect a “float”
        // codes are omitted here
        break;
case ‘e’:
        // expect an “else”
        // codes are omitted here
        break;
case ‘w’:
        // expect a “while”
        // codes are omitted here
        break;
case ‘!’:
        // expect a “!=”
        // codes are omitted here
        break;
case ‘0’:
case ‘1’:
case ‘2’:
case ‘3’:
case ‘4’:
case ‘5’:
case ‘6’:
case ‘7’:
case ‘8’:
case ‘9’:
        // read in a number
        while (linep < MAX_LINE_LEN &&
                 line[linep] >= ‘0’ && line[linep] <= ‘9’) {
                 tokenp++;
                 Token[tokenp] = line[linep];
                 linep++;
        }
        if (linep < MAX_LINE_LEN && line[linep] == ‘.’)
        {        // a floating point number
                 tokenp++;
                                  Token[tokenp] = line[linep];
                                  linep++;
                                  while (linep < MAX_LINE_LEN &&
                                          line[linep] >= ‘0’ &&
                                          line[linep] <= ‘9’) {
                                          tokenp++;
                                          Token[tokenp] = line[linep];
                                          linep++;
                                  }
                            }
                            Tokenp[++tokenp] = ‘\0’;
            }
            gotit = true;
        }
    }
}

								
To top