top down parsers

Document Sample
top down parsers Powered By Docstoc
					CIS324: Language Design and Implementation



                     A Simple Language Compiler III
1. Basic Approaches to Parsing
Parsing is the process of determining if a given string of tokens can be
generated by a grammar.

Parsers as computer tools are constructed from grammars. Given a
programming language grammar we aim to build parsers that operate fast.
The most widely used parsers can be classified into two groups:
 - top-down parsers: build the parse tree from the root down to the leaves
 - bottom-up parsers: build the parse tree from the leaves up to the root
1.1 Top - Down Parsing
A top-down parser works according to the following algorithm:
    - install the root node
    - while scanning the input do
         - being at a node labeled with a nonterminal select one
           of the productions for this nonterminal and construct children
           nodes for all the symbols in the right hand side of the production
         - select the next node from the children to expand, and continue
Example:
    T  S | array [ S ] of T
    S  int | num..num
The steps of the top-down parser when processing the string:
 array [ num..num ] of int
(while scanning the input from left to right) are:
Input string   array [ num..num] of int

Parse tree                      T


Input string                     ]
               array [ num .. num of int

                                T
Parse tree

               array    [       S        ]        of    T


Input string                     ]
               array [ num .. num of int



Input string                     ]
               array [ num .. num of int

                                T
Parse tree

               array    [       S        ]        of    T

                   num          ..               num


Input string                     ]
               array [ num .. num of int



Input string                     ]
               array [ num .. num of int



Input string                     ]
               array [ num .. num of int


Input string                     ]
               array [ num .. num of int


Input string                     ]
               array [ num .. num of int


Parse tree                      T


               array        [   S            ]     of    T

                       num          ..           num    S


                                                        int
1.2 Predictive Parsing
A predictive parser is a program consisting of procedures for every
nonterminal in the grammar. Each procedure does the following:
- it decides which production to use by looking at the lookahead symbol
   (If there is a conflict between two right sides for any lookahead symbol,
   then we cannot use this parsing method on this grammar);
- when the lookahead symbol is a nonterminal it calls the corresponding
  procedure for that nonterminal (the procedure uses a production by
  mimicking the right side).
       A nonterminal results in a call to the procedure for the
       nonterminal, and a token matching the lookahead symbol
       results in the next input token being read.
       If at some point the token in the production does not match the
       lookahead symbol an error is declared.
Example: Consider the following translation scheme that converts
         arithmetic expressions into postfix form (here including only
         digits separated by plus and minus signs):


E    E+ T       {print( ‘+’ )}
E    E- T       {print( ‘-’ )}
E    T
T    0          {print( ‘0’ )}
T    1          {print( ‘1’ )}
...
T    9          {print( ‘9’ )}
Since the above grammar is left-recursive, and a predictive parser cannot handle
such grammars, we eliminate the left-recursion and rewrite it as follows:

E    T   R
R    +   T {print( ‘+’ )} R | - T {print( ‘-’ )} R | 
T    0   {print( ‘0’ )}
T    1   {print( ‘1’ )}
...
T    9 {print( ‘9’ )}
2. Building a Symbol Table

The information about source language constructs, collected during the
lexical and syntax analysis phases of compilation, is stored in a special
data structure called symbol table.

This information includes: characters forming identifiers, type of
identifiers, usage of identifiers etc..

This information is later used in semantic analysis—to check the validity
of the language constructs with respect to the given grammar, and in
code generation to produce proper code for the source program.
3. Interpretation with Stack Machines
The compilers generate intermediate code for stack machines.
An abstract stack machine has separate instruction memory and
data memory, and all arithmetic operations are executed with the
values available on the stack.
The stack instructions typically include:
 - integer arithmetic instructions: simulate evaluation of postfix
   expressions using the stack;
 - stack manipulation: serve for accessing the data memory;
 - control flow: perform conditional and unconditional jumps.
Consider the expression: 1 + 3 * 5, the configuration of the
abstract stack machine that executes it is as follows:

   Instructions        Stack        Data
   push 1              1
   push 3              3
   add                 4
   push 5              5
   mul                20
   mov a                              20
Consider the expression:
d = ( 13 * y ) div 3 + ( 2 * m + 1) * 7
the stack machine program for it is:

    Instructions
    lvalue          d
    push           13
    Rvalue          y
    mul
    push            3
    div
    push            2
    rvalue          m
    mul
    push            1
    add
    push            7
    mul
    sto
4. Grouping the Phases of a Compiler
Lexical Analysis
1. Removal of White Space and Comments
2. Constants
3. Recognizing Identifiers and Keywords
4. Interface to the Lexical Analyser
5. A Lexical Analyser

Building a Symbol Table
1. Symbol Table Interface
2. Handling Reserved Keywords
3. A Symbol Table Implementation

Stack- Based Representation of Intermediate Code
1. Stack Machine Architecture: instruction memory, data memory and a stack
2. Instructions: integer arithmetic, stack manipulation and control flow
3. L-values and R-values
4. Stack Manipulation
5. Translation of Expressions
6. Control Flow
7. Translation of Statements

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:6
posted:9/18/2012
language:Unknown
pages:12