C# Document by worldofstaksb

VIEWS: 8 PAGES: 12

									                                                                                                         LABS

          COMPILER
                                                                                  Lab 2 Symbol table


        CONSTRUCTION
                                                                                  Lab 3 LR parsing and abstract syntax tree
                                                                                        construction using ''bison'‘

                                                                                  Lab 4 Semantic analysis (type checking)

                       Tutorial 2


                    TDDB44 Compiler Construction                                                TDDB44 Compiler Construction
                             Tutorial 2                                                                  Tutorial 2




             PHASES OF A COMPILER
                       Source Program


                       Lexical Analysis
Lab 2 Symtab –
administrates the
                                               Lab 3 Parser – manages syntactic
symbol table           Syntax Analyser         analysis, build internal form

  Symbol Table
    Manager           Semantic Analyzer
                             Text
                                               Lab 4 Semantics – checks static
                                               semantics                                  LAB 2
Error     Handler        Intermediate
                        Code Generator                                              THE SYMBOL TABLE
                        Code Optimizer


                        Code Generator

                        Target Program

                    TDDB44 Compiler Construction                                                TDDB44 Compiler Construction
                             Tutorial 2                                                                  Tutorial 2
            SYMBOL TABLES                                    SYMBOL TABLES
A Symbol table contains all the information      In a compiler we also need:
that must be passed between different               • Address (where it is the info stored?)
phases of a compiler/interpreter                    • Other info due to used data structures

A symbol (or token) has at least the following   Symbol tables are typically implemented
attributes:                                      using hashing schemes because good
   • Symbol Name                                 efficiency for the lookup is needed
   • Symbol Type (int, real, char, ....)!
   • Symbol Class (static, automatic, cons...)



              TDDB44 Compiler Construction                     TDDB44 Compiler Construction
                       Tutorial 2                                       Tutorial 2




      SIMPLE SYMBOL TABLES                             SCOPED SYMBOL TABLES
We classify for symbol tables as:
  • Simple                                       Complication in simple tables involves
  • Scoped                                       languages that permit multiple scopes
                                                 C permits at the simplest level two
Simple symbol tables have…
                                                 scopes: global and local (it is also
… only one scope
                                                 possible to have nested scopes in C)!
... only “global” variables

Simple symbol tables may be found in
BASIC and FORTRAN compilers


              TDDB44 Compiler Construction                     TDDB44 Compiler Construction
                       Tutorial 2                                       Tutorial 2
                          WHY SCOPES?                                                            SCOPED SYMBOL TABLES
 The importance of considering the scopes                                                  Operations that must be supported by the
 are shown in these two C programs                                                         symbol table in order to handle scoping:

                                                                                             • Lookup in any scope – search the most
     main(){                                    main(){                                       recently created scope first
        int a=10; //global variable                int a=10; //global variable
        changeA();                                 changeA();                                • Enter a new symbol in the symbol table
        printf(”Value of a=%d\n,a);                printf(”Value of a=%d\n,a);
     }                                          }                                            • Modify information about a symbol in a
     void changeA(){                            void changeA(){                               “visible” scope
        int a; //local variable                    a=5;
        a=5;                                    }                                            • Create a new scope
     }
                                                                                             • Delete the most recently scope


                           TDDB44 Compiler Construction                                                 TDDB44 Compiler Construction
                                    Tutorial 2                                                                   Tutorial 2




                      HOW IT WORKS
                                  Index to Other info. Hash Link
                                string table                                 Block Table
             Hash Table
                                                                   sym_pos
READ, REAL
                                                                   sym_pos
A, WRITE
                                                                   sym_pos
P1



                                                                                                         LAB 3
                                                                   sym_pos




                                                                                                        PARSING
INTEGER




             INTEGER      REAL    R E A D WR I T E



                      poolpos



                           TDDB44 Compiler Construction                                                 TDDB44 Compiler Construction
                                    Tutorial 2                                                                   Tutorial 2
           SYNTAX ANALYSIS                                             PURPOSE

The parser accepts tokens from the scanner      1. Verify the syntactic correctness of the
and verifies the syntactic correctness of the   input token stream, reporting any errors
program specification
                                                2. Produce a parse tree and certain table
Along the way, it also derives information
                                                for use by later phases
about the program and builds a fundamental
                                                •Syntactic correctness is judged by verification against a formal
data structure known as parse tree              grammar which specifies the language to be recognized
The parse tree is an internal representation    •Error messages are important and should be as meaningful as
                                                possible
of the program and it also augments the
symbol table                                    •Parse tree and tables will vary depending on compiler
                                                implementation technique and source language


              TDDB44 Compiler Construction                         TDDB44 Compiler Construction
                       Tutorial 2                                           Tutorial 2




                   METHOD                                  PARSING STRATEGIES

                                                Two categories of parsers:
 Match token stream using manually
 or automatically generated parser                 •Top-down parsers
                                                   •Bottom up parsers
                                                Within each of these broad categories
                                                are a number of sub strategies
                                                depending on whether leftmost or
                                                rightmost derivations are used

              TDDB44 Compiler Construction                         TDDB44 Compiler Construction
                       Tutorial 2                                           Tutorial 2
        TOP-DOWN PARSING
Start with a goal symbol and recognize
it in terms of its constituent symbols
Example: recognize a procedure in
terms of its sub-components (header,
declarations, and body)!
The parse tree is then built from the top
(root) and down (leaves), hence the
name

            TDDB44 Compiler Construction    TDDB44 Compiler Construction
                     Tutorial 2                      Tutorial 2




            TDDB44 Compiler Construction    TDDB44 Compiler Construction
                     Tutorial 2                      Tutorial 2
       BOTTOM-UP PARSING
Recognize the components of a
program and then combine them to
form more complex constructs until a
whole program is recognized
Example: recognize a procedure from
its sub-components (header,
declarations, and body)!
The parse tree is then built bottom and
up, hence the name
            TDDB44 Compiler Construction                  TDDB44 Compiler Construction
                     Tutorial 2                                    Tutorial 2




      PARSING TECHINIQUES                                  LR PARSING

A number of different parsing              A specific bottom-up technique
techniques are commonly used for             •LR stands for Left->right scan, Rightmost
syntax analysis, including:                  derivation
                                             •Probably the most common & popular parsing
  •Recursive-descent parsing                 technique
  •LR parsing                                •YACC, BISON, and many other parser generation
                                             tools utilize LR parsing
  •Operator precedence parsing               •Great for machines, not so cool for humans…
  •Many more…

            TDDB44 Compiler Construction                  TDDB44 Compiler Construction
                     Tutorial 2                                    Tutorial 2
               + AND - OF LR                                   BISON AND YACC USAGE
Advantages of LR:
  •Accept a wide range of grammars/languages
                                                      Bison is a general-purpose parser
                                                      generator that converts a grammar
  •Well suited for automatic parser generation
                                                      description for an LALR(1) context-free
  •Very fast                                          grammar into a C program to parse that
  •Generally easy to maintain                         grammar
Disadvantages of LR:
  •Error handling can be tricky
  •Difficult to use manually

               TDDB44 Compiler Construction                           TDDB44 Compiler Construction
                        Tutorial 2                                             Tutorial 2




      BISON AND YACC USAGE                                            BISON USAGE
One of many parser generator packages
                                                       Bison source
                                                                            Bison
Yet Another Compiler Compiler                            program
                                                         parser.y          Compiler
                                                                                                      y.tab.c


  •Really a poor name, is more of a parser compiler
  •Can specify actions to be performed when each
  construct is recognized and thereby make a full                             C
                                                           y.tab.c                                      a.out
                                                                           Compiler
  fledged compiler but its the user of Bison that
  specify the rest of the compilation process…
  •Designed to work with FLEX or other
  automatically or hand generated “lexers”                 Token              a.out                  Parse tree
                                                           stream



               TDDB44 Compiler Construction                           TDDB44 Compiler Construction
                        Tutorial 2                                             Tutorial 2
       BISON SPECIFICATION                                C DECLARATIONS
A Bison specification is composed of 4 parts   •Contains macro definitions and declarations
      %{
      C declarations
                                               of functions and variables that are used in the
      %}                                       actions in the grammar rules
      Bison declarations                       •Copied to the beginning of the parser file so
      %%
      Grammar rules                            that they precede the definition of yyparse
      %%                                       •Use #include to get the declarations from a
      Additional C code
      Comments enclosed in `/* ... */' may     header file. If C declarations isn’t needed, the
      appear in any of the sections            %{ and %} delimiters that bracket this section
                                               might be omitted
Looks like Flex specification, doesn’t it?
Similar function, tools, look and feel
              TDDB44 Compiler Construction                   TDDB44 Compiler Construction
                       Tutorial 2                                     Tutorial 2




       BISON DECLARATIONS                                 GRAMMAR RULES
Contains declarations that define              •Contains one or more Bison grammar
terminal and nonterminal symbols, and          rules, and nothing else
specify precedence
                                               •There must always be at least one
                                               grammar rule, and the first `%%' (which
                                               precedes the grammar rules) may
                                               never be omitted even if it is the first
                                               thing in the file


              TDDB44 Compiler Construction                   TDDB44 Compiler Construction
                       Tutorial 2                                     Tutorial 2
            ADDITIONAL C CODE                                        BISON EXAMPLE
•Copied verbatim to the end of the                      %{
parser file, just as the C declarations                 #include <ctype.h> /* standard C declarations here */

section is copied to the beginning                      }%
                                                        %token DIGIT /* BISON declarations */

•This is the most convenient place to                   %%
                                                        /* Grammar rules */
put anything that should be in the                      line : expr ‘\n’              {pritf{“%d\n”,$1};}   ;

parser file but isn’t need before the                   expr : expr ‘+’ term
                                                               | term
                                                                                      {$$=$1+$3;}
                                                                                                            ;
definition of yyparse                                   term : term ‘*’ factor
                                                               | factor
                                                                                      {$$=$1*$3;}
                                                                                                            ;
•The definitions of yylex and yyerror
often go here

                   TDDB44 Compiler Construction                          TDDB44 Compiler Construction
                            Tutorial 2                                            Tutorial 2




              BISON EXAMPLE                                   USING BISON WITH FLEX
 factor : ‘(‘ expr ’)’         {$$=$2;}

 DIGIT
        |
                                              ;
                                                       Bison and Flex are obviously designed
 %%
 /* Additional C code */
                                                       to work together
 yylex(){/* A really simple lexical analyzer*/           •Bison produces a driver program called yylex()
        int c;
        c = getchar();                                   (actually its included in the lex library -ll)!
        if(isdigit(c)){
                yylval=c-’0’;                               #include “lex.yy.c” in the third part of
                return DIGIT;                               Bison specification
        }
        return c;                                           this gives the program yylex access to Bisons’
 }
                                                            token names

Note: Bison uses yylex, yylval, etc - designed to be
used with FLEX

                   TDDB44 Compiler Construction                          TDDB44 Compiler Construction
                            Tutorial 2                                            Tutorial 2
    USING BISON WITH FLEX                                ERROR HANDLING IN BISON

•Thus do the following:                              Error handling in Bison is provided by error
                                                     productions
   % flex scanner.l
                                                     An error production has the general form
   % bison parser.y
                                                        non terminal: error synchronizing set
   % cc y.tab.c -ly -ll
                                                            •non-terminal where did it occur
•This will produce an a.out which is a parser with
                                                            •error a keyword
an integrated scanner included
                                                            •synchronizing-set possible empty subset of tokens
                                                     When an error occurs, Bison pops symbols off the
                                                     stack until it finds a state for which there exists an
                                                     error production which may be applied


             TDDB44 Compiler Construction                               TDDB44 Compiler Construction
                      Tutorial 2                                                 Tutorial 2




                                                                            PURPOSE
                                                     To verify the semantic correctness of the program
                                                     represented by the parse tree, reporting any errors,
                                                     possibly, to produce an intermediate form and certain
                                                     tables for use by later compiler phases

           LAB 4
                                                        -Semantic correctness the program adheres to the rules of
                                                        the type system defined for the language (plus some other
                                                        rules )!

         SEMANTICS                                      -Error messages should be as meaningful as possible
                                                        -In this phase, there is sufficient information to be able to
                                                        generate a number of tables of semantic information
                                                                    identifier, type and literal tables




             TDDB44 Compiler Construction                               TDDB44 Compiler Construction
                      Tutorial 2                                                 Tutorial 2
                 METHOD                                       IMPLEMENTATION

Ad hoc confirmation of semantic                     Semantic analyzer implementations
rules                                               are typically syntax directed

                                                    More formally, such techniques are
                                                    based on attribute grammars

                                                    In practice, the evaluation of the
                                                    attributes is done manually


            TDDB44 Compiler Construction                          TDDB44 Compiler Construction
                     Tutorial 2                                            Tutorial 2




   MATHEMATICAL CHECKS                                    UNIQUENESS CHECKS
Divide by zero                                   In certain situations it is important that
  Zero must be compile-time determinable         particular constructs occur only once
     constant zero, or an expression which       Declarations
     symbolically evaluates to zero at runtime     within any given scope, each identifier must be declared
Overflow                                           only once

  Constant which exceeds representation of       Case statements
  target machine language                          each case constant must occur only once in the “switch”
  arithmetic which obviously leads to overflow
Underflow
  Same as for overflow


            TDDB44 Compiler Construction                          TDDB44 Compiler Construction
                     Tutorial 2                                            Tutorial 2
            CONSISTENCY CHECKS                                                                            TYPE CHECKS
Some times it is also necessary to                                                     These checks form the bulk of semantic
ensure that a symbol that occurs in one                                                checking and certainly account for the
place occurs in others as well.                                                        majority of the overhead of this phase
Such consistency checks are required whenever matching is                              of compilation
required and what must be matched is not specified explicitly
                                                                                       In general the types across any given operator must be
(i.e as a terminal string) in the grammar
                                                                                       compatible
     This means that the check cannot be done by the parser
                                                                                           The meaning of compatible may be:
                                                                                              •the same
                                                                                              •two different sizes of the same basic type
                                                                                              •some other pre-defined compatibility




                        TDDB44 Compiler Construction                                                      TDDB44 Compiler Construction
                                 Tutorial 2                                                                        Tutorial 2




                                                                                                             Type Checking
                        TYPE CHECKS
Must execute the same steps as for expression
evaluation                                                                                                           + real
         Effectively we are ”executing” the expression at compile time for type
         information only

This is a bottom-up procedure in the parse tree                                                                int      real
                                                                                                         X                           * real
         We know the type of ”things” at the leaves of a parse tree
         corresponding to an expression
         (associated types stored in literal table for literals and symbol table for    Symbol Table
         identifiers)!
                                                                                        X | INT
When we encounter a parse tree node corresponding to some operator if the                                                      int       real
operand sub-trees are leaves we know their type and can check that the types            Y | INT                        Y                        Z
are valid for the given operator.
                                                                                        Z | REAL



                        TDDB44 Compiler Construction                                                      TDDB44 Compiler Construction
                                 Tutorial 2                                                                        Tutorial 2

								
To top