Docstoc

Compiler Construction

Document Sample
Compiler Construction Powered By Docstoc
					    Chapter 6: Semantic Analysis




1
    (Static) Semantic Analyzer

    ==> Semantic Structure
    - What is the program supposed to do?
    - Semantics analysis can be done during syntax analysis
       phase or the final code generation phase.
    - typical static semantic features include declarations and
      type checking.
    - information (attributes) gathered can be either added to
      the tree as annotations or entered into the symbol table.



2
       Output of the semantic analyzer – annotated AST




    with subscripts from a range

4
    Two Categories of Semantic Analysis


    1.   The analysis of a program to meet the
         definition of the programming language.

    2.   The analysis of a program to enhance the
         efficiency of execution of the translated
         program.



5
    Semantic Analysis Process

    includes formally:
    - description of the analyses to perform
    - implementation of the analysis (translation of
      the description) that may use appropriate
      algorithms.




6
    Description of Semantic Analysis

    1.    Identify attributes (properties) of language
          (syntactic) entities.

    2.    Write attribute equations (or semantic rules) that
          express how the computation of such attributes
          is related to the grammar rules of the language.

         Such a set of attributes and equations is called an
         attribute grammar.

7
    Syntax-directed semantics


    -   The semantic content of a program is closely
        related to its syntax.

    -   All modern languages have this property.




8
        Attributes

    - An attribute is any property of a programming
      language construct.
    - Typical examples of attributes are:
       the data type of a variable, the value of an
      expression, the location of a variable in memory,
      the object code of a procedure, the number of
      significant digits in a number.
    - Attribute corresponds to the name of a field of a
      structure.
9
         Attribute Grammars

        In syntax-directed semantics, attributes are
         associated with grammar symbols of the
         language. That is, if X is a grammar symbol and
         a is an attribute associated to X, then we write
         X.a for the value of a associated to X.
        For each grammar rule X0-> X1 X2 …Xn the
         values of the attributes Xi.aj of each grammar
         symbol Xi are related to the values of the
         attributes of other grammar symbols in the rule.
10
         That is, each relationship is specified by an
         attribute equation or semantic rule of the form:

     Xi.aj = fij (X0.a1 ,.., X0.ak ,.., X1.a1 ,.., X1.ak , .., Xn.a1
                   ,.., Xn.ak )

        An attribute grammar for the attributes a1,…,ak is
         the collection of all such attribute equations
         (semantic rules), for all the grammar rules of the
         language.


11
     number.val must be
     computed prior to
     factor.val




12
     Attribute grammars may involve several
     interdependent attributes.




13
e.g. 345o

    128d

    128o (x)
     Attribute grammars may be defined for
     different purposes.




16
17
    term1.tree = mkOpNode(*,term2.tree,factor.tree)


            *

        -                   42
                                 factor.tree =
   34                3           mkNumNode(number.lexval)




(34 – 3) * 42
     Algorithms for attribute computation

        Dependency graph and evaluation order




19
 Attribute grammar for simple C-like variable declarations


Grammar Rules                  Semantic Rules

decl  type var-list         var-list.dtype = type.dtype

type  int                   type.dtype = integer

type  float                  type.dtype = real

var-list1  id , var-list2    id.dtype = var-list1.dtype
                              var-list2.dtype = var-list1.dtype

var-list  id                 id.dtype = var-list.dtype
                       decl


              type               var-list
          (dtype = real)      (dtype = real)



             float        id        ,      var-list
                         (x)             (dtype = real)
                     (dtype = real)
                                              id
                                              (y)
Parse tree for the string
float x , y                               (dtype = real)
                          decl


                 type               var-list
             (dtype = real)      (dtype = real)


trivial
dependency      float        id        ,      var-list
                            (x)             (dtype = real)
                        (dtype = real)
                                                 id
                                                 (y)
  Parse tree for the string
  float x , y                                (dtype = real)
26
27
29
30
base is computed in
preorder and val in
postorder
     Synthesized Attributes

     An attribute a is synthesized if, given a grammar rule
     A -> X1 X2 …Xn, the only associated attribute equation
     with an a on the left-hand side is of the form:
        A.a = f (X1.a1 ,.., X1.ak ,.., Xn.a1 ,.., Xn.ak)
      e.g., E1 -> E2 + E3      {E1.val = E2.val + E3.val; }
      where E.val represents the attribute (numerical value
      obtained) for E
      An attribute grammar in which all the attributes are
      synthesized is called S-attributed grammar.


32
     S-attributed
     grammar




33
    term1.tree = mkOpNode(*,term2.tree,factor.tree)


            *

        -                   42
                                 factor.tree =
   34                3           mkNumNode(number.lexval)




(34 – 3) * 42                        *
                            -            42

                     34          3
     Inherited Attributes

        An attribute that is not synthesized is called
         an inherited attribute.




36
Preorder /Preorder & Inorder
traversal
     Computation of Attributes During
     Parsing

        L-attributed grammars




38
     Computing Synthesized Attributes During
     LR Parsing


       -- LALR(1) parser are primarily suited to
          handling synthesized attributes.
       -- Two stacks are required.
           value stack and parsing stack




39
                  E : E + E { $$ = $1 + $3} // Yacc specification




Parsing   Value
stack     stack



40
41
     Translation (Attribute Computation)


        A translation scheme is merely a context-free
         grammar in which a program fragment called
         semantic action is associated with each
         production.




42
     e.g. A -> XYZ {  }

     In a bottom up parser the semantic actions 
     is taken when XYZ is reduced to A. In a top-
     down parser the action  is taken when A, X,
     Y, or Z is expanded, whichever is appropriate.




43
     Semantic Action


     In addition to those stated before, the semantic
       action may also involve:
     1. the computation of values for variables
        belonging to the compiler.
     2. the generation of intermediate code.
     3. the printing of an error diagnostic.
     4. the placement of some values in the symbol
       table.
44
    Bottom-up Translation of S-attributed
    Grammars
- A bottom-up parser uses a stack to hold information about
   subtrees that have been parsed. We can use extra fields in
   the parser stack to hold the values of synthesized attributes.
  e.g. A -> XYZ {A.a = f (X.x, Y.y, Z.z)}

- Before reduction: the value of the attribute Z.z is in val [top],
   Y.y is in val [top-1], and Z.z is in val [top-2].

   After reduction: top is decremented by 2, A.a is put in val
    [top]
      For Special Conditions : Hook
     stmt -> IF cond THEN stmt ELSE stmt
     ==>
      stmt -> IF cond THEN { action to emit appropriate cond.
        jump } stmt ELSE { action to emit appropriate uncond.
        jump } stmt
     Or
     hook1->  { action to emit appropriate conditional jump }
     hook2->  { action to emit appropriate unconditional jump }

     stmt -> IF cond THEN hook1 stmt ELSE hook2 stmt

46
     Symbol Table

        consists of the records that associate
         attributes with various programmer declared
         objects. The main one is its name (a string
         of characters, e.g. identifier).
        semantic action will put information into
         symbol table or take out attribute from
         symbol table.


47
     What kind of objects?

     1. variables
     2. components of a composition structure (i.e.,
       field names of structure)
     3. labels
     4. procedure and function name
     5. parameters for procedure and function
     6. files

48
     What attributes (attributes of objects)?

     1. name
     2. type (e.g. int, float, array, struct, a pointer to
       struct, etc.)
     3. location for variables and entry point
     4. value for named constant
     5. initial value for variable
     6. flag showing if it has been accessed.

49
     How does an attribute be represented?

     1. Name strategy:
     (a) use a field of n char in the symbol table
       record to store the first (up to) n characters of
       the identifier.
     (b) use an auxiliary string table and store a
       pointer (e.g. 5) to the 1st char. of the identifier
       and the length ( e.g. 4) of the identifier in the
       "name" field of symbol table record.


50
     Scheme(A) is simpler, faster and require
     less programming.


     Scheme(B) allows arbitrary large name, saves
     space and requires more programming. Often
     this table is kept for string literals and can be
     used with little extra programming.




51
     2. type Since type can be arbitrarily complex,
     they are best represented by a pointer to a
     linked data structure that reflects the structure
     of the type. (type is mainly used to determine
     if semantics is correct and offset computation.)


     Def. The static scope of an occurrence of an
      identifier is that portion of the (source) program
      in which other occurrence of the same
      identifier represents the same object.


52
       e.g. in Pascal, it is the procedure or function in
       which it was declared minus all sub-procedures
       and sub-functions in which it is represented.

     program P;
        procedure Q;          -----------------------------
          var x: real;                                     |
          procedure R;              -----                  |
            var x: integer;              |                 | scope of x
                                         | minus this |
           x := ...                      |                 |
          end;                      -----                  |
                                                           |
         x := x + 1                                        |
      end;                      ----------------------------
     end.

53
      Symbol table mechanism ?


     1. What should be done when translating a
       declaration?
     2. What should be done when reference to an
       identifier?
     3. What should be done when at scope entry?
     4. What should be done when at scope exit?



54
     Multiscope symbol table

     - descriptor is a record that describing an object.
     - its fields are called attributes
     - its key contains an identifier together with a
       context (a context is a block or a declaration,
       represented by a lexical number).




55
     e.g.   float x;                  -------
            struct y{ -----                  |
                   int x; |                  |
                   int z; | inner context | another context
                    .        |               |
                      } -----         --------



56
   Each context associated with a #, x will pair with the
   # to look up at symbol table.
 - when we enter a context (compiling time - not run
   time) we give it a new # on to the context stack (it is
   the current context).
 - when we exit a context we pop that # from the stack.
- when resolving a reference to a simple identifier, say
   x9, we pair it with the current context and look it up in
   the symbol table, if not there try with next context, etc.
   until found, or we run out the context.
 - when declaring an object we allocate a descriptor for
   it. Put the current context into its context field and the
   identifier into its identifier-field. Then fill in other
   attribute as appropriate. In the case of a record
   declaration
       e.g.   float x;                  ------
              struct y { ----                  |
                     int x; |                   |
                     int z; | context #3        | context #2
                      .        |                 |
                        } ----         -------

      y has #2 in its context field and x has # 3 in its internal
        context field. Lookup y using context # 2. Look up its
        field x with context # 3.
     - when resolving a reference to a qualified identifier (e.g.
        student.grad) we look up the struct as before upon
        finding, we get an attribute that called (that will be a #)
        internal-context and lookup the field with the context to
        find the descriptor for that object.
58
      Consider the following basic programming-language
       constructs for generating intermediate codes:

     1. Declarations (ˇ)
     2. arithmetic assignment operations (ˇ)
     3. Boolean expressions (ˇ)
     4. flow-of-control statements` if-statement(ˇ) while (ˇ)
     5. array references (ˇ)
     6. procedure calls (ˇ)
     7. switch statements (ˇ)
     8. structure-type references (ˇ)


59
     Semantic Actions for different
     language constructs

     1. Declarations
     e.g. int x,y,z;
         float w, z, s;




60
     Suggested grammar:
     (Note: This is a very simple grammar mainly
     used for explanation.)

     P -> MD;
                                           P
     M ->  /* empty string */                      4

     D -> D, id                       M         D       ;
        | int id                 1         3

        | float id                            D , id
                                                    2
                                          int id

                                     int x , y ;
61
                            (Syntax-directed) Translation
     4   P -> MD; {/* do nothing */}
     1   M ->    { if offset was not initialized then offset = 0;}

     2   D -> int id { enter (id.name, int, offset);
                        /* a function entering type “int” and
                          particular offset to the entry id.name
                          of the symbol table */
                         D.type = int;
                         offset = offset + 4; /*bytes, width of int*/
                         D.offset = offset; }




62
2   D -> float id { enter (id.name, float, offset);
                     D.type = float;
                     offset = offset + 8;
                     /*bytes, width of float*/
                     D.offset = offset ; }
3   D -> D(1), id { enter (id.name, D(1).type, D(1).offset);
                     D.type = D(1).type;
                     If D(1).type == int D.offset = D(1).offset + 4;
                     else if
                     D(1).type == float D.offset = D(1).offset+8;
                     offset = D.offset;}

    Note: We can construct a data structure to store the information
    (attributes) of D. (i.e., D.type and D.offset)
                        Avoided grammar:
                                                            D    2
     D -> int namelist ; | float namelist ;
     namelist -> id, namelist | id                  int   namelist   ;
                                                                1
                                                          id
     Why?                                            int x ;
      When the 'id' is reduced into namelist, we cannot
      know the type of 'id' (int or float?) immediately.
      Therefore, it is troublesome to enter such type
      information into the corresponding field of the 'id' in
      the symbol table. Hence, we must use special coding
      technique (e.g. linked list keeping the ids name
      (pointers to symbol table) to achieve such a purpose.
      (* In other words, we need backpatch to chain the
64    data type.)
                         Acceptable grammar:
              D         -> int intlist ; | float floatlist ;
              intlist -> id, intlist | id
              floatlist -> id, floatlist | id


     Advantage: The above-mentioned problem will not
     happen. That is, when 'id' is reduced, we can
     identify the type of id. (If id is reduced to intlist, then
     id is of “int” type)

     Defect: too much production will occur. => too many
     states => bad performance


65
     How to handle the following declaration?
     x,y,z : float


     Two approaches:
     (I) decl -> id_list ':' type           3
         id_list -> id_list ',' id | id     1

         type -> int | float                2


     (II) decl -> id ':' type | id , decl   2
           type -> int | float              1


     Which one is better for LR parsing? Why?
66
     Suggested Grammar for the following Declaration:
     var x,y,z : real; u,v,t : integer; …

     declarations : VAR decl_list
                   | /* empty (no declaration is permitted) */
                   ;
     decl_list    : declaration ';'
                   | declaration ';' decl_list
                   ;
     declaration : ID ':' type
                   | ID ',' declaration
                   ;
     type          : REAL
                   | INTEGER
67
                   ;
Try to construct a parse tree for the following declaration
and see how to parse it:            var x: real; y: integer;

               declarations

        VAR         decl_list


         var    declaration ; decl_list


                ID : type declaration ;

                x     real      ID : type

                                y     integer
     The following grammar for declaration is
     difficult for attribute gathering.


     declaration : id_list ':' type ;
     id_list     : ID
                  | id_list ',' ID
     type         : REAL
                  | INTEGER




69
e.g.,             5          declaration

                       id_list      :   type
        3
                                               4


                 id_list ,     ID       REAL
2

        id_list , ID
1

            ID
      Intermediate Code Generation
   Three Address Code <-> (Two Address code => Triples)
   Quadruples (a collective data structure, each unit is with 4 fields)
         Operator    Arg1      Arg2       Result
             =+                                          A unit
             =-
             =*
             =/
             =%
             []=
             =[]
             ….

    Note: The entries of operator column are integers that represent
    individual operators. The entries of Arg1 (operand1) Arg2
    (operand2) and Result are index (pointer) to the symbol table.
   Kinds of three-address codes:
1. A = B op(1) C (op is a binary arithmetic or logical operation)
2. A = op(2) B    (op is a unary operation, e.g. minus, negation, shift
                   operators, conversion operator, identity operator)
3. goto L           (unconditional jump, execute the Lth three-
                     address code)
4. if A relop B goto L (relop denotes relational operators, e.g., <,
                            ==, >, >=, !=, etc.)
5. param A and call P,n (used to implement a procedure call)
6. A = B [i]
7. A[i] = B
8.     A = &B
9.     A = *B
10. *A = B
     In Quadruples:

            Operator    Arg1   Arg2   Result

     1. ==> op(1)        B     C      A
     2. ==> op(2)        B            A
     3. ==> goto                      L
     4. ==> relopgoto    A     B      L
     5. ==> param        A
     5. ==> call         P     n
     6. ==> =[]          B     i      A
     7. ==> []=          B     i      A
     8. ==> =&           B            A
     9. ==> =*           B            A
73   10. ==> *=          B            A
     Example:   D = A*B+C             D=A+B*C

    The generated three address code is:
         T1 = A * B              T1 = B * C
         T2 = T1 + C            T2 = A + T1
         D = T2                  D = T2

                   Operator    Arg1      Arg2     Result
Interpret
                    =*         A         B         T1
this                =+         T1        C        T2
                    =          T2                  D

* T1 and T2 are compiler-generated temporary variables and they
  are also saved in the symbol table.
                          Actually, in implementation the
                           quadruples look as:
                                                                       Interpret
            Operator          Arg1      Arg2        Result             this
               8                6         7            9
              15                9         8           11
              3                11                    10
     in symbol table: index identifier     attributes
                         0      twa
                         1       K
                         ..      ..
                         ..      ..
                         6       A
                         7       B
                         8       C
                         9       T1 /* compiler generated temporary variable */
                        10       D
75                      11       T2 /* compiler generated temporary variable */
     2. Arithmetic Statements


     A -> id = E
                                    A
     E -> E(1) + E(2)                        4

     E -> E(1) - E(2)      id       =    E
                                                  3
     E -> E(1) * E(2)
                                    E    +       E
     E -> E(1) / E(2)
                                1                     2
     E -> - E(1)                    id           id
     E -> (E(1))
                          x=a+b                  T1 = a + b
     E -> id
76                                               x = T1
3   A -> id = E        { GEN (id.place = E.place); }
    /* GEN (argument) - a function used to save its argument into the
       quadruple. The implementation of E is a data structure with one field
       E.place which holds the name that will hold the index value of the
       symbol table. */

2   E -> E(1) + E(2)    { T = NEWTEMP();
                           /* NEWTEMP() - a function used to generate a
                          temporary variable T and save T into symbol
                          table and return the index value of the symbol
                          table. */
                          E.place = T;
                           /* T‟s index value in symbol table is assigned to
                             E.place */
                         GEN(E.place = E(1).place + E(2).place); }


                                             T=a+b
2   E -> E(1) * E(2) { T = NEWTEMP();
                       E.place = T;
                       GEN(E.place = E(1).place * E(2).place); }

2   E -> - E(1)     { T = NEWTEMP();
                      E.place = T;
                      GEN(E.place = -E(1).place); }

2   E -> (E(1))     { E.place = E(1).place; }

1   E -> id         { E.place = id.place; }
                     /*將id之符號表index值傳給E之field 'place' ; In
                      implementation id.place refers to the index value
                      of id in the symbol table. */
Enhanced version for E -> E(1) op E(2)

**注意in this version E所對應資料結構之設計 (應以array of
  struct of E之資料結構來儲存各個E之attributes, 並將對
  應之array index值儲存於E對應之value stack中)

{ T = NEWTEMP();
  if E(1).type == int and E(2).type == int then
   { GEN (T = E(1).place intop E(2).place);
       E.type = int;
   }
  else if E(1).type == float and E(2).type == float then
  { GEN (T = E(1).place floatop E(2).place);
       E.type = float;
      }
    else if E(1).type == int and E(2). type == float then
      { U = NEWTEMP();
        GEN (U = inttofloat E(1).place);
        GEN (T = U floatop E(2).place);
        E. type = float;
      }
    else /* E(1). type == float and E(2). type == int then
      { U = NEWTEMP();
        GEN (U = inttofloat E(2).place);
        GEN (T = E(1).place floatop U);
        E. type = float;
      }
}
     3. Boolean Expression


      M -> 
      E -> E or M E
         | E and M E
         | not E
         |(E)
         | id
         | id relop id

81
     An example


      if p < q || r < s && t < u
          x = y + z;
       k = m – n;

     For the above boolean expression the
     corresponding contents in the quadruples
     are:

82
   quadruples                                       if p < q || r < s && t < u
                                                           x = y + z;
                Location   Three-Address Code           k = m – n;
                   …         ………….
                                                       E
                  100      if p < q goto 106                  7
counter = 100     101      goto 102                 E or M E
                  102      if r < s goto 104   1          2          6

                  103      goto 108              id < id
                                                             E and M E
                  104      if t < u goto 106            3
                                                                  4
                  105      goto 108 /*s.next = 105        id < id
                                                                     
                  106      t1 = y + z                              5
                  107      x = t1                                   id < id
                  108      t2 = m - n
                  109      k = t2
                   ...     .........
NEXTQUAD – an integer variable used for saving the index (location)
value of the next available entry of the quadruples.

E.true – an attribute of E that holds a set of indexes (locations) of the
quadruples, each indexed quadruple saves the three-address code
with „true‟ boolean expression.

E.false – an attribute of E that holds a set of indexes of the
quadruples, each indexed quadruple saves the three-address code
with „false‟ boolean expression.

GEN(x) – a function that translates x (a kind of three-address-code) into
quadruple representation.

So, we need to construct a data structure for E which includes two
fields, each field can save an unlimited number of integer.
Meanwhile, we need to construct an array of this E‟s structure to
store several Es‟ attributes to be used in the same period of time .
2   1.     M ->  { M.quad = NEXTQUAD; }
          /* M.quad is a data structure associated with M */
4   2.     E -> E(1) or M E(2)
          {
             BACKPATCH (E(1).false, M.quad);
             E.true = MERGE (E(1).true, E(2).true);
             E.false = E(2).false;
          }
       /* BACKPATCH (p, i) – a function that makes each of the
           quadruple index values on the list pointed to by p take
           quadruple i as a target (i.e., goto i).*/

         /* MERGE (a, b) – a function that takes the lists pointed to
            by a and b, concatenates them into one list, and
           returns a pointer to the concatenated list. */
4   3. E -> E(1) and M E(2)
      {
         BACKPATCH (E(1).true, M.quad);
         E.true = E(2).true;
         E.false = MERGE (E(1).false, E(2).false);
       }

3   4. E -> not E(1)
      { E.true = E(1).false; E.false = E(1).true;}

3   5. E -> ( E(1) )
      { E.true = E(1).true; E.false = E(1).false;}
1   6. E -> id
     {
       E.true = MAKELIST (NEXTQUAD);
       E.false = MAKELIST(NEXTQUAD + 1);
       GEN (if id.place goto _ );
       GEN (goto _);
     }

    /* MAKELIST ( i ) – a function that creates a list containing i, an
     index into the array of quadruples, and returns a pointer to the
     list it has made. */

    /* GEN(x) – a function that translates x (a kind of three-address-
      code) into quadruple representation. */
     1   7. E -> id(1) relop id(2)
            {
              E.true = MAKELIST (NEXTQUAD);
              E.false = MAKELIST(NEXTQUAD + 1);
              GEN (if id(1).place relop id(2).place goto _ );
              GEN (goto _);
                                   NEXTQUAD
            }                                     if id(1).place relop
                                                    20       id(2).place goto _
                                                    21          goto _
                              E true   false
                                                    22

                         20        ….          21        …



88
     4. Flow-of-Control statements
     A. Conditional Statements

          S -> if E then S else S
             | if E then S
             | A
             | begin L end
          L -> S
             |L;S

          /* A – denotes a general assignment statement
             L – denotes statement list
             S – denotes statement             */

89
     7   1. S -> if E then M(1) S(1) N else M(2) S(2)
            {
               BACKPATCH (E.true, M(1).quad);
               BACKPATCH (E.false, M(2).quad);
               S.next = MERGE (S(1).next, N.next, S(2).next);
             }

            /* S.next is a pointer to a list of all conditional and
         unconditional jump (goto) to the quadruple following the
         statement S in execution order. */




90
     7   2. S -> if E then M S(1)
            {
               BACKPATCH (E.true, M.quad);
               S.next = MERGE (E.false, S(1).next)
             }

     1   3. M ->  { M.quad = NEXTQUAD; }

     2   4. N -> 
            {
                                                        N   next
              N.next = MAKELIST (NEXTQUAD);
              GEN (goto _);
            }                                               20
                                            NEXTQUAD = 20
                            20   Goto ___

91                   NEXTQUAD
 3       5. S -> A
              { S.next = MAKELIST ( ); }
               /* initialize S.next to an empty list */

 4       6.    L -> S { L.next = S.next; }

 5       7.     L -> L(1) ; M S
                {
                   BACKPATCH (L(1).next, M.quad); // To resolve all
              quadruples with conditional & unconditional unresolved
              „goto _‟
                   L.next = S.next;
                 }

     6   8.    S -> begin L end { S.next = L.next; }

92
     B. Iterative Statement
        S -> while E do S

       9. S -> while M(1) E do M(2) S(1)
          {
            BACKPATCH (E.true, M(2).quad);
            BACKPATCH (S(1).next, M(1).quad);
            S.next = E.false;
            GEN (goto M(1).quad);
          }




93
An example:
while (A<B) do if (C<D) then X = Y + Z;
          E             E
  Index   Three-Address Code
    …     …..                      2
   100     if (A<B) goto 102
   101     goto __ //will be resolved (filling 107) later
   102     if (C<D) goto 104
   103     goto 100                1 If (C<D) then X=Y+Z;
   104     T=Y+Z
                         3
   105     X=T
   106     goto 100       4
   107      …
5. Array References
Addressing Array Elements

one-dimension: A[low..high]
two-dimension: A[low1..high1, low2..high2]
  n-dimension: A[low1..high1, low2..high2, ... , lown..highn]

Let: base = address of beginning of A, and
     w = width of an array element
     ni = the number of array elements in i-th dimension (row)
     /* row major */
    ( e.g. n1 = high1 - low1 + 1; n2 = high2 - low2 + 1;
           n3 = high3 - low3 + 1; ...)
A[i] has address: base (of A) + (i - low) * w = i * w + (base –
low * w), where base - low * w is compile-time invariant.

A[i1, i2] has address (row-major): base + ((i1 - low1) * n2 +
i2 - low2) * w = (i1 * n2 + i2) * w + base - (low1 * n2 +
low2) * w, where base - (low1 * n2 + low2) * w is compile-
time invariant

A[i1, i2, i3] has address: base + ((i1 - low1) * n2 * n3 + (i2 –
low2) * n3 + (i3 - low3)) * w = base + (((i1 - low1) * n2 + (i2
- low2)) * n3 + (i3 - low3)) * w = ((i1* n2 + i2) * n3 + i3) * w
+ base - ((low1* n2 + low2) * n3 + low3) * w, where base –
((low1* n2 + low2) * n3 + low3) * w is compile-time invariant.
In general, A[i1, i2, ... ,ik] has address: ((..(((i1* n2 + i2) * n3 + i3)
*n4 + ... ) * nk + ik) * w + base - ((..((low1* n2 + low2) * n3 +
low3)... ) * nk + lowk) * w, where base - ((..((low1* n2 + low2) *
n3 + low3) ... ) * nk + lowk) * w is compile-time invariant.

Therefore, we can compute as follows:

    e1 = i1
    e2 = e1* n2 + i2
    e3 = e2* n3 + i3
         .
   em = em-1* nm + im
         .
    ek = ek-1* nk + ik
     The address of A[i1, i2, ... ,ik] is: ek * w + compile-
     time invariant.




98
Translation Scheme for Addressing Array Elements

Assume: (1) for each id there exists id.place which holds
             its name,
        (2) there is a function ‘limit( )’ where
             limit(array_name, m) = nm i.e., the # of
             elements of array „array_name‟ at dimension
             m-th,
        (3) we can find the width of an array element
             from the name of array (i.e. from symbol
             table)

// Please read Section 8.3.2 (Array References) of the
   textbook.
7   (1) S -> L = E
        { if L.offset = null then
              GEN (L.place = E.place)
           else
              GEN (L.place[L.offset] = E.place); //
         }

6   (2) E -> E1 + E2
        { E.place = newtemp(); //generate a temporary variable and
                                 save its symbol table index
           GEN (E.place = E1.place+ E2.place);
         }
6   (3) E -> (E (1)) { E.place = E (1).place }

5   (4) E -> L     { if L.offset = null then
                         E.place = L.place
                      else
                         E.place = newtemp();
                        GEN (E.place = L.place[L.offset]);}
3   (5) L -> Elist ]
        { L.place = Elist.array_name;
          L.offset = newtemp();
          GEN (L.offset = w * Elist.place); }
         /* w is known from declaration of array */
4 (6) L -> id { L.place = id.place; L.offset = null }

                       (1),
2 (7) Elist -> Elist   E
         { T = newtemp(); m = Elist (1).ndimen + 1;
           GEN ( T = Elist (1).place * limit(Elist (1). array_name, m));
           GEN ( T = T + E.place );
           Elist.array_name = Elist (1).array_name;
           Elist.place = tj;       // tj 即為 T
           Elist.ndimen = mj; } // mj 即為 m

   /* note em = em-1* nm + im , where Elist.place = em, Elist (1).place = em-1,
       limit(Elist (1).array, m) = nm, and E.place = im */

1 (8) Elist -> id [ E { Elist.place = E.place; Elist.ndimen = 1;
                              Elist.array_name := id.place; }

    // 注意: compile-time invariant 等資訊已先存於id.place指向之符號表欄位
       中
     6. Procedure calls
  1. call -> id (args)
  2. args -> args , E
  3. args -> E

3 1. call -> id (args)
      { for each item p on QUEUE do
          GEN (param p);
        GEN (call id.place, length of QUEUE); }

  /* QUEUE is a data structure for saving the indexes of the symbol
      table containing the names of the arguments. The length of
      QUEUE is the number of elements in QUEUE */
2   2. args -> args , E
        { append E.place to the end of QUEUE; }

1   3. args -> E
        { initialize QUEUE to contain only
           E.place; }


    /* Originally, QUEUE is empty and, after the reduction
       of E to args, QUEUE contains a single pointer to the
       symbol table location for the name that denotes the
       value of E. */
    7. Structure Declarations (Read Sec. 8.3.3)

type -> struct { fieldlist} /*Note: symbols with bold face are
                               terminals */
      | ptr
      | char                                      struct { int x;    //offset 0
                                                           float y; //offset 2
      | int
                                                           char k[10];//offset 6
      | float                                             } m;
      | double
                                                                m.width = 16 bytes
fieldlist -> fieldlist field;
            | field;                    int x
field     -> type id
            | field [integer /*a token denoting any string of digits*/]


                       int x [10]   or   int x [10] [20] [30]
                                            field
2   field -> type id
             { field.width = type.width;
                field.name = id.name;
                W_enter(id.name, type.width);}

3        | field(1) [integer]
          { field.width = field(1).width * integer.val;
            field.name = field(1).name;
             D_enter(field(1).name, integer.val);}
3 fieldlist -> field; {O_enter (field.name, 0); fieldlist.width = field.width;}
              | fieldlist(1) field; { fieldlist.width = fieldlist(1).width
4
                                        + field.width;
                                        O_enter (field.name, fieldlist(1).width);}
5 type -> struct '{' fieldlist '} ' { type.width = fieldlist.width; }


1   type -> char { type.width = 1; } /* Assume characters take one byte.*/

1   type -> ptr {type.width = 4; }    /*Assume pointers take four bytes.*/

1   type -> int { type.width = 2; }   /* Assume integers take two bytes.*/

    .......
Definitions of functions used

D_enter(name,size) increases the number of dimensions for
„name‟ by one and enters the last dimension as „size‟ in the
symbol table entry for „name‟.

W_enter(name,width) enters „width‟as the width of each
element of „name‟. If „name‟ is not an array, then its width is
the number of locations taken by data of name‟s type.

O_enter(name,offset) makes „offset‟ the number for which
field name „name‟ stands. This information, also, is recorded
in the symbol table entry for „name‟.
     8. Switch Statement


     Syntax:
               switch E
                   {
                     case V1: S1;
                     case V2: S2;
                     .............
                     .............
                     case Vn-1: Sn-1;
                     default: Sn;
10                 }
9
When translated into three-address code:
100   Code to evaluate E into T   Temporary variable
101   If T  V1 goto 104
102   Code for S1
103   Goto 113
104    If T  V2 goto 107
105   Code for S2
106   Goto 113
107    ...
108    ...
109    If T  Vn-1 goto 112
110    Code for Sn-1
111    Goto 113
112    code for Sn
113    ….
     Based on the given translation example, you
     can infer how to generate the three-address
     codes for switch statement easily !!!




11
1

				
DOCUMENT INFO