Three address code by liaoqinmei


									         Language Processors (E2.15)
                                                                              Intermediate Code Generation (1)
               Lecture 13: Code Generation I
               Intermediate Code Generation                                          We could translate the source program directly into
                                                                                     the target language
                                                                                     However, there are benefits to having an
Objectives                                                                           intermediate, machine-independent representation
                                                                                         A clear distinction between the machine-independent and
     To introduce the concepts involved in                                               machine-dependent parts of the compiler
     intermediate code generation                                                        Retargeting is facilitated; the implementation of language
         Types of intermediate code                                                      processors for new machines will require replacing only
                                                                                         the back-end
         Three address code                                                              We could apply machine independent code optimisation
         Implementation examples                                                         techniques.

                         E2.15 - Language Processors                                                    E2.15 - Language Processors
                                   (Lect 13)                              1                                       (Lect 13)                       2

Intermediate Code Generation (2)                                              Three-address code (1)
Which?                                                                        Introduction

     There are several options for intermediate code                                 Three address code contains statements of the
         Specific to the language being implemented                                  form   x := y op z [three addresses]
             P-code for Pascal
                                                                                         x, y, z are names, constants, or compiler-generated
             Bytecode for Java
                                                                                         temporary variables
         Language independent:
                                                                                         op stands for any operator
             3-address code (we will examine this in more detail, following
             Aho et al)                                                                      Arithmetic operator
                                                                                             Logical operator
     In all cases, the intermediate code is a linearisation
                                                                                             Only one operator is permitted
     of the syntax tree produced during syntax and
     semantic analysis                                                               Code can contain symbolic labels, statements for
         Formed by breaking down the tree structure into                             flow of control
         sequential instructions, each of which is equivalent to a
         single, or small number of machine instructions.
         Machine code can then be generated (access might be
         required to symbol tables etc)
                         E2.15 - Language Processors                                                    E2.15 - Language Processors
                                   (Lect 13)                              3                                       (Lect 13)                       4
Three-address code (2)                                                          Three-address code (2)
Types of statements                                                             Types of statements (2)

      Assignment statements of the form x := y op z                                   Indexed assignments of the form x:=y[i] and
      Assignment statements of the form x := op z where                               x[i]:=y
      op is a unary operation (e.g. unary minus, logical
      negation, shift and convert operators)
                                                                                      Address and pointer assignments x := &y,
      Copy statements of the form x:=y
                                                                                      x := *y and *x =y;
      Unconditional jumps of the form goto L                                          …
      Conditional jumps of the form if x relop y goto L                               Choice of allowable operators a design
      Param x and call p,n for procedure calls and                                    decision:
      return y                                                                            Small operator set -> easier to port to a new
          Procedure p(x1,x2, …, xn) will be translated into                               machine
              param x1                                                                    but will force the compiler front end to generate
              param x2
                                                                                          long sequences of statements
              call p, n
                              E2.15 - Language Processors                                                  E2.15 - Language Processors
                                        (Lect 13)                           5                                        (Lect 13)                  6

Three-address code (3)                                                          Three-address code (3)
Implementation of three address statements                                      Implementation of three address statements (2)

      A three-address statement is an abstract form of                                Alternatively, in order to avoid using temporaries in
      intermediate code                                                               the symbol table, we can refer to a temporary value
      In a compiler these statements can be                                           by the position of the statement that computes it
      implemented as records with fields for the operator
      and the operands
      E.g. for the statement a:=b*-c + b*-c                                                                     Op         Arg1          Arg2
                                                                                                    (0)       uminus         c
                                       Op         Arg1      Arg2   Result
                                                                                                    (1)          *           b           (0)
                        (0)         uminus          c                t1            Triples
                                                                                                    (2)       uminus         c
                        (1)             *           b        t1      t2
                                                                                                    (3)          *           b           (2)
 Quadruples             (2)         uminus          c                t3
                                                                                                    (4)          +          (1)          (3)
                        (3)             *           b        t3      t4
                                                                                                    (5)          :=          a           (4)
                        (4)             +           t2       t4      t5
                        (5)            :=           t5               a
                              E2.15 - Language Processors                                                  E2.15 - Language Processors
                                        (Lect 13)                           7                                        (Lect 13)                  8
Three-address code (4)                                                                     Three-address code (5)
Example – Assignments                                                                      Example (2) – Reusing temporaries (1)

Example translation scheme to produce three address code for                                     newtemp function returns a new temporary name every time it
    assignments [Aho et al, p479]                                                                is called
                                                                                                 Temporaries used to hold intermediate values in expression
S -> id := E           { ptr := lookup(;        % is there an                            calculations tend to clutter up the symbol table [space is also
                                    occurrence for this entry in the symbol table?}
                                                                                                 needed for them]
                            if ptr <> nil then emit(ptr ‘:=‘
                           else error}                                                           newtemp can be adapted to reuse temporaries
E -> E1 + E2           { := newtemp;                                                     We can determine the lifetime of a temporary from the rules of
                         emit( ‘:=‘ ‘*’ }                              the grammar, e.g. :
E -> E1 * E2           ( similar to above)                                                           E -> E1 + E2
E -> - E1              { := newtemp;                                                   will be translated into
                       emit( ‘:=‘ ‘uminus’}                                         evaluate E1 into t1
E -> ( E1 )            { = }                                                        evaluate E2 into t2
E -> id                { ptr := lookup (;                                                    t := t1 + t2
                        if ptr <> nil then = ptr;
                                                                                               t1 and t2 are not used after the assignment
                       else error}
                               E2.15 - Language Processors                                                            E2.15 - Language Processors
                                         (Lect 13)                                    9                                         (Lect 13)                     10

Three-address code (5)                                                                     Summary
Example (2) – Reusing temporaries (2)
                                                                                             Intermediate code generation is concerned with the production of a
      Note that frequently temporaries are used only once – can be reused                    simple machine independent representation of the source program.
      A simple algorithm:                                                                    We saw three-address code as an example of such intermediate code
              Say we have a counter c, initialised to zero                                   and how structures can be translated into it.
              Whenever a temporary name is used, decrement c by 1
              Whenever a new temporary name is created, use $c and increment c by
      E.g.:                                                                                Next lecture:
              x := a*b + c*d – e*f
                                                                                           Code Generation II
              $0 := a*b            ; c incremented by 1
              $1 := c*d            ; c incremented by 1
              $0 := $0 + $1        ; c decremented twice, incremented once                 Recommended Reading:
              $1 := e * f          ; c incremented by 1                                      Chapter 8 of Aho et al [Highly recommended]
              $0 := $0 -$1         ; c decremented twice, incremented once
                                                                                             Chapter 4 of Grune et al
              x := $0              ; c decremented once
                                                                                             Chapter 8 of Hunter’s “The essence of compilers”
      What if temporary is used more than once?

                               E2.15 - Language Processors                                                            E2.15 - Language Processors
                                         (Lect 13)                                    11                                        (Lect 13)                     12

To top