Advanced CPT _Java_ by ewghwehws

VIEWS: 5 PAGES: 52

									Compiler Structures
        241-437, Semester 1, 2011-2012



                           10. Intermediate
                           Code Generation
    •    Objective
          – describe intermediate code generation
          – explain a stack-based intermediate code
            for the expression language

241-437 Compilers: IC/10                              1
Overview

  1.          Intermediate Code (IC) Generation
  2.          IC Examples
  3.          Expression Translation in SPIM
  4.          The Expressions Language




241-437 Compilers: IC/10                          2
                              Source Program

                              Lexical Analyzer

                              Syntax Analyzer      Front
                                                   End
                            Semantic Analyzer
  In this                   Int. Code Generator
  lecture                    Intermediate Code

                              Code Optimizer       Back
                           Target Code Generator   End

                             Target Lang. Prog.
241-437 Compilers: IC/10                                   3
1. Intermediate Code (IC) Generation

•   Helps with retargeting
     – e.g. can easily attach a back end for a new machine to
       an existing front end

                                                         Target
                           Intermediate
         Front end             code       Back end      machine
                                                          code


•   Enables machine-independent code optimization.

241-437 Compilers: IC/10                                          4
Graphical IC Representations

•   Abstract Syntax Trees (AST)
     – retains basic parse tree structure, but with
       unneeded nodes removed
•   Directed Acyclic Graphs (DAG)
     – compacted AST to avoid duplication
     – smaller memory needs
•   Control Flow Graphs (CFG)
     – used to model control flow

241-437 Compilers: IC/10                              5
Linear (text-based) ICs
•   Stack-based (postfix)
     – e.g. the JVM

•   Three-address code
         x := y op z

•   Two-address code:
         x := op y
         (the same as x := x op y)

241-437 Compilers: IC/10             6
2. IC Examples

•   ASTs and DAGs
•   Stack-based (postfix)
•   Three-address Code
•   SPIM




241-437 Compilers: IC/10    7
  2.1. ASTs and DAGs
                                 a := b *-c + b * -c
               assign                                  assign
           a           +                          a             +

AST            *             *                                  *       DAG
         b         -         b      -                     b         -

                   c                c                               c

                             Pros: easy restructuring of code
                                   and/or expressions for
                                   intermediate code optimization
                             Cons: memory intensive
  241-437 Compilers: IC/10                                                    8
     2.2. Stack-based (postfix)
                                a := b * -c + b * -c


b c uminus * b c uminus * + a assign            (e.g. JVM stack instrs)
                                                iload 2        //   push b
     Postfix notation represents                iload 3        //   push c
                                                ineg           //   uminus
     operations on a stack                      imul           //   *
                                                iload 2        //   push b
                                                iload 3        //   push c
                                                ineg           //   uminus
Pro: easy to generate                           imul           //   *
Cons: stack operations are more                 iadd           //   +
      difficult to optimize                     istore 1       //   store a

     241-437 Compilers: IC/10                                                 9
2.3. Three-Address Code
                           a := b * -c + b * -c


        t1    :=    - c                      t1   :=   - c
        t2    :=    b * t1                   t2   :=   b * t1
        t3    :=    - c                      t5   :=   t2 + t2
        t4    :=    b * t3                   a    :=   t5
        t5    :=    t2 + t4
        a     :=    t5
          Translated                           Translated
        from the AST                         from the DAG

241-437 Compilers: IC/10                                         10
    2.4. SPIM
•   Three address code for a simulator that runs
    MIPS32 assembly language programs
    – http://www.cs.wisc.edu/~larus/spim.html



•   Loading/Storing
    – lw register,var - loads value into register
    – sw register,var - stores value from register
    – many, many others



    241-437 Compilers: IC/10
                                                continued   11
•   8 registers: $t0       - $t7


•   Binary math ops (reg1 = reg2 op reg3):
     –   add     reg1,reg2,reg3
     –   sub     reg1,reg2,reg3
     –   mul     reg1,reg2,reg3
     –   div     reg1,reg2,reg3


•   Unary minus (reg1 = - reg2)
     – neg reg1, reg2



241-437 Compilers: IC/10                     12
      "a := b * -c + b * -c" in SPIM
                                          lw $t0,c
                assign
                                          neg $t1,$t0
            a            + t1             lw $t0,b
AST             * t2       *t1            mul $t2, $t1,$t0
                                          lw $t0,c
       t0
          b          -     b          -   neg $t1,$t0
                t1       t0      t1
                                          lw $t0,b
                t0
                     c                c   mul $t1, $t1,$t0
                                 t0
                                          add $t1,$t2,$t1
                                          sw $t1,a

      241-437 Compilers: IC/10                               13
  a := b * -c + b * -c

             assign                   lw $t0,c
         a          t2   +            neg $t1,$t0
DAG                                   lw $t0,b
                    t1   *            mul $t1, $t1,$t0
                    b             -   add $t2,$t1,$t1
               t0            t1
                                      sw $t2,a
                             t0
                                  c



  241-437 Compilers: IC/10                               14
 3. Expression Translation in SPIM
                                                        Generate:
Grammar:                              S                   lw $t1,b
 S => id := E                             E
 E => E + E
 E => id
                                          E       E
                                  E           E
                                                      As we parse, use
                             E1       E
                                                      attributes to pass
                   a := b + c + d + e                 information about the
                                                      temporary variables
                parse tree --> code using             up the tree.
                bottom-up evaluation
  241-437 Compilers: IC/10                                                    15
                                                     Generate:
                                    S                  lw $t1,b
                                        E              lw $t2,c


                                        E        E
                                E           E
                           E1       E2
                       a := b + c + d + e

                                            Each number
                                            corresponds to a
                                            temporary variable.
241-437 Compilers: IC/10                                          16
                                                     Generate:
                                     S                 lw $t1,b
                                         E             lw $t2,c
                                                       add $t3,$t1,$t2
                                         E       E
                                E3           E
                           E1        E2
                       a := b + c + d + e

                            Each number
                            corresponds to a
                            temporary variable.
241-437 Compilers: IC/10                                            17
                                                      Generate:
                                     S                  lw $t1,b
                                         E              lw $t2,c
                                                        add $t3,$t1,$t2
                                                        lw $t4,d
                                         E        E
                                E3           4E
                           E1        E2
                       a := b + c + d + e




241-437 Compilers: IC/10                                             18
                                                      Generate:
                                     S                  lw t1,b
                                         E              lw t2,c
                                                        add $t3,$t1,$t2
                                  5                     lw t4,d
                                         E        E     add $t5,$t3,$t4
                                E3           4E
                           E1        E2
                       a := b + c + d + e




241-437 Compilers: IC/10                                             19
                                                     Generate:
                                     S                 lw $t1,b
                                         E             lw $t2,c
                                                       add $t3,$t1,$t2
                                  5                 6 lw $t4,d
                                         E        E    add $t5,$t3,$t4
                                                       lw $t6,e
                                E3           4E
                           E1        E2
                       a := b + c + d + e




241-437 Compilers: IC/10                                            20
                                                     Generate:
                                     S
                                             7         lw $t1,b
                                         E             lw $t2,c
                                                       add $t3,$t1,$t2
                                  5                 6 lw $t4,d
                                         E        E    add $t5,$t3,$t4
                                                       lw $t6,e
                                E3           4E        add $t7,$t5,$t6
                           E1        E2
                       a := b + c + d + e




241-437 Compilers: IC/10                                            21
                                                     Generate:
                                     S
                                             7         lw $t1,b
                                         E             lw $t2,c
                                                       add $t3,$t1,$t2
                                  5                 6 lw $t4,d
                                         E        E    add $t5,$t3,$t4
                                                       lw $t6,e
                                E3           4E        add $t7,$t5,$t6
                           E1        E2                sw $t7,a
                       a := b + c + d + e

    Pro: easy to rearrange code for global optimization
    Cons: lots of temporaries
241-437 Compilers: IC/10                                            22
Issues when Processing Expressions

•   Type checking/conversion.

•   Address calculation for more complex types
    (arrays, records, etc.).

•   Expressions in control structures, such as
    loops and if tests.

241-437 Compilers: IC/10                         23
    4. The Expressions Language
•     exprParse3.c builds a parse tree for the input
      file (reuses code from exprParse2.c).

•     An intermediate code is generated from the
      parse tree, and saved to an output file.

•     The input file is not executed by exprParse3.c
       – that is done by a separate emulator.

    241-437 Compilers: IC/10                           24
                                                        test1.txt
Usage                                         let x = 2
                                              let y = 3 + x

> gcc -Wall -o exprParse3 exprParse3.c
> ./exprParse3 < test1.txt
> cat codeGen.txt
PUSH 2                          stores intermediate
STORE x                         code in codeGen.txt
WRITE
PUSH 3
LOAD x
ADD
STORE y
WRITE
STOP
                           test1.txt   exprParse3    codeGen.txt

241-437 Compilers: IC/10                                            25
Emulator Usage
> ./emulator codeGen.txt
Reading code from codeGen.txt
== 2
== 5
Stop


     codeGen.txt           emulator
                                      it runs the
                                      intermediate code



241-437 Compilers: IC/10                                  26
4.1. The Instruction Set

•   The instructions in codeGen.txt are
    executed by a emulator.
     – it emulates (simulates) real hardware


•   The instructions refer to two data structures
    used in the emulator.



241-437 Compilers: IC/10                            27
The Emulator's Data Structures

•   The emulator's data structures:
     – a symbol table of IDs and their integer values
     – a stack of integers for evaluating the
       expressions



                           x                      stack
                           4
                               symbol table   2
241-437 Compilers: IC/10                                  28
    The Instructions
•    WRITE              // pop top element off stack and print
•    STOP               // exit code emulation

•    LOAD ID                   // get ID value from symbol table,
                                  and push onto stack

•    STORE ID                  // copy stack top into symbol
                                  table for ID


    241-437 Compilers: IC/10
                                                          continued   29
•   PUSH integer           // push integer onto stack

•   STORE0 ID              // push 0 onto stack, and save to
                              table as value for ID
                              ( same as push 0; store ID)

•   MULT   // pop two stack values, multiply them,
              push result back
•   ADD, MINUS, DIV // same for those ops
241-437 Compilers: IC/10                                       30
Intermediate Code Type

•   Since the intermediate code uses a stack to
    store values rather than registers, then it is a
    stack-based (postfix) representation.




241-437 Compilers: IC/10                               31
4.2. exprParse3.c Coding

•   All the parsing code in exprParse3.c is the
    same as exprParse2.c.

•   The difference is that the parse tree is
    passed to a generateCode() function to
    convert it to intermediate code
     – see main()


241-437 Compilers: IC/10                          32
main()
#define CODE_FNM "codeGen.txt"
          // where to store generated code

int main(void)
/* parse, print the tree, then generate code
    which is stored in CODE_FNM */
{ Tree *t;
   nextToken();
   t = statements();
   match(SCANEOF);

    printTree(t, 0);
    generateCode(CODE_FNM, t);
    return 0;
}
241-437 Compilers: IC/10                       33
Generating the Code
void generateCode(char *fnm, Tree *t)
/* Open the intermediate code file, fnm, and
     write to it. */
{ FILE *fp;
   if ((fp = fopen(fnm, "w")) == NULL) {
      printf("Could not write to %s\n", fnm);
      exit(1);
   }
   else {
      printf("Writing code to %s\n", fnm);
      cgTree(fp, t);
      fprintf(fp, "STOP\n");
                     // last instruction in file
      fclose(fp);
   }
} // end of generateCode()
241-437 Compilers: IC/10                           34
void cgTree(FILE *fp, Tree *t)
/* Recurse over the parse tree looking for
   non-NEWLINE subtrees to convert into code
   Each block of code generated for a non-NEWLINE
   subtree ends with a WRITE instruction, to print
   out the value of the line. */
{ if (t == NULL)
    return;
  Token tok = TreeOper(t);
  if (tok == NEWLINE) {
    cgTree(fp, TreeLeft(t));
    cgTree(fp, TreeRight(t));
  }
  else {
    codeGen(fp, t);
    fprintf(fp, "WRITE\n"); // print value at EOL
  }
} // end of cgTree()
241-437 Compilers: IC/10                             35
void codeGen(FILE *fp, Tree *t)
/* Convert the tree nodes for ID, INT, ASSIGNOP,
    PLUSOP, MINUSOP, MULTOP, DIVOP into instructions.

       The load/store instructions:
          LOAD ID, STORE ID, STORE0 ID, PUSH integer
       The math instructions:
          MULT, ADD, MINUS, DIV
*/
{
     if (t == NULL)
       return;
          :


     241-437 Compilers: IC/10
                                               continued   36
   Token tok = TreeOper(t);

   if (tok == ID)
     codeGenID(fp, TreeID(t));
   else if (tok == INT)
     fprintf(fp, "PUSH %d\n", TreeValue(t));
   else if (tok == ASSIGNOP) {     // id = expr
     char *id = TreeID(TreeLeft(t));
     getIDEntry(id);    // don't use Symbol info
     codeGen(fp, TreeRight(t));
     fprintf(fp, "STORE %s\n", id);
   }
        :


241-437 Compilers: IC/10
                                              continued   37
    else if (tok == PLUSOP) {
      codeGen(fp, TreeLeft(t));
      codeGen(fp, TreeRight(t));
      fprintf(fp, "ADD\n");
    }
    else if (tok == MINUSOP) {
      codeGen(fp, TreeLeft(t));
      codeGen(fp, TreeRight(t));
      fprintf(fp, "MINUS\n");
    }
             :



241-437 Compilers: IC/10
                                   continued   38
 else if (tok == MULTOP) {
      codeGen(fp, TreeLeft(t));
      codeGen(fp, TreeRight(t));
      fprintf(fp, "MULT\n");
  }
  else if (tok == DIVOP) {
      codeGen(fp, TreeLeft(t));
      codeGen(fp, TreeRight(t));
      fprintf(fp, "DIV\n");
  }
} // end of codeGen()



241-437 Compilers: IC/10           39
void codeGenID(FILE *fp, char *id)
/* An ID may already be in the symbol table, or be new,
   which is converted into a LOAD or a STORE0 code
   operation. */
{
  SymbolInfo *si = NULL;

  if ((si = lookupID(id)) != NULL) // already declared
    fprintf(fp, "LOAD %s\n", id);
  else {   // new, so add to table
    addID(id, 0);    // 0 is default value
    fprintf(fp, "STORE0 %s\n", id);
  }
} // end of codeGenID()

   241-437 Compilers: IC/10                               40
                                                     let x = 2
  From Tree to Code                                  let y = 3 + x



                 \n
                                                          PUSH 2
       \n                        =                        STORE x
                                                          WRITE
                             y                            PUSH 3
NULL         =                           +
                                                          LOAD x
         x       2                   3           x        ADD
                                                          STORE y
                                                          WRITE
                                                          STOP
        symbol table             x           y
     in exprParse3.c             0           0
  241-437 Compilers: IC/10                                           41
4.3. The Emulator
> gcc –Wall –o emulator emulator.c

> ./emulator codeGen.txt
Reading code from codeGen.txt
== 2
== 5
Stop




241-437 Compilers: IC/10             42
 Emulator Data Structures
#define MAX_SYMS 15 // max no of vars
#define STACK_SIZE 10

// stack data structure
int stack[STACK_SIZE];                          2
int stackTop = -1;

// symbol table data structures
typedef struct SymInfo {                x
    char *id;
    int value;                          4
} SymbolInfo;

int symNum = 0;   // number of symbols stored
SymbolInfo syms[MAX_SYMS];
 241-437 Compilers: IC/10                           43
 Evaluating Input Lines
void eval(FILE *fp)
/* Read in the code file a line at a time and
     process the lines.
    An instruction on a line may be a single
     command (e.g. WRITE) or a instruction name
     and an argument (e.g. LOAD x). */
{
   char buf[BUFSIZ];
   char cmd[MAX_LEN], arg[MAX_LEN];
   int no;
         :




 241-437 Compilers: IC/10
                                              continued   44
 while (fgets(buf, sizeof(buf), fp) != NULL) {
    no = sscanf(buf, "%s %s\n", cmd, arg);
    if ((no < 1) || (no > 2))
       printf("Unknown format: %s\n", buf);
    else
       processCmd(cmd, arg);
          // process commands as they are read in
 }
} // end of eval()




241-437 Compilers: IC/10                            45
 Processing an Instruction
void processCmd(char *cmd, char *arg)
{ SymbolInfo *si;
  if (strcmp(cmd, "LOAD") == 0) {
    if ((si = lookupID(arg)) == NULL) {
      printf("Error: load cannot find %s\n", arg);
      exit(1);
    }
    push(si->value);
  }
  else if (strcmp(cmd, "STORE") == 0)
    addID(arg, topOf());
  else if (strcmp(cmd, "STORE0") == 0) {
    push(0);
    addID(arg, 0);
  }
 241-437 Compilers: IC/10
                                             continued   46
   else if (strcmp(cmd,    "PUSH") == 0)
     push( atoi(arg) );
   else if (strcmp(cmd,    "MULT") == 0) {
     int v2 = pop();
     int v1 = pop();
     push( v1*v2 );
   }
   else if (strcmp(cmd,    "ADD") == 0) {
     int v2 = pop();
     int v1 = pop();
     push( v1+v2 );
   }
   else if (strcmp(cmd,    "MINUS") == 0) {
     int v2 = pop();
     int v1 = pop();
     push( v1-v2 );
   }
241-437 Compilers: IC/10
                                              continued   47
   else if (strcmp(cmd, "DIV") == 0) {
     int v2 = pop();
     if (v2 == 0) {
       printf("Error: div by 0; using 1\n");
       v2 = 1;
     }
     int v1 = pop();
     push( v1/v2 );
   }
   else if (strcmp(cmd, "WRITE") == 0)
     printf("== %d\n", pop());
   else if (strcmp(cmd, "STOP") == 0) {
     printf("Stop\n");
     exit(1);
   }

241-437 Compilers: IC/10
                                               continued   48
 else
   printf("Unknown instruction: %s\n", cmd);
} // end of processCmd()




241-437 Compilers: IC/10                       49
Evaluating the Code for test1.txt

          test1.txt        codeGen.txt
       let x = 2           PUSH 2
       let y = 3 + x       STORE x
                           WRITE
                           PUSH 3
                           LOAD x
                           ADD
                           STORE y
                           WRITE
                           STOP

241-437 Compilers: IC/10
                                         continued   50
                               stack       symbol table


•   PUSH 2                       2

                                       x   x
•   STORE X
                                 2         2

                                           x
•   WRITE                  2
                                           2

•   PUSH 3                                 x
                                 3         2

241-437 Compilers: IC/10
                                               continued   51
                                stack       symbol table

                                            x
                                  2
•   LOAD X                        3         2

                           2+               x
•   ADD                    3      5         2

                                            x   y
•   STORE Y                       5
                                        y
                                            2   5

                                            x   y
•   WRITE                  5
                                            2   5
•   STOP
241-437 Compilers: IC/10                                   52

								
To top