Multi-Level Intermediate Representations by sdfgsg234

VIEWS: 18 PAGES: 11

									      TDDC86 Compiler Optimizations and Code Generation
                                                                                                                           Compiler Flow                                                         source code
                                                                                                                                         source code                                                    Text stream
                                                                                                                                                Text stream                                    Lexical Analyzer
                                                                                                                                       Lexical Analyzer                                                      Token stream
                                                                                                                                                      Token stream                                    Parser
                                                                                                                                              Parser                                                         Parse tree
                       Multi-Level Intermediate                                                                                                       Parse tree                            Semantic Analyzer

                          Representations                                                                                           Semantic Analyzer
                                                                                                                                                 Parse tree
                                                                                                                                                                                                        Parse tree
                                                                                                                                                                                                IR Generator
                                                                                                                                           Translator                                                  Medium-level IR
                                                                                                                                                      Low-level IR                                Optimizer
                        Local CSE, DAGs, Lowering                                                                                           Optimizer                                                  Medium-level IR
                                                                                                                                                                                              Code Generator
                              Call Sequences                                                                                                         Low-level IR
                                                                                                                                                                                                            Low-level IR
                                                                                                                                    Assembler Emitter
                                                                                                                                                    Text stream                             Postpass Optimizer
                                                                                                                                                                                                           Text stream
       Survey of some Compiler Frameworks                                                                                                 asm code
                                                                                                                                                                                                  asm code
                                                                                  Christoph Kessler, IDA,
                                                                                  Linköpings universitet, 2009.          (a) Optimizations on low-level IR only
                                                                                                                         C. Kessler, IDA, Linköpings universitet.            2          TDDC86 model
                                                                                                                                                                                   (b) Mixed Com piler Optimizations a nd Code Ge neration




  Compiler Flow                                                         source code                                        Multi-Level IR
                source code                                                    Text stream
                       Text stream                                    Lexical Analyzer                                            Multi-level IR, e.g.
              Lexical Analyzer                                                      Token stream                                     AST abstract syntax tree – implicit control and data flow
                             Token stream                                    Parser                                                  HIR high-level IR
                     Parser                                                         Parse tree                                       MIR medium-level IR
                             Parse tree                            Semantic Analyzer                                                 LIR low-level IR, symbolic registers
           Semantic Analyzer                                                   Parse tree                                            VLIR very low-level IR, target specific, target registers
                        Parse tree                                     IR Generator
                                                                                                                                  Standard form and possibly also SSA (static single assignment) form
                  Translator                                                  Medium-level IR
                                                                                                                                  Open form (tree, graph) and/or closed (linearized, flattened) form
                        Low-level IR                                     Optimizer
                                                                                                                                     For expressions: Trees vs DAGs (directed acyclic graphs)
                   Optimizer                                                  Medium-level IR
                                                                     Code Generator                                               Translation by lowering
                            Low-level IR
                                                                                   Low-level IR                                    ☺ Analysis / Optimization engines can work on
           Assembler Emitter                                                                                                             the most appropriate level of abstraction
                           Text stream                             Postpass Optimizer
                                                                                                                                   ☺ Clean separation of compiler phases,
                 asm code                                                         Text stream
                                                                                                                                         somewhat easier to extend and debug
                                                                         asm code
                                                                                                                                     Framework gets larger and slower
(a) Optimizations on low-level IR only
C. Kessler, IDA, Linköpings universitet.          3            TDDC86 model
                                                          (b) Mixed Com piler Optimizations a nd Code Ge neration        C. Kessler, IDA, Linköpings universitet.            4             TDDC86 Com piler Optimizations a nd Code Ge neration




  Example: WHIRL
                                                                                                                               TDDC86 Compiler Optimizations and Code Generation
  (Open64 Compiler) C, C++                                F95
                                                                      front-ends
                                                                      (GCC)
     VHO
     standalone inliner
                                            Very High WHIRL
                                                 (AST)                Lower aggregates
                                                                      Un-nest calls …
    IPA (interprocedural analysis)
    PREOPT                                   High WHIRL               Lower     arrays
    LNO (Loop nest optimizer)                                         Lower     complex numbers
                                                                      Lower
                                                                      Lower
                                                                                HL control flow
                                                                                bit-fields …                                                        Multi-Level IR Overview
    WOPT (global optimizer,                   Mid WHIRL
     uses internally an SSA IR)                                      Lower intrinsic ops to calls
    RVI1 (register variable                                          All data mapped to segments                                                                     AST
          identification)                                            Lower loads/stores to final form
                                                                     Expose code sequences for                                                                       HIR              SSA-HIR
    RVI2                                                                 constants, addresses
                                              Low WHIRL
                                                                     Expose #(gp) addr. for globals                                                                                   SSA-MIR
                                                                     …
                                                                                                                                                                     MIR
    CG
                                            Very Low WHIRL                                                                                                           LIR              SSA-LIR
                                                                     code generation, including
    CG                                                               scheduling, profiling support,                                                                  VLIR (target code)
                                                CGIR                 predication, SW speculation                                                                                                           Christoph Kessler, IDA,
                                                                                                                                                                                                           Linköpings universitet, 2009.
C. Kessler, IDA, Linköpings universitet.          5               TDDC86 Com piler Optimizations a nd Code Ge neration




                                                                                                                                                                                                                                                  1
  AST, Symbol table                                                                                                                AST Example: Open64 VH-WHIRL




                                                               globals (Level 0)


  Hierarchical symbol table                               1
                                                          2                                       locals, level 1
  follows nesting of scopes
                                                          3
                                                                                             1

C. Kessler, IDA, Linköpings universitet.                 7                TDDC86 Com piler Optimizations a nd Code Ge neration   C. Kessler, IDA, Linköpings universitet.           8            TDDC86 Com piler Optimizations a nd Code Ge neration




                                                                                                                                       TDDC86 Compiler Optimizations and Code Generation
  Symbol table
         Some typical fields in a symbol table entry
     Field Name                    Field Type                 Meaning
     name                          char *                     the symbol’s identifier
     sclass                        enum { STATIC, ...}        storage class
     size                          int                        size in bytes
     type                          struct type *              source language data type
                                                                                                                                                            Multi-Level IR Overview
     basetype                      struct type *              source-lang. type of elements of a                                                                            AST
                                                              constructed type
     machtype                      enum { ... }               machine type corresponding to                                                                                 HIR             SSA-HIR
                                                              source type (or element type if
                                                              constructed type)                                                                                             MIR             SSA-MIR
     basereg                       char *                     base register to compute address
                                                                                                                                                                            LIR             SSA-LIR
     disp                          int                        displacement to address on stack
     reg                           char *                     name of register containing the                                                                               VLIR (target code)                   Christoph Kessler, IDA,
                                                                                                                                                                                                                 Linköpings universitet, 2009.
C. Kessler, IDA, Linköpings universitet.                 9    symbol’s value piler Optimizations a nd Code Ge neration
                                                                     TDDC86 Com




                                                                                                                                       TDDC86 Compiler Optimizations and Code Generation
  HIR - high-level intermediate representation
         A (linearized) control flow graph,
         but level of abstraction close to AST                                  for v = v1 by v2 to v3 do
                                                                                   a[i] = 2
                loop structures and bounds explicit                             endfor
                array subscripts explicit                                                                                                          Flattening 0:
                 suitable for data dependence analysis                                                                                  From AST to HIR (or other CFG repr.)
                   and loop transformation / parallelization
                                                                                                                                                                            AST

                artificial entry node for the procedure                                                                                                                     HIR             SSA-HIR
                assignments var = expr                                                                                                                                      MIR             SSA-MIR
                unassigned expressions, e.g. conditionals
                                                                                                                                                                            LIR             SSA-LIR
                function calls
                                                                                                                                                                            VLIR (target code)                   Christoph Kessler, IDA,
                                                                                                                                                                                                                 Linköpings universitet, 2009.
C. Kessler, IDA, Linköpings universitet.                 11               TDDC86 Com piler Optimizations a nd Code Ge neration




                                                                                                                                                                                                                                                        2
  Generating a CFG from AST                                                                                                   Creating a CFG from AST (2)
                                                                                                                                                                             {
         Straightforward for structured programming languages                                                                        Traverse AST                             b = a + 1;
                                                                                                                                     recursively,                             while (b>0)
                Traverse AST and compose control flow graph recursively                                                                                                         b = b / 3;
                                                                                                                                     compose CFG                              print(b);
                As in syntax-directed translation, but separate pass                                                                                                         }
                                                                                                                                     Example:
                Stitching points: single entry, single exit point of control;
                symbolic labels for linearization

                                      CFG ( while (expr) stmt ) =

                                                                              CFG(expr)
                                              entry

                                           CFG(stmt1)                           CFG(stmt)
CFG ( stmt1; stmt2 ) =
                                           CFG(stmt2)

                                              exit
C. Kessler, IDA, Linköpings universitet.                13           TDDC86 Com piler Optimizations a nd Code Ge neration   C. Kessler, IDA, Linköpings universitet.                         14            TDDC86 Com piler Optimizations a nd Code Ge neration




      TDDC86 Compiler Optimizations and Code Generation
                                                                                                                              HIR/MIR/LIR Example                                                 (adapted from Muchnick’97)

                                                                                                                                     HIR:
                                                                                                                                     for v = v1 by v2 to v3 do
                                                                                                                                        a[i] = 2                                                                          symbolic registers
                                                                                                                                     endfor                                                                              allocated: v in s2, v1
                                                                                                                                                                       assuming that v2                                      in s1, i in s9 ...
                                Multi-Level IR Overview                                                                                                                   is positive
                                                                                                                                     MIR:                                                             LIR:
                                Standard vs. SSA Form                                                                                    v = v1                                                           s2 = s1
                                                                                                                                         t2 = v2                                                          s4 = s3
                                               AST                                                                                       t3 = v3                                                          s6 = s5
                                                                                                                                     L1: if v > t3 goto L2                                            L1: if s2 > s6 goto L2
                                               HIR              SSA-HIR                                                                  t4 = addr a                                                      s7 = addr a
                                                                                                                                         t5 = 4 * i                                                       s8 = 4 * s9
                                               MIR              SSA-MIR                                                                  t6 = t4 + t5                                                     s10 = s7 + s8
                                                                                                                                         *t6 = 2                                                          [s10] = 2
                                               LIR              SSA-LIR                                                                  v = v + t2                                                       s2 = s2 + s4
                                                                                                                                         goto L1                                                          goto L1
                                               VLIR (target code)                    Christoph Kessler, IDA,                         L2:                                                              L2:
                                                                                     Linköpings universitet, 2009.
                                                                                                                            C. Kessler, IDA, Linköpings universitet.                         16            TDDC86 Com piler Optimizations a nd Code Ge neration




  Example with SSA-LIR                                         (adapted from Muchnick’97)                                     SSA-Form vs. Standard Form of IR
                                                                                                                                     SSA form makes data flow (esp., def-use chains) explicit
          s2 is assigned (written, defined)                                                                                          Certain program analyses and transformations are easier to
         multiple times in the program text                     B1      s21 = s1                                                     implement or more efficient on SSA-representation
         (i.e., multiple static assignments)                            s4 = s3
                                                                        s6 = s5                                                      (Up to now) SSA is not suitable for code generation
        LIR:                                                                                                                           Requires transformation back to standard form
            s2 = s1                                 B2 s2 = φ ( s2 , s2 )
                                                             2        1    3                                                         Comes later…
            s4 = s3                                       s22 > s6 ?
            s6 = s5                                                                                                                                                            AST
                                                         Y            N
        L1: if s2 > s6 goto L2
            s7 = addr a                                                                                                                                                         HIR                     SSA-HIR
                                                               B3 s7 = addr a
            s8 = 4 * s9           After introducing one             s8 = 4 * s9                                                                                                 MIR                     SSA-MIR
            s10 = s7 + s8        version of s2 for each             s10 = s7 + s8
            [s10] = 2          static definition and explicit       [s10] = 2
            s2 = s2 + s4         merger ops for different                                                                                                                       LIR                     SSA-LIR
                                                                    s23 = s22 + s4
            goto L1              reaching versions (phi
        L2:                     nodes, φ): Static single                                                                                                                        VLIR (target code)
                                            assignment (SSA) form
C. Kessler, IDA, Linköpings universitet.                17           TDDC86 Com piler Optimizations a nd Code Ge neration   C. Kessler, IDA, Linköpings universitet.                         18            TDDC86 Com piler Optimizations a nd Code Ge neration




                                                                                                                                                                                                                                                                  3
                                                                                                                 TDDC86 Compiler Optimizations and Code Generation
  MIR – medium-level intermediate representation
         “language independent”
         control flow reduced to simple branches, call, return
         variable accesses still in terms of symbol table names
         explicit code for procedure / block entry / exit
                                                                                                                                                        Flattening 1:
         suitable for most optimizations                                                                                                              From HIR to MIR
         basis for code generation                  AST
                                                                                                                                                           AST
                                                    HIR                           SSA-HIR
                                                                                                                                                           HIR             SSA-HIR
                                                    MIR                           SSA-MIR
                                                                                                                                                           MIR             SSA-MIR
                                                    LIR                           SSA-LIR
                                                                                                                                                           LIR             SSA-LIR
                                                    VLIR (target code)
                                                                                                                                                           VLIR (target code)                   Christoph Kessler, IDA,
                                                                                                                                                                                                Linköpings universitet, 2009.
C. Kessler, IDA, Linköpings universitet.   19       TDDC86 Com piler Optimizations a nd Code Ge neration




  HIR                MIR (1): Flattening the expressions                                                     HIR MIR (2): Lowering Array References (1)
   By a postorder traversal of each expression tree in the CFG:                                                     HIR:
                                                                                                                    t1 = a [ i, j+2 ]
         Decompose the nodes of the expression trees (operators, ...)
         into simple operations (ADD, SUB, MUL, ...)
                                                                                                                    the Lvalue of a [ i, j+2 ] is
         Infer the types of operands and results (language semantics)                                               (on a 32-bit architecture)
                annotate each operation by its (result) type                                                        (addr a) + 4 * ( i * 20 + j + 2 )

                insert explicit conversion operations where necessary
                                                                                                                    MIR:
         Flatten each expression tree (= partial order of evaluation)                                               t1 = j + 2
         to a sequence of operations (= total order of evaluation)                                                  t2 = i * 20
         using temporary variables t1, t2, ... to keep track of data flow                                           t3 = t1 + t2
                                                                                                                    t4 = 4 * t3
                This is static scheduling!                                                                          t5 = addr a
                May have an impact on space / time requirements                                                     t6 = t5 + t4
                                                                                                                    t7 = *t6

C. Kessler, IDA, Linköpings universitet.   21       TDDC86 Com piler Optimizations a nd Code Ge neration   C. Kessler, IDA, Linköpings universitet.                22           TDDC86 Com piler Optimizations a nd Code Ge neration




  HIR MIR (2): Flattening the control flow graph                                                             Control flow graph                                                  1:     ( JEQZ,           5,        0,      0)


         Depth-first search of the control flow graph                                                               Nodes: primitive operations                                  2:     ( ASGN,          2,         0,      A)

         Topological ordering of the operations, starting with entry                                                      (e.g., quadruples)
                                                                                                                                                                                 3:     ( ADD             A,       3,      B)
         node                                                                                                       Edges: control flow transitions
                at conditional branches:                                                                            Example:                                                    4:      ( JUMP,          7,        0,      0)
                one exit fall-through, other exit branch to a label                                                  1:      ( JEQZ,           5,     0,   0)
                                                                                                                                                                                5:     ( ASGN,           23,       0,      A)
         Basic blocks = maximum-length subsequences of                                                               2:      ( ASGN,           2,     0,   A)
         statements containing no branch nor join of control flow                                                    3:      ( ADD              A,    3,   B)
                                                                                                                                                                                6:     ( SUB             A,       1,      B)
         Basic block graph obtained from CFG by merging                                                              4:      ( JUMP,           7,     0,   0)
         statements in a basic block to a single node                                                                5:      ( ASGN,           23,    0,   A)
                                                                                                                                                                                7:     ( MUL,            A,       B,       C)
                                                                                                                     6:      ( SUB             A,     1,   B)
                                                                                                                     7:      ( MUL,             A,    B,   C)                   8:    ( ADD,            C,        1,      A)
                                                                                                                     8:      ( ADD,             C,    1,   A)
                                                                                                                     9:      ( JNEZ,           B,     2,   0)                   9:     ( JNEZ,          B,         2,       0)
C. Kessler, IDA, Linköpings universitet.   23       TDDC86 Com piler Optimizations a nd Code Ge neration   C. Kessler, IDA, Linköpings universitet.                24           TDDC86 Com piler Optimizations a nd Code Ge neration




                                                                                                                                                                                                                                       4
  Basic block                                                                                                     Basic block graph                                                   B1
                                                                                                                                                                                             1:     ( JEQZ,           5,        0,      0)

         A basic block is a sequence of textually consecutive                                                            Nodes: basic blocks
         operations (e.g. MIR operations, LIR operations, quadruples)                                                                                                                 B2 2: ( ASGN, 2,                         0,      A)
                                                                                                                         Edges: control flow transitions
         that contains no branches (except perhaps its last operation)                                                                                                                       3:     ( ADD             A,       3,      B)
         and no branch targets (except perhaps its first operation).                                                                                                                         4:     ( JUMP,          7,        0,      0)
                                                                                                                              1:      ( JEQZ,              5,    0,    0)
                Always executed in same order from entry to exit
                                                                                                                              2:      ( ASGN,              2,    0,    A)
                A.k.a. straight-line code        1:    ( JEQZ,         5,        0,      0)        B1                                                                                B3 5: ( ASGN, 23,                         0,      A)
                                                                                                                              3:      ( ADD                A,    3,    B)
                                                 2:    ( ASGN,         2,        0,      A)       B2                                                                                        6:     ( SUB             A,       1,       B)
                                                                                                                              4:      ( JUMP,              7,    0,    0)
                                                 3:    ( ADD           A,        3,      B)
                                                                                                                              5:      ( ASGN,              23,   0,    A)
                                                 4:    ( JUMP,         7,        0,      0)
                                                                                                                              6:      ( SUB                A,    1,    B)            B4 7: ( MUL,                   A,        B,      C)
                                                 5:    ( ASGN,         23,       0,      A)        B3
                                                                                                                              7:      ( MUL,               A,    B,     C)                  8:     ( ADD,           C,        1,      A)
                                                 6:    ( SUB           A,        1,      B)
                                                                                                                              8:      ( ADD,               C,    1,    A)                   9:     ( JNEZ,          B,        2,        0)
                                                 7:    ( MUL,          A,        B,      C)        B4
                                                                                                                              9:      ( JNEZ,              B,    2,     0)
                                                 8:    ( ADD,          C,        1,      A)
C. Kessler, IDA, Linköpings universitet.    25   9:       TDDC86 Com B,     2,        0)
                                                       ( JNEZ, piler Optimizations a nd Code Ge neration        C. Kessler, IDA, Linköpings universitet.                      26            TDDC86 Com piler Optimizations a nd Code Ge neration




  LIR – low-level intermediate representation                                                                     MIR                 LIR: Lowering Variable Accesses
         in GCC: Register-transfer language (RTL)                                                                  Seen earlier:                                                     Memory layout:
                                                                                                                                                                                       Local variables relative to
         usually architecture dependent                                                                                                                                                procedure frame pointer fp
                                                                                                                         HIR:
                e.g. equivalents of target instructions + addressing modes                                               t1 = a [ i, j+2 ]                                             j at fp – 4
                for IR operations                                                                                                                                                          i at fp – 8
                variable accesses in terms of target memory addresses                                                    the Lvalue of a [ i, j+2 ] is                                     a at fp – 216
                                                                                                                         (on a 32-bit architecture)
                                                                                                                         (addr a) + 4 * ( i * 20 + j + 2 )                           LIR:
                                                             AST                                                                                                                     r1 = [fp – 4]
                                                                                                                         MIR:                                                        r2 = r1 + 2
                                                             HIR                           SSA-HIR                       t1 = j + 2                                                  r3 = [fp – 8]
                                                                                                                         t2 = i * 20                                                 r4 = r3 * 20
                                                             MIR                           SSA-MIR                       t3 = t1 + t2                                                r5 = r4 + r2
                                                                                                                         t4 = 4 * t3                                                 r6 = 4 * r5
                                                             LIR                          SSA-LIR                        t5 = addr a                                                 r7 = fp – 216
                                                                                                                         t6 = t5 + t4                                                f1 = [r7 + r6]
                                                             VLIR (target code)                                          t7 = *t6
C. Kessler, IDA, Linköpings universitet.    27           TDDC86 Com piler Optimizations a nd Code Ge neration   C. Kessler, IDA, Linköpings universitet.                      28            TDDC86 Com piler Optimizations a nd Code Ge neration




                                                                                                                      TDDC86 Compiler Optimizations and Code Generation
  Example: The LCC-IR
         LIR – DAGs (Fraser, Hanson ’95)
       entry


     ADDRGP _a, VR1
     ADDRLP (fp+4), VR2
                                                  ADDRLP (fp+4), VR9
                                                                                                                                                                  Flattening 2:
     INDIRI VR2, VR3
     CNSTI 1, VR4
     ADDI VR3, VR4, VR5
                                                  INDIRI VR9, VR10
                                                  CNSTI 3, VR11                                                                                                 From MIR to LIR
     ASGNI VR5, VR1                               DIVI VR10, VR11, VR12
                                                  ASGNI VR12, VR9
     LABELV L1
                                                  JUMPV L1
                                                                                                                                                                      AST
     ADDRLP (fp+4), VR6
     INDIRI VR6, VR7                              LABELV L2                                                                                                           HIR             SSA-HIR
     CNSTI 0, VR8
                                                  ADDRLP (fp+4), VR13
     LEI VR7, VR8, L2
                                                  INDIRI VR13, VR14                                                                                                   MIR             SSA-MIR
                                                  ARGI VR14
                                                  CALL _print                                                                                                         LIR             SSA-LIR
                                                      exit
                                                                                                                                                                      VLIR (target code)                    Christoph Kessler, IDA,
                                                                                                                                                                                                            Linköpings universitet, 2009.
C. Kessler, IDA, Linköpings universitet.    29           TDDC86 Com piler Optimizations a nd Code Ge neration




                                                                                                                                                                                                                                                   5
  MIR                 LIR: Storage Binding                                                                  MIR                 LIR translation example
         mapping variables (symbol table items) to addresses                                                   MIR:                                  LIR, bound to            LIR, bound to
                                                                                                                                                     storage locations:       symbolic registers:
         (virtual) register allocation
         procedure frame layout implies addressing of formal                                                   a=a*2                                 r1 = [gp+8] // Load      s1 = s1 * 2
                                                                                                                                                     r2 = r1 * 2
         parameters and local variables relative to frame pointer fp,                                                                                [gp+8] = r2 // store
         and parameter passing (call sequences)                                                                b=a+c[1]                              r3 = [gp+8]              s2 = [fp – 56]
                                                                                                                                                     r4 = [fp – 56]           s3 = s1 + s2
         for accesses, generate Load and Store operations                                                                                            r5 = r3 + r4
                                                                                                                                                     [fp – 20] = r5
         further lowering of the program representation



                                                                                                                                              Storage layout:
                                                                                                                                              Global variable a addressed relative
                                                                                                                                              to global pointer gp
                                                                                                                                              local variables b, c relative to fp

C. Kessler, IDA, Linköpings universitet.   31      TDDC86 Com piler Optimizations a nd Code Ge neration   C. Kessler, IDA, Linköpings universitet.                    32        TDDC86 Com piler Optimizations a nd Code Ge neration




  MIR LIR: Procedure call sequence (1)
  [Muchnick 5.6]
                                                                                                            MIR                 LIR: Procedure call sequence (2)
         call instruction assembles arguments                                                                Procedure prologue
         and transfers control to callee                                                                     executed on entry to the procedure
         evaluate each argument (reference vs. value param.) and                                                   save old frame pointer fp
                push it on the stack, or                                                                           old stack pointer sp becomes new frame pointer fp
                write it to a parameter register
                                                                                                                   determine new sp (creating space for local variables)
         determine code address of the callee
         (mostly, compile-time or link-time constant)                                                              save callee-save registers
         store caller-save registers (usually, push on the stack)
         save return address (usually in a register)
         and branch to code entry of callee.



C. Kessler, IDA, Linköpings universitet.   33      TDDC86 Com piler Optimizations a nd Code Ge neration   C. Kessler, IDA, Linköpings universitet.                    34        TDDC86 Com piler Optimizations a nd Code Ge neration




                                                                                                                TDDC86 Compiler Optimizations and Code Generation
  MIR                 LIR: Procedure call sequence (3)
   Procedure epilogue
                                                                                                                                          From Trees to DAGs:
   executed at return from procedure
         restore callee-save registers
         put return value (if existing) in appropriate place (reg/stack)                                                           Common Subexpression
         restore old values for sp and fp                                                                                            Elimination (CSE)
         branch to return address

   Caller cleans up upon return:                                                                                                                E.g., at MIR LIR Lowering
         restore caller-save registers
         use the return value (if applicable)

                                                                                                                                                                                                Christoph Kessler, IDA,
                                                                                                                                                                                                Linköpings universitet, 2009.
C. Kessler, IDA, Linköpings universitet.   35      TDDC86 Com piler Optimizations a nd Code Ge neration




                                                                                                                                                                                                                                       6
  From Trees to DAGs:
                                                                                                                              Local CSE on MIR produces a MIR DAG
  Local CSE
                                                                                                                               1. c = a
                                                                                                                               2. b = a + 1
                                                                                                                               3. c = 2 * a
                                                                                                                                                                       d : neg                         add
                                                                                                                               4. d = – c
                                                                                                                               5. c = a + 1
                                                                                                                               6. c = b + a                                 mul       b : add
                                                                                                                               7. d = 2 * a
                                                                                                                               8. b = c                                2          c: a                        1




C. Kessler, IDA, Linköpings universitet.                     37      TDDC86 Com piler Optimizations a nd Code Ge neration   C. Kessler, IDA, Linköpings universitet.       38       TDDC86 Com piler Optimizations a nd Code Ge neration




      TDDC86 Compiler Optimizations and Code Generation
                                                                                                                              LIR               VLIR: Instruction selection
                                                                                                                                     LIR has often a lower level of abstraction than most target
                                                                                                                                     machine instructions (esp., CISC, or DSP-MAC).
                                                                                                                                     One-to-one translation LIR-operation to equivalent target
                                             Flattening 3:                                                                           instruction(s) (“macro expansion”) cannot make use of more
                                                                                                                                     sophisticated instructions
                                           From LIR to VLIR                                                                          Pattern matching necessary!

                                               AST

                                                HIR               SSA-HIR

                                               MIR                SSA-MIR

                                               LIR                SSA-LIR

                                                VLIR (target code)                   Christoph Kessler, IDA,
                                                                                     Linköpings universitet, 2009.
                                                                                                                            C. Kessler, IDA, Linköpings universitet.       40       TDDC86 Com piler Optimizations a nd Code Ge neration




  LIR / VLIR: Register Allocation                                                                                             On LIR/VLIR: Global register allocation
         Example for a SPARC-specific VLIR                                                                                           Register allocation
                                                                                                                                            determine what values to keep in a register
        int a, b, c, d;                     ldw a, r1                  add r1, r2, r3
                                            ldw b, r2
                                                                                                                                            “symbolic registers”, “virtual registers”
        c = a + b;                                                     add r3, 1, r4
        d = c + 1;                          add r1, r2, r3                                                                           Register assignment
                                            stw r3, addr c                                                                                  assign virtual to physical registers
                                            ldw addr c, r3                                                                                  Two values cannot be mapped to the same register if they
                                            add r3, 1, r4                                                                                   are alive simultaneously, i.e. their live ranges overlap
                                            stw r4, addr d                                                                                  (depends on schedule).
                                                                                      There is a lot to
                                                                                       be gained by
                                                                                       good register
                                                                                        allocation!



C. Kessler, IDA, Linköpings universitet.                     41      TDDC86 Com piler Optimizations a nd Code Ge neration   C. Kessler, IDA, Linköpings universitet.       42       TDDC86 Com piler Optimizations a nd Code Ge neration




                                                                                                                                                                                                                                           7
  On LIR/VLIR: Instruction scheduling                                                                                   Remarks on IR design (1) [Cooper’02]
         reorders the instructions (LIR/VLIR)                                                                            AST? DAGs? Call graph? Control flow graph? Program dep. graph? SSA? ...
         (subject to precedence constraints given by dependences)
         to minimize                                                                                                           Level of abstraction is critical for implementation cost and opportunities:
                                                                                                                                      representation chosen affects the entire compiler
                space requirements (# registers)
                time requirements (# CPU cycles)                                                                         Example 1: Addressing for arrays and aggregates (structs)
                power consumption                                                                                                     source level AST: hides entire address computation A[i+1][j]
                ...                                                                                                                   pointer formulation: may hide critical knowledge (bounds)
                                                                                                                                      low-level code: may make it hard to see the reference
                                                                                                                              “best” representation depends on how it is used
                                                                                                                                      for dependence-based transformations: source-level IR (AST, HIR)
                                                                                                                                      for fast execution: pointer formulation (MIR, LIR)
                                                                                                                                      for optimizing address computation: low-level repr. (LIR, VLIR, target)

C. Kessler, IDA, Linköpings universitet.              43       TDDC86 Com piler Optimizations a nd Code Ge neration   C. Kessler, IDA, Linköpings universitet.           44           TDDC86 Com piler Optimizations a nd Code Ge neration




  Remarks on IR Design (2)                                                                                              Summary
   Example 2: Representation for comparison&branch                                                                             Multi-level IR
         fundamentally, 3 different operations:                                                                                       Translation by lowering
                Compare convert result to boolean branch                                                                         ☺ Program analyses and transformations can work on
           combined in different ways by processor architects                                                                        the most appropriate level of abstraction
         “best” representation may depend on target machine                                                                      ☺ Clean separation of compiler phases
                                                                                                                                       Compiler framework gets larger and slower
         r7 = (x < y)                      cmp x y (sets CC)               r7 = (x < y)
                                                                                                                          Lowering:                              AST
         br r7, L12                        brLT L12                   [r7] br L12
                                                                                                                          Gradual loss of                        HIR             SSA-HIR
                                                                                                                          source-level
               design problem for a retargetable compiler                                                                 information                            MIR             SSA-MIR
                                                                                                                          Increasingly target                    LIR             SSA-LIR
                                                                                                                          dependent
                                                                                                                                                                 VLIR (target code)
C. Kessler, IDA, Linköpings universitet.              45       TDDC86 Com piler Optimizations a nd Code Ge neration   C. Kessler, IDA, Linköpings universitet.           46           TDDC86 Com piler Optimizations a nd Code Ge neration




      TDDC86 Compiler Optimizations and Code Generation
                                                                                                                        LCC (Little C Compiler)
                                                                                                                               Dragon-book style C compiler implementation in C
                                                                                                                               Very small (20K Loc), well documented, well tested, widely used
                                                                                                                               Open source: http://www.cs.princeton.edu/software/lcc
                   APPENDIX – For Self-Study                                                                                   Textbook A retargetable C compiler [Fraser, Hanson 1995]
                                                                                                                               contains complete source code
                                                                                                                               One-pass compiler, fast
                                                                                                                               C frontend (hand-crafted scanner and recursive descent parser)
                              Compiler Frameworks                                                                              with own C preprocessor
                                                                                                                               Low-level IR
                                    A (non-exhaustive) survey                                                                         Basic-block graph containing DAGs of quadruples
                                                                                                                                      No AST
                               with a focus on open-source frameworks                                                          Interface to IBURG code generator generator
                                                                                                                                      Example code generators for MIPS, SPARC, Alpha, x86 processors
                                                                                                                                      Tree pattern matching + dynamic programming
                                                                                                                               Few optimizations only
                                                                               Christoph Kessler, IDA,
                                                                                                                                      local common subexpr. elimination, constant folding
                                                                               Linköpings universitet, 2009.                                                                 a prototype is needed Ge neration
                                                                                                                               Good choice for source-to-target compiling if TDDC86 Com piler Optimizations a nd Codesoon
                                                                                                                      C. Kessler, IDA, Linköpings universitet.     48




                                                                                                                                                                                                                                             8
  GCC 4.x                                                                                                               Open64 / ORC Open Research Compiler
         Gnu Compiler Collection (earlier: Gnu C Compiler)                                                                     Based on SGI Pro-64 Compiler for MIPS processor, written in C++,
         Compilers for C, C++, Fortran, Java, Objective-C, Ada …                                                               went open source in 2000
             sometimes with own extensions, e.g. Gnu-C                                                                         Several tracks of development (Open64, ORC, …)
         Open-source, developed since 1985
                                                                                                                               For Intel Itanium (IA-64) and x86 (IA-32) processors.
         Very large                                                                                                            Also retargeted to x86-64, Ceva DSP, Tensilica, XScale, ARM …
         3 IR formats (all language independent)                                                                               ”simple to retarget” (?)
             GENERIC: tree representation for whole function (also statements)
                                                                                                                               Languages: C, C++, Fortran95 (uses GCC as frontend),
             GIMPLE (simple version of GENERIC for optimizations)                                                                             OpenMP and UPC (for parallel programming)
             based on trees but expressions in quadruple form.
             High-level, low-level and SSA-low-level form.                                                                     Industrial strength, with contributions from Intel, Pathscale, …
             RTL (Register Transfer Language, low-level, Lisp-like) (the traditional GCC-IR)                                   Open source: www.open64.net, ipf-orc.sourceforge.net
             only word-sized data types; stack explicit; statement scope
         Many optimizations                                                                                                    6-layer IR:
         Many target architectures                                                                                                 WHIRL (VH, H, M, L, VL) – 5 levels of abstraction
         Version 4.x (since ~2004) has strong support for retargetable code generation                                                All levels semantically equivalent
             Machine description in .md file
                                                                                                                                      Each level a lower level subset of the higher form
             Reservation tables for instruction scheduler generation
         Good choice if one has the time to get into the framework
                                                                                                                                   and target-specific very low-level CGIR
                                                                                                                               Many optimizations, many third-party contributed components
C. Kessler, IDA, Linköpings universitet.            49         TDDC86 Com piler Optimizations a nd Code Ge neration   C. Kessler, IDA, Linköpings universitet.          50         TDDC86 Com piler Optimizations a nd Code Ge neration




  Open64 WHIRL                             C, C++        F95                                                            LLVM                                     (llvm.org)
                                                                   front-ends
                                                                   (GCC)                                                       LLVM (Univ. of Illinois at Urbana Champaign)
     VHO
     standalone inliner
                                           Very High WHIRL
                                                (AST)             Lower aggregates                                                    ”Low-level virtual machine”
                                                                  Un-nest calls …                                                     Front-ends (GCC) for C, C++, Objective-C, Fortran, …
    IPA (interprocedural analysis)
    PREOPT                                   High WHIRL           Lower      arrays                                                   One IR level: a LIR + SSA-LIR,
    LNO (Loop nest optimizer)                                     Lower      complex numbers
                                                                  Lower      HL control flow                                                linearized form, printable, shippable, but target-dependent,
                                                                  Lower      bit-fields …                                                   ”LLVM instruction set”
    WOPT (global optimizer,                  Mid WHIRL
     uses internally an SSA IR)                                   Lower intrinsic ops to calls                                        compiles to many target platforms
    RVI1 (register variable                                       All data mapped to segments
          identification)                                         Lower loads/stores to final form                                          x86, Itanium, ARM, Alpha, SPARC, PowerPC, Cell SPE, …
                                                                  Expose code sequences for
    RVI2                                                                                                                                    And to low-level C
                                             Low WHIRL                constants, addresses
                                                                  Expose #(gp) addr. for globals                                      Link-time interprocedural analysis and optimization framework
                                                                  …                                                                   for whole-program analysis
    CG
                                           Very Low WHIRL                                                                             JIT support available for x86, PowerPC
                                                                  code generation, including
                                                                  scheduling, profiling support,                                      Open source
    CG
                                                CGIR              predication, SW speculation
C. Kessler, IDA, Linköpings universitet.            51         TDDC86 Com piler Optimizations a nd Code Ge neration   C. Kessler, IDA, Linköpings universitet.          52         TDDC86 Com piler Optimizations a nd Code Ge neration




                                                                                                                            TDDC86 Compiler Optimizations and Code Generation
  VEX Compiler
         VEX: ”VLIW EXample”
                Generic clustered VLIW Architecture and Instruction Set
         From the book by Fisher, Faraboschi, Young:
         Embedded Computing, Morgan Kaufmann 2005
                                                                                                                                           CoSy
                www.vliw.org/book
         Developed at HP Research
                Based on the compiler for HP/ST Lx (ST200 DSP)
         Compiler, Libraries, Simulator and Tools
         available in binary form from HP for non-commercial use
                IR not accessible, but CFGs and DAGs can be dumped or visualized                                                           A commercial compiler framework
         Transformations controllable by options and/or #pragmas
                Scalar optimizations, loop unrolling, prefetching, function inlining, …
                                                                                                                                           www.ace.nl
                Global scheduling (esp., trace scheduling),
                but no software pipelining                                                                                                                                                         Christoph Kessler, IDA,
                                                                                                                                                                                                   Linköpings universitet, 2009.
C. Kessler, IDA, Linköpings universitet.            53         TDDC86 Com piler Optimizations a nd Code Ge neration




                                                                                                                                                                                                                                          9
 Traditional Compiler Structure                                                                                                     A CoSy Compiler with
                                                                                                                                    Repository-Architecture
        Traditional compiler model: sequential process
                                    tokens            tree Semant. IR                          Code     code                                    “Engines”
           text                                                          Optimize IR                                                                                          Semantic                  Transformation
                        Lexer                Parser                                            generato                                         (compiler tasks,
                                                           Analysis      r                                                                       phases)                      analysis
                                                                                               r


                                                                                                                                                Parser
        Improvement: Pipelining
                                                                                                                                                                                                                                   Optimizer
                     (by files/modules, classes, functions)
        More modern compiler model with shared symbol table and IR:
                            Coordination                 Symbol table
                            data flow                                                                                                       Lexer

            text                                                                                Code     code                                                                                                                              Codegen
                                                          Semant.        Optimize               generato
                         Lexer               Parser       Analysis       r                      r

          Data fetch/store                                                                                                                                                    Common
                                                Intermediate representation (IR)
                                                                                                                                                                              intermediate representation              “Blackboard architecture”
                                                                            TDDC86 Com piler Optimizations a nd Code Ge neration
                                                                                                                                                                              repository                     TDDC86 Com piler Optimizations a nd Code Ge neration
C. Kessler, IDA, Linköpings universitet.                     55                                                                    C. Kessler, IDA, Linköpings universitet.                  56




 Engine                                                                                                                             Composite Engines in CoSy
                                                                                                                                          Built from simple engines or from other composite engines
                                                                                                                                          by combining engines in interaction schemes
                                                                                                                                          (Loop, Pipeline, Fork, Parallel, Speculative, ...)
       Modular compiler building block
                                                                                                                                          Described in EDL (Engine Description Language)
       Performs a well-defined task
       Focus on algorithms, not compiler configuration
                                                                                                                                          View defined by the joint effect of constituent engines
       Parameters are handles on the underlying common IR repository                                                                      A compiler is nothing more than a large composite engine
       Execution may be in a separate process or as subroutine call -
         the engine writer does not know!
       View of an engine class:                                                                                                           ENGINE CLASS compiler (IN u: mirUNIT) {
         the part of the common IR repository that it can access                                                                            PIPELINE
         (scope set by access rights: read, write, create)                                                                                    frontend (u)
                                                                                                                                              optimizer (u)
       Examples: Analyzers, Lowerers, Optimizers, Translators, Support                                                                        backend (u)
                                                                                                                                          }
C. Kessler, IDA, Linköpings universitet.                     57             TDDC86 Com piler Optimizations a nd Code Ge neration   C. Kessler, IDA, Linköpings universitet.                  58              TDDC86 Com piler Optimizations a nd Code Ge neration




                                                                                                                                    Example for CoSy EDL
   A CoSy Compiler                                                                                                                  (Engine Description Language)
                                                                                                                                           Component classes (engine class)                   ENGINE CLASS optimizer ( procedure p )
                                                Optimizer                                                                                  Component instances (engines)                      {
                                                    I                                                                                                                                            ControlFlowAnalyser cfa;
                                                                                                                                           Basic components                                      CommonSubExprEliminator cse;
              Parser                                     Logical view                            Optimizer                                 are implemented in C                                  LoopVariableSimplifier lvs;
                                                                                                     II                                    Interaction schemes (cf. skeletons)                    PIPELINE cfa(p); cse(p); lvs(p);
                                                                                                                                                                                              }
                                                                                                                                           form complex connectors
                                                                                                                                               SEQUENTIAL                                     ENGINE CLASS compiler ( file f )
                                                        Generated Factory                        Logical view
                                                                                                                                                  PIPELINE                                    { ….
                                                                                                                                                                                                Token token;
                                                                                                                                                  DATAPARALLEL                                  Module m;
                                                                                                                                                                                                PIPELINE // lexer takes file, delivers token stream:
                                                                                                                                                  SPECULATIVE                                          lexer( IN f, OUT token<> );
                                                                                                                                                                                                         // Parser delivers a module
                                                                                                                                           EDL can embed automatically                                   parser( IN token<>, OUT m );
                                                                                                Generated                                     Single-call-components into                                sema( m );
                                                                                                access layer                                  pipes                                                      decompose( m, p<> );
                                                                                                                                                                                                        // here comes a stream of procedures
                                                                                                                                                  p<> means a stream of p-items                         // from the module
                                                                                                                                                  EDL can map their protocols to                         optimizer( p<> );
                                                                                                                                                  each other (p vs p<>)                                  backend( p<> );
                                                                                                                              59
C. Kessler, IDA, Linköpings universitet.                     59             TDDC86 Com piler Optimizations a nd Code Ge neration   C. Kessler, IDA, Linköpings universitet.
                                                                                                                                                                                              }
                                                                                                                                                                                             60              TDDC86 Com piler Optimizations a nd Code Ge neration




                                                                                                                                                                                                                                                                    10
 Evaluation of CoSy                                                                                               Source-to-Source compiler frameworks
        The outer call layers of the compiler are generated from view description
                                                                                                                         Cetus
        specifications
               Adapter, coordination, communication, encapsulation                                                              http://cobweb.ecn.purdue.edu/ParaMount/Cetus/
               Sequential and parallel implementation can be exchanged                                                          C/C++ source-to-source compiler written in Java.
               There is also a non-commercial prototype                                                                         Open source
               [Martin Alt: On Parallel Compilation. PhD thesis, 1997, Univ.
               Saarbrücken]                                                                                              Tools and generators

        Access layer to the repository must be efficient
                                                                                                                                TXL source-to-source transformation system
        (solved by generation of macros)                                                                                        ANTLR frontend generator
        Because of views, a CoSy-compiler is very simply extensible                                                             ...
               That's why it is expensive
               Reconfiguration of a compiler within an hour


C. Kessler, IDA, Linköpings universitet.     61          TDDC86 Com piler Optimizations a nd Code Ge neration   C. Kessler, IDA, Linköpings universitet.   62    TDDC86 Com piler Optimizations a nd Code Ge neration




  More frameworks…
         Some influential frameworks of the 1990s
         ...some of them still active today
             SUIF Stanford university intermediate format,
             suif.stanford.edu
             Trimaran (for instruction-level parallel processors)
             www.trimaran.org
             Polaris (Fortran) UIUC
             Jikes RVM (Java) IBM
             Soot (Java)
             GMD Toolbox / Cocolab Cocktail™ compiler generation
             tool suite
             and many others …
         And many more for the embedded domain …
C. Kessler, IDA, Linköpings universitet.     63          TDDC86 Com piler Optimizations a nd Code Ge neration




                                                                                                                                                                                                                        11

								
To top