Tiver Virtual Machine and Runtime Environment

Document Sample
Tiver Virtual Machine and Runtime Environment Powered By Docstoc
					The IR to VMx86 Translation Module

Chris Lattner

The IR to VMx86 Translation Module Specification
Chris Lattner December 8th, 1999 CS490: Advanced Compiler Design University of Portland

The IR to VMx86 Translation Module Specification
Target Audience Organization The VMx86 Package Tiger VM Goals Design Considerations Fundamental Classes Pseudo-Instruction Classes Instruction Classes The IR Package Design Considerations Fundamental Classes IR.Instruction Subclasses Unary Operator Classes Binary Operator Classes Indirect Operator Classes Summary

1
1 1 1 1 2 2 6 8 11 12 12 18 21 23 27 30

CS490 – Adv. Compilers

Page i

The IR to VMx86 Translation Module

Chris Lattner

Complete Table of Contents
The IR to VMx86 Translation Module Specification ............................... 1
Target Audience ......................................................................................................................................... 1 Organization ............................................................................................................................................... 1 The VMx86 Package..................................................................................................................................... 1 Tiger VM Goals.......................................................................................................................................... 1 Design Considerations ................................................................................................................................ 2 Fundamental Classes .................................................................................................................................. 2 The VMx86.Instruction Class................................................................................................................. 2 The VMx86.Program Class .................................................................................................................... 3 The VMx86.Operand Class .................................................................................................................... 3 The VMx86.Register Class..................................................................................................................... 4 The VMx86.Memory Class .................................................................................................................... 4 The VMx86.Constant Class .................................................................................................................... 4 Pseudo-Instruction Classes ......................................................................................................................... 6 The VMx86.Comment Class .................................................................................................................. 6 The VMx86.Label Class ......................................................................................................................... 6 The VMx86.FileHeader Class ................................................................................................................ 6 The VMx86.ObjectSize Class ................................................................................................................ 7 The VMx86.SegmentSelect Class .......................................................................................................... 7 The VMx86.EmitLong Class .................................................................................................................. 7 The VMx86.EmitString Class ................................................................................................................ 7 Instruction Classes ...................................................................................................................................... 8 The VMx86.add Class ............................................................................................................................ 8 The VMx86.and Class ............................................................................................................................ 8 The VMx86.call Class ............................................................................................................................ 8 The VMx86.calli Class ........................................................................................................................... 8 The VMx86.cmp Class ........................................................................................................................... 8 The VMx86.div Class ............................................................................................................................. 8 The VMx86.idiv Class ............................................................................................................................ 8 The VMx86.jcc Class ............................................................................................................................. 9 The VMx86.jmp Class ............................................................................................................................ 9 The VMx86.lea Class ............................................................................................................................. 9 The VMx86.mov Class ........................................................................................................................... 9 The VMx86.mul Class ............................................................................................................................ 9 The VMx86.or Class .............................................................................................................................10 The VMx86.pop Class ...........................................................................................................................10 The VMx86.push Class .........................................................................................................................10 The VMx86.ret Class.............................................................................................................................10 The VMx86.sar Class ............................................................................................................................10 The VMx86.setcc Class .........................................................................................................................10 The VMx86.shl Class ............................................................................................................................11 The VMx86.shr Class ............................................................................................................................11 The VMx86.sub Class ...........................................................................................................................11 The VMx86.test Class ...........................................................................................................................11 The VMx86.xor Class ...........................................................................................................................11 The IR Package ............................................................................................................................................11 Design Considerations ...............................................................................................................................12 Fundamental Classes .................................................................................................................................12
CS490 – Adv. Compilers

Page ii

The IR to VMx86 Translation Module

Chris Lattner

The IR.Instruction Class ........................................................................................................................12 Linked List Operations ......................................................................................................................13 IR Optimization Methods ..................................................................................................................13 VMx86 Code Generation Methods ....................................................................................................14 The IR.Program Class ...........................................................................................................................15 VMx86 Code Generation...................................................................................................................15 The IR.Subroutine Class ........................................................................................................................15 VMx86 Code Generation...................................................................................................................16 The IR.SubroutineImport Class .............................................................................................................17 The IR.GlobalObject Class ....................................................................................................................17 VMx86 Code Generation...................................................................................................................17 IR.Instruction Subclasses...........................................................................................................................18 The IR.Call Class...................................................................................................................................18 VMx86 Code Generation...................................................................................................................18 The IR.CallIndirect Class ......................................................................................................................18 VMx86 Code Generation...................................................................................................................19 The IR.Goto Class .................................................................................................................................19 VMx86 Code Generation...................................................................................................................19 The IR.Label Class ................................................................................................................................20 The IR.NullInstruction Class .................................................................................................................20 The IR.Return Class ..............................................................................................................................20 VMx86 Code Generation...................................................................................................................21 Unary Operator Classes .............................................................................................................................21 The IR.UnaryOperator Class .................................................................................................................21 The IR.Alloc Class ................................................................................................................................21 VMx86 Code Generation...................................................................................................................22 The IR.IsType Class ..............................................................................................................................22 VMx86 Code Generation...................................................................................................................22 The IR.LocalAddr Class ........................................................................................................................22 VMx86 Code Generation...................................................................................................................22 The IR.Move Class ................................................................................................................................22 VMx86 Code Generation...................................................................................................................23 The IR.Size Class ..................................................................................................................................23 VMx86 Code Generation...................................................................................................................23 Binary Operator Classes ............................................................................................................................23 The IR.BinaryOperator Class ................................................................................................................23 The IR.Add Class ..................................................................................................................................24 VMx86 Code Generation...................................................................................................................24 The IR.BitOperators Class .....................................................................................................................24 VMx86 Code Generation...................................................................................................................24 The IR.Compare Class ...........................................................................................................................24 VMx86 Code Generation...................................................................................................................25 The IR.Divide Class ..............................................................................................................................25 VMx86 Code Generation...................................................................................................................25 The IR.Multiply Class ...........................................................................................................................26 VMx86 Code Generation...................................................................................................................26 The IR.Shift Class .................................................................................................................................26 VMx86 Code Generation...................................................................................................................27 The IR.Subtract Class ............................................................................................................................27 VMx86 Code Generation...................................................................................................................27 Indirect Operator Classes...........................................................................................................................27 The IR.IndirectLoadOperator Class.......................................................................................................27 The IR.IndirectLoad Class .................................................................................................................28 VMx86 Code Generation ...............................................................................................................28 The IR.IndirectLoadByte Class .........................................................................................................28 VMx86 Code Generation ...............................................................................................................28
CS490 – Adv. Compilers

Page iii

The IR to VMx86 Translation Module

Chris Lattner

The IR.IndirectLoadStack Class ........................................................................................................28 VMx86 Code Generation ...............................................................................................................29 The IR.IndirectStoreOperator Class ......................................................................................................29 The IR.IndirectStore Class ................................................................................................................29 VMx86 Code Generation ...............................................................................................................29 The IR.IndirectStoreByte Class .........................................................................................................29 VMx86 Code Generation ...............................................................................................................29 The IR.IndirectStoreStack Class ........................................................................................................30 VMx86 Code Generation ...............................................................................................................30 Summary ......................................................................................................................................................30

CS490 – Adv. Compilers

Page iv

The IR to VMx86 Translation Module

Chris Lattner

The IR to VMx86 Translation Module Specification
The Tiger language compiler „tc‟ employs a back end designed to allow highly optimized code to be generated, independent of the target platform. To assist in this goal, an Intermediate Language (IRWIN) was designed and implemented, and a VM layer was specified. The program undergoing compilation goes through five stages to create a binary executable suitable for the target architecture: 1. 2. 3. 4. 5. Parsing into an Abstract Syntax Tree (AST) Transformation from the AST into the Intermediate Representation (IR) Transformation from the IR into the target Virtual Machine (VM) Emission of assembly language directly from the Virtual Machine. The system assembler creates a binary executable program.

This document describes step #3, transformation from the IR to the 80x86 virtual machine, named VMx86.

Target Audience
Because we will be describing the translator module in depth, a solid understanding of the Intel 80386 architecture is crucial, as is the IRWIN architecture. Familiarity with the “Tiger Virtual Machine and Runtime Specification” and “Intermediate Representation with an Interesting Name” documents is expected.

Organization
This document is split into two main sections. The first section, “The VMx86 Package” describes the 80x86 virtual machine itself, describing the organization and the classes involved. “The IR Package” describes the classes that implement the IR, but also explain the transformation stage that each IR node must go through to emit valid VMx86 code.

The VMx86 Package
The Tiger VMx86 is an 80386 (and compatible) specific representation of the Tiger IRWIN, with the appropriate runtime support library. This representation is chosen to allow 80x86 specific optimizations to take place, such as instruction selection, register allocation, and peephole optimization in an efficient and elegant way. Once the VM data structures have been built, and the optimizations performed, assembly language code is emitted directly by the VM.

Tiger VM Goals
The main goal of the Tiger VM is to provide a platform that supports experimentation, optimization, and future enhancements. As such, it must be general enough to allow adaptation to the future, yet powerful enough to yield an efficient system that may be highly optimized. In addition, several constraints have been imposed on the implementation of the Tiger VM: 1. Tiger is a garbage-collected language. As such, the virtual machine must provide an execution environment that is compatible with the runtime garbage collector.

CS490 – Adv. Compilers

Page 1

The IR to VMx86 Translation Module

Chris Lattner

2. Interfacing with C code. The Tiger language will be enhanced to allow support functions to be written in C code, seamlessly integrated with functions written in Tiger. To support this, a Tiger language extension will be defined to allow the definition of “external” C functions, and we must be compatible with C calling conventions, as well as provide a mapping between Tiger and C data types. Additionally, C programmers must be careful to follow the constraints needed to work in a garbage-collected environment. 3. The x86 architecture. Unfortunately, the target architecture is not a very orthogonal architecture, which makes several tasks difficult. Optimization becomes a complex combination of instruction scheduling, pipelining, and register allocation. Register allocation, however, is further constrained by the fact that certain primitive operations (ex: mul/div) only work with certain registers (ex: [e]ax/[e]dx). All of these constraints must be satisfied for the Tiger compiler to work. The IR to VMx86 translation module is not concerned with optimization, instead it focuses on producing correct code that allows for later optimization.

Design Considerations
Because the 80x86 Virtual Machine is a self-contained entity, it is localized within the VMx86 Java package. Packages provide an excellent facility to group related code, and work extremely well for this circumstance. Many of the classes in the VMx86 package are direct implementations of specific 80x86 instructions (examples: „div‟ and „test‟). Other classes are used to represent features of the 80x86 architecture such as the register set and memory addressing modes. By convention classes that map to x86 instructions are lower case words, pseudo-instructions and fundamental classes are in uppercase.

Fundamental Classes
There is a small set of fundamental classes that are used to implement everything else. The classes that fall into this category are VMx86.Instruction, VMx86.Program, VMx86.Operand, VMx86.Register, VMx86.Memory and VMx86.Constant (and its subclasses: VMx86.LConst, VMx86.LabelAddress, and VMx86.StrConst). These classes are used by the Instruction classes to represent the specific instructions available on the 80x86 architecture. The VMx86.Instruction Class
public abstract class Instruction extends TreeDisplay.TreeDisplayableAdapter { public void setPrev(Instruction P); // Linked list operations. public void setNext(Instruction N); public Instruction getPrev(); public Instruction getNext(); public abstract String getInstAsString(); public String getTreeDesc(Object obj); } // Overrides TreeDisplayableAdapter

The VMx86.Instruction class is the base class of all of the Instruction and PseudoInstruction classes, and inherits from the TreeDisplay.TreeDisplayAdapter class. It is designed to coexist as a node in a linked list. As such, it provides the expected getPrev(), setPrev(), getNext(), and setNext() methods. The VMx86.Instruction class defines one abstract method named getInstAsString(), which is supposed to return the current Instruction node as a legal 80x86 assembly language

CS490 – Adv. Compilers

Page 2

The IR to VMx86 Translation Module

Chris Lattner

string, for emission to the assembler. The syntax of the assembler style is defined by the Program.IntelStyleOutput variable, which is described below. The VMx86.Program Class
public class Program extends TreeDisplay.TreeDisplayableAdapter { static boolean IntelStyleOutput = false; public Program(Instruction VMList, String Filename, TreeDisplayable MetaInfo); public String getFilename(); public void addInst(Instruction I); public void setIntelOutput(boolean Set); public void outputCode(PrintWriter Output); public String getTreeDesc(Object obj); // Overrides TreeDisplayableAdapter public TreeDisplayable[] getDrawTreeSubobj(); // Overrides TreeDisplayableAdapter }

The VMx86.Program class is the “container” for the list of Instruction‟s that make up one module of the program. The IR.Program.codeGenVMx86(boolean) method returns an instance of the VMx86.Program class that contains a VMx86 representation of the IR. The VMx86.Program class contains one package-visible static variable named “VMx86.Program.IntelStyleOutput”, which is used to control the syntax of the code emitted by the compiler. When this boolean is set to true, Intel style code is emitted. When false, AT&T style Intel assembly is emitted. Examples:
IntelStyleOutput = true mov EAX, EBX mov EAX, [EBX*4] IntelStyleOutput = false movl %ebx, %eax movl (,4,%ebx), %eax

Extra-package code may set this variable with the VMx86.Program.setIntelOutput(boolean) method. The VMx86.Program.addInst(VMx86.Instruction) method is used to add an Instruction to the end of the stream of 80x86 instructions. This method simply causes the Instruction to be appended to the end of the instruction list. This method is used extensively in the IR to VMx86 translation process. The VMx86.Program.outputCode(PrintWriter) method causes the Program object to emit code for all of the Instruction‟s that have been addInst()‟ed. This allows the complete program to be written to the specified PrintWriter as an assembly file for the appropriate target platform. The VMx86.Operand Class
public abstract class Operand { public abstract String getName(); public abstract int getSize(); }

The VMx86.Operand class is a very simple base class for “parameters” to instructions to extend. Subclasses of this class must override two methods, getName() and getSize(). The getName() routine returns the current Operand in String form, formatted with respect to the current value of Program.IntelStyleOutput.
CS490 – Adv. Compilers

Page 3

The IR to VMx86 Translation Module

Chris Lattner

The getSize() method returns the intrinsic size of the Operand in bits, or –1 if no intrinsic size is defined. The VMx86.Register Class
public final class Register extends Operand { public static Register AL, AH, BL, BH; // 8 bit registers public static Register CL, CH, DL, DH; public static Register AX, CX, DX, BX; public static Register SP, BP, SI, DI; public static Register EAX, ECX, EDX, EBX; public static Register ESP, EBP, ESI, EDI; public int getSize(); public String getName(); // 16 bit registers // 32 bit registers

// Defines Operand // Defines Operand

public int getRegisterNumber(); public static Register getRegisterByNum(int idx); }

The VMx86.Register class is defined to represent the registers of the 80x86 architecture. Because there are a fixed number of registers in the x86 architecture, the constructor for this class is private, allowing no new Register‟s to be created. To refer to a register, refer to one of the public static members of the class (for example “ VMx86.Register.EAX” to refer to the EAX register). The getSize() method returns the size of the register in bits (either 8, 16, or 32). The getName() method returns the name of the register formatted according to the current output style (ie EAX or %eax). Each register of the x86 architecture is numbered. To retrieve this unique number, call the getRegisterNumber() method. To retrieve a 32-bit register knowing it‟s number, call the getRegisterByNum(int) method. The VMx86.Memory Class
public class Memory extends Operand { public Memory(Register base, int Disp, int Scale, Register Index); public Memory(Register base, int Disp); public Memory(Register base); public Memory(int Disp); public int getSize(); public String getName(); // Defines Operand // Defines Operand

public final Register getBaseRegister(); public final int getDisplacement(); }

The VMx86.Memory class is designed to encapsulate the x86‟s complex memory addressing schemes. It handles any combination of the x86 Scaled Index Byte addressing modes. In this addressing mode, memory addresses are allowed to be specified with a base register, an integer offset, a scaling factor (of 1, 2, 4, or 8) and an index register that is multiplied by the scaling factor. Because VMx86.Memory operands have no intrinsic size, the getSize() method always returns –1. The VMx86.Constant Class
public abstract class Constant extends Operand { public abstract String getDataDeclForm();

CS490 – Adv. Compilers

Page 4

The IR to VMx86 Translation Module

Chris Lattner

public abstract String getName(); public abstract int getSize(); }

// Defines Operand // Defines Operand

The abstract Constant class is the base class that all constants in VMx86 must extend. It promotes the required getName() and getSize() methods, but also adds a new requirement, getDataDeclForm(). It turns out that under some systems (specifically Unix/gas), constants are used in expressions differently than they are when used in data declarations. For example, with a literal numeric constant, one would expect to see something like this:
movl $1, %eax movl 1, %eax / is valid / is not valid

Which prefixes the number with a dollar sign. however, the dollar sign is not prefixed.
.long 1 .long $1 / is valid / is not valid

When used with the .long directive,

Similarly, other forms of constants are different in this case. The getDataDeclForm() method returns a text string that represents the constant in a form suitable for use in the data declaration type of situations. There are three concrete subclasses of Constant, one for literal constants, one for symbolic label addresses, and one for string constants. The class for literal integer constants is named LConst:
public class LConst extends Constant { public LConst(int val); public int getValue(); public String getDataDeclForm(); public int getSize(); public String getName(); } // Defines Constant // Defines Operand // Defines Operand

The class that represents the addresses of labels is named VMx86.LabelAddress:
public class LabelAddress extends Constant { public LabelAddress(Label L); public String getDataDeclForm(); public int getSize(); public String getName(); } // Defines Constant // Defines Operand // Defines Operand

In addition to simple constants, it is possible to create composite constants created with several constants related by an expression (for example, the difference of two label addresses). To represent these, the ConstAdd and ConstSubtract classes exist.
public class ConstAdd extends Constant { public ConstAdd(Constant C1, Constant C2); public int getSize(); public String getDataDeclForm(); public String getName(); } public class ConstSubtract extends Constant { public ConstSubtract(Constant C1, Constant C2); public int getSize(); public String getDataDeclForm();

CS490 – Adv. Compilers

Page 5

The IR to VMx86 Translation Module

Chris Lattner

public String getName(); }

These classes take two Constant‟s as parameters, and express the relation between them.

Pseudo-Instruction Classes
The first form of Instruction subclasses that we will look at is the Pseudo-Instruction type of Instruction. There is no semantic difference between instruction and PseudoInstructions… the distinction is based strictly on the fact that Pseudo-Instruction‟s do not cause the assembler to generate code. Instead, they emit pseudo-ops and comments. The classes that fall into this category include Comment, Label, FileHeader, ObjectSize, SegmentSelect, EmitLong, and EmitString. The VMx86.Comment Class
public class Comment extends Instruction { public Comment(String Value); public Comment(String Value, boolean Seperator); public String getInstAsString(); public String getTreeDesc(Object obj); } // Defines Instruction // Overrides Instruction

The Comment class is used to insert a comment into the output source, however if the program is optimized (ie, not compiled with –g), all comments are stripped. The optional boolean „Seperator‟ parameter to the constructor causes (if true), an extra blank line to precede the comment. This causes the comment to be a separator of sorts. The VMx86.Label Class
public class Label extends Instruction { // Global/Local - Visibility constants... public static final int GLOBAL, LOCAL; // Object, Function, Label... type arguments... public static final int OBJECT, DATA, SUB, LABEL; public Label(String Name, int Visibility, int Type); public String getName(); public String getInstAsString(); } // Defines Instruction // Overrides Instruction

The VMx86.Label class is a very important class that serves many purposes. It obviously causes labels to be emitted to the code stream, but it also emits assembly pseudo-ops as required to make sure the label has the appropriate scope and declaration. Additionally, this class takes care of the name mangling required to separate different IRWIN namespaces into a shared namespace. Subroutine symbol names are not mangled, as they may be exported and called directly by C functions. Labels, which are local to the current subroutine, are all prefixed with “.L”. Because “.” is an illegal character in an identifier, subroutines cannot conflict with it. Global objects have an “obj$” prefix, and raw data objects have a “data$” prefix added to their symbol names. These names cannot conflict with subroutines or labels, because neither may contain a “$” character in their name. The VMx86.FileHeader Class
public class FileHeader extends Instruction { public FileHeader(String filename); public final String getFilename();

CS490 – Adv. Compilers

Page 6

The IR to VMx86 Translation Module

Chris Lattner

public String getInstAsString(); }

// Defines Instruction

The VMx86.FileHeader class is used to emit a “.file” directive to the output assembly language file to indicate the input source file that caused this program to be generated. This is useful for debugging. The VMx86.ObjectSize Class
public class ObjectSize extends Instruction { public ObjectSize(Label obj, Label objEnd); public ObjectSize(Label obj, int objSize); public String getInstAsString(); } // Defines Instruction

The VMx86.ObjectSize class is used to emit a “.size” pseudo-op to the output stream. There are two forms to this instruction, either an integer size value, or the difference between two labels. The VMx86.SegmentSelect Class
public class SegmentSelect extends Instruction { public static int DATA, TEXT; // Valid values for Type... public SegmentSelect(int Type); public String getInstAsString(); } // Defines Instruction

The VMx86.SegmentSelect class is used to choose which output segment the following instructions should be emitted to. There are two choices, the „.data‟ segment or the „.text‟ segment. This choice is specified with the constructor. The VMx86.EmitLong Class
public class EmitLong extends Instruction { public EmitLong(Constant value); public String getInstAsString(); // Defines Instruction }

The VMx86.EmitLong class is used to emit a long value to the output code stream. This long value goes to the currently active segment (chosen with VMx86.SegmentSelect). This is typically used to emit values to the data segment. The VMx86.EmitString Class
public class EmitString extends Instruction { public final static int STRING, // Normal string ASCII, // Normal string ASCIZ; // Null terminated string... public StringDecl(String val, int type); public StringDecl(String value); // Default to ASCII public String getInstAsString(); } // Defines Instruction

The VMx86.EmitString class is used to emit a string value to the output code stream. This string value goes to the currently active segment (chosen with VMx86.SegmentSelect). This is typically used to emit values to the data segment. Two forms of the string may be outputted, corresponding to a string with a terminating null (VMx86.EmitString.ASCIZ), or a string without any termination characters (VMx86.EmitString.ASCII).

CS490 – Adv. Compilers

Page 7

The IR to VMx86 Translation Module

Chris Lattner

Instruction Classes
There are many classes that implement thin wrappers around the x86 instruction set. These instructions need wrapper classes so that the optimizer can separate and recombine the instructions. Because they are so simple, most classes have very terse descriptions. The VMx86.add Class
public class add extends Instruction { public add(Operand dest, Operand source); public String getInstAsString(); // Defines Instruction }

The VMx86.add class corresponds to the x86 „add‟ instruction. The VMx86.and Class
public class and extends Instruction { public and(Operand dest, Operand source); public String getInstAsString(); // Defines Instruction }

The VMx86.and class corresponds to the x86 „and‟ instruction. The VMx86.call Class
public class call extends Instruction { public call(Label target); public String getInstAsString(); // Defines Instruction }

The VMx86.call class encapsulates the x86 „call‟ instruction. The VMx86.calli Class
public class calli extends Instruction { public calli(Operand addr); public String getInstAsString(); // Defines Instruction }

The VMx86.calli class encapsulates the x86 „call‟ instruction with a computed target address. The VMx86.cmp Class
public class cmp extends Instruction { public cmp(Operand dest, Operand source); public String getInstAsString(); // Defines Instruction }

The VMx86.cmp class corresponds to the x86 „cmp‟ instruction. The VMx86.div Class
public class div extends Instruction { public div(Operand param); public String getInstAsString(); // Defines Instruction }

The VMx86.div class handles the x86 „div‟ instruction, which performs an unsigned division operation. The VMx86.idiv Class
public class idiv extends Instruction {

CS490 – Adv. Compilers

Page 8

The IR to VMx86 Translation Module

Chris Lattner

public idiv(Operand param); public String getInstAsString(); }

// Defines Instruction

The VMx86.idiv class handles the x86 „idiv‟ instruction, which performs a signed division operation. The VMx86.jcc Class
public class jcc extends Instruction { public static final int jnz; // Jump public static final int jz; // Jump public static final int ja; // Jump public static final int jb; // Jump public static final int jae; // Jump not zero zero above below above or equal

public jcc(Label L, int JumpType); public String getInstAsString(); // Defines Instruction }

The VMx86.jcc class is used to encapsulate the various x86 conditional jump instructions. The condition type is specified by the constant passed in as the JumpType argument to the constructor. The VMx86.jmp Class
public class jmp extends Instruction { public jmp(Operand Addr); public jmp(Label L); public String getInstAsString(); } // Defines Instruction

The VMx86.jmp class handles unconditional jumps in the x86 architecture, which may either be computed (destination is an Operand), or absolute (destination is a Label). The VMx86.lea Class
public class lea extends Instruction { public lea(Register dest, Memory source); public String getInstAsString(); // Defines Instruction }

This class is a simple encapsulation of the x86 lea instruction. The VMx86.mov Class
public class mov extends Instruction { public mov(Operand dest, Operand source, int Size); public mov(Operand dest, Operand source); public String getInstAsString(); }

The VMx86.mov instruction is used to copy data from one location to another. This operation works with multiple sizes of data when specified to use two size unspecified operands. The VMx86.mul Class
public class mul extends Instruction { public mul(Operand Source); public String getInstAsString(); // Defines Instruction }

The VMx86.mul class corresponds to the x86 „mul‟ instruction.

CS490 – Adv. Compilers

Page 9

The IR to VMx86 Translation Module

Chris Lattner

The VMx86.or Class
public class or extends Instruction { public or(Operand dest, Operand source); public String getInstAsString(); // Defines Instruction }

The VMx86.or class corresponds to the x86 „or‟ instruction. The VMx86.pop Class
public class pop extends Instruction { public pop(Operand dest, Operand source); public String getInstAsString(); // Defines Instruction }

The VMx86.pop class corresponds to the x86 „pop‟ instruction. The VMx86.push Class
public class push extends Instruction { public push(Operand param); public String getInstAsString(); // Defines Instruction }

The VMx86.push class is a very straightforward class to push an operand onto the stack. The VMx86.ret Class
public class ret extends Instruction { public ret(); public ret(int numWords); public String getInstAsString(); // Defines Instruction }

The VMx86.ret class encapsulates the same named 80x86 instruction. It optionally takes an argument that specifies the number of words to pop before the ret instruction is executed. The VMx86.sar Class
public class sar extends Instruction { public sar(Operand param, int amount); public sar(Operand param); // Shift right by amount in CL register... public String getInstAsString(); } // Defines Instruction

The VMx86.sar class corresponds to the x86 „sar‟ instruction. The VMx86.setcc Class
public class setcc extends Instruction { public static final int LT; // Less Than public static final int GT; // Greater than public static final int LTE; // Less than or equal public static final int GTE; // Greater than or equal public static final int NE; // Not equal to public static final int EQ; // Equal to public setcc(Operand dest, int type); public String getInstAsString(); } // Defines Instruction

The VMx86.setcc class is used to set a x86 register to true if a condition is true. This is a useful way to copy a value from the flags register into a useful general register.

CS490 – Adv. Compilers

Page 10

The IR to VMx86 Translation Module

Chris Lattner

The VMx86.shl Class
public class shl extends Instruction { public shl(Operand param, int amount); public shl(Operand param); // Shift left by amount in CL register... public String getInstAsString(); } // Defines Instruction

The VMx86.shl class corresponds to the x86 „shl‟ instruction. The VMx86.shr Class
public class shr extends Instruction { public shr(Operand param, int amount); public shr(Operand param); // Shift right by amount in CL register... public String getInstAsString(); } // Defines Instruction

The VMx86.shr class corresponds to the x86 „shr‟ instruction. The VMx86.sub Class
public class sub extends Instruction { public sub(Operand dest, Operand source); public String getInstAsString(); // Defines Instruction }

The VMx86.sub class corresponds to the x86 „sub‟ instruction. The VMx86.test Class
public class test extends Instruction { public test(Operand dest, Operand source); public String getInstAsString(); // Defines Instruction }

The VMx86.test class corresponds to the x86 „test‟ instruction. The VMx86.xor Class
public class xor extends Instruction { public xor(Operand dest, Operand source); public String getInstAsString(); // Defines Instruction }

The VMx86.xor class corresponds to the x86 „xor‟ instruction.

The IR Package
The IR package consists of classes that represent the IRWIN language in an internal list representation. The IR package is the intermediate representation used by the Tiger compiler. Multiple languages could be supported by this intermediate representation, allowing multiple front ends to be created for the compiler. Currently two front ends exist for the intermediate language, the IRWIN parser and the Tiger parser. To understand the aim of the IRWIN intermediate representation, please read “Intermediate Representation With an Interesting Name”. This section of this document describes both the IR package and the IR to VMx86 transformation that occurs for each node. This document only describes IR classes that are actually used in the transformation process. Other support classes, such as IR.TreeSymbolTableDisplay, are documented in their source code.

CS490 – Adv. Compilers

Page 11

The IR to VMx86 Translation Module

Chris Lattner

Design Considerations
The classes of the IR package are designed with their usage in mind. Objects from this package will be created in three situations: 1. The IRWIN parser is parsing directly into the intermediate representation. 2. The Tiger compiler is transforming the Tiger Abstract Syntax Tree into the intermediate representation. 3. The Tiger compiler is performing one of many IR  IR transformations that it provides. In addition to creation, the classes of the IR package must support output of the IR structure to an IRWIN file.

Fundamental Classes
Just as with the VMx86 package, the IR package contains an Instruction class that is the base class of all simple instructions. In addition to the IR.Instruction class, the IR package contains a Program class that provides very similar functionality to the VMx86.Program class. Unlike the VMx86 package, the IR package has a concept of Subroutine‟s. Global data is represented by subclasses of the GlobalObject class, and terms for expressions are represented with the SimpleValue class. The IR.Instruction Class
public abstract class Instruction extends TreeDisplay.TreeDisplayableAdapter { public abstract String getInstAsString(); // Simple linked list operations... public void setPrev(Instruction P); public void setNext(Instruction N); public Instruction getPrev(); public Instruction getNext(); public public public public void void void void append(Instruction I); unlinkFromList(); removeInstruction(); replaceThisInstruction(Instruction With);

// IR Optimization code below... public boolean doPeepholeOptimization(); public abstract SimpleValue[] getReadSimpleValues(); public abstract SimpleValue getWriteSimpleValue(); public final SimpleValue[] getSimpleValueOperands(); protected final void addSimpleValueReferences(); // Called by ctors typically

public final void replaceSimpleValue(SimpleValue Old, SimpleValue New); protected abstract void ReplaceSimpleValue(SimpleValue O, SimpleValue N); public final void eliminateWriteInst(); protected void doEliminateWriteInst(); // VMx86 Specific Code Below... public abstract void codeGenVMx86(VMx86.Program P, boolean Debug);

CS490 – Adv. Compilers

Page 12

The IR to VMx86 Translation Module

Chris Lattner

// Valid values for public static final public static final public static final public static final public static final public static final

the int int int int int int

Type argument to VMx86DebugRegisterFlags(..) VMx86RegTypeNoCare; // Ignore this register VMx86RegTypeInt; // Integer type VMx86RegTypePtr; // Generic Pointer VMx86RegTypeRawPtr; // Raw Data Pointer VMx86RegTypeObjPtr; // Object Pointer VMx86RegTypeEAXidx; // Index into EAX object

protected int VMx86DebugRegisterFlags(VMx86.Register R, int Type); protected void VMx86EmitDebugCode(VMx86.Program P, int Flags); // The stuff below is just for the tree view... public String getTreeDesc(Object obj); }

The IR.Instruction class is designed to be used as a list of primitive operations, that may write to at most one temporary value. Additionally, it must support optimization, tree display, and code generation for the VMx86 platform task for all subclasses. Because of this, it is the most complex class in the IR package. To be a proper subclass of the Instruction class, subclasses must override all of the abstract virtual methods getInstAsString(), getReadSimpleValues(), getWriteSimpleValue(), ReplaceSimpleValue(), and codeGenVMx86(). The getInstAsString() method is the simplest method that subclasses must implement. It is designed to return a representation of the node as a string. In fact, this string should be something that would parse correctly by the IRWIN parser, because this method is how output is generated when compiling with an IRWIN destination.

Linked List Operations
Because the program in the IR representation may undergo many optimization phases, the IR list can be modified in many ways. To support robust operation, the Instruction class itself provides operations for adjusting itself in a list. The getNext(), setNext(), getPrev(), setPrev() operations all provide simple list operations on a node. The two “set” operations also set the adjoining node to point back to the current node. For example, if you call “A.setNext(B);”, A‟s next pointer is set to B and B‟s previous pointer is set to A. This is so that A.getNext().getPrev() == A is an invariant (baring end of list exceptions). The four meta-list operations are provided to do common operations used by optimization phases of the compiler. Note that these operations do not correctly function if dealing with an end of list situation. Because of this, the lists in the IR are all terminated with instances of IR.NullInstruction (see „The IR.NullInstruction Class‟, Page #20).

IR Optimization Methods
There are two main goals of the IR optimization phase. The first is to allow simple and easy operations to occur, such as peephole optimization, with little complication. The second is to allow characterization of IR nodes to allow sophisticated optimization techniques to be implemented without undue obfuscation. The first optimization method is the doPeephopeOptimization() method. This abstract method should be overridden by subclasses that have some peephole optimizations that can be done to them. For example, the IR.Multiply node uses this method to replace itself with a shift operation if an operand is a constant power of two. If a change is made to the underlying structure of the program, this method should return true. If no changes have been made, a value of false is expected.
CS490 – Adv. Compilers

Page 13

The IR to VMx86 Translation Module

Chris Lattner

For characterization purposes, several methods are used to determine what kinds of operands are being used by the instructions, and to globally modify operands. For example, if a variable is determined to always contain the value of 5, an optimization would be to replace all uses of that variable with the constant 5 (potentially allowing further optimization). The getReadSimpleValues() and getWriteSimpleValue() methods are used to characterize what types of temporaries are read from and written to by this operation. An operand may be returned by both methods in the case of a read-modify-write operation. To get the union of the two sets, call the getSimpleValueOperands() method. Note that instructions are only allowed to modify one value at a time, which is reflected by the getWriteSimpleValue() method only returning one operand. The addSimpleValueReferences() method is typically used by the constructor of the method to addref all of the operands referenced by the object (as defined by the above two methods). It is imperative that instructions addRef(..) their operands when using them so that optimization does not remove an operand that is still in use. In the case above, if the replacement of an operand has to be made, the replaceSimpleValue(..) method may be used. This method provides a wrapper around the abstract doReplaceSimpleValue() method that essentially makes sure that reference counts are correct after the replacement. Subclasses must implement the doReplaceSimpleValue() method to actually replace the internal handles with alternative targets. The eliminateWriteInst() method is used by optimization phases that determine that the operand being written to (as defined by the getWriteSimpleValue() method) is never read. If this is the case, either the instruction may be eliminated (the default case for sideeffect-less instruction), or the write portion of the instruction may be eliminated (for a side-effect-producing instruction, such as a call). For example, the Multiply subclass does not override this method, so it is simply removed from the instruction list. The Call subclass does override this method, causing it to convert an “X = call Y(a, b, c)” into “call Y(a, b, c)”.

VMx86 Code Generation Methods
The simplest and most important method required by the VMx86 code generation phase of the IR compiler is the codeGenVMx86() method. Calling this method is supposed to cause instructions to be addInst()‟d to the VMx86.Program object passed as a parameter. If the Debug parameter is true, extra debugging code may also be added. To simplify the addition of this debugging code, the IR.Instruction class provides an assertion service. This allows the subclasses of Instruction to assert that an x86 register holds a particular type of value. These values are defined by constants in the Instruction class: VMx86RegTypeNoCare, VMx86RegTypeInt, VMx86RegTypePtr, VMx86RegTypeRawPtr, VMx86RegTypeObjPtr, and VMx86RegTypeEAXidx. These values correspond to: Unknown type, integral type, unknown pointer type, pointer to raw data block, pointer to regular object block, and a valid index into a block whose pointer is stored in the EAX register. These constants are used in conjunction with the VMx86DebugRegisterFlags(..) method, which returns these codes and the specified register in an encoded (compact) form. To make multiple assertions, simply „|‟ the result together into a single value. When a value has been computed, code may be emitted to the program stream with the VMx86EmitDebugCode(..) method. An example of the expected usage to assert EAX is a pointer to a raw data block, and EDX is a valid index into this block would be:
CS490 – Adv. Compilers

Page 14

The IR to VMx86 Translation Module

Chris Lattner

// Compute flags... int Flags = VMx86DebugRegisterFlags(VMx86.Register.EAX, VMx86RegTypeRawPtr) | VMx86DebugRegisterFlags(VMx86.Register.EDX, VMx86RegTypeEAXidx); VMx86DebugRegisterFlags(P, Flags); // Emit assertions to program stream.

This debug facility is designed to allow for flexible assertions and expansion in the future. The IR.Program Class
public class Program extends TreeDisplay.TreeDisplayableAdapter { public Program(SymbolTable objectSymTab, SymbolTable dataSymTab, SymbolTable subsSymTab, TreeDisplayable metaInfo, String filename); public Program Optimize(); public void outputCode(java.io.PrintWriter Output); // VMx86 Specific Code... public VMx86.Program codeGenVMx86(boolean Debug); }

The IR.Program class is very similar in spirit to the VMx86.Program class... it provides a container for the instructions that make up the program, and is a centralized dispatcher for program wide events like optimization passes. Unlike the VMx86 layer of the compiler, the IR.Program class organizes the portions of the program into subroutines instead of directly into instructions. This class provides two generally useful methods, Optimize() and outputCode(). The Optimize() method invokes the available IR  IR optimizations that are available. It returns a Program instance that corresponds to the optimized program. The other generally useful routine is the outputCode() method, which causes the program emit IRWIN code to the specified PrintWriter object.

VMx86 Code Generation
After all of the desired IR level optimizations have been performed, code may be generated for the VMx86 platform with the codeGenVMx86(..) method. This method causes all of the global objects to be emitted to the output program, and all of the subroutines to emit themselves. Because the global scope instructions are stuffed into a subroutine, they are automatically compiled. The final step of
VMx86.Program.GroupSegments()

CodeGen Data Objects

CodeGen Global Objects

CodePregen Subroutines

CodeGen Subroutines

code generation is a call to Group Segments Together to group all of the segments of the Figure 1 - IR.Program Code Generation executable together (text with text and data with data). This IR to IR transformation does not affect the quality of the generated, it simply cleans things up. It is especially important when compiling in debug mode, because global objects are created for most debug records, and this avoids having emissions to the GC segment spread throughout the generated code. The IR.Subroutine Class
public class Subroutine extends TreeDisplay.TreeDisplayableAdapter { public Subroutine(String Name, SymbolTable Locals, SymbolTable Labels, int NumParams); public final SymbolTable getSymbolTable(); public final SymbolTable getLabelSymbolTable(); public final String getName();

CS490 – Adv. Compilers

Page 15

The IR to VMx86 Translation Module

Chris Lattner

public public public public

final int getNumParams(); void setNumParams(int np); void setParams(Vector P); final void setBody(Instruction body);

public void addRef(SubUser SU); public void unRef(SubUser SU); public String getInstAsString(); // IR Optimization code below... public boolean Optimize(); public boolean inlineSelfIntoCallers(); // IRWIN Specific code Below... public void outputCode(java.io.PrintWriter Output); // VMx86 public public public public } Specific Code Below... VMx86.Label getVMx86StartLabel(); void codePregenVMx86(VMx86.Program P, boolean Debug) void codeGenVMx86(VMx86.Program P, boolean Debug) void emitReturnVMx86(VMx86.Program P, boolean Debug);

The IR.Subroutine class is the next level of encapsulation of the IRWIN program. It contains the actual Instruction objects that make up the program. In addition to the Instruction‟s that make up the program, a symbol table of local variables and a symbol table of labels are maintained (getSymbolTable(), getLabelSymbolTable()), information about the subroutine parameters are tracked, and (of course) the name of the subroutine is retained. The Subroutine class also provides a dispatch point used by the Program class to run intra-subroutine optimizations. The addRef()/unRef() routines are used to keep an accurate count of the number of uses there are of the subroutine. Additionally these are used to allow the users (which implement the SubUser interface), to update themselves if an aspect of the subroutine changes. Two examples of notification required are when a call gets inlined, or if a subroutine determines it does not need a parameter any longer.

VMx86 Code Generation
Code generation for subroutines consists of two steps. The first step corresponds to the codePregenVMx86(..) method. This is called to allocate resources that may potentially be used by other subroutines before this subroutine gets codeGenVMx86‟d. Currently this routine creates a new VMx86.Label that corresponds to the start of the subroutine (needed for the sub[foo] syntax, and for calls), assigns stack offsets to local variables and parameters (needed for the foo@var syntax), and pre-declares variables so that they are available when codeGen‟ing. The actual code generation step for the Subroutine class consists of many of debugging aids. For example, comments are emitted to indicate the stack locations of all local variables. Aside from that the subroutine code generation phase establishes the standard subroutine frame with the following assembly code:
pushl %ebp movl %esp, %ebp
VMx86.Label StartLabel

Emit Debugging Comments

VMx86.push(EBP)

VMx86.mov(EBP, ESP)

VMx86.sub(ESP, 4*NumLocals)

Emit Subroutine Body

VMx86.Label EndLabel

VMx86.ObjectSize
Figure 2 - IR.Subroutine Code Generation

CS490 – Adv. Compilers

Page 16

The IR to VMx86 Translation Module

Chris Lattner

subl [4*NumParams], %esp

Note that no cleanup code is created. This is because the IR.Return class emits it whenever an IRWIN „return‟ operation is encountered. The last two nodes establish the size of the subroutine, so that the debugger knows when to stop disassembling the subroutine. The IR.SubroutineImport Class
public class SubroutineImport extends Subroutine { public static final int LANG_IRWIN, LANG_C; public SubroutineImport(String Name, int NumParams, int CallConv); public String getInstAsString(); // IRWIN Specific Code Below... public void outputCode(java.io.PrintWriter Output); // Overrides IR.Subroutine public boolean Optimize(); // Overrides IR.Subroutine // VMx86 Specific Code Below... only emits a comment. public void codePregenVMx86(VMx86.Program P, boolean Debug); public void codeGenVMx86(VMx86.Program P, boolean Debug); }

The subroutine is a simple subclass of IR.Subroutine that is used to represent imported subroutines. Because these subroutines are imported, no VMx86 code is generated for this node... only a comment is emitted to indicate that the subroutine was indeed imported. It is then up to the linker to resolve this reference. The IR.GlobalObject Class
public abstract class GlobalObject extends TreeDisplay.TreeDisplayableAdapter { public GlobalObject(boolean raw, String name); public GlobalObject(boolean raw); // Creates a unique name... public final String getName(); public abstract int getSize(); public final boolean isRaw(); // IRWIN Specific Code Below... public void outputCode(java.io.PrintWriter Output); // VMx86 public public public } Specific Code Below... final VMx86.Label getVMx86ObjectLabel(); abstract void codeGenVMx86GlobalObj(VMx86.Program P); void codeGenVMx86(VMx86.Program P, boolean Debug);

The IR.GlobalObject class is the base class of all global objects (either raw or normal). As such it provides a framework of utilities for the subclasses to inherit. Currently, there are three different types of global objects provided by IRWIN: 1. Global Object Lists. IE: object X = 12, sub[foo], object[x] 2. Data Object Lists. IE: data X = 12, 24, 56 3. Data Object Strings. IE: data X = “hello world\0a” These different cases are implemented with the and GlobalObjectString subclasses of GlobalObject. They generate code by specializing the codeGenVMx86GlobalObj(..) virtual method.
GlobalObjectList

three

VMx86.Label

VMx86.EmitLong (header)

VMx86 Code Generation

Specialized CodeGen

VMx86.Label

CS490 – Adv. Compilers

Page 17
VMx86.ObjectSize
Figure 3 - IR.GlobalObject Code Generation

The IR to VMx86 Translation Module

Chris Lattner

Although much of the code generated by GlobalObject is specific to its subclasses, there is shared code that is used to emit the header for the object and the trailer that computes the size of the object (strictly for debugger support). As such, the code generation diagram looks as appears to the right. Code generation for GlobalObjectList is simply a series of VMx86.EmitLong‟s, one per word of the list. Code generation for the GlobalObjectString subclass consists of one instance of VMx86.EmitString, with the escaped value of the string as a parameter.

IR.Instruction Subclasses
Although there are many ancestors of IR.Instruction, there are only a few direct subclasses. Most instructions inherit from one of the helper classes for Unary Operators, Binary Operators, or Indirect Operators. As of this writing, there are only the following direct subclasses: Call, CallIndirect, Goto, Label, NullInstruction, and Return. The diagrams below illustrate the sequence of VMx86 instructions generated. Many of the diagrams contain optional instructions (denoted by two flow of control branches) that assert that parameters are correct. Compiling with the „-g‟ command line option enables these instructions. The IR.Call Class
public class Call extends Instruction implements SubUser { public Call(SimpleValue dest, Subroutine sub, Vector params, Subroutine subContext); public SimpleValue getDest(); public String getInstAsString(); public SimpleValue[] getParamList(); // VMx86 Specific Code Below... public void codeGenVMx86(VMx86.Program P, boolean Debug); }
VMx86.Push Parameter

The IR.Call class corresponds to the IRWIN “call Sub(a, b, c)” instruction. It is a very simple class that is only complicated by optimization phases (not discussed here). It only needs to keep track of the values of its parameters, and the subroutine that it needs to call.

VMx86.Call (subroutine)

VMx86.add(ESP, 4*numparams)

VMx86 Code Generation

VMx86.mov(result, EAX)
Figure 4 - IR.Call Code Generation

Code generation for the x86 platform is very simple. The parameters are pushed onto the stack, the subroutine is call‟ed, and then the stack is readjusted on return from the subroutine. If the result of the call is used, then the result is copied into the destination location. The IR.CallIndirect Class
public class CallIndirect extends Instruction { public CallIndirect(SimpleValue Dest, Simplevalue SubAddr, Vector Params, Subroutine subContext); public SimpleValue getDest(); public String getInstAsString(); public SimpleValue[] getParamList();
VMx86.Push Parameter

VMx86.mov(EAX, SubAddr)

assert EAX is an integer

VMx86.calli(SubAddr) VMx86.add(ESP, 4*numparams) Page VMx86.mov(result, EAX)
Figure 5 - IR.CallIndirect Code Generation

CS490 – Adv. Compilers

18

The IR to VMx86 Translation Module

Chris Lattner

// VMx86 Specific Code Below... public void codeGenVMx86(VMx86.Program P, boolean Debug); }

The IR.CallIndirect class corresponds to the IRWIN “call [X](a, b, c)” instruction. It too is a simple class, and has even less information to keep track of.

VMx86 Code Generation
Code generation for the x86 platform is very simple. The parameters are pushed onto the stack, the subroutine address is calli‟ed, and then the stack is readjusted on return from the subroutine. If the result of the call is used, then the result is copied into the destination location. The optional loop in the diagram is contingent on whether the Debug parameter is true. If so, code is emitted (with the previously mentioned IR.Instruction.VMx86EmitDebugCode(..) routine), to ensure that the subroutine pointer is an integer. This value must appear to be an integer, because the subroutine is aligned on a four-byte boundary. This is convenient, because then subroutine addresses are treated as integers… this means that they are not garbage collected. The IR.Goto Class
public class Goto extends Instruction implements LabelUser { public Goto(Label Dest); public Goto(Label Dest, SimpleValue Predicate); public Goto(SimpleValue Address, SimpleValue Predicate); public boolean isUnconditional(); public String getInstAsString(); // VMx86 Specific Code Below... public void codeGenVMx86(VMx86.Program P, boolean Debug); }

The IR.Goto class corresponds to the IRWIN “if P goto L” instruction and the “if P goto [addr]” instruction.

VMx86 Code Generation
Code generation for the x86 platform is very simple, but has to deal with a couple of different cases. These cases are broken out as follows: 1. If this is an unconditional direct jump, simply emit a VMx86.jmp instruction. 2. If this is an unconditional indirect jump, emit:
movl DestAddr, %eax jmp (%eax)

3. If this is a conditional jump, emit:
movl Predicate, %eax testl –1, %eax

If this is a direct jump, finish by emitting a VMx86.jcc.jnz instruction. Otherwise, finish, by emitting:
jz temp.label movl, DestAddr, %edx jmp (%edx)

CS490 – Adv. Compilers

Page 19

The IR to VMx86 Translation Module

Chris Lattner

temp.label:

The IR.Label Class
public class Label extends Instruction { public Label(String name); public String getName(); public String getInstAsString(); public void addRef(LabelUser LU); public void unRef(LabelUser LU); // VMx86 public public public } Specific Code Below... VMx86.Label getVMx86Label(); void codePregenVMx86(VMx86.Program P, boolean Debug); void codeGenVMx86(VMx86.Program P, boolean Debug);

The IR.Label class predictably refers to a label in the IRWIN language. When code generated, it corresponds directly to a VMx86.Label instance. In order for other instructions to make forward references to the label code generated by this class, all IR.Label‟s must be codePregen‟d. This instruction creates the label (accessable with the getVMx86Label() method), which is later inserted into the instruction stream by the codeGenVMx86(..) method. The IR.NullInstruction Class
public class NullInstruction extends Instruction { public NullInstruction(Subroutine S); public Subroutine getSubroutine(); public String getInstAsString(); // VMx86 Specific Code Below... public void codeGenVMx86(VMx86.Program P, boolean Debug); }

The IR.NullInstruction class is a very strange class designed to simplify linked list operations on instructions. It generates no code (except a blank line), and has no corresponding IRWIN instruction. Instances of the IR.NullInstruction class are implicitly inserted into the start and end of each subroutine. This ensures that if IR.Instruction.removeInstruction(..) is called on a node, there is a node both before and after the node being removed. This simplifies end of list checking, because it simply does not need to be done. The IR.NullInstruction‟s are never removed or reordered in the chain, and therefore always search as the head and tail nodes. The IR.Return Class
public class Return extends Instruction { public Return(Subroutine Sub, SimpleValue retval); public Return(Subroutine Sub); public SimpleValue getResult(); public String getInstAsString(); // VMx86 Specific Code Below... public void codeGenVMx86(VMx86.Program P, boolean Debug); }

The IR.Return class implements the “return”, “return ()”, and “return (x)” IRWIN instructions. Because it is the only way to exit a subroutine, it also implements the subroutine epilog code.

CS490 – Adv. Compilers

Page 20

The IR to VMx86 Translation Module

Chris Lattner

VMx86 Code Generation
The IR.Return class does not actually emit very much VMx86 code on its own. Instead, it uses a call to the IR.Subroutine.emitReturnVMx86(..) method to do so. The effect of this is just the same as if it actually did it itself, just located in a different file. Regardless, the first step it does is to copy the return value of the subroutine into the EAX register in preparation for the return. Then it copies the EBP register into ESP to restore ESP, then restores the EBP register. Finally it executes a „ret‟ instruction to return to the caller.
VMx86.mov(EAX, RetValue)

VMx86.mov(ESP, EBP)

VMx86.pop(EBP)

VMx86.ret()
Figure 6 - IR.Return Code Generation

Unary Operator Classes
Many IRWIN operators share some common traits: One destination, one source, and one operation. These all fall under the umbrella of the UnaryOperator class, which is designed as a superclass for just this type of instruction. The IR.UnaryOperator Class
public abstract class UnaryOperator extends Instruction { public UnaryOperator(SimpleValue dest, SimpleValue source); protected SimpleValue Dest, Source; public final SimpleValue getSource(); public final SimpleValue getDest(); // Many irrelevant methods removed }

All that the IR.UnaryOperator class itself does is express the fact that subclasses have a singular source and destination operand. Although this may not be very groundbreaking, it is actually very useful because much of the optimization code is shared between subclasses. This is the real reason for this aggregation, although optimization is not discussed here. The constructor takes a source and destination SimpleValue, which subclasses may manipulate with the Dest and Source member variables. External classes may refer to these with the getSource() and getDest() methods. Because it is specific to each subclass, this class does no code generation. The IR.Alloc Class
public class Alloc extends UnaryOperator { public Alloc(SimpleValue dest, SimpleValue numBytes, boolean IsRaw); public String getInstAsString(); // VMx86 Specific Code Below... public void codeGenVMx86(VMx86.Program P, boolean Debug); }

VMx86.mov(EAX, size to alloc)

The IR.Alloc class is one subclass of the UnaryOperator class, and shows just how simple subclasses may be. This class exposes three methods, a constructor, getInstAsString() (to codegen back to IRWIN code), and codeGenVMx86(..) (to codegen to the x86). It has no need to fuss with any optimization issues at all.

assert EAX is an integer

if !raw then VMx86.shl(EAX, 2)

VMx86.push(isRaw ? 2 : 0)

VMx86.call(__Allocate)

CS490 – Adv. Compilers

VMx86.add(ESP, 8)

Page 21

VMx86.mov(Dest, EAX)
Figure 7 - IR.Alloc Code Generation

The IR to VMx86 Translation Module

Chris Lattner

This class corresponds to the “alloc” and “ralloc” IRWIN instructions.

VMx86 Code Generation
The IR.Alloc class does not generate much useful x86 assembly code, instead relying on the runtime library to find the memory to allocate. As such, the primary instruction of importance that is emitted is the VMx86.call instruction to the runtime __Allocate function. Later, much of the code may be inlined into the IR.Alloc VMx86 instructions, because the common case for garbage collection is very few instructions. The IR.IsType Class
public class IsType extends UnaryOperator { public static final int INT = 0; public static final int PTR = 1; public IsType(SimpleValue dest, SimpleValue source, int type); public String getInstAsString(); // VMx86 Specific Code Below... public void codeGenVMx86(VMx86.Program P, boolean Debug); }
VMx86.mov(EAX, param)

The IR.IsType class is the IR implementation of the IRWIN “isint” and “isptr” operators. These take an undetermined value and return a 1 if it is the specified type or a 0 if not.

VMx86.and(EAX, 1)

VMx86 Code Generation
The IR.IsType class takes advantage of the fact that pointers and integers are stored in distinct formats: Pointers always have the low-order bit set, integers always have it cleared. As such, this class simply tests that bit and shifts it by one (because “isint” and “isptr” return integer parameters of 0 or 1). The IR.LocalAddr Class
public class LocalAddr extends UnaryOperator { public LocalAddr(SimpleValue dest, SimpleValue source); public String getInstAsString(); // VMx86 Specific Code Below... public void codeGenVMx86(VMx86.Program P, boolean Debug); }

if isint then VMx86.xor(EAX, 1)

VMx86.shl(EAX, 1)

VMx86.mov(Dest, EAX)
Figure 8 - IR.IsType Code Generation

The IR.LocalAddr class corresponds directly to the “x = local[x]” IRWIN statement. It generates a number that is related to the current stack frame pointer, and may be different across different invocations of a subroutine.

VMx86.lea(EAX, local)

VMx86.mov(Dest, EAX)
Figure 9 - IR.LocalAddr Code Generation

VMx86 Code Generation
The IR.LocalAddr class uses the VMx86.lea instruction to load the address of the local into a register. It then moves the register into the destination. The IR.Move Class
public class Move extends UnaryOperator { public Move(SimpleValue dest, SimpleValue source); public String getInstAsString();

CS490 – Adv. Compilers

Page 22

The IR to VMx86 Translation Module

Chris Lattner

// VMx86 Specific Code Below... public void codeGenVMx86(VMx86.Program P, boolean Debug); }

The IR.Move class is the simplest of all UnaryOperator‟s, because it does no modification to its operand. Instead it simply copies it to the destination operand. This corresponds to the IRWIN “x = y” operation.

VMx86.mov(EAX, Source)

VMx86.mov(Dest, EAX)
Figure 10 - IR.Move Code Generation

VMx86 Code Generation
Code generation for the IR.Move class is a simple matter of copying the source into a register, and then copying the register into the destination. The IR.Size Class
public class Size extends UnaryOperator { public Size(SimpleValue dest, SimpleValue source); public String getInstAsString(); // VMx86 Specific Code Below... public void codeGenVMx86(VMx86.Program P, boolean Debug); }

The IR.Size class is the IR representation of the IRWIN “sizeof” operator. It takes a pointer argument and returns the size of the memory block VMx86.mov(EAX, Pointer) that it points to. It dereferences the pointer and reads the header for the memory block… returning the block size.
assert EAX is a pointer

VMx86 Code Generation
Code generation for the “sizeof” operator consists of checking to make sure that the argument really is a pointer, and then loading the header word of the memory block. Because integers are stored with the low bit clear, the header must be shifted to the left one bit to adjust it correctly. This also has the effect of shifting out the “is raw” bit from the header word. Once computed, this word is stored into the destination.
VMx86.mov(EAX, [EAX-1])

VMx86.shl(EAX, 1)

VMx86.mov(Dest, EAX)
Figure 11 - IR.Size Code Generation

Binary Operator Classes
Binary operators are very similar to unary operators, except they operate on two parameters. Because so much is shared, a common subclass is used for all aggregate functionality. The IR.BinaryOperator Class
public abstract class BinaryOperator extends Instruction { protected SimpleValue Dest, Source1, Source2; public BinaryOperator(SimpleValue dest, SimpleValue s1, SimpleValue s2); public abstract String getInstAsString(); // Many irrelevant methods removed }

CS490 – Adv. Compilers

Page 23

The IR to VMx86 Translation Module

Chris Lattner

This class is very similar to the IR.UnaryOperator class, it simply provides a way to get access two a destination and two source operands. This class provides no code generation methods. The IR.Add Class
public class Add extends BinaryOperator { public Add(SimpleValue Dest, SimpleValue Source1, SimpleValue Source2); public String getInstAsString(); // VMx86 Specific Code Below... public void codeGenVMx86(VMx86.Program P, boolean Debug); }

The IR.Add class is a very straightforward class that implements the IRWIN “+” operator.

VMx86.mov(EAX, Source1)

VMx86.mov(EDX, Source2)

VMx86 Code Generation
Code generation for the addition operator consists of loading both operands into registers, asserting that they are integers, adding them, and then storing the result into the destination. The IR.BitOperators Class
public class BitOperators public static final int public static final int public static final int extends BinaryOperator { AND = 0; OR = 1; XOR = 2;
VMx86.mov(Dest, EAX)
Figure 12 - IR.Add Code Generation

assert EAX is an integer

assert EDX is an integer

VMx86.add(EAX, EDX)

public BitOperators(SimpleValue Dest, SimpleValue Source1, SimpleValue Source2, int Type); public String getInstAsString(); // VMx86 Specific Code Below... public void codeGenVMx86(VMx86.Program P, boolean Debug); }

The IR.BitOperators class implements the IRWIN “&”, “|”, and “^” operators. It maintains an internal type field to determine which operation a particular instance corresponds to, as set by the constructor. These operators all share a class because they are so close to each other in operations.

VMx86.mov(EAX, Source1)

VMx86.mov(EDX, Source2)

assert EAX is an integer

VMx86 Code Generation
Code generation for these three operands consist of loading both operands into registers, asserting that they are integers, performing the operation, and then storing the result into the destination. Note that the operation instruction may be one of VMx86.and, VMx86.or, or VMx86.xor. The IR.Compare Class

assert EDX is an integer

VMx86.<op>(EAX, EDX)

VMx86.mov(Dest, EAX)
Figure 13 - IR.BitOperators Code Generation

public class Compare extends BinaryOperator { public static final int LT = 0; // Less Than public static final int GT = 1; // Greater than public static final int LTE = 2; // Less than or equal public static final int GTE = 3; // Greater than or equal public static final int NE = 4; // Not equal to public static final int EQ = 5; // Equal to

CS490 – Adv. Compilers

Page 24

The IR to VMx86 Translation Module

Chris Lattner

public Compare(SimpleValue Dest, SimpleValue S1, SimpleValue S2, int Type); public String getInstAsString(); // VMx86 Specific Code Below... public void codeGenVMx86(VMx86.Program P, boolean Debug); }

The IR.Compare class corresponds to the 6 different integer comparison operators available in IRWIN. Each instance of IR.Compare maintains an internal „Type‟ field to distinguish which type of comparison is being referred to by the particular instance of the class. The constructor sets this type field.

VMx86 Code Generation
Code generation for the IR.Compare class is simplified by the VMx86 support of the „set<cc>‟ assembly instruction. This instruction sets a byte register to one if the corresponding flag bit is set. The assertion to ensure that EAX or EDX are integers is predicated by the Debug flag, and by the condition being generated. The „EQ‟ and „NE‟ types of Compare instances may be allowed to compare pointers, so the assertions are not relevant. The IR.Divide Class
public class Divide extends BinaryOperator { public static final int SDIV = 0; // Signed division public static final int UDIV = 1; // Unsigned division public static final int SREM = 2; // Signed remainder public Divide(SimpleValue Dest, SimpleValue Source1, SimpleValue Source2, int Type); public String getInstAsString(); // VMx86 Specific Code Below... public void codeGenVMx86(VMx86.Program P, boolean Debug); }

VMx86.mov(EAX, Source1)

VMx86.mov(EDX, Source2)

assert EAX is an integer

assert EDX is an integer

VMx86.mov(ECX, 0)

VMx86.cmp(EAX, EDX)

VMx86.set<cc>(CL)

VMx86.shl(ECX, 1)

VMx86.mov(Dest, ECX)
Figure 14 - IR.Compare Code Generation

The IR.Divide class models the family of instructions that centers on the division operator.

VMx86 Code Generation
Code generation for the IR.Divide class consists mainly of code to handle the special form that integers have and must maintain. Because of this, a lot of shifting has to be done to get the operands into the shifted/unshifted forms of the instructions as needed.

CS490 – Adv. Compilers

Page 25

The IR to VMx86 Translation Module

Chris Lattner

VMx86.mov(EAX, Source1)

VMx86.mov(EAX, Source1)

VMx86.mov(EAX, Source1)

VMx86.mov(EDX, Source2)

VMx86.mov(EDX, Source2)

VMx86.mov(EDX, Source2)

assert EAX is an integer

assert EAX is an integer

assert EAX is an integer

assert EDX is an integer

assert EDX is an integer

assert EDX is an integer

VMx86.mov(EDX, 0)

VMx86.mov(EDX, 0)

VMx86.mov(EDX, 0)

VMx86.mov(ECX, Source2)

VMx86.mov(ECX, Source2)

VMx86.mov(ECX, Source2)

VMx86.div(ECX)

VMx86.idiv(ECX)

VMx86.sar(EAX, 1)

VMx86.shl(EAX, 1)

VMx86.shl(EAX, 1)

VMx86.sar(ECX, 1)

VMx86.mov(Dest, ECX)
Figure 15 - IR. Divide Code Generation SDIV Case

VMx86.mov(Dest, ECX)
Figure 16 - IR. Divide Code Generation DIV Case

VMx86.idiv(ECX)

VMx86.shl(EDX, 1)

VMx86.mov(Dest, EDX)
Figure 17 - IR.Divide Code Generation SREM Case

The IR.Multiply Class
public class Multiply extends BinaryOperator { public Multiply(SimpleValue dest, SimpleValue s1, SimpleValue s2); public String getInstAsString() // VMx86 Specific Code Below... public void codeGenVMx86(VMx86.Program P, boolean Debug); }
VMx86.mov(EAX, Source1)

VMx86.mov(EDX, Source2)

The IR.Multiply class provides a simple encapsulation layer around the IRWIN “*” operator.

assert EAX is an integer

assert EDX is an integer

VMx86 Code Generation
VMx86.shr(EAX, 1)

Code generation for the IR.Multiply class is a simple wrapper around the x86 „mul‟ instruction. The only notable feature is that one operand must be shifted right before multiplication to make the result be result*2, instead of result*4. The IR.Shift Class

VMx86.mul(EDX)

VMx86.mov(Dest, EAX)
Figure 18 - IR.Multiply Code Generation

public class Shift extends BinaryOperator { public static final int LSHIFT = 0; // Left Shift public static final int RSHIFT = 1; // Right Shift public static final int RSIGNSHIFT = 2; // Right Shift, Sign Extended public Shift(SimpleValue dest, SimpleValue s1, SimpleValue s2, int type); public String getInstAsString(); // VMx86 Specific Code Below... public void codeGenVMx86(VMx86.Program P, boolean Debug);

CS490 – Adv. Compilers

Page 26

The IR to VMx86 Translation Module

Chris Lattner

}

VMx86.mov(EAX, Source1)

The IR.Shift class represents the IRWIN “<<”, “>>”, and “>>>” operators.

VMx86.mov(EDX, Source2)

VMx86 Code Generation
The IR.Shift class generates code to make use of the 80x86 „shl‟, „shr‟, or „sar‟ instructions. To do so, it makes use of the code template diagrammed in Figure 19. The “<operation>” node is one of the following:
LSHIFT: RSHIFT: RSIGNSHIFT: VMx86.shl(EAX); VMx86.shr(EAX); VMx86.and(EAX, ~1); VMx86.sar(EAX); VMx86.and(EAX, ~1);

assert EAX is an integer

assert EDX is an integer

VMx86.mov(ECX, Source2)

VMx86.shr(ECX, 1)

<operation>

The „VMx86.and‟ instructions are used to clear out the low bit of the new word, to make sure that it is a valid integer, instead of a pointer. The IR.Subtract Class
public class Subtract extends BinaryOperator { public Subtract(SimpleValue Dest, SimpleValue Source1, SimpleValue Source2); public String getInstAsString(); // VMx86 Specific Code Below... public void codeGenVMx86(VMx86.Program P, boolean Debug); }

VMx86.mov(Dest, EAX)
Figure 19 - IR.Shift Code Generation

VMx86.mov(EAX, Source1)

VMx86.mov(EDX, Source2)

assert EAX is an integer

assert EDX is an integer

The IR.Subtract class is a simple wrapper for the IRWIN “-” operator.

VMx86.sub(EAX, EDX)

VMx86 Code Generation

VMx86.mov(Dest, EAX)
Figure 20 - IR.Subtract Code Generation

Code generation for the subtraction operator simply loads operands into registers, subtracts them and then stores the result into the destination.

Indirect Operator Classes
public abstract class IndirectOperator extends Instruction { protected SimpleValue ArrayBase, Index, Value; public IndirectOperator(SimpleValue val, SimpleValue base, SimpleValue Idx); public abstract String getInstAsString(); }

The IR.IndirectOperator class is subclassed by all memory referencing instructions. This class is further specialized by the IR.IndirectLoadOperator and IR.IndirectStoreOperator classes for loads and stores. The IR.IndirectLoadOperator Class
public abstract class IndirectLoadOperator extends IndirectOperator { public IndirectLoadOperator(SimpleValue val, SimpleValue base, SimpleValue Idx); }

CS490 – Adv. Compilers

Page 27

The IR to VMx86 Translation Module

Chris Lattner

The IR.IndirectLoadOperator class is used to factor out optimization code that is common the various forms of reading from memory. Subclasses are expected to do memory loads.

The IR.IndirectLoad Class
public class IndirectLoad extends IndirectOperator { public IndirectLoad(SimpleValue Dest, SimpleValue Arr, SimpleValue Index); public String getInstAsString(); // VMx86 Specific Code Below... public void codeGenVMx86(VMx86.Program P, boolean Debug); }
VMx86.mov(EAX, ArrayBase)

VMx86.mov(EDX, Index)

assert EAX is an object pointer

The IR.IndirectLoad class is used to load a word out of an object reference. This corresponds to the IRWIN “x = a[b]” syntax.

assert EDX is an index into EAX

VMx86.mov(EAX, EAX+3+EDX*2)

VMx86 Code Generation

VMx86.mov(Dest, EAX)
Figure 21 - IR.IndirectLoad Code Generation

The 80x86 architecture conveniently supports a form of the „mov‟ instruction that does exactly what we need… index into an array of words. Code generation simply makes use of this instruction.

The IR.IndirectLoadByte Class

VMx86.mov(EAX, ArrayBase)

public class IndirectLoadByte extends IndirectOperator { public IndirectLoadByte(SimpleValue dest, SimpleValue Ar, SimpleValue index); public String getInstAsString(); // VMx86 Specific Code Below... public void codeGenVMx86(VMx86.Program P, boolean Debug); }

VMx86.mov(EDX, Index)

assert EAX is a raw pointer

assert EDX is an index into EAX

The IR.IndirectLoadByte class is used to load a byte out of a raw object reference. This corresponds to the IRWIN “x = a{b}” syntax.

VMx86.mov(ECX, 0)

VMx86.mov(CL, EAX+3+EDX*2)

VMx86 Code Generation
Code generation for IR.IndirectLoadByte is very similar to code generation for IR.IndirectLoad. A few extra instructions are introduced to pad the byte with zeros in the upper portion of the word, and then shift the byte left one bit to make it look like an integer.
VMx86.shl(ECX, 1)

VMx86.mov(Dest, ECX)
Figure 22 - IR.IndirectLoadByte Code Generation

The IR.IndirectLoadStack Class

public class IndirectLoadStack extends IndirectLoadOperator { public IndirectLoadStack(Temporary dest, SimpleValue address); public String getInstAsString(); // VMx86 Specific Code Below... public void codeGenVMx86(VMx86.Program P, boolean Debug); }

VMx86.mov(EAX, ArrayBase)

assert EAX is an integer

VMx86.mov(EAX, [EAX])

VMx86.mov(Dest, EAX)

The IR.IndirectLoadStack class is used to load a word out the stack. This corresponds to the IRWIN “x = **b**” syntax.
CS490 – Adv. Compilers

Figure 23 - IR.IndirectLoadStack Code Generation

Page 28

The IR to VMx86 Translation Module

Chris Lattner

VMx86 Code Generation
Code generation for is IR.IndirectLoadStack significantly different than the other IndirectLoad classes. Instead of indirectly loading out of a heap object, it loads from the stack. The parameter argument is a pointer into the stack… but because it is not a pointer to a heap object, it appears to be an integer. This value is then loaded. The IR.IndirectStoreOperator Class
public abstract class IndirectStoreOperator extends IndirectOperator { public IndirectStoreOperator(SimpleValue val, SimpleValue base, SimpleValue Idx); }

The IR.IndirectStoreOperator class is used to factor out optimization code that is common the various forms of reading from memory. Subclasses are expected to do memory loads.

The IR.IndirectStore Class

public class IndirectStore extends IndirectOperator { public IndirectStore(SimpleValue Dst, SimpleValue Arr, SimpleValue Index); public String getInstAsString(); // VMx86 Specific Code Below... public void codeGenVMx86(VMx86.Program P, boolean Debug); }

VMx86.mov(EAX, ArrayBase)

VMx86.mov(EDX, Index)

assert EAX is an object pointer

The IR.IndirectStore class is used to store a word into an object reference. This corresponds to the IRWIN “a[b] = x” syntax.

assert EDX is an index into EAX

VMx86.mov(ECX, Value)

VMx86 Code Generation

VMx86.mov(EAX+3+EDX*2, ECX)
Figure 24 - IR.IndirectStore Code Generation

The 80x86 architecture conveniently supports a form of the „mov‟ instruction that does exactly what we need… index into an array of words. Code generation simply makes use of this instruction.

The IR.IndirectStoreByte Class
public class IndirectStoreByte extends IndirectOperator { public IndirectStoreByte(SimpleValue Dst, SimpleValue Ar, SimpleValue index); public String getInstAsString(); // VMx86 Specific Code Below... public void codeGenVMx86(VMx86.Program P, boolean Debug); }

VMx86.mov(EAX, ArrayBase)

VMx86.mov(EDX, Index)

assert EAX is a raw pointer

assert EDX is an index into EAX

The IR.IndirectStoreByte class is used to store a byte into a raw object reference. This corresponds to the IRWIN “a{b} = x” syntax.

VMx86.mov(ECX, 0)

VMx86.mov(ECX, Value)

VMx86 Code Generation
Code generation for IR.IndirectStoreByte is very similar to code generation for IR.IndirectStore. A few extra instructions are introduced to shift the byte right one bit to adjust it out of the integer form into the raw form.
VMx86.shr(ECX, 1)

VMx86.mov(EAX+3+EDX*2, CL)
Figure 25 - IR.IndirectStoreByte Code Generation

CS490 – Adv. Compilers

Page 29

The IR to VMx86 Translation Module

Chris Lattner

The IR.IndirectStoreStack Class

public class IndirectStoreStack extends IndirectLoadOperator { public IndirectLoadStack(Temporary dest, SimpleValue address); VMx86.mov(EAX, ArrayBase) public String getInstAsString(); // VMx86 Specific Code Below... public void codeGenVMx86(VMx86.Program P, boolean Debug); }
assert EAX is an integer

The IR.IndirectStoreStack class is used to store a word into the stack. This corresponds to the IRWIN “**b** = x” syntax.

VMx86.mov(EDX, Value)

VMx86.mov([EAX], EDX)
Figure 26 - IR.IndirectStoreStack Code Generation

VMx86 Code Generation

Code generation for is IR.IndirectStoreStack significantly different than the other IndirectStore classes. Instead of indirectly storing into a heap object, it stores into the stack. The parameter argument is a pointer into the stack… but because it is not a pointer to a heap object, it appears to be an integer. This value is then stored.

Summary
Code generation for the 80x86 platform is a combination of many small and simple steps. Each class of the IR package is designed to closely match a statement in the IRWIN language, which is low level enough to provide direct code generation into 80x86assembly code.

CS490 – Adv. Compilers

Page 30


				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:6
posted:10/30/2009
language:English
pages:34