Precise and Efficient Garbage Collection in VMKit with MMTk

Document Sample
Precise and Efficient Garbage Collection in VMKit with MMTk Powered By Docstoc
					 Precise and Efficient Garbage
Collection in VMKit with MMTk

       LLVM Developer’s Meeting
           Nicolas Geoffray

        nicolas.geoffray@lip6.fr
                   Background

•   VMKit: Java and .Net on top of LLVM
    •   Uses LLVM’s JIT for executing code

    •   Uses Boehm for GC

•   Performance bottlenecks
    •   No dynamic optimization

    •   Conservative GC




                                  2
                                  CPU-intensive Benchmarks
                                           (JGF)
                           5.00
                                                                                    MacOSX 10.5
                                                                                     8 x X86_32
                                                                                   2.66GHz 12Go
Time normalized to VMKit




                           3.75



                           2.50



                           1.25



                             0
                                   series   lufact    heapsort   crypt       fft   sor    matmult

                                              VMKit                  Jikes          Sun
                                                                 3
                                  VM-intensive Benchmarks
                                         (Dacapo)
                           1.50
                                                                               MacOSX 10.5
                                                                                8 x X86_32
                                                                              2.66GHz 12Go
Time normalized to VMKit




                           1.13



                           0.75



                           0.38



                             0
                                   antlr   fop     luindex          bloat   jython     pmd

                                           VMKit                 Jikes           Sun
                                                             4
         Execution Overheads
      antlr              fop            luindex

   7%                 9%
                     2%                  5%
                                        2%
  7%                8%                 8%
10%                6%                12%
11%        65%
                             75%              73%


   bloat             jython             pmd

  7%5%                                   4%
                   24%                20%
18%                                         38%
           53%     4%          49%   1%
 17%               12%               20%
                     11%                   17%


Application        Allocations        Collections

System.arraycopy   Interface calls    Others
                         5
               Execution Overheads
            antlr                    fop                  luindex

         7%                       9%
                                 2%                        5%
                                                          2%
        7%                      8%                       8%
21%   10%                14%   6%                20%   12%
      11%        65%
                                         75%                    73%


         bloat                   jython                   pmd

        7%5%                                               4%
                               24%                      20%
      18%                                                     38%
35%              53%     23%   4%          49%   37%   1%
       17%                     12%                     20%
                                 11%                         17%


      Application              Allocations              Collections

      System.arraycopy         Interface calls          Others
                                     6
Goal: replace Boehm with
          MMTk

•   MMTk is JikesRVM’s GC
    •   Framework for writing GCs

    •   Multiple GC Implementations (Copying, Mark and trace, Immix)

•   Copying collectors require precise stack scanning

    •   Locate pointers on the stack




                                 7
              But... it’s in Java?

•   Yes, but nothing to be afraid of:
    •   Use of Magic tricks

    •   No use of runtime features (exceptions, inheritance)

    •   No use of standard library

•   Use VMKit’s AOT compiler

    •   Transform MMTk into a .bc file



                                  8
                    Outline

•   Introduction

•   Precise garbage collection

•   Compiling MMTk with VMJC

•   Putting it all together

•   What’s left




                              9
•   Introduction

•   Precise garbage collection

•   Compiling MMTk with VMJC

•   Putting it all together

•   What’s left




                              10
Precise Garbage Collection

•   Write code that locates pointers in the stack
    •   llvm.gcroot in JIT-generated code

    •   llvm.gcroot in VMKit’s runtime written in C++

•   Use LLVM’s GC framework to generate stack
    maps
    •   Caml stack maps for llvm-g++ generated code

    •   JIT stack maps for JIT-generated code




                                 11
Precise Garbage Collection
                         App.java




            llvm-g++
                         VMKit.exe
VMKit.cpp
                       Caml Stackmap

                                          Stackmaps
                                       for precise stack
                                           scanning
                         JITted App
                        JIT Stackmap



                          12
                 Stack Scanning

•   Problem: interweaving of different kinds of functions
    •   Application’s managed (Java or C#) functions: trusted

    •   VMKit’s C++ functions: trusted

    •   Application’s JNI functions: untrusted

•   Solution: create a side-stack for frame addresses
    •   Updated upon entry of a kind of method

    •   VMKit knows the kind of each frame on the thread stack



                                   13
             Type of methods

•   Trusted
    •   Has a stack map, so can manipulate objects (llvm.gcroot)

    •   Saves frame pointer (llvm::NoFramePointerElim)

•   Untrusted
    •   Has no stack map, so should not manipulate objects

    •   May not save the frame pointer




                                 14
Stack Scanning Example (1)
                 VMKit.main (C++)




            15
   Stack Scanning Example (1)
                        VMKit.main (C++)
push(Enter Java)
                        App.main (Java)




                   16
   Stack Scanning Example (1)
                        VMKit.main (C++)
push(Enter Java)
                        App.main (Java)

                        App.function (Java)




                   17
   Stack Scanning Example (1)
                          VMKit.main (C++)
push(Enter Java)
                          App.main (Java)

                          App.function (Java)
push(Enter native)
                          VMKit.runtime (C++)




                     18
   Stack Scanning Example (1)
                          VMKit.main (C++)
push(Enter Java)
                          App.main (Java)

                          App.function (Java)
push(Enter native)
                          VMKit.runtime (C++)
push(Enter Java)
                          App.function2 (Java)


                     19
   Stack Scanning Example (1)
                               VMKit.main (C++)
push(Enter Java)
                               App.main (Java)

                               App.function (Java)
push(Enter native)
                               VMKit.runtime (C++)
push(Enter Java)     fp
                               App.function2 (Java)


                          20
   Stack Scanning Example (1)
                               VMKit.main (C++)
push(Enter Java)
                               App.main (Java)

                               App.function (Java)
push(Enter native)   fp
                               VMKit.runtime (C++)
push(Enter Java)     fp
                               App.function2 (Java)


                          21
   Stack Scanning Example (1)
                               VMKit.main (C++)
push(Enter Java)
                               App.main (Java)
                     fp
                               App.function (Java)
push(Enter native)   fp
                               VMKit.runtime (C++)
push(Enter Java)     fp
                               App.function2 (Java)


                          22
   Stack Scanning Example (1)
                               VMKit.main (C++)
                     fp
push(Enter Java)
                               App.main (Java)
                     fp
                               App.function (Java)
push(Enter native)   fp
                               VMKit.runtime (C++)
push(Enter Java)     fp
                               App.function2 (Java)


                          23
Stack Scanning Example (2)
                 VMKit.main (C++)




            24
   Stack Scanning Example (2)
                        VMKit.main (C++)
push(Enter Java)
                        App.main (Java)




                   25
   Stack Scanning Example (2)
                        VMKit.main (C++)
push(Enter Java)
                        App.main (Java)
push(Enter JNI)
                        App.function (JNI)




                   26
   Stack Scanning Example (2)
                        VMKit.main (C++)
push(Enter Java)
                        App.main (Java)
push(Enter JNI)
                        App.function (JNI)

                        App.function2 (JNI)




                   27
   Stack Scanning Example (2)
                          VMKit.main (C++)
push(Enter Java)
                          App.main (Java)
push(Enter JNI)
                          App.function (JNI)

                          App.function2 (JNI)

push(Enter native)
                          VMKit.jniRuntime (Java)



                     28
   Stack Scanning Example (2)
                                     VMKit.main (C++)
push(Enter Java)
                                     App.main (Java)
push(Enter JNI)
                                     App.function (JNI)

                     saved fp        App.function2 (JNI)

push(Enter native)
                                     VMKit.jniRuntime (Java)



                                29
   Stack Scanning Example (2)
                                     VMKit.main (C++)
push(Enter Java)
                          fp         App.main (Java)
push(Enter JNI)
                                     App.function (JNI)

                     saved fp        App.function2 (JNI)

push(Enter native)
                                     VMKit.jniRuntime (Java)



                                30
   Stack Scanning Example (2)
                                     VMKit.main (C++)
                          fp
push(Enter Java)
                          fp         App.main (Java)
push(Enter JNI)
                                     App.function (JNI)

                     saved fp        App.function2 (JNI)

push(Enter native)
                                     VMKit.jniRuntime (Java)



                                31
       Running the GC


 A precise GC scans the stacks at safe points: point
during execution where the GC can know the type
            of each value on the stack




                       32
Single-threaded Application


•   GC always triggered at safe points
     •   gcmalloc instrunctions

     •   Collector::collect()




                                  33
Multi-threaded Application

•   When entering a GC, must wait for all threads to
    join
    •   Don’t use signals! or no safe point

    •   Use a thread-local variable to poll on method entry and
        backward branches

    •   Scan stacks of threads blocked in JNI or system calls




                                  34
Application changes for GC


 public static void runLoop(int a) {
      while (a--) System.out.println(“Hello World”);
 }




                          35
Application changes for GC
 public static void runLoop(int a) {
      if (getThreadID().doGC) GC()
      while (a--) {
            System.out.println(“Hello World”);
            if (getThreadID().doGC) GC()
       }
 }


                          36
•   Introduction

•   Precise garbage collection

•   Compiling MMTk with VMJC

•   Putting it all together

•   What’s left




                              37
                What is VMJC?

•   An Ahead of Time compiler (AOT)
    •   Generates .bc files from .class files

•   Use of llvm tools to generate platform-dependant
    files
    •   shared library: llc -relocation-model=pic + gcc

    •   executable: llc + ld vmkit + gcc




                                  38
Goal: compile MMTk with
         VMJC

•   Generate a .bc file that can be linked with VMKit
    •   Interface MMTK → VMKit (e.g. threads synchronization, stack
        scanning)

    •   Interface VMKit → MMTk (e.g. gcmalloc)




                                39
Why MMTk does not need
    a Java runtime?

•   No use of runtime features
    •   synchronizations, exceptions, inheritance

•   No use of standard library
    •   HashMap, LinkedList, ArrayList




                                  40
How MMTk is manipulating
      pointers?

•   Definition of Magic classes and methods
    •   Address, Word, Offset

    •   Word Address.loadWord(Offset)

•   Magic classes and methods translated by the
    compiler [VEE’09]
    •   Similar mechanism than Inline ASM for C




                                41
   Example (Frampton [VEE’09])

  Inline ASM in C                    Magic in Java

void
prefetchObjects(
           @NoBoundsCheck



OOP
*buffer,
                  void
prefetchObjects(



int
size)
{
                   

ObjectReference[]
buffer)
{



for(int
i=0;i
<
size;i++){
    

for(int
i=0;i<buffer.length;i++)
{





OOP
o
=
buffer[i];
          



ObjectReference
current





asm
volatile(
               





=
buffer[i];







"prefetchnta
(%0)"
::
     



current.prefetch();







"r"
(o));
                 

}



}
                             }

}



                                42
•   Introduction

•   Precise garbage collection

•   Compiling MMTk with VMJC

•   Putting it all together

•   What’s left




                              43
        Option 1: Object File


•   Create a .o file of MMTk
    •   gcc mmtk.o vmkit.o -o vmkit

•   But...
    •   No inlining in application code




                                  44
Option 2: LLVM Bitcode File

 •   Create a .bc file of MMTk
     •   vmkit (-load mmtk.bc) -java HelloWorld

 •   Late binding of allocations in VMKit code
     •   gcmalloc in C++ are linked at runtime

 •   Inlining in Java code
     •   new in applications are inlined with MMTk’s malloc




                                  45
Option 3: Everything is Bitcode


  •   Create a .bc file of MMTk

  •   Create a .bc file of VMKit

  •   Link, optimize and run




                               46
•   Introduction

•   Precise garbage collection

•   Compiling MMTk with VMJC

•   Putting it all together

•   What’s left




                              47
                    What’s left

•   Implementing the MMTK → VMKit interface
    •   Interactions between the GC and the VM

•   Finish implementation with read/write barriers
    •   In VMKit code, in managed code

•   Run benchmarks!
    •   Benchmark with different GCs from MMTk




                                48
http://vmkit.llvm.org



          49

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:19
posted:5/31/2011
language:English
pages:49