balex.ppt

Document Sample
balex.ppt Powered By Docstoc
					   Performance Issues With
      Compiling JavaTM
                Weiming Gu
                Bill Alexander
                      IBM


CS 375
26 April 2002



Java is a trademark of Sun Microsystems
Outline

  Definitions: Static vs. dynamic compilation

  Why compiling high-performance code for a dynamic
  language like Java is hard for any compiler

  Performance issues with dynamic (just-in-time)
  compilers
Static Compilation: Compile all methods once, before
execution. The resulting binaries can be executed many
times.

Advantage: Because compilation is not in the execution
path, you can spend as much time as you choose doing
optimization.

Disadvantage: Difficult to correctly implement the semantics
of Java's dynamic features such as dynamic class loading,
static initializers, rmi, introspection.
Dynamic Compilation: Also called just-in-time or JIT.
Begin interpreting the program, and compile a method only
after it is invoked (perhaps after many invocations, or never.)
Binaries discarded when JVM terminates.

Advantage: Can use runtime information to produce better
code.

Disadvantage: Some optimizations are just too expensive
to do at runtime.
Dynamic class loading makes some typical compiler
optimizations hard:
  Inlining virtual methods
  Interprocedural analysis

Java's precise semantics make some optimizations hard:
  Exception semantics

Let's look at two examples ...
 Example ... method foo() methods
class Bill { 1: Inlining virtual... } // foo not final
class Weiming extends Bill { ... method foo () ... }

.
.
    Class wCl = Class.forName("Weiming");
    Bill w = (Bill) wCl.newInstance();
    bar(w);
.
.
bar(Bill w){
   .
   .
   w.foo(); // Which foo to call?
   .
   .
}
The compiler knows how to generate code that will invoke the correct foo,
but suppose we want to inline the code for foo into bar when we
compile bar???
           Virtual Method Resolution
                    Bill's
                 Method Table         Method Blocks
                                foo                   compiled
instance                                                code
    of
                                bar
   Bill




                  Weiming's
                 Method Table         Method Blocks

                                foo                   compiled
instance                                                code
    of
Weiming
Inlining foo (cont.)
Least aggressive:
  if (w is really of type Bill) {
      /* inlined code for Bill.foo */
   } else {
      w.foo() // normal virtual method invocation
   }

Slightly more aggressive: Move the virtual invocation to the end of
bar and generate a few nops instead of the if, letting control flow
into the inlined code. If Bill.foo is ever overridden at runtime,
patch the code for bar to branch to the virtual invocation. This saves
the runtime type check and may have slightly better i-cache behavior.

Most aggressive: "Devirtualization." Inline the code for Bill.foo
unconditionally. If class Bill is ever extended at runtime, recompile bar.
This gives us the most freedom to do code optimization by removing
the basic block boundaries.
Example 2 : Array out of bounds exception

Checking array indexes at every array reference is expensive. Often
one check outside a loop is sufficient. Given this Java code:

  for (i=0; i<n; i++) {
     x = a[i];
     ...
  }


we can generate code equivalent to this:

  if( n > a.length) { throw exception }
  else {
      for (i=0; i<n; i++) {
         x = a[i]; // no bounds check inside loop
         ...
      }
  }
Array out of bounds exception (cont.)

But watch out for Java semantics!
Given this Java code:

  for (i=0; i<n; i++) {
     a[i] = <expression>;
  }


it is WRONG to generate code equivalent to this:

  if( n > a.length) { throw exception }
  else {
      for (i=0; i<n; i++) {
         a[i] = <expression>; // no bounds check inside loop
      }
  }


because Java semantics is that a[0] ... a[n-1] must get <expression>
before the exception is thrown.
Array bounds checking (cont.)

The compiler must instead generate code equivalent to this:

  if(n <= a.length) {
      // fast loop
      for(i=0; i<n; i++){
          a[i] = <expression> // no bounds check inside loop
      }
  }
  else {
      // slow loop
      for(i=0; i<n; i++) {
          // bounds check before each statement
          a[i] = <expression>
      }
  }
       Performance Issues With JIT Compilers
                              ( the bad news )

Because compilation occurs during run time, it is very important to
minimize the time spent compiling. Techniques include:

  Don't compile all methods. Interpret methods the first few times they
  are invoked until you are sure they are "hot". (But watch out for the
  long-running loop problem!) "Mixed-mode interpreter". The time saved
  by executing compiled code should exceed the cost of compiling it.

  Settle for less than full optimization. Use heuristics for certain problems
  such as data dependency analysis and register allocation, because
  finding the optimal solution takes too long.

  Use different levels of optimization. Compile with little optimization on
  first invocation, and recompile later with more optimization only for
  "hot" methods.

  Use different "server" and "client" options.
    Performance Issues With JIT Compilers
                           ( the good news )

Because compilation occurs at run time, there are opportunities to
improve performance:

  Take advantage of run time information about program behavior
  concerning which call sites are hot, which loops are long-running,
  which way branches are taken, etc. This can come from
    ƒ a mixed-mode interpreter
    ƒ run time program counter sampling

  Generate machine-specific code. Examples include:
    ƒ Different generations of x86 processors have different instruction
    sets, different numbers of functional units, different latencies, etc.
    It pays to generate different code sequences and different
    instruction schedules for each IA32 processor family.
    ƒ If you are running on a uniprocessor, you can sometimes avoid
    generating certain expensive HW synchronization instructions that
    would be needed on an SMP.
Backups
              IBM Solution Technologies
              System Performance Group

30+ professional performance analysts & consultants

  performance group has existed for 10 years

  expertise in operating systems and middleware performance

  12 PhDs, 10 MS

  more than 100 recent patents & publications

  guest editor and contributors to Feb. 2000
  IBM Systems Journal issue on Java performance
          System Performance Group (cont.)

What do we do?

 Develop and measure workloads which attempt to reflect customer usage

 Analyze performance characteristics of these workloads and enhance performance
  of IBM platforms

 Drive enhancements into platform independent codebase and specific platform
  dependent code for Intel architecture

 Share results with other IBM platforms

 Analyze and enhance performance of operating systems and middleware, including
  JDBC, JSP, servlets, Websphere

 Develop internal tools for monitoring & analyzing performance

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:3
posted:11/13/2011
language:English
pages:16