Learning Center
Plans & pricing Sign in
Sign Out

JVM Internals - PDF

VIEWS: 4,261 PAGES: 51


Understanding and Tuning the Java Virtual Machine (JVM) for WebSphere

© 2006 IBM Corporation

What you should know after this talk
You should understand at a high level how the Java Virtual Machine (JVM) operates underneath your application server and the differences between the JVM’s on platforms

You should understand what the IBM J9 JVM is and what new features it provides to the runtime as well as when to use those features.

You will understand how different garbage collection schemes work and how to use them to effect your applications response times

You will have a knowledge and understanding of some critical JVM tuning parameters that effect the runtime performance of the JVM

Lastly you will get to know which debugging tools are available for the JVM as well as understand when and where to use them most effectively.

JVM Basics

Overview of IBM’s J9 JVM

Memory Management / Garbage Collection

Runtime Performance Tuning

Debugging Tools


JVM Basics
Highest Level Overview Java is a Write Once Run Anywhere (WORA) 3rd generation Object Oriented programming language that is executed on a virtual machine The Java Virtual Machine (JVM) runs applications written in Java after the Java code has been compiled to bytecode via the javac process. The JVM in conjunction with other components performs optimization on your compiled Java code to attempt to make it as fast as native code The JVM performs automatic memory management (Garbage Collection) to ensure that system wide memory leaks do not occur and to allow for easier development by allowing developers not to explicitly have to perform memory management. There are multiple implementations of the JVM which all “should” execute any application written for the Java specification level that JVM was developed for.

JVM Basics
Which JVM do I have? The different platforms that WebSphere Application Server runs on have different JVM implementations in some cases

The IBM J9 JVM is the runtime environment on the following Operating Systems or Platforms AIX, Windows, Linux (x86), Linux (PPC), iSeries, zSeries

The Sun JVM is the runtime environment on all platforms running the Solaris Operating System

The HP JVM (which is a very simple Sun JVM port) is the runtime environment on all platforms running the HP-UX Operating System


JVM Basics
The Overall Java Application Stack JVM is built using OO design. Building Block components providing higher level function for simplified end user development and runtime JVM’s core runtimes are developed in C or C++ and execute a large majority of function in native code Garbage collector, MMU, JIT, etc IO subroutines, OS calls The J2SE/J2EE APIs all exist at the Java Code layer. Makes data structures available Gives users access to needed function Allow black box interactions with system

Java application code

Uses 1 of many possible configurations

Java calls


Pluggable components that dynamically load into the virtual machine Virtual machine JVM Profiler Debugger Realtime Profiler JIT Class loader Interpreter Exception handler Garbage collector

Calls to C libraries

JNI JCL natives

Native applications

OS-specific calls

Thread model

Port Library (file IO, sockets, memory allocation, etc.)

Operating system

JVM Basics
Class loader basics Uses parent-delegation model to allow additional security as well as allow users to implement their own class loaders Loads classes in a hierarchical manner by delegating to the current class loaders parent recursively What does the class loader do? Loading – Obtaining of the byte code from the java class file Verification – Validates that the code inside the class file does not attempt to perform any illegal operations in the JVM and that the class file is well formed and not corrupt Preparation – Performs allocation and initialization of storage space for static class fields. Also creates method tables and object templates Initialization – Executes the classes initialization method and initializes static fields if your java class has them


JVM Basics
JIT Basics The just-in-time compiler (JIT) is not really part of the JVM but is essential for a high performing Java application Java is Write Once Run Anywhere thus it is interpreted by nature and without the JIT could not compete with native code applications

The JIT works by compiling byte code loaded from the class loader when it is access by an application. Due to different platforms having different JITs there is no standard method for when a method is compiled. As your code accesses methods the JIT determines how frequently specific methods are accessed and compiles those touched often quickly to optimize performance


JVM Basics

Overview of IBM’s J9 JVM

Memory Management / Garbage Collection

Runtime Performance Tuning

Debugging Tools


Overview of IBM’s J9 JVM
IBM Toronto Laboratory has a long history (30+ years) of expertise in programming language compilation and optimization technologies. C, C++, Fortran, XML Parser, HPJ (statically compiled Java) Language independent, interprocedural optimizers Parallelization technology Low-level compiler backends: optimizers, linkers, and code generators Dynamic compilation: Java just-in-time (JIT) compilers Close relationships with: Research : productizing innovative ideas and experimental technologies. Hardware : understanding how to achieve the best possible performance with the underlying system and processor. IBM Middleware : performance analysis Deep IP portfolio Java JIT group alone filed 14 U.S. patents in 2004, 6 in 2005.


Overview of IBM’s J9 JVM
What is the J9 JVM? Sun IP-free, but Java 2 (1.3) compliant (J2ME) and J2SE (1.4.2, 5.0) Highly configurable class library implementation Multi-platform PowerPC, IA32, x86-64, and 390 (Linux or z/OS) More applications than the above outside of the middleware space Flexible and sophisticated technology oriented to: Performance (throughput and application startup) Scalability Reliability and Servicability (RAS)


Overview of IBM’s J9 JVM
Garbage collector enhancements Incorporates for the first time generational garbage collection Fine-grained locking of VM data structures Asynchronous compilation Compilation of Java methods proceeds on a background thread • Other application threads do not have to wait to execute the method Improves startup time of heavily multithreaded applications on SMPs Compile-time optimizations to remove contention escape analysis, lock coarsening, … architectural support to limit its effect Superior JIT (Just in time) compiler Multiple optimization methods from application profiling to more intelligent and better code optimization algorithms


Overview of IBM’s J9 JVM
Key Highlights for WAS

Superior Java application execution performance Just-In-Time (JIT) compiler technology • Far improved over JDK 1.4.2 and Sun’s JIT • Maximized performance with minimized runtime overhead – multiple optimization levels, multiple recompilations of the same method, many new optimizations – dynamic observation of the execution of code via profiling to aggressively improve hot code – Interpreter profiling to adapt compilation to compiled methods for block reordering, loop unrolling, etc.


JVM Basics

Overview of IBM’s J9 JVM

Memory Management / Garbage Collection

Runtime Performance Tuning

Debugging Tools


Memory Management / Garbage Collection
Overview Garbage Collection (GC) - the main cause of memory–related performance bottlenecks in Java.

Two things to look at in GC: frequency and duration Frequency depends on the heap size and allocation rate Duration depends on the heap size and Java heap footprint

GC algorithm – it is critical to understand how it works so that tuning is done more intelligently.

How do you eliminate GC bottlenecks minimize the use of objects by following good programming practices Set your heap size properly, memory-tune your JVM


Memory Management / Garbage Collection
JVM Memory Layout
Application classes and objects
JVM JVM Native Native Heap Heap

WAS Classes and Objects JVM Runtime classes, extension classes and objects

Java Heap

JVM Stack JVM Data Segment Segment JVM Native Code

JVM Data Segment

interpreter, verifier, JIT compiler, memory manager, etc

JVM OS Interface Operating System Operating System Typical JVM Architecture

Memory Management / Garbage Collection
Total Application Memory Footprint Application classes and objects Application Resources, etc.

• Fixed Cost
- WebSpere Runtime
- XML Parser - ORB, JCE Security - JMX - Classloaders, etc • User-Controlled - Thread pool - Connection pool - Monitoring/Logging - EJB cache (growable?) - Other cache (prepared statement, security, etc )

Application Footprint

WAS Footprint
Native Implementation (e.g. C++) Data Structures, Code, Runtime artifacts, OS Interface

• Application Dependent
- number of classes - dynacache - security - resources - HTTP session size - WLM

JVM Footprint

Memory Management / Garbage Collection
What factors effect memory performance the most Memory management – how efficient does the system manage memory ?

Total available memory – is there enough memory to satisfy every request for memory ?

Allocation Rate – how often does the application requests for memory ?

Object Size – how big are these objects ?

Object Lifetime – how long do these objects stay reserved by the application ?


Memory Management / Garbage Collection
Parallel VS Concurrent Collectors

Parallel Collectors – two or more threads run at the same time to perform garbage collection Still uses the “stop-the-world” model but instead of only one GC thread, there are helper threads as well.

Concurrent Collectors – collector threads are triggered to run while applications are running Does not use “Stop-the-world” but threads can be asked to perform garbage collection once in a while


Memory Management / Garbage Collection
What garbage collection algorithms are available on my JDK?
IBM J9 JDK Platforms Memory management is configurable using four different policies with varying characteristics 1. 2. 3. 4. Optimize for Throughput – flat heap collector focused on maximum throughput Optimize for Pause Time – flat heap collector with concurrent mark and sweep to minimize GC pause time Subpool – a flat heap technique to help increase performance on multiprocessor systems , commonly greater than 8. Available on IBM pSeries™ and zSeries™ Generational Concurrent – divides heap into “nursery” and “tenured” segments providing fast collection for short lived objects. Can provide maximum throughput with minimal pause times

Sun/HP JDK 5.0 Platforms Garbage collector always Generational but implementation is chosen based on class of system out of the box 1. 2. 3. Serial – Collects objects one at a time in both new and old generations Throughput - Uses a parallel model for collecting objects in the new generation Concurrent – Uses parallel collection in the new generation and concurrent in old.


Memory Management / Garbage Collection
How the IBM Mark and Sweep Garbage Collector Works

Wilderness Thread B
Used Heap Thread Local Heap

Stack Heap lock Thread A

Garbage Used Heap Collector Global Heap
Used Heap Thread Local Heap


System Heap
(JDK 1.4.2)

Thread Local Heap

Memory Management / Garbage Collection
How the IBM J9 Generational and Sun/HP Garbage Collectors Work

JVM Heap
Nursery/Young Generation Old Generation Permanent Space

IBM J9: -Xmn (-Xmns/-Xmnx) Sun: -XX:NewSize=nn -XX:MaxNewSize=nn -Xmn<size>

IBM J9: -Xmo (-Xmos/-Xmox) Sun: -XX:NewRatio=n

Sun JVM Only: -XX:MaxPermSize=nn

• Minor Collection – takes place only in the young generation, normally done through direct copying very efficient • Major Collection – takes place in the old generation and uses the normal mark and sweep algorithm

Memory Management / Garbage Collection
Fragmentation Java Objects in the heap are in most cases moveable In other words they are not tied to a single space in memory Some objects in the heap however cannot be moved either permanently or temporarily Known as “pinned objects” What does J9 do to prevent fragmentation With the addition of new garbage collection strategies as well as a new runtime memory management unit, pinned objects can be moved during compaction and accounted for in a much better manner with JDK 5.0 thus nearly eliminating the fragmentation problem seen in JDK 1.4.2 for the most part.


JVM Basics

Overview of IBM’s J9 JVM

Memory Management / Garbage Collection

Runtime Performance Tuning

Debugging Tools


Runtime Performance Tuning
Overview Tuning the JVM properly is a process that takes time and must be tailored to your application. HOWEVER you can typically get 80% of the maximum performance with 20% of the work by ensuring that you are making good choice on a few key settings To truly extract maximum performance from your application you must know your applications memory allocation and runtime needs The JVM must be tuned in two iterative steps over a testing cycle Step 1: Heap Size tuning Step 2: Applying runtime optimization Applying these two steps repeatedly will lead you to a JVM tuned for your application


Runtime Performance Tuning
Key Parameters The key setting for the IBM JVM that effects performance most on all Java application and should get you near 80% of your maximum performance if set correctly is: Heap Size (-Xms / -Xmx) Ensure that you are setting your minimum and maximum to values that are under you physical memory limitation but allow you to have a substantially large interval between GC’s • Typical low end bound on frequency of GC’s is 10sec • Typical high end bound on duration of GC’s is 1-2sec For the Sun/HP JVM a lot more work is required to get optimal performance than just tuning the heap size as you need to tune the garbage collector and runtime as well A new JVM setting was introduced in JDK 1.4.1 that for Sun has shown promise in automatically tuning the rest of heap settings for your machine • -XX:+AggresiveHeap is issued at the command line and it makes decisions on GC algorithms, Young/Old Generation spaces, and other resources to use. One must also issue the –server parameter to the Sun/HP JVMs to get them to run in their highest performing mode.

Runtime Performance Tuning
What GC Policy should I choose for the J9 JVM?

I want my application to run to completion as quickly as possible. -Xgcpolicy:optthruput My application requires good response time to unpredictable events. -Xgcpolicy:optavgpause My application has a high allocation and death rate. -Xgcpolicy:gencon My application is running on big metal and has high allocation rates on many threads. -Xgcpolicy:subpool


Runtime Performance Tuning
Real world examples
WebSphere 6.1 - Trade 6

Some WebSphere applications perform better with Generational – however some applications degrade in performance.

120 100 80 60 40 20 0 optthruput gencon

Customer may still be interested in generational if it delivers lower GC pause times.

WebSphere 6.1 - SPECjAppServer
120 100 80 60 40 20 0 optthruput gencon

Numbers are approximate and only intended to show a general behaviour seen when running Trade6 compared to SPECjAppServer 29

Runtime Performance Tuning
Other IBM JVM Tuning Parameters
-Xgcthreads<n> -Xnoclassgc -Xnocompactgc -Xoss<size> -Xss<size> -Xlp (default is n-1 for n processors) - turns off class garbage collection (default is false) - turns off compaction which can lead to fragmentation (default is false) - set the max Java stack size of any thread - set the max native stack size of any thread - enables large page support on supported Operating Systems

-Xdisableexplicitgc - turns System.gc() calls into no-ops -Xifa:<on|off|force> - enables the Java code to run on z/OS zAAP processors -Xmaxe / -Xmine - sets the maxium or minimum expansion unit during allocation


Runtime Performance Tuning
What GC Policy should I choose for the Sun JVM? I want my application to concurrently with a lot of other JVM’s (hoteling). Use default serial collector as the GC algorithm is single threaded I need my application to perform good on a large number of processors. -XX:+UseParallelGC I need my application to return near constant response times on machines that have a large number of processors. -XX:+UseConcMarkSweepGC I need my application to return near constant response times on machines that have a small number of processors. -XX:+UseTrainGC


Runtime Performance Tuning
Other Sun/HP JVM Tuning Parameters
-Xincgc -Xnoclassgc -Xss - incremental GC, uses the Train algorithm (default is disabled) - disable class garbage collection - set the stack size of each thread (512K)

-XX:+DisableExplicitGC - no System.gc() will be executed -XX:TargetSurvivorRatio - sets threshold in survivor space for promotion to kick in -XX:+UseAdaptiveSizePolicy - JVM determines good size for Eden, Survivor Spaces -XX:+UseISM - allows for bigger pages (4MB)

-XX:+UseMPSS (used only for Solaris 9) -XX:+AggressiveHeap -maximizes heap size and algorithms for speed -Xoptgc - optimizes GC in Young Generation (HP only)


Runtime Performance Tuning
How to tune a generation GC setup – General

We need to consider the respective size of the nursery and the tenured space. Two approaches Dynamic • Specify the mininum and maxiumum heap size (e.g. –Xms512m – Xmx1024m) and in the Sun JDK case -XX:+AggressiveHeap • The JVM will dynamically size the nursery and tenured space. • May not give optimal performance • Could be good for low response times. Fixed • Be more specific on the nursery and/or tenured space sizes. • Recommended approach for performance sensitive, server-side applications.


Runtime Performance Tuning
How to tune a generation GC setup – Setting the tenured/old space

The tenured space must be large enough to hold all persistent data of the application. Too small will cause excessive GC or even out of memory conditions. For a typical WebSphere Application Server application this is ~100400Mb. One way to determine the tenure space size is to look at the amount of free heap exists after each GC in default mode %free heap x Total heap size

Analyse verbosegc to understand how frequently the tenured space gets collected. An optimal generational application will never have a collection in the tenured space. In the lab some WAS applications collect every ~15min.

Runtime Performance Tuning
How to tune a generation GC setup – Setting the nursery/new generation space

Large nursery Small nursery

“good for throughput” “good for low pause times”

Good WebSphere performance (throughput) requires a reasonable large nursery. • A good starting point would be 512 megabyte. • Move up or down to determine optimal value. – Measure throughput and/or response times Analyse verbosegc to understand frequency and length of scavenges.


Heap size options
Fix both nursery and tenured space
-XmnAm -XmoBm

0 A 0


Allow them to expand/contract
-XmnsAm -XmnxBm -XmosBm -XmoxCm





Runtime Performance Tuning
Process for tuning heap settings
Set your Performance Requirements Give your best estimate for mx and ms Start

Tips Tips

Adjust mx and/or ms and possibly switch GC policy

Stress test your application

No Profile objects If needed GC profile is good? Analyze GC behavior

Yes Done.

Run your application, ••Run your application, analyze heap usage analyze heap usage and determine the and determine the steady state. steady state. Set your ms to the Set your ms to the steady state. steady state. Make sure your heap ••Make sure your heap never pages. Monitor never pages. Monitor your paging activities. your paging activities. A rule of thumb is to ••A rule of thumb is to keep 30% of your keep 30% of your heap free most of heap free most of the time. the time.

Runtime Performance Tuning
Process for other runtime tuning settings
Set your Performance Requirements Set your baseline tuning parameters

Tips Tips


Apply new Tuning parameter To runtime

Stress test your application

Measure throughput ••Measure throughput during steady state of during steady state of the benchmark to the benchmark to ensure consistent ensure consistent results results Some tuning ••Some tuning

No Remove tuning Parameter if Negative effect Is throughput as expected? Analyze Runtime behavior

Yes Done.

parameters will effect parameters will effect performance negatively performance negatively as they might be as they might be targeted for an targeted for an application with different application with different runtime characteristics runtime characteristics than your application than your application Add only one tuning •• Add only one tuning parameter at a time to parameter at a time to measure its impact alone measure its impact alone

JVM Basics

Overview of IBM’s J9 JVM

Memory Management / Garbage Collection

Runtime Performance Tuning

Debugging Tools


Debugging Tools
Garbage Collection Debugging/Analysis Tools (Verbose:GC) Your most indispensable tool directly from the JVM runtime

Enabled by issuing –verbose:gc on the java command line

Pros can give a lot of detailed low-level information for serious debugging, enough for initial investigation readily available and it is free Cons Have to restart your server not suitable for production environments does not give object-level information for further analysis


Runtime Performance Tuning
Verbose:GC from J9
<af type="nursery" id="35" timestamp="Thu Aug 11 21:47:11 2005" intervalms="10730.361"> <minimum requested_bytes="144" /> <time exclusiveaccessms="1.193" /> <nursery freebytes="0" totalbytes="1226833920" percent="0" /> <tenured freebytes="68687704" totalbytes="209715200" percent="32" > <soa freebytes="58201944" totalbytes="199229440" percent="29" /> <loa freebytes="10485760" totalbytes="10485760" percent="100" /> </tenured> <gc type="scavenger" id="35" totalid="35" intervalms="10731.054"> <flipped objectcount="1059594" bytes="56898904" /> <tenured objectcount="12580" bytes="677620" /> <refs_cleared soft="0" weak="691" phantom="39" /> <finalization objectsqueued="1216" /> <scavenger tiltratio="90" /> <nursery freebytes="1167543760" totalbytes="1226833920" percent="95" tenureage="14" /> <tenured freebytes="67508056" totalbytes="209715200" percent="32" > <soa freebytes="57022296" totalbytes="199229440" percent="28" /> <loa freebytes="10485760" totalbytes="10485760" percent="100" /> </tenured> <time totalms="368.309" /> </gc> <nursery freebytes="1167541712" totalbytes="1226833920" percent="95" /> <tenured freebytes="67508056" totalbytes="209715200" percent="32" > <soa freebytes="57022296" totalbytes="199229440" percent="28" /> <loa freebytes="10485760" totalbytes="10485760" percent="100" /> </tenured> <time totalms="377.634" /> </af>

Allocation request details, time it took to stop all mutator threads. Heap occupancy details before GC.

Details about the scavenge.

Heap occupancy details after GC.


Debugging Tools
Garbage Collection Debugging/Analysis Tools – Sun/HP JVM verbose:gc output

-verbose:gc –XX:+PrintTenuringDistribution –XX:+PrintGCDetails –XX:+PrintGCStamps
Example: 0.0000013: [Full GC 0.0005366: [Tenured: 0K->4185K(1380352K), 0.3102502 secs] 62984K->4185K(2057344K), 0.3103787 secs]

236.661: [GC 236.661: [DefNew Desired survivor size 61145088 bytes, new threshold 31 (max 31) - age 1: 16817808 bytes, 16817808 total - age 2: 20124840 bytes, 36942648 total : 630283K->36076K(657088K), 0.7287377 secs] 666617K->72411K(2037440K), 0.7289491 secs]

262.697: [GC 262.697: [DefNew Desired survivor size 61145088 bytes, new threshold 31 (max 31) - age 1: 15971824 bytes, 15971824 total - age 2: 3806192 bytes, 19778016 total

- age 3: 18963992 bytes, 38742008 total : 633452K->37833K(657088K), 0.6451270 secs] 669787K->74168K(2037440K), 0.6453326 secs]

286.232: [GC 286.233: [DefNew Desired survivor size 61145088 bytes, new threshold 31 (max 31) - age 1: 17242304 bytes, 17242304 total - age 2: - age 3: 5131296 bytes, 22373600 total 2684464 bytes, 25058064 total

- age 4: 18728192 bytes, 43786256 total : 635209K->42760K(657088K), 0.7164103 secs] 671544K->79094K(2037440K), 0.7166029 secs]


Debugging Tools
IBM JDK Debugging/Analysis Tools Thread dumps Available on all JVM’s by issuing kill -3 <pid> on the command line where the<pid> is your servers process id In essence a snap shot in time of what your system is executing. Used to debug and find where threads are spending time in your system, or are hung in your system Heap dump Can be enabled to occur with a thread dump by setting the following JVM properties • Click on Application Server -> server1 -> Process definition -> custom properties -> • Enter Name = IBM_HEAPDUMP • Value = true • Enter Name = IBM_JAVA_HEAPDUMP_TEXT (this enables generating heapdump in txt format, which can be analyzed using heaproots) • Value = true Can be analyzed using HeapRoots at

Debugging Tools
IBM JDK Debugging/Analysis Tools Class loader runtime diagnostics -verbose:class – Gives you information about which classes are loaded<name> - Gives you specific information about how a class name you define is attempted to me loaded.

Runtime Performance Analysis A variety of third party tools will hook up to the IBM JVM to provide runtime level profiling • Jprobe, Jprofiler, etc Hprof if built into the JDK as a profiler but is limited in function however still good for debugging simple unit test case performance issues.


A few VERY useful URLs Contains all the diagnostic guides for our JVMs PDF on GC and Memory usage Contains a large amount of documentation and tuning for the Sun JVM Reference to all SUN JVM flags as well as an explanation of them Wealth of information on tuning and configuring the HPUX JVM


Thank you
Any questions ?


Backup and Extras


JVM Basics
The high level JVM Building Blocks – Part 1


Core Interface – Encapsulates all interactions with user, external programs and operating environment

Core Interface (CI) Execution Management (XM) Execution Engine (XE)

Execution Management – Provides process control and management Threading engine resides here


Execution Engine - – Provides all methods of executing Java Byte Code both compiled and interpreted.

JVM Basics
The high level JVM Building Blocks – Part 2


Diagnostics – Encapsulates all debug and diagnostic services in the JVM Tracing, FFDC, RAS, Debug APIs

Core Interface (CI) Execution Management (XM) Execution Engine (XE) Diagnostics (DG) Class Loader (CL)

Class Loader – Provides support for loading and unloading of Java binaries Performs loading, validation, initialization, and implements methods for reflection APIs

Data Conversion

Data Conversion – Supports

JVM Basics
The high level JVM Building Blocks – Part 3


Lock – Provides locking and synchronization services

Core Interface (CI) Execution Management (XM)

Storage – Encompasses all support for storage services the JVM needs Heap management, and allocation strategies

Execution Engine (XE) Diagnostics (DG) Data Conversion Class Loader (CL) Lock (LK)

Storage (ST)


HPI – A set of well defined functions that provide low level facilities and services in a platform neutral way. This interface is defined by


Memory Management / Garbage Collection
How the Sun/HP Garbage Collector Works – Part 2
-XX:SurvivorRatio=nn -XX:MaxTenuringThreshold=nn


Survivor Space

Survivor Space

Young Generation

Old Generation

Permanent Space


To top