® Understanding and Tuning the Java Virtual Machine (JVM) for WebSphere © 2006 IBM Corporation What you should know after this talk You should understand at a high level how the Java Virtual Machine (JVM) operates underneath your application server and the differences between the JVM’s on platforms You should understand what the IBM J9 JVM is and what new features it provides to the runtime as well as when to use those features. You will understand how different garbage collection schemes work and how to use them to effect your applications response times You will have a knowledge and understanding of some critical JVM tuning parameters that effect the runtime performance of the JVM Lastly you will get to know which debugging tools are available for the JVM as well as understand when and where to use them most effectively. 2 Overview JVM Basics Overview of IBM’s J9 JVM Memory Management / Garbage Collection Runtime Performance Tuning Debugging Tools 3 JVM Basics Highest Level Overview Java is a Write Once Run Anywhere (WORA) 3rd generation Object Oriented programming language that is executed on a virtual machine The Java Virtual Machine (JVM) runs applications written in Java after the Java code has been compiled to bytecode via the javac process. The JVM in conjunction with other components performs optimization on your compiled Java code to attempt to make it as fast as native code The JVM performs automatic memory management (Garbage Collection) to ensure that system wide memory leaks do not occur and to allow for easier development by allowing developers not to explicitly have to perform memory management. There are multiple implementations of the JVM which all “should” execute any application written for the Java specification level that JVM was developed for. 4 JVM Basics Which JVM do I have? The different platforms that WebSphere Application Server runs on have different JVM implementations in some cases The IBM J9 JVM is the runtime environment on the following Operating Systems or Platforms AIX, Windows, Linux (x86), Linux (PPC), iSeries, zSeries The Sun JVM is the runtime environment on all platforms running the Solaris Operating System The HP JVM (which is a very simple Sun JVM port) is the runtime environment on all platforms running the HP-UX Operating System 5 JVM Basics The Overall Java Application Stack JVM is built using OO design. Building Block components providing higher level function for simplified end user development and runtime JVM’s core runtimes are developed in C or C++ and execute a large majority of function in native code Garbage collector, MMU, JIT, etc IO subroutines, OS calls The J2SE/J2EE APIs all exist at the Java Code layer. Makes data structures available Gives users access to needed function Allow black box interactions with system 6 Java application code Uses 1 of many possible configurations Java calls JNI Pluggable components that dynamically load into the virtual machine Virtual machine JVM Profiler Debugger Realtime Profiler JIT Class loader Interpreter Exception handler Garbage collector J2SE Calls to C libraries JNI JCL natives Native applications OS-specific calls Thread model Port Library (file IO, sockets, memory allocation, etc.) Operating system 7 JVM Basics Class loader basics Uses parent-delegation model to allow additional security as well as allow users to implement their own class loaders Loads classes in a hierarchical manner by delegating to the current class loaders parent recursively What does the class loader do? Loading – Obtaining of the byte code from the java class file Verification – Validates that the code inside the class file does not attempt to perform any illegal operations in the JVM and that the class file is well formed and not corrupt Preparation – Performs allocation and initialization of storage space for static class fields. Also creates method tables and object templates Initialization – Executes the classes initialization method and initializes static fields if your java class has them 8 JVM Basics JIT Basics The just-in-time compiler (JIT) is not really part of the JVM but is essential for a high performing Java application Java is Write Once Run Anywhere thus it is interpreted by nature and without the JIT could not compete with native code applications The JIT works by compiling byte code loaded from the class loader when it is access by an application. Due to different platforms having different JITs there is no standard method for when a method is compiled. As your code accesses methods the JIT determines how frequently specific methods are accessed and compiles those touched often quickly to optimize performance 9 Overview JVM Basics Overview of IBM’s J9 JVM Memory Management / Garbage Collection Runtime Performance Tuning Debugging Tools 10 Overview of IBM’s J9 JVM Background IBM Toronto Laboratory has a long history (30+ years) of expertise in programming language compilation and optimization technologies. C, C++, Fortran, XML Parser, HPJ (statically compiled Java) Language independent, interprocedural optimizers Parallelization technology Low-level compiler backends: optimizers, linkers, and code generators Dynamic compilation: Java just-in-time (JIT) compilers Close relationships with: Research : productizing innovative ideas and experimental technologies. Hardware : understanding how to achieve the best possible performance with the underlying system and processor. IBM Middleware : performance analysis Deep IP portfolio Java JIT group alone filed 14 U.S. patents in 2004, 6 in 2005. 11 Overview of IBM’s J9 JVM What is the J9 JVM? Sun IP-free, but Java 2 (1.3) compliant (J2ME) and J2SE (1.4.2, 5.0) Highly configurable class library implementation Multi-platform PowerPC, IA32, x86-64, and 390 (Linux or z/OS) More applications than the above outside of the middleware space Flexible and sophisticated technology oriented to: Performance (throughput and application startup) Scalability Reliability and Servicability (RAS) 12 Overview of IBM’s J9 JVM Scalability Garbage collector enhancements Incorporates for the first time generational garbage collection Fine-grained locking of VM data structures Asynchronous compilation Compilation of Java methods proceeds on a background thread • Other application threads do not have to wait to execute the method Improves startup time of heavily multithreaded applications on SMPs Compile-time optimizations to remove contention escape analysis, lock coarsening, … architectural support to limit its effect Superior JIT (Just in time) compiler Multiple optimization methods from application profiling to more intelligent and better code optimization algorithms 13 Overview of IBM’s J9 JVM Key Highlights for WAS Superior Java application execution performance Just-In-Time (JIT) compiler technology • Far improved over JDK 1.4.2 and Sun’s JIT • Maximized performance with minimized runtime overhead – multiple optimization levels, multiple recompilations of the same method, many new optimizations – dynamic observation of the execution of code via profiling to aggressively improve hot code – Interpreter profiling to adapt compilation to compiled methods for block reordering, loop unrolling, etc. 14 Overview JVM Basics Overview of IBM’s J9 JVM Memory Management / Garbage Collection Runtime Performance Tuning Debugging Tools 15 Memory Management / Garbage Collection Overview Garbage Collection (GC) - the main cause of memory–related performance bottlenecks in Java. Two things to look at in GC: frequency and duration Frequency depends on the heap size and allocation rate Duration depends on the heap size and Java heap footprint GC algorithm – it is critical to understand how it works so that tuning is done more intelligently. How do you eliminate GC bottlenecks minimize the use of objects by following good programming practices Set your heap size properly, memory-tune your JVM 16 Memory Management / Garbage Collection JVM Memory Layout Application classes and objects JVM JVM Native Native Heap Heap WAS Classes and Objects JVM Runtime classes, extension classes and objects Java Heap JVM Stack JVM Data Segment Segment JVM Native Code JVM Data Segment interpreter, verifier, JIT compiler, memory manager, etc JVM OS Interface Operating System Operating System Typical JVM Architecture 17 Memory Management / Garbage Collection Total Application Memory Footprint Application classes and objects Application Resources, etc. • Fixed Cost - WebSpere Runtime - XML Parser - ORB, JCE Security - JMX - Classloaders, etc • User-Controlled - Thread pool - Connection pool - Monitoring/Logging - EJB cache (growable?) - Other cache (prepared statement, security, etc ) Application Footprint WAS Footprint Native Implementation (e.g. C++) Data Structures, Code, Runtime artifacts, OS Interface • Application Dependent - number of classes - dynacache - security - resources - HTTP session size - WLM 18 JVM Footprint Memory Management / Garbage Collection What factors effect memory performance the most Memory management – how efficient does the system manage memory ? Total available memory – is there enough memory to satisfy every request for memory ? Allocation Rate – how often does the application requests for memory ? Object Size – how big are these objects ? Object Lifetime – how long do these objects stay reserved by the application ? 19 Memory Management / Garbage Collection Parallel VS Concurrent Collectors Parallel Collectors – two or more threads run at the same time to perform garbage collection Still uses the “stop-the-world” model but instead of only one GC thread, there are helper threads as well. Concurrent Collectors – collector threads are triggered to run while applications are running Does not use “Stop-the-world” but threads can be asked to perform garbage collection once in a while 20 Memory Management / Garbage Collection What garbage collection algorithms are available on my JDK? IBM J9 JDK Platforms Memory management is configurable using four different policies with varying characteristics 1. 2. 3. 4. Optimize for Throughput – flat heap collector focused on maximum throughput Optimize for Pause Time – flat heap collector with concurrent mark and sweep to minimize GC pause time Subpool – a flat heap technique to help increase performance on multiprocessor systems , commonly greater than 8. Available on IBM pSeries™ and zSeries™ Generational Concurrent – divides heap into “nursery” and “tenured” segments providing fast collection for short lived objects. Can provide maximum throughput with minimal pause times Sun/HP JDK 5.0 Platforms Garbage collector always Generational but implementation is chosen based on class of system out of the box 1. 2. 3. Serial – Collects objects one at a time in both new and old generations Throughput - Uses a parallel model for collecting objects in the new generation Concurrent – Uses parallel collection in the new generation and concurrent in old. 21 Memory Management / Garbage Collection How the IBM Mark and Sweep Garbage Collector Works Wilderness Thread B Used Heap Thread Local Heap Stack Heap lock Thread A Garbage Used Heap Collector Global Heap Used Heap Thread Local Heap Stack 22 System Heap (JDK 1.4.2) Thread Local Heap Memory Management / Garbage Collection How the IBM J9 Generational and Sun/HP Garbage Collectors Work JVM Heap Nursery/Young Generation Old Generation Permanent Space IBM J9: -Xmn (-Xmns/-Xmnx) Sun: -XX:NewSize=nn -XX:MaxNewSize=nn -Xmn<size> IBM J9: -Xmo (-Xmos/-Xmox) Sun: -XX:NewRatio=n Sun JVM Only: -XX:MaxPermSize=nn • Minor Collection – takes place only in the young generation, normally done through direct copying very efficient • Major Collection – takes place in the old generation and uses the normal mark and sweep algorithm 23 Memory Management / Garbage Collection Fragmentation Java Objects in the heap are in most cases moveable In other words they are not tied to a single space in memory Some objects in the heap however cannot be moved either permanently or temporarily Known as “pinned objects” What does J9 do to prevent fragmentation With the addition of new garbage collection strategies as well as a new runtime memory management unit, pinned objects can be moved during compaction and accounted for in a much better manner with JDK 5.0 thus nearly eliminating the fragmentation problem seen in JDK 1.4.2 for the most part. 24 Overview JVM Basics Overview of IBM’s J9 JVM Memory Management / Garbage Collection Runtime Performance Tuning Debugging Tools 25 Runtime Performance Tuning Overview Tuning the JVM properly is a process that takes time and must be tailored to your application. HOWEVER you can typically get 80% of the maximum performance with 20% of the work by ensuring that you are making good choice on a few key settings To truly extract maximum performance from your application you must know your applications memory allocation and runtime needs The JVM must be tuned in two iterative steps over a testing cycle Step 1: Heap Size tuning Step 2: Applying runtime optimization Applying these two steps repeatedly will lead you to a JVM tuned for your application 26 Runtime Performance Tuning Key Parameters The key setting for the IBM JVM that effects performance most on all Java application and should get you near 80% of your maximum performance if set correctly is: Heap Size (-Xms / -Xmx) Ensure that you are setting your minimum and maximum to values that are under you physical memory limitation but allow you to have a substantially large interval between GC’s • Typical low end bound on frequency of GC’s is 10sec • Typical high end bound on duration of GC’s is 1-2sec For the Sun/HP JVM a lot more work is required to get optimal performance than just tuning the heap size as you need to tune the garbage collector and runtime as well A new JVM setting was introduced in JDK 1.4.1 that for Sun has shown promise in automatically tuning the rest of heap settings for your machine • -XX:+AggresiveHeap is issued at the command line and it makes decisions on GC algorithms, Young/Old Generation spaces, and other resources to use. One must also issue the –server parameter to the Sun/HP JVMs to get them to run in their highest performing mode. 27 Runtime Performance Tuning What GC Policy should I choose for the J9 JVM? I want my application to run to completion as quickly as possible. -Xgcpolicy:optthruput My application requires good response time to unpredictable events. -Xgcpolicy:optavgpause My application has a high allocation and death rate. -Xgcpolicy:gencon My application is running on big metal and has high allocation rates on many threads. -Xgcpolicy:subpool 28 Runtime Performance Tuning Real world examples WebSphere 6.1 - Trade 6 Some WebSphere applications perform better with Generational – however some applications degrade in performance. 120 100 80 60 40 20 0 optthruput gencon Customer may still be interested in generational if it delivers lower GC pause times. WebSphere 6.1 - SPECjAppServer 120 100 80 60 40 20 0 optthruput gencon Numbers are approximate and only intended to show a general behaviour seen when running Trade6 compared to SPECjAppServer 29 Runtime Performance Tuning Other IBM JVM Tuning Parameters -Xgcthreads<n> -Xnoclassgc -Xnocompactgc -Xoss<size> -Xss<size> -Xlp (default is n-1 for n processors) - turns off class garbage collection (default is false) - turns off compaction which can lead to fragmentation (default is false) - set the max Java stack size of any thread - set the max native stack size of any thread - enables large page support on supported Operating Systems -Xdisableexplicitgc - turns System.gc() calls into no-ops -Xifa:<on|off|force> - enables the Java code to run on z/OS zAAP processors -Xmaxe / -Xmine - sets the maxium or minimum expansion unit during allocation 30 Runtime Performance Tuning What GC Policy should I choose for the Sun JVM? I want my application to concurrently with a lot of other JVM’s (hoteling). Use default serial collector as the GC algorithm is single threaded I need my application to perform good on a large number of processors. -XX:+UseParallelGC I need my application to return near constant response times on machines that have a large number of processors. -XX:+UseConcMarkSweepGC I need my application to return near constant response times on machines that have a small number of processors. -XX:+UseTrainGC 31 Runtime Performance Tuning Other Sun/HP JVM Tuning Parameters -Xincgc -Xnoclassgc -Xss - incremental GC, uses the Train algorithm (default is disabled) - disable class garbage collection - set the stack size of each thread (512K) -XX:+DisableExplicitGC - no System.gc() will be executed -XX:TargetSurvivorRatio - sets threshold in survivor space for promotion to kick in -XX:+UseAdaptiveSizePolicy - JVM determines good size for Eden, Survivor Spaces -XX:+UseISM - allows for bigger pages (4MB) -XX:+UseMPSS (used only for Solaris 9) -XX:+AggressiveHeap -maximizes heap size and algorithms for speed -Xoptgc - optimizes GC in Young Generation (HP only) 32 Runtime Performance Tuning How to tune a generation GC setup – General We need to consider the respective size of the nursery and the tenured space. Two approaches Dynamic • Specify the mininum and maxiumum heap size (e.g. –Xms512m – Xmx1024m) and in the Sun JDK case -XX:+AggressiveHeap • The JVM will dynamically size the nursery and tenured space. • May not give optimal performance • Could be good for low response times. Fixed • Be more specific on the nursery and/or tenured space sizes. • Recommended approach for performance sensitive, server-side applications. 33 Runtime Performance Tuning How to tune a generation GC setup – Setting the tenured/old space The tenured space must be large enough to hold all persistent data of the application. Too small will cause excessive GC or even out of memory conditions. For a typical WebSphere Application Server application this is ~100400Mb. One way to determine the tenure space size is to look at the amount of free heap exists after each GC in default mode %free heap x Total heap size Analyse verbosegc to understand how frequently the tenured space gets collected. An optimal generational application will never have a collection in the tenured space. In the lab some WAS applications collect every ~15min. 34 Runtime Performance Tuning How to tune a generation GC setup – Setting the nursery/new generation space Large nursery Small nursery “good for throughput” “good for low pause times” Good WebSphere performance (throughput) requires a reasonable large nursery. • A good starting point would be 512 megabyte. • Move up or down to determine optimal value. – Measure throughput and/or response times Analyse verbosegc to understand frequency and length of scavenges. 35 Tuning Heap size options Fix both nursery and tenured space -XmnAm -XmoBm Nursery 0 A 0 Tenured B Allow them to expand/contract -XmnsAm -XmnxBm -XmosBm -XmoxCm A B C 36 Runtime Performance Tuning Process for tuning heap settings Set your Performance Requirements Give your best estimate for mx and ms Start Tips Tips Adjust mx and/or ms and possibly switch GC policy Stress test your application No Profile objects If needed GC profile is good? Analyze GC behavior Yes Done. 37 Run your application, ••Run your application, analyze heap usage analyze heap usage and determine the and determine the steady state. steady state. Set your ms to the Set your ms to the steady state. steady state. Make sure your heap ••Make sure your heap never pages. Monitor never pages. Monitor your paging activities. your paging activities. A rule of thumb is to ••A rule of thumb is to keep 30% of your keep 30% of your heap free most of heap free most of the time. the time. Runtime Performance Tuning Process for other runtime tuning settings Set your Performance Requirements Set your baseline tuning parameters Tips Tips Start Apply new Tuning parameter To runtime Stress test your application Measure throughput ••Measure throughput during steady state of during steady state of the benchmark to the benchmark to ensure consistent ensure consistent results results Some tuning ••Some tuning No Remove tuning Parameter if Negative effect Is throughput as expected? Analyze Runtime behavior Yes Done. 38 parameters will effect parameters will effect performance negatively performance negatively as they might be as they might be targeted for an targeted for an application with different application with different runtime characteristics runtime characteristics than your application than your application Add only one tuning •• Add only one tuning parameter at a time to parameter at a time to measure its impact alone measure its impact alone Overview JVM Basics Overview of IBM’s J9 JVM Memory Management / Garbage Collection Runtime Performance Tuning Debugging Tools 39 Debugging Tools Garbage Collection Debugging/Analysis Tools (Verbose:GC) Your most indispensable tool directly from the JVM runtime Enabled by issuing –verbose:gc on the java command line Pros can give a lot of detailed low-level information for serious debugging, enough for initial investigation readily available and it is free Cons Have to restart your server not suitable for production environments does not give object-level information for further analysis 40 Runtime Performance Tuning Verbose:GC from J9 <af type="nursery" id="35" timestamp="Thu Aug 11 21:47:11 2005" intervalms="10730.361"> <minimum requested_bytes="144" /> <time exclusiveaccessms="1.193" /> <nursery freebytes="0" totalbytes="1226833920" percent="0" /> <tenured freebytes="68687704" totalbytes="209715200" percent="32" > <soa freebytes="58201944" totalbytes="199229440" percent="29" /> <loa freebytes="10485760" totalbytes="10485760" percent="100" /> </tenured> <gc type="scavenger" id="35" totalid="35" intervalms="10731.054"> <flipped objectcount="1059594" bytes="56898904" /> <tenured objectcount="12580" bytes="677620" /> <refs_cleared soft="0" weak="691" phantom="39" /> <finalization objectsqueued="1216" /> <scavenger tiltratio="90" /> <nursery freebytes="1167543760" totalbytes="1226833920" percent="95" tenureage="14" /> <tenured freebytes="67508056" totalbytes="209715200" percent="32" > <soa freebytes="57022296" totalbytes="199229440" percent="28" /> <loa freebytes="10485760" totalbytes="10485760" percent="100" /> </tenured> <time totalms="368.309" /> </gc> <nursery freebytes="1167541712" totalbytes="1226833920" percent="95" /> <tenured freebytes="67508056" totalbytes="209715200" percent="32" > <soa freebytes="57022296" totalbytes="199229440" percent="28" /> <loa freebytes="10485760" totalbytes="10485760" percent="100" /> </tenured> <time totalms="377.634" /> </af> Allocation request details, time it took to stop all mutator threads. Heap occupancy details before GC. Details about the scavenge. Heap occupancy details after GC. 41 Debugging Tools Garbage Collection Debugging/Analysis Tools – Sun/HP JVM verbose:gc output -verbose:gc –XX:+PrintTenuringDistribution –XX:+PrintGCDetails –XX:+PrintGCStamps Example: 0.0000013: [Full GC 0.0005366: [Tenured: 0K->4185K(1380352K), 0.3102502 secs] 62984K->4185K(2057344K), 0.3103787 secs] 236.661: [GC 236.661: [DefNew Desired survivor size 61145088 bytes, new threshold 31 (max 31) - age 1: 16817808 bytes, 16817808 total - age 2: 20124840 bytes, 36942648 total : 630283K->36076K(657088K), 0.7287377 secs] 666617K->72411K(2037440K), 0.7289491 secs] 262.697: [GC 262.697: [DefNew Desired survivor size 61145088 bytes, new threshold 31 (max 31) - age 1: 15971824 bytes, 15971824 total - age 2: 3806192 bytes, 19778016 total - age 3: 18963992 bytes, 38742008 total : 633452K->37833K(657088K), 0.6451270 secs] 669787K->74168K(2037440K), 0.6453326 secs] 286.232: [GC 286.233: [DefNew Desired survivor size 61145088 bytes, new threshold 31 (max 31) - age 1: 17242304 bytes, 17242304 total - age 2: - age 3: 5131296 bytes, 22373600 total 2684464 bytes, 25058064 total - age 4: 18728192 bytes, 43786256 total : 635209K->42760K(657088K), 0.7164103 secs] 671544K->79094K(2037440K), 0.7166029 secs] 42 Debugging Tools IBM JDK Debugging/Analysis Tools Thread dumps Available on all JVM’s by issuing kill -3 <pid> on the command line where the<pid> is your servers process id In essence a snap shot in time of what your system is executing. Used to debug and find where threads are spending time in your system, or are hung in your system Heap dump Can be enabled to occur with a thread dump by setting the following JVM properties • Click on Application Server -> server1 -> Process definition -> custom properties -> • Enter Name = IBM_HEAPDUMP • Value = true • Enter Name = IBM_JAVA_HEAPDUMP_TEXT (this enables generating heapdump in txt format, which can be analyzed using heaproots) • Value = true Can be analyzed using HeapRoots at http://www.alphaworks.ibm.com/tech/heaproots 43 Debugging Tools IBM JDK Debugging/Analysis Tools Class loader runtime diagnostics -verbose:class – Gives you information about which classes are loaded -Dibm.cl.verbose=<name> - Gives you specific information about how a class name you define is attempted to me loaded. Runtime Performance Analysis A variety of third party tools will hook up to the IBM JVM to provide runtime level profiling • Jprobe, Jprofiler, etc Hprof if built into the JDK as a profiler but is limited in function however still good for debugging simple unit test case performance issues. 44 A few VERY useful URLs http://www-106.ibm.com/developerworks/java/jdk/diagnosis/ Contains all the diagnostic guides for our JVMs PDF on GC and Memory usage http://java.sun.com/docs/performance Contains a large amount of documentation and tuning for the Sun JVM Reference to all SUN JVM flags as well as an explanation of them http://www.hp.com/products1/unix/java/infolibrary/index.html Wealth of information on tuning and configuring the HPUX JVM 45 Thank you Any questions ? 46 Backup and Extras 47 JVM Basics The high level JVM Building Blocks – Part 1 IBM JVM Core Interface – Encapsulates all interactions with user, external programs and operating environment Core Interface (CI) Execution Management (XM) Execution Engine (XE) Execution Management – Provides process control and management Threading engine resides here 48 Execution Engine - – Provides all methods of executing Java Byte Code both compiled and interpreted. JVM Basics The high level JVM Building Blocks – Part 2 IBM JVM Diagnostics – Encapsulates all debug and diagnostic services in the JVM Tracing, FFDC, RAS, Debug APIs Core Interface (CI) Execution Management (XM) Execution Engine (XE) Diagnostics (DG) Class Loader (CL) Class Loader – Provides support for loading and unloading of Java binaries Performs loading, validation, initialization, and implements methods for reflection APIs 49 Data Conversion Data Conversion – Supports JVM Basics The high level JVM Building Blocks – Part 3 IBM JVM Lock – Provides locking and synchronization services Core Interface (CI) Execution Management (XM) Storage – Encompasses all support for storage services the JVM needs Heap management, and allocation strategies Execution Engine (XE) Diagnostics (DG) Data Conversion Class Loader (CL) Lock (LK) Storage (ST) 50 HPI – A set of well defined functions that provide low level facilities and services in a platform neutral way. This interface is defined by HPI Memory Management / Garbage Collection How the Sun/HP Garbage Collector Works – Part 2 -XX:SurvivorRatio=nn -XX:MaxTenuringThreshold=nn Eden Survivor Space Survivor Space Young Generation Old Generation Permanent Space 51
"JVM Internals - PDF"