Plugging memory leaks with weak references

Document Sample
Plugging memory leaks with weak references Powered By Docstoc
					Plugging memory leaks with weak references
For garbage collection (GC) to reclaim objects no longer in use by the program, the logical lifetime of an object (the time that the application will use it) and the actual lifetime of outstanding references to that object must be the same. Most of the time, good software engineering techniques ensure that this happens automatically, without us having to expend a lot of attention on the issue of object lifetime. But every once in a while, we create a reference that holds an object in memory much longer than we expected it to, a situation referred to as unintentional object retention. Memory leaks with global Maps The most common source of unintentional object retention is the use of a Map to associate metadata with transient objects. Let's say you have an object with an intermediate lifetime -- longer than that of the method call that allocated it, but shorter than that of the application -- such as a socket connection from a client. You want to associate some metadata with that socket, such as the identity of the user that has made the connection. You don't know this information at the time the Socket is created, and you cannot add data to the Socket object because you do not control the Socket class or its instantiation. In this case, the typical approach is to store such information in a global Map, as shown in the SocketManager class in Listing 1:

Listing 1. Using a global Map to associate metadata with an object
public class SocketManager { private Map<Socket,User> m = new HashMap<Socket,User>(); public void setUser(Socket s, User u) { m.put(s, u); } public User getUser(Socket s) { return m.get(s); } public void removeUser(Socket s) { m.remove(s); } } SocketManager socketManager; ... socketManager.setUser(socket, user);

The problem with this approach is that the lifetime of the metadata needs to be tied to the lifetime of the socket, but unless you know exactly when the socket is no longer needed by the program and remember to remove the corresponding mapping from the Map, the Socket and User objects will stay in the Map forever, long after the request has been serviced and the socket has been closed. This prevents the Socket and User objects from being garbage collected, even though the application is never going to use either of them again. Left unchecked, this could easily cause a program to run out of memory if it runs for long enough. In all but the most trivial cases, the techniques for identifying when the Socket becomes no longer used by the program resemble the annoying and error-prone techniques required for manual memory management. Identifying memory leaks The first sign that your program has a memory leak is usually that it throws an OutOfMemoryError or starts to exhibit poor performance because of frequent garbage collection. Fortunately, the garbage collector is willing to share a lot of information that can be used to diagnose a memory leak. If you invoke the JVM with the -verbose:gc or the -Xloggc option, a diagnostic message is printed on the console or to a log file every time the GC runs, including how long it took, the current heap usage, and how much memory was recovered. Logging GC usage is not intrusive, and so it is reasonable to enable GC logging in production by default in the event you ever need to analyze memory problems or tune the garbage collector. Tools can take the GC log output and display it graphically; one such tool is the free JTune (see Resources). By looking at the graph of heap size after GC, you can see the trend of your program's memory usage. For most programs, you can divide memory usage into two components: the baseline usage and the current load usage. For a server application, the baseline usage is what the application uses when it is not subjected to any load but is ready to accept requests; the current load usage is what is used in the process of handling requests but released when the request processing is complete. So long as load is roughly constant, applications typically reach a steady state level of memory usage fairly quickly. If memory usage continues to trend upwards even though the application has completed its initialization and its load is not increasing, the program is probably retaining objects generated in the course of processing prior requests. Listing 2 shows a program that has a memory leak. MapLeaker processes tasks in a thread pool and records the status of each task in a Map. Unfortunately, it never removes the entry when the task is finished, so the status entries and the task objects (along with their internal state) accumulate forever.

Listing 2. Program with a Map-based memory leak
public class MapLeaker { public ExecutorService exec = Executors.newFixedThreadPool(5);

public Map<Task, TaskStatus> taskStatus = Collections.synchronizedMap(new HashMap<Task, TaskStatus>()); private Random random = new Random(); private enum TaskStatus { NOT_STARTED, STARTED, FINISHED }; private class Task implements Runnable { private int[] numbers = new int[random.nextInt(200)]; public void run() { int[] temp = new int[random.nextInt(10000)]; taskStatus.put(this, TaskStatus.STARTED); doSomeWork(); taskStatus.put(this, TaskStatus.FINISHED); } } public Task newTask() { Task t = new Task(); taskStatus.put(t, TaskStatus.NOT_STARTED); exec.execute(t); return t; } }

Figure 1 shows a graph of the application heap size after GC for MapLeaker over time. The upward-sloping trend is a telltale sign of a memory leak. (In real applications, the slope is never this dramatic, but it usually becomes apparent if you gather GC data for long enough.)

Figure 1. Persistent upward memory usage trend

Once you are convinced you have a memory leak, the next step is to find out what type of objects are causing the problem. Any memory profiler can produce snapshots of the heap broken down by object class. There are some excellent commercial heap profiling tools available, but you don't have to spend any money to find memory leaks -- the built-in hprof tool can also do the trick. To use hprof and instruct it to track memory usage, invoke the JVM with the -Xrunhprof:heap=sites option. Listing 3 shows the relevant portion of the hprof output breaking down the application's memory usage. (The hprof tool produces a usage breakdown after the application exits, or when you signal your application with a kill -3 or by pressing Ctrl+Break on Windows.) Notice that there has been significant growth in Map.Entry, Task, and int[] objects between the two snapshots. See Listing 3. Listing 4 shows yet another portion of the hprof output, giving call stack information for allocation sites for Map.Entry objects. This output tells us which call chains are generating the Map.Entry objects; with some program analysis, it is usually fairly easy to pinpoint the source of the memory leak.

Listing 4. HPROF output showing allocation site for Map.Entry objects
TRACE 300446: java.util.HashMap$Entry.<init>(<Unknown Source>:Unknown line) java.util.HashMap.addEntry(<Unknown Source>:Unknown line) java.util.HashMap.put(<Unknown Source>:Unknown line) java.util.Collections$SynchronizedMap.put(<Unknown Source>:Unknown line) com.quiotix.dummy.MapLeaker.newTask(MapLeaker.java:48) com.quiotix.dummy.MapLeaker.main(MapLeaker.java:64)

Weak references to the rescue The problem with SocketManager is that the lifetime of the Socket-User mapping should be matched to the lifetime of the Socket, but the language does not provide any easy means to enforce this rule. This forces the program to fall back on techniques that resemble manual memory management. Fortunately, as of JDK 1.2, the garbage collector provides a means to declare such object lifecycle dependencies so that the garbage collector can help us prevent this sort of memory leak -- with weak references. A weak reference is a holder for a reference to an object, called the referent. With weak references, you can maintain a reference to the referent without preventing it from being garbage collected. When the garbage collector traces the heap, if the only outstanding references to an object are weak references, then the referent becomes a candidate for GC as if there were no outstanding references, and any outstanding weak references are

cleared. (An object that is only referenced by weak references is called weakly reachable.) The referent of a WeakReference is set at construction time, and its value, if it has not yet been cleared, can be retrieved with get(). If the weak reference has been cleared (either because the referent has already been garbage collected or because someone called WeakReference.clear()), get() returns null. Accordingly, you should always check that get() returns a non-null value before using its result, as it is expected that the referent will eventually be garbage collected. When you copy an object reference using an ordinary (strong) reference, you constrain the lifetime of the referent to be at least as long as that of the copied reference. If you are not careful, this can be the lifetime of your program -- such as when you place an object in a global collection. On the other hand, when you create a weak reference to an object, you do not extend the lifetime of the referent at all; you simply maintain an alternate way to reach it while it is still alive. Weak references are most useful for building weak collections, such as those that store metadata about objects only for as long as the rest of the application uses those objects -which is exactly what the SocketManager class is supposed to do. Because this is such a common use for weak references, WeakHashMap, which uses weak references for keys (but not for values), was also added to the class library in JDK 1.2. If you use an object as a key in an ordinary HashMap, that object cannot be collected until the mapping is removed from the Map; WeakHashMap allows you to use an object as a Map key without preventing that object from being garbage collected. Listing 5 shows a possible implementation of the get() method from WeakHashMap, showing the use of weak references:

Listing 5. Possible implementation of WeakReference.get()
public class WeakHashMap<K,V> implements Map<K,V> { private static class Entry<K,V> extends WeakReference<K> implements Map.Entry<K,V> { private V value; private final int hash; private Entry<K,V> next; ... } public V get(Object key) { int hash = getHash(key); Entry<K,V> e = getChain(hash); while (e != null) { K eKey= e.get(); if (e.hash == hash && (key == eKey || key.equals(eKey))) return e.value;

e = e.next; } return null; }

When WeakReference.get() is called, it returns a strong reference to the referent (if it is still alive), so there is no need to worry about the mapping disappearing in the body of the while loop because the strong reference keeps it from being garbage collected. The implementation of WeakHashMap illustrates a common idiom with weak references -- that some internal object extends WeakReference. The reason for this becomes clear in the next section when we discuss reference queues. When you add a mapping to a WeakHashMap, remember that it is possible that the mapping could "fall out" later because the key is garbage collected. In that case, get() returns null, making it even more important than usual to test the return value of get() for null. Plugging the leak with WeakHashMap Fixing the leak in SocketManager is easy; simply replace the HashMap with a WeakHashMap, as shown in Listing 6. (If SocketManager needs to be thread-safe, you can wrap the WeakHashMap with Collections.synchronizedMap()). You can use this approach whenever the lifetime of the mapping must be tied to the lifetime of the key. However, you should be careful not to overuse this technique; most of the time an ordinary HashMap is the right Map implementation to use.

Listing 6. Fixing SocketManager with WeakHashMap
public class SocketManager { private Map<Socket,User> m = new WeakHashMap<Socket,User>(); public void setUser(Socket s, User u) { m.put(s, u); } public User getUser(Socket s) { return m.get(s); } }

Reference queues
WeakHashMap

uses weak references for holding map keys, which allows the key objects to be garbage collected when they are no longer used by the application, and the get() implementation can tell a live mapping from a dead one by whether

WeakReference.get() returns null. But this is only half of what is needed to keep a Map's memory consumption from increasing throughout the lifetime of the application; something must also be done to prune the dead entries from the Map after the key object has been collected. Otherwise, the Map would simply fill up with entries corresponding to

dead keys. And while this would be invisible to the application, it could still cause the application to run out of memory because the Map.Entry and value objects would not be collected, even if the key is. Dead mappings could be eliminated by periodically scanning the Map, calling get() on each weak reference, and removing the mapping if get() returns null. But this would be inefficient if the Map has many live entries. It would be nice if there was a way to be notified when the referent of a weak reference is garbage collected, and that is what reference queues are for. Reference queues are the garbage collector's primary means of feeding back information to the application about object lifecycle. Weak references have two constructors: one takes only the referent as an argument and the other also takes a reference queue. When a weak reference has been created with an associated reference queue and the referent becomes a candidate for GC, the reference object (not the referent) is enqueued on the reference queue after the reference is cleared. The application can then retrieve the reference from the reference queue and learn that the referent has been collected so it can perform associated cleanup activities, such as expunging the entries for objects that have fallen out of a weak collection. (Reference queues offer the same dequeuing modes as BlockingQueue -- polled, timed blocking, and untimed blocking.)
WeakHashMap has a private method called expungeStaleEntries() that is called during most Map operations, which polls the reference queue for any expired references and

removes the associated mappings. A possible implementation of expungeStaleEntries() is shown in Listing 7. The Entry type, which is used to store the key-value mapping, extends WeakReference, so when expungeStaleEntries() asks for the next expired weak reference, it gets back an Entry. Using reference queues to clean up the Map instead of periodically trawling through its contents is more efficient because live entries are never touched in the cleanup process; it only does work if there are actually enqueued references.

Listing 7. Possible implementation of WeakHashMap.expungeStaleEntries()
private void expungeStaleEntries() { Entry<K,V> e; while ( (e = (Entry<K,V>) queue.poll()) != null) { int hash = e.hash; Entry<K,V> prev = getChain(hash); Entry<K,V> cur = prev; while (cur != null) {

Entry<K,V> next = cur.next; if (cur == e) { if (prev == e) setChain(hash, next); else prev.next = next; break; } prev = cur; cur = next; } } }

Conclusion Weak references and weak collections are powerful tools for heap management, allowing the application to use a more sophisticated notion of reachability, rather than the "all or nothing" reachability offered by ordinary (strong) references. Next month, we'll look at soft references, which are related to weak references, and we'll look at the behavior of the garbage collector in the presence of weak and soft references.

Plugging memory leaks with soft references
Garbage collection might make Java programs immune to memory leaks, at least for a sufficiently narrow definition of "memory leak," but that doesn't mean we can totally ignore the issue of object lifetime in Java programs. Memory leaks in Java programs usually arise when we pay insufficient attention to object lifecycle or subvert the standard mechanisms for managing object lifecycle. For example, last time we saw how failing to demarcate an object's lifecycle could cause unintentional object retention when trying to associate metadata with transient objects. There are other idioms that can similarly ignore or subvert object lifecycle management and can also lead to memory leaks. Object loitering A form of memory leak, sometimes called object loitering, is illustrated by the LeakyChecksum class in Listing 1, which provides a getFileChecksum() method to calculate the checksum of a file's contents. The getFileChecksum() method reads the contents of the file into a buffer to compute the checksum. A more straightforward implementation would simply allocate the buffer as a local variable within getFileChecksum(), but this version is more "clever" than that, instead caching the buffer in an instance field to reduce memory churn. This "optimization" often does not

deliver the expected savings; object allocation is cheaper than many give it credit for. (Also note that promoting the buffer from a local variable to an instance variable renders the class no longer thread-safe without additional synchronization; the straightforward implementation would not require getFileChecksum() to be declared synchronized and would offer better scalability when called concurrently.)

Listing 1. Class that exhibits "object loitering"
// BAD CODE - DO NOT EMULATE public class LeakyChecksum { private byte[] byteArray; public synchronized int getFileChecksum(String fileName) { int len = getFileSize(fileName); if (byteArray == null || byteArray.length < len) byteArray = new byte[len]; readFileContents(fileName, byteArray); // calculate checksum and return it } }

This class has plenty of problems, but let's focus on the memory leak. The decision to cache the buffer most likely followed from the assumption that it would be called many times within a program and that it would therefore be more efficient to reuse the buffer rather than reallocate it. But as a result, the buffer is never released because it is always reachable by the program (unless the LeakyChecksum object is garbage collected). Worse, while it can grow, it cannot shrink, so LeakyChecksum permanently retains a buffer as large as the largest file processed. At the very least, this puts pressure on the garbage collector and requires more frequent collections; keeping a large buffer around for the purpose of computing future checksums may not be the most efficient use of available memory. The cause of the problem in LeakyChecksum is that the buffer is logically local to the getFileChecksum() operation, but its lifecycle has been artificially prolonged by promoting it to an instance field. As a result, the class has to manage the lifecycle of the buffer itself rather than letting the JVM do it. Soft references In the previous installment, we saw how weak references can provide an application with an alternate means of reaching an object while it is used by the program, but without prolonging its lifetime. Another subclass of Reference, soft references, fulfills a different but related purpose. Where weak references allow the application to create references that do not interfere with garbage collection, soft references allow the application to enlist the aid of the garbage collector by designating some objects as

"expendable." While the garbage collector does a good job at figuring out what memory the application is using and not using, it is up to the application to decide the most appropriate use for the available memory. If the application makes poor decisions as to what objects to retain, performance can suffer as the garbage collector has to work harder to prevent the application from running out of memory. Caching is a common performance optimization, allowing the application to reuse the results of a previous calculation instead of recalculating it. Caching is a trade-off between CPU utilization and memory usage, and the ideal balance for this trade-off depends on how much memory is available. With too little caching, the desired performance benefit will not be achieved; with too much, performance may suffer because too much memory is being expended on caching and therefore not enough is available for other purposes. Because the garbage collector is generally in a better position than the application to determine the demand for memory, it makes sense to enlist the garbage collector's help in making these decisions, which is what soft references do for us. If the only remaining references to an object are weak or soft references, then that object is said to be softly reachable. The garbage collector does not aggressively collect softly reachable objects the way it does with weakly reachable ones -- instead it only collects softly reachable objects if it really "needs" the memory. Soft references are a way of saying to the garbage collector, "As long as memory isn't too tight, I'd like to keep this object around. But if memory gets really tight, go ahead and collect it and I'll deal with that." The garbage collector is required to clear all soft references before it can throw OutOfMemoryError. We can fix the problems in LeakyChecksum by using a soft reference to manage the cached buffer, as shown in Listing 2. Now the buffer is retained as long as the memory is not badly needed, but it can be reclaimed by the garbage collector if necessary:

Listing 2. Fixing LeakyChecksum with a soft reference
public class CachingChecksum { private SoftReference<byte[]> bufferRef; public synchronized int getFileChecksum(String fileName) { int len = getFileSize(fileName); byte[] byteArray = bufferRef.get(); if (byteArray == null || byteArray.length < len) { byteArray = new byte[len]; bufferRef.set(byteArray); } readFileContents(fileName, byteArray); // calculate checksum and return it } }

A poor man's cache
CachingChecksum

used a soft reference to cache a single object and let the JVM handle the details of when to evict the object from the cache. Similarly, soft references are also often used in GUI applications for caching bitmap graphics. The key to whether a soft reference can be used is whether an application can recover from the loss of the data being cached. If you need to cache more than a single object, you might use a Map, but you have a choice as to how to employ soft references. You could manage the cache as a Map<K, SoftReference<V>> or as a SoftReference<Map<K,V>>. The latter option is usually preferable, as it makes less work for the collector and allows the entire cache to be reclaimed with less effort when memory is in high demand. Weak references are sometimes mistakenly used instead of soft references for building caches, but this will result in poor caching performance. In practice, weak references will get cleared fairly quickly after the object becomes weakly reachable -- usually before the cached object is needed again -- as minor garbage collections run frequently. For applications that rely heavily on caching for performance, soft references may be too blunt an instrument -- it certainly is not meant as a replacement for sophisticated caching frameworks that provide flexible expiration, replication, and transactional caching. But as a "cheap and dirty" caching mechanism, it is an attractive price-performer. Just as with weak references, a soft reference can be created with an associated reference queue, and the reference is enqueued when it is cleared by the garbage collector. Reference queues are not as useful with soft references as with weak references, but they could be used to raise a management alert that the application is starting to run low on memory. How the garbage collector handles References Weak and soft references both extend the abstract Reference class (as do phantom references, which will be discussed in a future article). Reference objects are treated specially by the garbage collector. When the garbage collector encounters a Reference in the course of tracing the heap, it does not mark or trace the referent object, but instead places the Reference on a queue of known live Reference objects. After the trace, the collector identifies the softly reachable objects -- these objects for which no strong references exist but soft references do. The garbage collector then assesses whether soft references need to be cleared at this time, based on the amount of memory reclaimed by the current collection and other policy considerations. Soft references that are to be cleared are enqueued if they have a corresponding reference queue; the remaining softly reachable objects (the ones that are not cleared) are then treated as a root set and the heap trace is continued using these new roots so that the objects reachable through live soft references can be marked.

After processing the soft references, the set of weakly reachable objects is identified -objects for which no strong or soft references exist. These are cleared and enqueued. All Reference types are cleared before they are enqueued, so the thread handling the postmortem cleanup never has access to the referent object, only the Reference object. For this reason, when References are used in conjunction with a reference queue, it is common to subclass the appropriate reference type and either use it directly in your design (as with WeakHashMap, whose Map.Entry extends WeakReference) or to store references to entities that require cleanup. Performance costs of reference processing Reference objects introduce some additional costs into the garbage collection process. At each garbage collection, a list of live Reference objects must be constructed, and each reference must be processed appropriately, which adds some per-Reference overhead to each collection regardless of whether the referent is being collected at that time. Reference objects themselves are subject to garbage collection and can be collected before the referent, in which case they are not enqueued. Array-based collections Another form of object loitering arises when arrays are used to implement data structures such as stacks or circular buffers. The LeakyStack class in Listing 3 shows an implementation of a stack backed by an array. In the pop() method, after the top pointer is decremented, elements still maintains a reference to the object being popped off the stack. This means that a reference to that object is still reachable by the program even though the program never actually uses that reference again, which prevents that object from being garbage collected until that location is reused by a future push().

Listing 3. Object loitering in array-based collection
public class LeakyStack { private Object[] elements = new Object[MAX_ELEMENTS]; private int size = 0; public void push(Object o) { elements[size++] = o; } public Object pop() { if (size == 0) throw new EmptyStackException(); else { Object result = elements[--size]; // elements[size+1] = null; return result; } } }

The cure for object loitering in this case is to null out the reference after popping it from the stack, as shown in the commented-out line of code in Listing 3. However, this case -where a class manages its own memory -- is one of the very few situations where explicitly nulling objects that are no longer needed is a good idea. Most of the time, aggressively nulling references believed to be unused results in no performance or memory utilization gain at all; it usually results in either worse performance or NullPointerExceptions. A linked implementation of this algorithm would not have this problem; in a linked implementation, the lifetime of the link node (and therefore the reference to the object being stored) would be automatically tied to the duration that the object is stored in the collection. Weak references could be used to fix this problem -maintaining an array of weak references instead of strong references -- but in reality, LeakyStack manages its own memory and is therefore responsible for ensuring that references to objects no longer needed are cleared. Using arrays to implement a stack or a buffer is an optimization that reduces allocation but imposes a greater burden for the implementer to carefully manage the lifetime of references stored in the array. Summary Soft references, like weak references, can help applications prevent object loitering by enlisting the aid of the garbage collector in making cache eviction decisions. Soft references are suitable for use only if the application can tolerate the loss of the softly referenced object.


				
DOCUMENT INFO
Shared By:
Stats:
views:147
posted:9/19/2009
language:English
pages:13
Description: For garbage collection (GC) to reclaim objects no longer in use by the program, the logical lifetime of an object (the time that the application will use it) and the actual lifetime of outstanding references to that object must be the same. Most of the time, good software engineering techniques ensure that this happens automatically, without us having to expend a lot of attention on the issue of object lifetime. But every once in a while, we create a reference that holds an object in memory much longer than we expected it to, a situation referred to as unintentional object retention.
Vinothkumar Vinothkumar Engineer
About