JPerfAnal: A Java Performance Analysis Tool

Nathan Meyers
nmeyers@javalinux.net

JPerfAnal is a GUI-based tool for analyzing the performance of applications on the Java 2 Platform. You can use JPerfAnal to identify performance problems and locate hot spots in your code.

JPerfAnal is distributed through SourceForge, under the GPL, at http://sourceforge.net/projects/jperfanal/. An earlier version of the tool was distributed, also under the GPL, with the book Java Programming on Linux.

Using Sun's hprof Profiler

The capability that enables performance profiling is Sun's JVM Profiling Interface (JVMPI), which allows you to attach a native-code library to the JVM to capture performance-related information. The SDK distribution includes such a profiler library, hprof, that illustrates JVMPI's capabilities. While hprof is not itself a powerful analysis tool, the data it captures can form the basis of powerful performance analysis. JPerfAnal's role is to analyze CPU usage results captured by hprof.

Documentation on hprof is provided with the SDK, and you can get a help message by running:

    java -Xrunhprof:help

To capture data usable for analysis by JPerfAnal, you use hprof with these options:

    java -Xrunhprof:cpu=samples,depth=XXX,thread=y ...

where you choose the value XXX based on how much detail you want to capture. A large value collects more detail in stack tracebacks captured by hprof, but also increases JPerfAnal's workload - I usually choose a value of 10, then adjust it if needed.

A Sample Problem: TestHProf

To illustrate the use of JPerfAnal, I'll use a small hprof test program originally published by Stuart Halloway at Sun's Java Developer Connection site. Here is a listing of the source, annotated with line numbers:

 1 package com.develop.demos;
 2       
 3 import java.io.IOException;
 4      
 5   public class TestHprof {
 6   public static String cat = null;
 7   public final static int loop=5000;
 8          
 9   public static void makeString() {
10      cat = new String();
11        for (int n=0; n<loop; n++) {
12            addToCat("more");
13          }    
14      }
15          
16   public static void addToCat(String more) {
17       cat = cat + more;
18    }
19          
20   public static void makeStringInline() {
21      cat = new String();
22        for (int n=0; n<loop; n++) {
23            cat = cat + "more";
24        }
25    }
26      
27    public static void makeStringWithLocal() {
28        String tmp = new String();
29        for (int n=0; n<loop; n++) {
30           tmp = tmp + "more";
31         }
32        cat = tmp;
33    }
34          
35    public static void makeStringWithBuffer() {
36        StringBuffer sb = new StringBuffer();
37        for (int n=0; n<loop; n++) {
38            sb.append("more");
39        }
40        cat = sb.toString();
41    }
42    public static void main(String[] args) {
43      long begin = System.currentTimeMillis();
44              
45      if (null != 
46          System.getProperty("WaitForProfiler")) {
47              System.out.println(
48                  "Start your profiler, then press any key to begin...");
49              try {
50                      System.in.read();
51              }
52              catch (IOException ioe) {
53              }
54      }
55                  
56        makeString();
57        makeStringInline();
58        makeStringWithLocal();
59        makeStringWithBuffer();
60              
61        long end = 
62        System.currentTimeMillis();
63        System.out.println("Total run time of " 
64         + (end - begin) + " milliseconds");
65      }
66    } 

The program tests four different ways to build a string through concatenation.

Here is a sample invocation of the program, with profiling (I'll talk more about the -classic flag below):

    java -classic -Xrunhprof:cpu=samples,depth=10,thread=y com.develop.demos.TestHprof

Interpreting Raw hprof output

Briefly examining the hprof output, java.hprof.txt, shows that understanding the performance of this application is a difficult problem. We see that most of the time is being spent in StringBuffer.expandCapacity() - but what does that tell us about the relative efficiency of makeString(), makeStringInline(), makeStringWithLocal(), and makeStringWithBuffer()? To figure out which makeString methods are most efficient requires some detective work.

We see in trace #25, which accumulated 233 profiling ticks, that StringBuffer.expandCapacity() was performed on behalf of the makeStringWithLocal() method. But that's not the total cost of makeStringWithLocal(); that method appears in 7 other traces. Tracking down its total cost from the hprof output is tedious and difficult work.

Interpreting hprof Output With JPerfAnal

The JPerfAnal tool simplifies this analysis in two ways:

To begin, run JPerfAnal on the profiling data:

    java -jar JPerfAnal.jar java.hprof.txt

Here's a view of the main GUI:

The four quadrants contain four different analyses of the data, presented with the familiar Swing JTree interface. Let's take a closer look at each quadrant.

Analyzing CPU Usage by Caller

The upper-left quadrant ranks and analyzes CPU usage by caller. It ranks the time spent in various methods, inclusive of subroutine calls. Not surprisingly, TestHprof.main() takes the most time. We can open up branches of the tree to explore where main is spending its time:

We see that TestHprof.main() spent 34.34% of its time in TestHprof.makeString(), 32.45% of its time in TestHprof.makeStringWithLocal(), and so on. Examining the first method in more detail show us that TestHprof.makeString() spent most of its time in StringBuffer.append().

Analyzing CPU Usage by Callee

The lower left quadrant shows the same ranking we saw in the upper left quadrant, but this tree organizes the analysis by callee. It answers the question: on whose behalf is a method spending time? Here is the result of exploring the call graph for StringBuffer.append():

This answers the question: on whose behalf are we spending 97.85% of application time in StringBuffer.append()? All but one of the makeString methods is a heavy user of StringBuffer.append().

Analyzing CPU Usage by Line Number

The upper right quadrant shows, yet again, the same ranking of CPU usage by method, inclusive of calls. Here, however, the usage is broken down by line number:

The analysis shows, line by line, where main (or any other method) is spending its time, inclusive of called methods.

Analyzing CPU Usage, Exclusive of Subroutines

Finally, the lower right quadrant answers the question: where is the CPU really spending its time? This tree ranks method usage exclusive of subroutine calls:

We see that, when we do not include subroutine calls, line 202 of StringBuffer.addCapacity() is where our program spends fully 91.16% of its time.

Pulling it all Together

Each of these analyses, individually, provides important information about where execution time is being spent. JPerfAnal provides an additional feature that allows you to correlate the information shown in the various windows.

For example, the last figure showed us spending 5.93% of execution time in System.arraycopy. What is taking that time? By right-clicking on that entry, we pop up a menu that allows us to correlate the information:

By choosing Goto this Method, we position all four quadrants to examine this method. Returning to the lower-left quadrant, we can explore in detail who is calling System.arraycopy:

Other Capabilities

The right-click popup menu offers two additional features.

If you choose Select a Method to Analyze, you are presented with a dialog that allows you to select a method by name:

After you select a method, all four quadrants will be positioned to analyze that method.

And if you choose Select Thread(s) to Analyze from the popup, you can choose one or more threads from a dialog:

After you choose, the analysis will be recomputed to report only on the selected threads.

Saving Output

Finally, JPerfAnal allows you to save the analysis data. The File menu (in the menu bar) gives you a Save... item that saves all the processed information into a textfile.

JVM Dependencies

In the example discussed above, we captured the profiling data using a JVM running with green threads (the -classic option). The choice of threading model and JIT can have an important effect on your results. The reason for this choice is unfortunate: JVMPI doesn't work everywhere. When running the Sun JVM on Linux, the HotSpot performance engine fails to capture most performance data. I have no idea when or if this will be fixed.

It can also be useful to turn of JIT compilation if you care about the per-line-number analysis: line number information is sometimes lost in JIT-compiled code.

hprof Option Dependencies

Certain hprof options have important effects on the JPerfAnal analysis: