JPerfAnal
: A Java
Performance Analysis ToolJPerfAnal
is a GUI-based tool for analyzing the
performance of applications on the Java 2 Platform. You can use
JPerfAnal
to identify performance problems and locate hot
spots in your code.
JPerfAnal
is distributed
through SourceForge, under the GPL, at http://sourceforge.net/projects/jperfanal/.
An earlier version of the tool was distributed, also under the GPL, with
the book Java
Programming on Linux.
hprof
ProfilerThe capability that enables performance profiling is Sun's JVM
Profiling Interface (JVMPI), which allows you to attach a native-code
library to the JVM to capture performance-related information. The
SDK distribution includes such a profiler library, hprof
,
that illustrates JVMPI's capabilities. While hprof
is not
itself a powerful analysis tool, the data it captures can form the basis
of powerful performance analysis. JPerfAnal
's role is to
analyze CPU usage results captured by hprof
.
Documentation on hprof
is provided with the SDK, and you can get a help message by running:
java -Xrunhprof:help
To capture data usable for analysis by JPerfAnal
, you
use hprof
with these options:
java -Xrunhprof:cpu=samples,depth=XXX,thread=y ...
where you choose the value XXX based on how much detail
you want to capture. A large value collects more detail in stack
tracebacks captured by hprof
, but also increases
JPerfAnal
's workload - I usually choose a value of 10,
then adjust it if needed.
TestHProf
To illustrate the use of JPerfAnal
,
I'll use a small hprof
test
program originally published by Stuart Halloway at Sun's Java
Developer Connection site. Here is a listing of the source, annotated
with line numbers:
1 package com.develop.demos; 2 3 import java.io.IOException; 4 5 public class TestHprof { 6 public static String cat = null; 7 public final static int loop=5000; 8 9 public static void makeString() { 10 cat = new String(); 11 for (int n=0; n<loop; n++) { 12 addToCat("more"); 13 } 14 } 15 16 public static void addToCat(String more) { 17 cat = cat + more; 18 } 19 20 public static void makeStringInline() { 21 cat = new String(); 22 for (int n=0; n<loop; n++) { 23 cat = cat + "more"; 24 } 25 } 26 27 public static void makeStringWithLocal() { 28 String tmp = new String(); 29 for (int n=0; n<loop; n++) { 30 tmp = tmp + "more"; 31 } 32 cat = tmp; 33 } 34 35 public static void makeStringWithBuffer() { 36 StringBuffer sb = new StringBuffer(); 37 for (int n=0; n<loop; n++) { 38 sb.append("more"); 39 } 40 cat = sb.toString(); 41 } 42 public static void main(String[] args) { 43 long begin = System.currentTimeMillis(); 44 45 if (null != 46 System.getProperty("WaitForProfiler")) { 47 System.out.println( 48 "Start your profiler, then press any key to begin..."); 49 try { 50 System.in.read(); 51 } 52 catch (IOException ioe) { 53 } 54 } 55 56 makeString(); 57 makeStringInline(); 58 makeStringWithLocal(); 59 makeStringWithBuffer(); 60 61 long end = 62 System.currentTimeMillis(); 63 System.out.println("Total run time of " 64 + (end - begin) + " milliseconds"); 65 } 66 }
The program tests four different ways to build a string through concatenation.
Here is a sample invocation of the program, with profiling (I'll talk more about the -classic flag below):
java -classic -Xrunhprof:cpu=samples,depth=10,thread=y com.develop.demos.TestHprof
hprof
outputBriefly examining the hprof
output,
java.hprof.txt
,
shows that understanding the performance of this application is
a difficult problem. We see that most of the time is being spent
in StringBuffer.expandCapacity()
- but what does that
tell us about the relative efficiency of makeString()
,
makeStringInline()
, makeStringWithLocal()
,
and makeStringWithBuffer()
? To figure out which
makeString
methods are most efficient requires some
detective work.
We see in trace #25, which accumulated 233 profiling ticks, that
StringBuffer.expandCapacity()
was performed on behalf of
the makeStringWithLocal()
method. But that's not the
total cost of makeStringWithLocal()
; that method appears in 7
other traces. Tracking down its total cost from the hprof
output is tedious and difficult work.
hprof
Output With JPerfAnal
The JPerfAnal
tool simplifies this analysis in two
ways:
It interprets the profile data to report the total cost of methods.
It presents its results in a graphical format to assist in your analysis.
To begin, run JPerfAnal
on the profiling data:
java -jar JPerfAnal.jar java.hprof.txt
Here's a view of the main GUI:
The four quadrants contain four different analyses of the data,
presented with the familiar Swing JTree
interface. Let's
take a closer look at each quadrant.
The upper-left quadrant ranks and analyzes CPU usage by caller.
It ranks the time spent in various methods, inclusive of
subroutine calls. Not surprisingly, TestHprof.main()
takes
the most time. We can open up branches of the tree to explore where
main
is spending its time:
We see that TestHprof.main()
spent 34.34%
of its time in TestHprof.makeString()
, 32.45% of
its time in TestHprof.makeStringWithLocal()
, and
so on. Examining the first method in more detail show us that
TestHprof.makeString()
spent most of its time in
StringBuffer.append()
.
The lower left quadrant shows the same ranking we saw in the
upper left quadrant, but this tree organizes the analysis
by callee. It answers the question: on whose behalf is a method
spending time? Here is the result of exploring the call graph for
StringBuffer.append()
:
This answers the question: on whose behalf are we spending 97.85% of
application time in StringBuffer.append()
?
All but one of the makeString
methods is a heavy user
of StringBuffer.append()
.
The upper right quadrant shows, yet again, the same ranking of CPU usage by method, inclusive of calls. Here, however, the usage is broken down by line number:
The analysis shows, line by line, where main
(or any
other method) is spending its time, inclusive of called methods.
Finally, the lower right quadrant answers the question: where is the CPU really spending its time? This tree ranks method usage exclusive of subroutine calls:
We see that, when we do not include subroutine calls, line 202 of
StringBuffer.addCapacity()
is where our program spends
fully 91.16% of its time.
Each of these analyses, individually, provides important information
about where execution time is being spent. JPerfAnal
provides
an additional feature that allows you to correlate the information shown
in the various windows.
For example, the last figure showed us spending 5.93% of execution
time in System.arraycopy
. What is taking that time?
By right-clicking on that entry, we pop up a menu that allows us to
correlate the information:
By choosing Goto this Method
, we position all four
quadrants to examine this method. Returning to the lower-left quadrant,
we can explore in detail who is calling System.arraycopy
:
The right-click popup menu offers two additional features.
If you choose Select a Method to Analyze
, you are
presented with a dialog that allows you to select a
method by name:
After you select a method, all four quadrants will be positioned to analyze that method.
And if you choose Select Thread(s) to Analyze
from the
popup, you can choose one or more threads from a dialog:
After you choose, the analysis will be recomputed to report only on the selected threads.
Finally, JPerfAnal
allows you to save the analysis
data. The File
menu (in the menu bar) gives you a
Save...
item that saves all the processed information into
a textfile.
In the example discussed above, we captured the profiling data using a JVM running with green threads (the -classic option). The choice of threading model and JIT can have an important effect on your results. The reason for this choice is unfortunate: JVMPI doesn't work everywhere. When running the Sun JVM on Linux, the HotSpot performance engine fails to capture most performance data. I have no idea when or if this will be fixed.
It can also be useful to turn of JIT compilation if you care about the per-line-number analysis: line number information is sometimes lost in JIT-compiled code.
hprof
Option DependenciesCertain hprof
options have important effects on the
JPerfAnal
analysis:
cpu=samples
: This option is necessary to
capture the data JPerfAnal
needs.
depth=
: This option controls the depth
of the stack traces saved during profiling. You may need to
experiment with this value: large values produce more accurate
information, but increase the memory and CPU requirements of
JPerfAnal
.
If the value you choose for stack depth is too small,
JPerfAnal
will not be able to fully ascertain CPU
usage by subroutines. Notice, for example, how the times reported
for the various callers in the lower left quadrant add up to
the time reported for the callee. If they do not add up, you
have not captured enough stack depth. There is no right
amount of stack depth--it just depends on how far down the call
stack you need to analyze. In my own profiling, I usually start
with a depth of 10 and refine it as needed.
thread=y
: You must specify this option if you
want to be able to analyze the data by thread.