Showing posts with label reflection. Show all posts
Showing posts with label reflection. Show all posts

Wednesday, January 26, 2011

JVM Performance, Part II - JVM Bazaar

As promised, I have results from running some of the tests in other JVMs. The most interesting results were from the Map test and the Method execution test, so I have re-run those on various JVMs and environments.

The first result comes from running with Oracle (Sun) JDK 1.6.23 on the same Windows XP machine. Next I ran on Oracle (Sun) JDK 1.5.5, and then Oracle JRockit 1.6. I then ran the tests on another machine, an old 4 cpu 8 core AMD machine running Linux. The results are very surprising, each different machine and environment seemed to have very different behavior.

Note: These tests are not in any way attempting to compare Windows vs Linux performance, or Oracle Sun JDK vs Oracle JRockit, completely different hardware, JVMs and environments are used.  The goal is to see how different operations compare on the same JVM, and how one JVM and environment can differ with another.  Some of the Windows results are faster most likely because the machine is newer, and a faster CPU, some of the Linux results are faster possibly because the machine has 8 CPUs instead of 2, so can process garbage collection on other threads.

The test uses a Map of size 100 testing instantiation of the Map, 100 puts and 100 gets. The average result is included. The %DIF is between the Map and the Hashtable results for that same JVM.

Map Operation Performance Comparison

MapWXP-JDK1.6.7%DIFWXP-JDK1.6.23%DIFWXP-JDK1.5.5%DIF
Hashtable3572290%3796730%2199610%
HashMap274587-30%403501+6.2%238916+8.6%
LinkedHashMap269840-32%356207-6.5%228787+4.0%
IdentityHashMap110801-222%121691-211%60670-262%
ConcurrentHashMap119068-200%183159-107%109912-100%
HashSet281960-26%407709+7.3%2289364.0%

MapLX-JDK1.6.20%DIFLX-JDK1.5.5%DIFLX-JRK1.6.5%DIF
Hashtable3432500%3072270%4743780%
HashMap496258+44%375805+22%642267+35%
LinkedHashMap476535+38%417273+35%595331+25%
IdentityHashMap552437+60%478350+55%716368+51%
ConcurrentHashMap293529-16%299675-2.5%331730-43%
HashSet451615+31%381843+24%633824+33%

The first good thing to notice is that for the same machine, the results get better with newer JVM's.  That is very nice, and something that most Java developers are familiar with, each new JVM version having better performance than the previous.

The second things to notice is that the Map types have huge variations based on the JVM.  HashMap was faster than Hashtable in 1.6.23 and even 1.5.5, but somehow slower in 1.6.7.  I found this very perplexing, but no matter how many times, or in which order I ran the tests the results were the same.  I also compared the code between the releases, and did not find any code changes that could account for the change in performance.  I assume it has to do with how the JVM manages memory and synchronized methods.  One would think that the code for Hashtable, HashMap, and IdentityHashMap was all the same, since the functionality is the essentially identical, but in fact they share none of the same code, and use quite different data structures.  I assume this accounts for the difference in performance under the different JVMs, so that none of them is inferior, they are just different.

Note that these tests are still single threaded tests.  So although they are comparing the raw performance difference between Hashtable and HashMap, the far bigger issue is the affect Hashtable's synchronized methods have on concurrency.  This is also true for ConcurrentHashMap, altough its raw throughput seems to be worse, it concurrency is far better, and in a mutli-threaded environment it will fair much better.  I will hopefully explore this in another blog post.

Method Execution Performance Comparison

Execution typeWXP-JDK1.6.7%DIFWXP-JDK1.6.23%DIFWXP-JDK1.5.5%DIF
Normal255331300%296087200%185579410%
synchronized13383707-93%14373780-105%1254183-1379%
Block synchronized8244087-203%8812159-235%1253624-1380%
final26873873+6.1%30812911+4.0%18853714+1.5%
In-lined26816109+5.2%30815433+4.0%19734119+6.3%
volatile1503727-1539%3290550-799%1448561-1181%
Reflection159069-1564%145585-20237%82713-22336%

Execution typeLX-JDK1.5.5%DIFLX-JDK1.6.20%DIFLX-JRK1.6.5%DIF
Normal46667530%48403610%42579950%
synchronized2250871-107%4413458-9.6%3465398-22%
Block synchronized2252474-107%4410721-9.7%3672440-15%
final46663840.0%4911130+1.4%4392700+3.1%
In-lined46685380.0%48745390.7%4042412-5.3%
volatile3236491-44%3231105-49%3032121-40%
Reflection2645911-76%2568412-88%132232-3120%

The first good thing to note again is that newer JVM have for the most part better performance.  This is one of the great aspects of developing using the Java platform.

The second thing to notice is once again the behavior for the various JVMs is very different.  They all seem to be consistent in that synchronized methods, volatile and reflection are slower, but the degree of the difference is pretty major.  Synchronized methods seem to be much better in JDK 1.6 vs 1.5, but in the latest Linux JVM almost have no overhead at all.  Reflection also improved a lot from 1.5 to 1.6, but again the Linux overhead is less than a tenth of the Windows JVM results (except for JRockit).  Volatile has similar differences.  This could have more to do with the hardware than the OS or JVM, as the Linux machine has 8 CPUs, so may be doing some memory managment using its other CPUs (but this does not explain the JRockit result...).

Summary

So what have we learned?  I think the most important thing is that different JVMs and environment can have very different behavior, so it is important to test in the environment that you will go into production in.  If you do not have that luxury, then it is important to test in a variety of different environments to ensure you are not making trade offs in one environment that could hurt you in another.

I would not get too obsessed with worrying about how your applications performance may differ in different environments.  By in large, the main performance optimizations of reducing CPU usages, reducing message sends, reducing memory and garbage, improving concurrency will improve your application's performance not matter what the environment is.

Friday, December 17, 2010

What is faster? JVM Performance

Java performance optimization is part analysis, and part superstition and witch craft.

In deciding how to optimize their applications a lot of developers search the web for what people say is faster, or go by what they have heard from their co-workers sister's boyfriend's cousin, or by what seems to be the "public opinion". Sometimes the public opinion is correct, and their application benefits, and sometimes it is not, and they waste their time and effort and make little performance improvement, or make things worse.

True performance optimization involves measuring the current performance. Profiling the application and determining the bottlenecks. Investigating and implementing optimizations to avoid the bottlenecks.

But how should one code on a day to day basis for optimal performance? Are synchronized methods slow? Are final methods fast? What is faster Vector, ArrayList or LinkedList? What is the optimal way to iterate a List? Is refection slow? I will attempt to answer these questions in this post.

In the EclipseLink project we are generally very concerned about performance, as we are a persistence library used by other applications, so like the JDK we are part of the plumbing and need to be as optimized as possible.

We have a number of performance test suites in EclipseLink, mainly to measure our persistence performance, but we have one test suite that has nothing to do with persistence. Our Java performance test suite just measures the performance of different operations in Java. These tests help us determine how to best optimize our code.

The tests function by running a certain operation for a set amount of time and measuring the number of operations in the time period. Next the next operation is run and measured the same way. This is then repeated 5 times, and the max/min values are rejected, the average of the middle 3 is computed, the standard deviation is computed, and the averages of the two operations are compared. Tests are single threaded.

The tests were run on Oracle Sun JDK 1.6.0_07 on 32 bit Windows.

Maps

There are several different Map implementations in Java. This test compares the performance for various sizes of Maps. The test instantiates the Map, does n (size) puts, then n gets and n removes.

Map Operation Performance Comparison

MapSizeAverage (operations/10 seconds)%STD%DIF (with Hashtable)
Hashtable1036052350.03%0%
HashMap1029088540.02-23%
LinkedHashMap1025944920.03%-38%
IdentityHashMap1013462780.01%-167%
ConcurrentHashMap1010092590.0%-257%
HashSet1029272540.02%-23%
Hashtable1003572290.01%0%
HashMap1002745870.03%-30%
LinkedHashMap1002698400.03%-32%
IdentityHashMap1001108010.02%-222%
ConcurrentHashMap1001190680.01%-200%
HashSet1002819600.04%-26%
Hashtable1000340346.2%0%
HashMap1000278180.06%-22%
LinkedHashMap1000255140.03%-33%
IdentityHashMap1000116500.04%-192%
ConcurrentHashMap1000124202.9%-174%
HashSet1000268880.04%-26%

These results are quite interesting, I never would have expected such a big difference in performance between classes doing basically the same well defined thing, that has been around for quite some time. It is surprising that Hashtable performs better than HashMap, when HashMap is suppose to be the replacement for Hashtable, and does not suffer from its limitation of using synchronized methods, which are suppose to be slow.  Note: After exploring other JVMs in my next post, it seems that this anomaly only exists in JDK 1.6.7, in the latest JVM, and most other JVM, HashMap seems to be faster than Hashtable.

I would expect LinkedHashMap and ConcurrentHashMap to be somewhat slower, as they have additional overhead and perform better in specific use cases, but it is odd that IdentityHashMap is so much slower, given my test object did not define a hashCode(), so was using its identity, so the map was doing the identical thing as HashMap and Hashtable. Also interesting that HashSet has the same performance as HashMap, given it in theory could be a simpler data structure (in reality it is a subclass of HashMap, so performs the same).

Note that these are single threaded results. Although Hashtable outperformed HashMap in this test, in a multi-threaded (and multi-CPU) environment, Hashtable would perform much worse because of the method synchronization. Assuming read-only access of coarse, as concurrent access to a HashMap would blow up. ConcurrentHashMap does perform the best in a concurrent read-write environment.

Lists

For Maps, Hashtable turned out to have the best performance. Lets now investigate Lists, will the old Vector implementation have better performance than ArrayList? The list test instantiates a list, then does n (size) adds followed by n gets by index.

List Operation Performance Comparison

ListSizeAverage (operations/10 seconds)%STD%DIF (with Vector)
Vector10116254530.006%0%
ArrayList10184484530.08%+56%
LinkedList10123622900.03%+6.0%
Vector10012414200.2%0%
ArrayList10018935810.1%+52%
LinkedList10010127400.01%-22%
Vector10001321820.2%0%
ArrayList10002239690.1%+69%
LinkedList1000126890.02%-941%

So, it seems that ArrayList is much faster than Vector. Given both Vector and Hashtable are synchronized, I assume it is not the synchronized methods, just the implementation. LinkedList surprisingly performs as good as Vector for small sizes, but then performs much worse on larger sizes because of the indexed get. This is expected as LinkedList performs good in specific use cases, such as removing from the head or middle, but this was not tested.

Iteration

There are many way to iterate a List in Java. For a Vector a Enumeration can be used, or an Iterator in any List. A simple for loop with an index can also be used for any List. Java 5 also defines a simplified for syntax for iterating any Collection. This test compares the performance of iteration, the List has already been pre-populated.

Iteration Performance Comparison

ListAverage (operations/10 seconds)%STD%DIF (with Vector)
for index (Vector)60792720.06%0%
for index (ArrayList)60483240.03%-0.5%
Enumeration (Vector)47360490.02%-28%
Iterator (Vector)28400940.05%-114%
Iterator (ArrayList)19351220.008%-214%
for (Vector)28415670.05%-113%
for (ArrayList)19335760.04%-214%

So using a for loop with an index is faster than an Enumerator or Iterator. This is expected, as it avoids the cost of the iteration object instance. It is interesting that a Enumerator is faster than an Iterator, and a Vector Iterator is faster than an ArrayList Iterator. Unfortunately there is no magic in the Java 5 for syntax, it just calls iterator().

So the optimal way to iterate a List is:

int size = list.size();
for (int index = 0; index < size; index++) {
    Object object = list.get(index);
    ...
}

However, the Java 5 syntax is simpler, so I must admit I would use it in any non performance critical code, and maybe some day the JVM will do some magic and for will have better performance.

Method Execution

A method in Java can be defined in several different ways. It can be synchronized, final, called reflectively, or not called at all (in-lined). This next test tries to determine the performance overhead to a method call. The test executes a simple method that just increments an index. The test executes the method 100 times to avoid the test call overhead. For the reflective usage the Method object and arguments array are cached.

Method Execution Performance Comparison

MethodAverage (100 operations/10 seconds)%STD%DIF (with normal)
Normal255331300.7%0%
synchronized133837074.3%-93%
Block synchronized82440874.7%-203%
final268738731.2%+6.1%
In-lined268161091.5%+5.2%
volatile15037270.1%-1539%
Reflection1590693.2%-15646%

Interesting results. Synchronized methods are slower, and using a synchronized block inside a method seems to be slower than just making the method synchronized. Final methods seem to be slightly faster, but very minor, and seem to have the same improvement as in-lining the method, so I assume that is what the JVM is doing.

Calling a method through reflection is significantly slower. Reflection has improved much since 1.5 added byte-code generation, but if you profile a reflective call you will see several isAssignable() checks before the method is invoked to ensure the object and arguments are of the correct type, it is odd that these cannot be handled through casts in the generated code, or typing errors just trapped.

CORRECTION
Originally volatile was showing an improvement in performance because of a bug in the volatile test. After correcting the bug the test now shows a huge overhead in performance. The usage of volatile on the int field that is being incremented in the test method is causing a 15x performance overhead. This is surprising, and much larger than the synchronized overhead, but still smaller than the reflection overhead. This is especially odd as the field is an int which one would think is atomic anyway. I imaging the affect the volatile has on the JVM memory usage is somehow causing the overhead. In spite of the overhead, I would still use volatile when required, in concurrent pieces of code where the same variable is concurrently modified. I have done multi-threaded tests comparing its concurrency with synchronizing the method, and using volatile is better than using synchronization for concurrency.

Note that this difference in cost of execution is on a method that does basically nothing. Generally, performance bottlenecks in applications are in places that do something, not nothing, and it is the something that is the bottleneck, not how the method is called. Even in the reflective call, the cost was less than 1 millionth of a second, so on a method that toke 1 millisecond the overhead would be irrelevant. Also, I assume these results are very JVM specific, other JVMs, older JVMs, and the JVMs of the future may behave much differently.

Reflection

There are two types of reflection, field reflection and method reflection. In JPA this normally corresponds to using FIELD access of PROPERTY access (annotating the fields or the get methods). Which is faster? This test sets a single variable, either directly, by calling a set method, or through field or set method reflection.

Reflection Performance Comparison

TypeAverage (operations/10 seconds)%STD%DIF (with in-lined)
In-lined441057300.7%0%
Set method461896381.2%+4.7%
Field Reflection234889871.4%-87%
Method Reflection108626840.8%-306%

So its appears that in JDK 1.6 field reflection is faster than set method reflection. Part of this difference is that for method reflection an Object array must be created to call the set method, where as field reflection does not require this. Method reflection improved a lot in JDK 1.5, but it still seems slower than field reflection, at least for set methods. Both have an overhead over a compiled method execution or direct variable access.

If you look at the results the difference between reflection and normal execution is much less than in the previous test. This is because in the previous test the method was executed 100 times inside the test, so the overhead of calling the test method was reduced, where as this test only called it once, so had the test method call overhead in it. This highlights an important fact, that how you call the method that sets the field has a big impact on the performance. If you use a complex layering of interfaces and generated code to call a set method directly, versus calling it reflectively, the overhead of the layering may be worse than the reflective call.

In EclipseLink field reflection is used by default, but it depends if you annotate your fields or get methods. If you annotate your get methods, then method reflection is used. If you use weaving (enabled through either, the EclipseLink agent, static weaving, JEE, or Spring) and use FIELD access, then EclipseLink will weave in a generic get and set method into your classes to avoid reflection. This amounts to a minor optimization in the context of database persistence, and can be disabled using the persistence unit property "eclipselink.weaving.internal"="false".

Summary

So this post has presented a lot of data on how different things perform in Oracle Sun JDK 1.6 on Windows. It would be interesting to see how the same tests perform in different JVMs in different environments. Perhaps I will investigate that next.

The source code to the above tests is in the EclipseLink SVN repository,
here