Why is OpenWebBeans so fast?
2012/03/20 2 Comments
The Speed-King
Apache OpenWebBeans [1] is considered the Speed-King of dependency injection containers. A blog post of Gehard Petracek showed this pretty nicely [2][3]. But why is this? During performance tests for the big EE application I developed with colleagues since late 2009 (40k users, 5 mio pages/day) we came around a few critical points.
1. Static vs Dynamic
The CDI specification requires to scan all the classes which are in locations which have a META-INF/beans.xml marker file and store them in a big list of Bean instances. This information will only get scanned at startup, checked for inconsistency and further optimized. At runtime all the important information is immediately available without any further processing.
This is afaik different in Spring which allows some kind of ‘dynamic’ reconfiguration. Afaik you can switch some parts of the bean configuration programmatically at runtime. This might be neat for some usecases (e.g. scripting integration) but definitely requires the container to do much more work at runtime. It also prevents to apply some kind of very aggressive caching.
2. Reduce String Operations
In dependency injection containers you often store evaluation results in Maps. We pretty often have constellations similar to the following example:
java.lang.reflect.Method met = getInjectionMethod(); Class clazz = getInjectionClass(); String key = clazz.getName() + "/" + met.getName() + parametersString(met.getParameterTypes()); Object o = map.get(key);
This obviously is nothing I would ever write into a piece of performance intense code! The first step is to use a StringBuilder instead
Using StringBuilder
StringBuilder sb = new StringBuilder();
sb.append(clazz.getName()).append('/').append(met.getName()).append('/');
appendParameterTypes(sb, met.getParameterTypes());
String key = sb.toString();
This is now twice as fast. But still way too slow for us 😉
Increase StringBuilder capacity
If I remember the sources correctly StringBuider by default has a internal buffer capacity of 16 bytes and gets doubled every time this buffer exceeds. In our example above this most times leads to expanding the internal capacity 3 or 4 times (depending on the lenght of the parameters, etc). Every time the capacity gets extended the internal String handling must perform a alloc + memcpy(newbuffer, oldbuffer) + free(oldbuffer). We can reduce this by initially allocating more space already:
StringBuilder sb = new StringBuilder(200);
sb.append(clazz.getName()).append('/').append(met.getName()).append('/');
...
This already helps a lot, the code now runs 3x as fast as with the default StringBuilder(). The downside is that you always need 200 bytes for each key at least. And we are still far from what’s possible performance wise.
Hash based solution
Another solution looks like the following:
long key = clazz.getName().hashCode() + 29L * met.getName().hashCode()
for(Type t:met.getParameterTypes()) {
key += 29L * t.hashCode();
}
Object o = map.get(key);
This performs much faster than any String based solution. Up to 50 times for 3 String operands to be more precise. Even more if more Strings need to be concatenated. Not only is the key creation much faster because it needs zero memory allocation. It also makes the map access faster as well. The downside is that the hash keys might clash.
Own Key Object
The solution we use in OWB now is to create an own key object which implements hashCode() and equals() in an optimized way.
3. ELResolver tuning
Expression Language based programming is excessively used in JSP and JSF pages and also in other frameworks. It is fairly easy to use, plugable and integrates very well in existing frameworks. But many people are not aware that the EL integration is pretty expensive. It works by going down a chain of registered ELResolver implementations until one of them found the requested object.
Nested EL calls
We will explain the impact of the ELResolver by going through a single EL invocation. Imagine the following line in a JSF page:
<inputText value="#{shoppingCart.user.address.city}">
How many invocations to an ELResolver#getValue() do you think gets executed?
The answer is: many! Let’s say we have 10 ELResolvers in the chain. This is not a fantasy value but comes pretty close to the reality. While evaluating the whole expression, the EL integration splits the given EL-expression parts on the dots (‘.’) and starts the resolving with the outerleft term (“shoppingCart”). This EL-part will be passed down the ELResolver chain until one ELResolver knows the given name. Since the cheap EL-Resolvers which have a high hit-ratio are usually put first our CDI-ELResolver will be somewhere in the middle. This means that we already got approximately 5 other ELResolver invocations before we find the bean…
The Answer: our single EL expressin will roughly perform 30 ELResolver invocations already!
But what can we do to improve the performance of our very own WebBeansELResolver at least?
EL caching
The OpenWebBeans WebBeansELResolver (or any CDI containers ELResolver) will typically execute the following code to get a bean:
Set<Bean> beans = beanManager.getBeans(name); CreationalContext creationalContext = beanManager.createCreationalContext(bean); Bean bean = beanManager.resolve(beans); Object contextualReference = beanManager.getReference(bean, Object.class, creationalContext);
This gives us quite a few things we can cache.
a.) we can cache the found Bean.
b.) for NormalScoped beans (most scopes, except @Dependent) we can even cache the contextualReference because in this case we always will get a Proxy anyway (see the spec: ‘Contextual Reference’).
Negative caching
Searching a bean is expensive. Searching it and NOT finding anything is even more expensive!
To prevent our ELResolver from doing this over and over again, we also cache the misses. This did speed up the OpenWebBeans EL integration big times.
4. Proxy tuning
One of the most elaborated parts in OWB is the possibility to configure custom proxies for any scope.
The job of a Contextual Reference
Proxies in CDI are used for implementing Interceptors, Serialization and mainly for on-the-fly resolving the underlying ‘Contextual Instance’ of a ‘Contextual Reference’. You can read more about this in the latest JavaTechJournal [4] ‘CDI Introduction’ article. In this last part the proxy will lookup the correct Contextual Instance for each method invocation and then redirect the method invocation to this resolved instance. In OpenWebBeans this is done in the NormalScopedBeanInterceptorHandler.
But is it really necessary to always perform this expensive lookup?
Caching resolved Contextual Instances
Let’s take a look at a @RequestScoped bean. Once the servlet request gets started the resolved contextual instance doesn’t change anymore until the end of the request. So it should be possible to ‘cache’ this contextual reference and clean the cache when the servlet request ends. OpenWebBeans does this by providing an own Proxy-MethodHandler for @RequestScoped beans, the RequestScopedBeanInterceptorHandler, which stores this info in a ThreadLocal:
private static ThreadLocal<HashMap<OwbBean, CacheEntry>> cachedInstances
= new ThreadLocal<HashMap<OwbBean, CacheEntry>>();
An @ApplicationScoped bean in a WebApp doesn’t change the instance at all once we resolved it. Thus we use an ApplicationScopedBeanInterceptorHandler [5] to even cache more aggressively.
Cache your own Scopes
As Gerhards blog post [2] shows we can also use an OWB feature to configure the builtin and also custom Proxy-MethodHandlers for any given Scope. The configuration is done via OpenWebBeans own configuration mechanism explained in a previous blog post.
Simply create a file META-INF/openwebbeans/openwebbeans.properties which contains a content similar to the following:
org.apache.webbeans.proxy.mapping.javax.enterprise.context.ApplicationScoped=org.apache.webbeans.intercept.ApplicationScopedBeanInterceptorHandler
The config looks like the following:
org.apache.webbeans.proxy.mapping.[fully qualified scope name]=[proxy method-handler classname]
Summary
Why the hell did we put so much time and tricks into Apache OpenWebBeans?
The answer is easy: Our application shows study codes of curricula on a single page. Up to 1600 lines in a <h:dataTable> resulting in > 450.000 EL invocations! We tried this with other containers which used to take 6 seconds to render this page.
With Apache tomcat + Apache OpenWebBeans + Apache MyFaces-2.1.x [6] + JUEL [7] + Apache OpenJPA [8] we are now down to 350ms for this very page …
have fun and LieGrue,
strub
PS: we already explained a few of our tricks to our friends from the Weld community. This resulted in their @RequestScoped beans getting 3 times faster and now being almost as fast as in OWB 😉
[1] Apache OpenWebBeans
[2] os890 blog 1
[3] os890 blog 2
[4] Java-Tech-Journal CDI special
[5] ApplicationScopedBeanInterceptorHandler.java
[6] Apache MyFaces
[7] JUEL EL-2.2 Implementation
[8] Apache OpenJPA