The 5 groups of people involved in IT hypes

Lately AI is all over the place. It’s THE IT hype of today. As was bigdata, cloud, microservices, NoSQL, etc. Every one of those hypes has really good use cases and brings something new to our toolbox. On the other hand it’s a complete overkill for 80% of usages they are currently trying to force it upon.

That brought me to think about who drives what. At the end I came to the conclusion that those folks can be divided into roughly 5 different groups. Of course I slightly exaggerate some arguments to make them clearer.

1: The ‘Analysts’

No, I’m not talking about IT analysts but venture capital analysts! Let’s face it: in the western world we do a lot less own production nowadays (those mostly got moved to cheap labour countries) but instead have a lot of jobs in the service sector and financial market, etc. Part of those could also be called ‘unproductive jobs’. IT – partly – falls into this category. Not because what we programmers do isn’t worth anything, but because we are often a cue ball in the game of big $$$ investment plans. Every new IT hype is fuelling the blue chip stock market. Naturally they are interested in pushing it even further to increase their share prices. And if one hype is ‘milked’ they move on pushing another hype to boost their stock shares.

And the big consulting companies (Gartner, etc), IT magazines, et al want to make good cheap money themselves. So they are helping them by recommending technology (which they not even closely understand themselves) down to the next layer, which is:

2: The CTOs and ‘Bosses’

When you have dozens of Articles about a new IT hype all over the place and other CTOs raving about them in the golf club there is barely a CTO which will freely admit “Nah, I just wait until I learn more about it, because I really don’t understand it yet”. What they most likely will do is to go to their own management and say “Mr CEO or technical manager, look at that AI stuff, we need it too, because our competitor uses it as well and will overtake us!”. Do they have a clue what they really want and why the want it? Hell, surely (in almost all cases) not!

Of course most of the time there are often good use cases. And if you have a very good technical management team (or down the line someone who they listen to) they will try to implement those use cases first. But it’s not guaranteed. Actually most times it’s just bonkers what they try to achieve with this new hype. How many companies really did have a need for big data yet still many did have an Apache Hadoop installation. Don’t get me wrong: Apache Hadoop is STILL extremely good IF you have the right use case for it – but for most companies it was really bonkers and overkill!

3: The Copy-Coders

So most times these totally out-of-nowhere requirements are pushed down to the development team. And as we all know there is a big spread in quality of developers. Of course there are developers who should imo not touch productive code, but most are really ok-ish. Most of the time it’s the environment they work in which makes them copy-coders. Or as I used to call them: Stackoverflow-Programmers. Btw, those are right now moving from being Stackoverflow-Copy-Coders to “Vibe-Programmers”. That means: use ChatGPT, Claude, Copilot, etc to let it suggest code which you then take or adopt based on your gut feeling. In both cases the resulting quality is nÀÀÀh not so good. With Stackoverflow you at least get a really good explanation along the solution most times. Of course someone must have the time and will to understand this explanation, but this is now missing with AI-assisted coding altogether.

Usually this leads to a working solution (after a ton of try-and-error iterations). The downside is that the source code now does not only contain the solution but tons of additional code blocks and logic which got just copied over as well but is actually useless. In the best case it only makes maintenance much harder. But more often than that this superfluous code introduces problematic paths and side effects. Yes, it’s a quick way to come to decently working solution, but be aware that it also introduces technical debt as explained by Ward Cunningham (one of the big IT superheroes, read up on his inventions) who coined that term: https://www.youtube.com/watch?v=pqeJFYwnkjE

4: The Informed Programmer

There are always a few people in a company who try to not only get their tasks done but to truly understand what they are doing. They want to understand the ‘tricks’ behind the APIs they are using. Those are the seniors in the team (regardless of their biological age, I’ve seen excellent ‘seniors’ (and ‘seniorinas’!) which are in their 20s!). What sets them apart is that they really want to understand what they are doing.

Now I have to step back a bit and try to explain what I try to bring across whenever I do some education, be it a conference talk, a lecture or just explaining to some fellow co-workers:
Imagine seeing a trick shown by a magician. If you see it first it really IS magic – often seemingly defeating physics. The same happens on the the 2nd and further time you see. It IS magic! But once the trick gets explained to you then you can not unsee it any more! All the magic is gone, you now KNOW how it is done! You went from “known unknown” to “known known” (see also: Rumsfeld Matrix)

What most programmers do when using some random API is like they are watching a magician do his tricks. They do roughly know how to use it, what to put into and what they get out – but they do not understand what it really does. It’s magic to them! Or as physicicist Arthur Clarke put it:

“Any sufficiently advanced technology is indistinguishable from magic”

Once you understand the trick you can most times judge what side effects and limitations some new technology has. And when to NOT use it as well πŸ™‚

5: The Implementors

I mention this just for completeness. There are people who really invent and implement those algorithms, libraries and frameworks. Of course they do know how this stuff works – and they are also well aware what they do NOT yet cover or not yet understand themselves. And that’s a big part. To bring another quote, this time from one of the first computer scientists David Wheeler:

“All problems in computer science can be solved by another level of indirection.”

In fact our high level computer languages are just a pile of hundreds of layers of abstractions built upon another. From APIs, the actual implementation down to machine instruction, transistors or FETs down to semiconductor physics and atoms, electrons, quarks and gluons. And if you implement fundamentally new frameworks you have to go down these abstractions at least to some degree. And this is obviously complex. Or to put it in my own words (as a person who wrote quite a few specs and libraries in the JavaEE area):

“Behind every problem detail you have to solve
there are at least 3 more problems hiding
(which you also have to solve).”

Which of course a stock market analyst doesn’t care much about…

To beans or not to beans.xml?

When to use a beans.xml file in your CDI application?

Today there was a discussion on Twitter whether beans.xml can simply be left away these days. This feature actually exists since the CDI-1.1 specification but results in a slightly different handling of your application.

META-INF/beans.xml present

In the old CDI-1.0 days one always had to add a META-INF/beans.xml file as a marker.

If a classpath entry has a META-INF/beans.xml file present then this is called an Explicit Bean Archive.

If we have an Explicit Bean Archive and no further bean-discovery-mode is defined, every class which is a candidate to be a bean will get picked up during class scanning and a ProcessAnnotatedType CDI Lifecycle event (PAT event) is being fired for it. There are not many classes which will not get handled that way. E.g. no PAT event will get fired for a non-static inner class. But you will even get a PAT for interfaces or abstract classes and enums for example!

And this feature (PAT for interfaces, etc) is often used via CDI Extensions to collect information or automatically register Custom Beans. E.g. the Apache DeltaSpike @MessageBundle mechanism is based on it.

Oh, there is another implication of this mode. Every class which gets scanned and is a potential bean candidate will automatically get registered as @Dependent scoped Bean. This is a part which I really don’t like because it fills up your BeanManager with mostly useless garbage. This is especially bad for jars where you have 100s of JavaSE classes but only a few CDI features on top of it. I will later show how to solve this problem.

JAR without any beans.xml

If not beans.xml is present (or the bean-discovery-mode is set to annotated) then only classes with a Bean Defining Annotation are picked up. In this case the ProcessAnnotatedType CDI Lifecycle event will also only get fired for those classes. And of course features like @MessageBundle will simply not work in those cases.

Trimmed Bean Archives

In CDI-2.0 we added a new feature. This is somewhat similar to the explicit bean archive as we know it from CDI-1.0 but adds a feature to not automatically pick up all classes as @Dependent scoped beans.

This mode is the trim mode and can be switched on with the following inside your beans.xml file:

<beans>
  <trim/>
</beans>

The effect is the following: Every class will get picked up by the class scanning and a ProcessAnnotatedType CDI lifecycle event is fired. After the event the AnnotatedType will get inspected. If the AnnotatedType has a Bean Defining Annotation it will get registered as Bean<T>. Otherwise it will simply get ignored in the further processing.

This gives us a neat way to still make all the fancy CDI Extension tricks and use PAT as a kind of business class scanning. While at the same time we do not overly pollute our BeanManager with information he will never need again.

toString(), equals() and hashCode() in JPA entities

Many users have generated toString, equals() and hashCode() methods in their JPA entities.
But most times they underestimate what impact that can have.

This blog post post is inspired by a chat I had with Gavin King and Vlad Mihalcea.

Preface: I like to emphase that I take a big focus on keeping the customer code portable across different JPA vendors. Some ‘Uber trick’ might work in one JPA vendor and totally mess up the others. Each JPA provider is broken in it’s own very special way. Trust me, I know what I am talking about from both a user and a vendor perspective… The stuff I show here is the least common denominator for JBoss Hibernate, EclipseLink and Apache OpenJPA. Please shout out if you think some of the shown code does not work on one of those JPA containers.

toString()

What’s wrong with most toString() methods in entities?
Well, most of the times developers just use the ‘generated toString’ shortcut to create this method. And that means that the generated toString() method usually just reads all the attributes of your entity and prints it.

What happens if you touch an attribute really depends in a high degree which ‘mode’ your JPA provider runs in. In Hibernate you often have the pure class. In that case not much will happen if you only read the attributes which are not Collections etc. By ‘using attributes’ I mean this.fieldname and not using getters like this.getFieldname(). Simply because Hibernate does not support lazy loading for any other fields in that mode. However, if you touch a @OneToMany or an @ElementCollection field then you will force lazy loading on the first time toString() gets invoked. It might also behave different if you use the getters instead of reading the attributes.

And if you use EclipseLink, Apache OpenJPA or even Hibernate in byte-code weaving mode or if you get a javassist proxy from Hibernate(e.g from em.getReference()) then you are in even deeper troubles. Because in that case touching the attributes might trigger lazy loading for any other field as well.

I tried to explain how the enhancement or ‘weaving’ works in JPA in a blog post many years ago https://struberg.wordpress.com/2012/01/08/jpa-enhancement-done-right/ Parts of it might nowadays work a tad different but the most basic approach should still be the same.

Note that OpenJPA will generate a toString() method for you if the entity class doesn’t have one. In that case we will print the name of the entity and the primary key. And since we know the state of the _loaded fields we will also not force generating a new PK if the entity didn’t already load one from the sequence.
According to Gavin and Vlad Hibernate doesn’t generate any toString(). I have no clue whether EclipseLink does.

For other JPA implementations than Apache OpenJPA I suggest you provide a toString which looks like the following

public String toString() {
    return this.getClass().getSimpleName() + "-" + getId();
}

And not a single attribute more.

equals() and hashCode()

This is where Vlad, Gavin and I really disagree.
My personal opinion is that you shall not write own equals() nor hashCode() methods for entities.

Vlad did write a blog post about equals() and hashCode() in the past https://vladmihalcea.com/2016/06/06/how-to-implement-equals-and-hashcode-using-the-entity-identifier/

As you can see it’s not exactly easy to write a proper equals() and hashCode() method for JPA entities. Even Vlad’s advanced version does have holes. E.g. if you use em.getReference() or em.merge().
In any case, there is a point where Gavin, Vlad and I agree upon: generating equals() and hashCode() with IDEs is totally bollocks for JPA entities. It’s always broken to compare *all* fields. You would simply not be able to update your database rows πŸ˜‰

IF you like to write a equals() method then compare the ids with a fallback on instance equality. And have the hashCode() always return zero as shown in Vlad’s blog.

Another way is to generated a UUID in the constructor or the getId() method. But this is pretty performance intense and also not very nice to handle on the DB side (large Strings as PK consume a lot more storage in the indexes on disk and in memory)

Using ‘natural IDs’ for equals()

That sounds promising. And IF you have a really good natural ID then it’s also a good thing. But most times you don’t.

So what makes a good naturalId? It must adhere to the following criteria:

  • it must be unique
  • it must not change

Sadly most natural IDs you think of are not unique. The social security number (SSN) in most countries? Hah, not unique! Really, there are duplicates in most countries…
Also often used in examples: the ISBN of a book. Toooo bad that those are not unique neither… Sometimes the same ISBN references different books, and sometimes the same book has multiple ISBNs assigned.

What about immutability? Sometimes a customer does not have a SSN yet. Or you simply don’t know it YET. Or you only know it further down the application process. So the SSN is null and only later get’s filled. Or you detect a collision with another person and you have to assign one of them a new SSN (that really happens more often than you think!). There is also the case where the same physical person got multiple SSN (happens more frequent as well).

Many tables also simply don’t have a good natural ID. Romain Manni-Bucau came up with the example of a Blog entry. What natural ID does a blog entry have? The date? -> Not unique. The title? -> can get changed later…

Why do you need equals() and hashCode() at all?

This is a good question. And my answer is: “you don’t !”

The argument why people think it’s needed for JPA entities is because e.g. having a field like:

@OneToMany 
private Set others;

A HashSet internally of course uses equals() and hashCode() but why do you need to provide a custom one? In my opinion the one you implicitly derive from Object.class is perfectly fine. It gives you instance-equality. And since per the JPA specification the EntityManager guarantees that you only get exactly one single entity instance for a row in the database you don’t need more. Doubt it? Then read the JPA specification yourself:

"An EntityManager instance is associated with a persistence context. A persistence context is a set of entity instances in which for any persistent entity identity there is a unique entity instance."

https://docs.oracle.com/javaee/7/api/javax/persistence/EntityManager.html

An exception where instance equality does not work is if you mix managed with detached entity instances. But that is something you should avoid at any cost as my following examples show.

Why you shouldn’t store managed and detached entities in the same Collection

Why would you do that? Instead of storing entities in a Set you can always use a Map. In that case you again don’t need any equals() nor hashCode() for the whole entity. And even then you might get into troubles.

One example is to have a ‘cache’.
Say you have a university management software which has a Course table. Courses get updated only a few times per year and only by some administrative people. But almost every page in the application reads the information. So what could be more reasonable as to simply store the Course in a shared @ApplicationScoped cache as Map for say an hour? Why don’t I use the cache management provided with some JPA containers? Many reasons. First and foremost they are not portable. They are also really tricky to configure (I’m talking about real production, not a sample app!). And you like to have FULL control over the cache!

So, having a cache is really a great idea, but *please* do not store JPA entities in the cache. At least not as long as they are managed. All is fine as long as you only run it locally and click around on your app and only do unit tests. But under heavy load in production (our app had 5 Mio page hits/day average) you will hit the following problem:

The JPA specification does not allow an EntityManager to be used from multiple threads at the same time. As a managed entity is bound to an EntityManager, this limitation also affects the entities themselves.
So while you do the em.find() and later a coursesCache.put(courseId, course) the entity is still in ‘managed’ mode! And under heavy load it *will* happen that another user gets the still managed entity from the cache before it got detached (which happens at the tx commit or request end, depending on your setup). Boooommm it goes…

How can you avoid that? Simply use a view object. Normally the full database entities with all their gory attribute details and sub-tables are not needed on an overview course list anyway. So you better use a ‘new’ query:

CourseListVO couseViewItem 
  = em.createQuery("SELECT NEW org.myproject.Course(c.id, c.name, c.,...) " +
      " FROM Course AS c WHERE...");
cache.put(courseId, courseViewItem);

By using a ‘new Query’ you will get instances which are not managed by the container. And it’s also much faster and consumes less memory btw.

Oh I’m sure there are things which are still not cosidered yet…

PS: this is not an easy topic as you might be able to judge from looking at the involved people. Gavin is the inventor of Hibernate and JPA, Vlad is the current Hibernate maintainer. And I was involved in the DODS DB layer of Lutris Enhydra in the 90s and am a long time Apache OpenJPA committer (and even the current PMC chair).

Applying interceptors to producer methods

Interceptors are really cool if you have a common problem and like to apply it to without making every single colleague copy the same code over again and again to apply a solution over the whole code base.

In my case it was the urge to log out SOAP and REST invocations to other systems. I also like to add a logCorrelationId via HTTP header to each outgoing SOAP call. You can read more about the background over in my other logCorrelation blog post.

I’ll focus on integrating SOAP clients, but you can easily do the same for REST clients as well.

Integrating a SOAP client in an EE project

Usually I create a CDI producer for my SOAP ports. That way I can easily mock them out with a local dummy implementation by just using CDI’s @Specializes or @Alternative. If you combine this with with Apache DeltaSpike @Exclude and the DeltaSpike Configuration system then you can even even enable those Mock via ProjectStage or a configuration setting.

Consider you have a WSDL and you create a SOAP client with the interface CustomerService.

What we like to get from a ‘consumer’ perspective is the following usage:

public class SomeFancyClass {
  private @Inject CustomerService customerService;
  ...
}

Which means you need a CDI producer method, e.g. something like:

@ApplicationScoped
public class CusomerServiceSoapClientProducer {
  @ConfigProperty(name = "myproject.customerService.endpointUrl")
  private String customerServiceEndpointUrl;

  @Produces
  @RequestScoped
  @LogTiming
  public CustomerService createSoapPort() {
    // generated from the WSDL, e.g. via CXF
    CustomerServiceService svc = new CustomerServiceService();
    CustomerServiceServicePort port = svc.getCustomerServiceServicePort();

    // this sets the endpoint URL during producing.
    ((BindingProvider) port).getRequestContext().
           put(BindingProvider.ENDPOINT_ADDRESS_PROPERTY, customerServiceEndpointUrl);

    return port;
  }
}

Side note: the whole class could also be @RequestScoped to get the endpoint URL evaluated on every request. We could of course also use the DeltaSpike ConfigResolver programmatically to gain the same. But the whole point of setting the endpoint URL manually is that we don’t need to change the WSDL and have to recompile the project on every server change. We can also use different endpoints for various boxes (test vs production environments, or different customers) that way.

What is this @LogTiming stuff?

Now it becomes interesting! We now have a SOAP client which looks like a regular CDI bean from a ‘user’ point of view. But we like to get more information about that outgoing call. After all it’s an external system and we have no clue how it behaves in terms of performance. That means we like to protocol each and every SOAP call and log out it’s duration. Of course since we not only have 1 SOAP service client but multiple dozen ones we like to do this via an Interceptor!

@Inherited
@InterceptorBinding
@Retention(RetentionPolicy.RUNTIME)
@Target({ElementType.METHOD, ElementType.TYPE})
public @interface LogTiming {
}

Applying an Interceptor on a producer method?

Of course the code written above DOES work. But it behaves totally different as many of you will guess.
If you apply an interceptor annotation to a producer method, then it will not intercept the calls to the produced bean!
Instead it will just intercept the invocation of the producer method. A producer method gets invoked when the Contextual Instance gets created. For a @Produces @RequestScoped annotated producer method this will happen the first time a method on the produced CDI bean gets called in the very request (or thread for non-servlet request based threads). And exactly this call gets intercepted.

If we would just apply a stopwatch to this interceptor then we would get the info about how long it took to create the soap client. That’s not what we want! We like to get the times from each and every usage of that CustomerService invocation! So what does our LogTiming interceptor do?

Proxying the Proxy

The trick we apply is to to use our LogTiming Interceptor to wrap the produced SOAP port in yet another proxy. And this proxy logs out the request times, etc. As explained before we cannot use CDI interceptors, but we can use java.lang.reflect.Proxy!:

@LogTiming
@Interceptor
public class WebserviceLoggingInterceptor {

    @AroundInvoke
    private Object wrapProxy(InvocationContext ic) throws Exception {
        Object producedInstance = ic.proceed();
        Class[] interfaces = producedInstance.getClass().getInterfaces();
        Class<?> returnType = ic.getMethod().getReturnType();
        return Proxy.newProxyInstance(ClassUtils.getClassLoader(null), interfaces, new LoggingInvocationHandler(producedInstance, returnType));
    }
}

This code will register our reflect Proxy in the CDI context and each time someone calls a method on the injected CustomerService it will hit the LogInvocationHandler. This handler btw can also do other neat stuff. It can pass over the logCorrelationId (explanation see my other blog post linked above) as HTTP header to the outgoing SOAP call.

The final LoggingInvocationHandler looks like the following:

public class LoggingInvocationHandler implements InvocationHandler {
    private static final long SLOW_CALL_THRESHOLD = 100; // ms
 
    private final Logger logger;
    private final T delegate;

    public LoggingInvocationHandler(T delegate, Class loggerClass) {
        this.delegate = delegate;
        this.logger = LoggerFactory.getLogger(loggerClass);
    }

    @Override
    public Object invoke(Object proxy, Method method, Object[] args) throws Throwable {
        if (EXCLUDED_METHODS.contains(method.getName())) {
            // don't log toString(), hashCode() etc...
            return method.invoke(delegate, args);
        }

        long start = System.currentTimeMillis();

        try {
            // setting log correlation header if any logCorrelationId is set on the thread.
            String logCorrelationId = LogCorrelationUtil.getCorrelationId();
            if (StringUtils.isNotEmpty(logCorrelationId) && delegate instanceof BindingProvider) {
                BindingProvider port = (BindingProvider) delegate;
                Map<String, List> headers = (Map<String, List>) port.getRequestContext().get(MessageContext.HTTP_REQUEST_HEADERS);
                if (headers == null) {
                    headers = new HashMap<>();
                }
                headers.put(LogCorrelationUtil.REQUEST_HEADER_CORRELATION_ID, Collections.singletonList(logCorrelationId));
                port.getRequestContext().put(MessageContext.HTTP_REQUEST_HEADERS, headers);
            }

            // continue with the real call
            return method.invoke(delegate, args);
        }
        finally {
            long duration = System.currentTimeMillis() - start;
            if (duration <= SLOW_CALL_THRESHOLD) {
                logger.info("soapRemoteCall took={} ms service={} method={}", duration, delegate.getClass().getName, method.getName());
            }
            else {
                // log a more detailed msg, e.g. with params
            }
        }
    }

Limitations

Of course this trick only works if the producer method returns an interface! That’s caused by the reflect Proxies are only available for pure interfaces.

I’m trying to remove this limitations by bringing intercepetors for produced instances to CDI-2.0 as well on working on a interceptors spec change to introduce ways to create subclassing proxies as easy as interface proxies. Stay tuned!

What is LogCorrelation?

While working on an article about CDI interceptors on producer methods I mention logCorrelation. I will not go into detail on this topic over at the other blog post as it would be simply too much over there. And this gives a great topic for a separate post anyway. And here we go…

So what is LogCorrelation?

Consider you have a more or less distributed application topology. You might have a server which does maintain customer data. There might be another box handling all the document archive, another one which holds the calculation kernel, etc etc.

Nowadays all people would say that are microservices. 8 years ago all people called it SOA. To be honest I totally don’t care how some sales people name it as all this is around since much longer than I’m working in the industry (which is a whoopie 25 years already). It’s just modular applications talking with each other somehow. Sometimes via SOAP or REST, but maybe even via MessageQueue, shared database tables or file based with batches handling the passing over – to me it doesn’t matter much.

But for all those the problem is the same: Consider a user clicks on some button in his browser or fat client. This triggers an ‘application action’. And this single action might hit the first server, then this server pings another one, etc. Synchronous or asynchronous also doesn’t matter. This might go all over the place in your company and even externally.Β  At the end something happens and the user most times gets some response. And it is really, REALLY hard to tell what’s wrong and where it went wrong if something doesn’t work as expected or returns wrong results. Often you don’t even have a clue which servers were involved. And if your whole application starts to behave ‘laggy’ then you will have a hard time judging which system you need to tinker with.

Now how cool would it be if you could follow this single action over all the involved servers?

And this is exactly what logCorrelation does!

What is the trick?

The trick is really plain simple. Each ‘action’ gets an own unique logCorrelationId. That might be a UUID for example. The only requirement is that it’s ‘decently’ unique.

If a server gets a request then he checks if there was a logCorrelationId passed to him. If so, then he takes this id and stores it in a ThreadLocal. If there was no id passed, then this is a ‘new’ action and we generate a fresh logCorrelationId. Of course this logCorrelationId will also get set as e.g. HTTP header for all subsequent outgoing HTTP calls on this very thread.

Where do I show the logCorrealationId?

Our applications now all handle this logCorrelationId properly, but where can I look at it? What is the benefit of all this?

At my customers I mainly use Apache log4j as logging backend, (often with slf4j as API). The point is that only log4j (and logback, but with way worse performance) support a nice little feature called MDC which stands for Mapped Diagnostic Context.Β  It is basically a ThreadLocal<Map<String, String>> which will get logged out in each and every line you log out on this very thread.

This log4j feature can also be accessed via the slf4j API. E.g. in a javax.servlet.Filter

MDC.set("correlationId", logCorrelationId);
MDC.set("sessionId", httpSessionId);
MDC.set("userId", loggedInUser.getId());

For enabling it in the log output you need to configure a ConversionPattern:

<log4j:configuration xmlns:log4j="http://jakarta.apache.org/log4j/">
    <appender name="console" class="org.apache.log4j.ConsoleAppender">
        <param name="Target" value="System.out"/>
        <layout class="org.apache.log4j.PatternLayout">
            <param name="ConversionPattern" value="%d{ISO8601} [%t] %X{sessionId} %X{userId} %X{correlationId} %-5p %c{2} %m%n"/>
        </layout>
    </appender>

If you logging is configured properly in your company and you funnel all back into log aggregation systems like ELK (OpenSource with commercial support offering) or Splunk (Commercial with limited free offering) then you can now simply follow a single action over all the various systems.

What about non-user requests?

Non user requests can sometimes even be filled with more information. At a few customers we use Camunda BPMN Suite (OpenSource with commercial support). The core has a Thread which basically polls the DB and fetches new tasks to execute from it. Those will then get ‘executed’ in a parallel thread. For those threads we intercept the Executor and fill the logCorrelationId with the camunda jobId which basically is a uuid starting with ‘cam-‘. So once a process task blows up we can exactly figure what went wrong – even on a different server.

Of course this trick is not limited to the process engine…

PS: how does my stuff look like?

Probably it’s also worth sharing my LogCorrelationUtil:

/**
 * Helper for log correlation.
 *
 * @author Mark Struberg
 */
public class LogCorrelationUtil {

    public static final String REQUEST_HEADER_CORRELATION_ID = "X_LOG_CORRELATION_ID";
    public static final String MDC_CORRELATION_ID = "correlationId";

    private LogCorrelationUtil() {
    }

    /**
     * Creates a new log correlation Id ONLY IF there is no existing one!.
     * Takes about 4 uS, because I use a much faster UUID algo
     *
     * @param logEnvironment prefix for the logCorrelationId if a new one has to be created. Determines the environment the uuid got created in.
     * @param existingLogCorrelationId oder {@code null} if there is none yet
     */
    public static String enforceLogCorrelationId(LogEnvironment logEnvironment, String existingLogCorrelationId) {
        if (existingLogCorrelationId != null && existingLogCorrelationId.length() > 0) {
            return existingLogCorrelationId;
        }
        ThreadLocalRandom random = ThreadLocalRandom.current();
        String uuid = new UUID(random.nextLong(), random.nextLong()).toString();

        if (logEnvironment != null) {
            StringBuilder sb = new StringBuilder(60);
            sb.append(logEnvironment);
            sb.append(uuid);
            uuid = sb.toString();
        }
        return uuid;
    }

    /**
     * @return the logCorrelationId for this thread or {@code null}
     */
    public static final String getCorrelationId() {
        return MDC.get(MDC_CORRELATION_ID);
    }

    /**
     * Set the given logCorrelationId for the current Thread.
     */
    public static final void setCorrelationId(String logCorrelationId) {
        MDC.put(MDC_CORRELATION_ID, logCorrelationId);
    }

    /**
     * Clears the logCorrelationId from the current Thread.
     * This method MUST be called at the end of each request 
     * to prevent mem leaks!
     */
    public static final void clearCorrelationId() {
        MDC.remove(MDC_CORRELATION_ID);
    }
}

The (mostly) unknown story behind javax.ejb.EJBException

Yesterday I blogged about what impact Exceptions do have on JavaEE transactions in EJB.
But there is another funny EJB Exception mechanism waiting for you to get discovered – the javax.ejb.EJBException.

This handling dates back to the times when EJB was mainly intended to be a competition to NeXT Distributed Objects and Microsoft DCOM. Back in those days it was all about ‘Client-Server Architecture’ and people tried to spread the load to different servers on the network. A single EJB back then needed 4 classes and was inherently remote by default.

Only much later EJBs got a ‘local’ behaviour as well. And only in EJB-3.1 the No-Interface View (NIV) got introduced which made interfaces obsolete and are local-only.
But for a very long time remoting was THE default operation mode of EJBs. So all the behaviour was tailored around this – regardless whether you are really using remoting or are running in the very same JVM.

The impact of remoting

The tricky situation with remote calls is that you cannot be sure that every class is available on the client.

Imagine a server which uses JPA. This might throw a javax.persistence.EntityNotFoundException. But what if the caller – a Swing EJB client app – doesn’t have any JPA classes on it’s classpath?
This will end up in a ClassNotFoundException or NoClassDefFoundException because de-serialisation of the EntityNotFoundException will blow up on the client side.

To avoid this from happening the server will serialize a javax.ejb.EJBException instead of the originally thrown Exception in certain cases. The EJBException will contain the original Exceptions stack trace as pure Strings. So you at least have the information about what was going wrong in a human readable format.

If you like to read up the very details then check out 9.4 Client’s View of Exceptions in the EJB specification.

Switching on the ‘Auto Pilot”

Some containers like e.g. OpenEJB/TomEE contain a dual-strategy. We have a ‘container’ (ThrowableArtifact) which wraps the orignal Throwable plus the String interpretation and sends both informations as fallback over the line.

On the client side the de-serialization logic of ThrowableArtifact first tries to de-serialize the original Exception. Whenever this is possible you will get the originally thrown Exception on the client side. If this didn’t work then we will use the passed information and instead of the original Exception we throw an EJBException with the meta information as Strings.

The impact on your program?

The main impact for you as programmer is that you need to know that you probably not only need to catch the original Exception but also an EJBException. So this really changes the way your code needs to be written.
And of course if you only got the EJBException then you do not exactly know what was really going on. If you need to react on different Exceptions in different ways then you might try to look it up in the exception message but you have no type-safe way anymore. In that case it might be better to catch it on the server side and send an explicit @ApplicationException over the line.

When do I get the EJBException and when do I get the original one?

I’d be happy to have a good answer myself πŸ˜‰

My experience so far is that it is not well enough specified when each of them gets thrown. But there are some certain course grained groups of container behaviour:

  • Container with Auto-Pilot mode; Those containers will always try to give you the original Exception. And only if it is really not technically possible will give you an EJBException. E.g. TomEE works that way.
  • Container who use the original Exception for ‘local’ calls and EJBException for remote calls.
  • Container who will always give you an EJBException – even for local invocations. I have not seen those for quite some time though. Not sure if this is still state of the art?

Any feedback about which container behaves what way is welcome. And obviously also if you think there is another category!

Transaction handling in EJBs and JavaEE7 @Transactional

Handling transactions in EJBs is easy, right? Well, in theory it should be. But how does the theory translate into reality once you leave the ivory tower?

I’ll show you a small example. Let’s assume we have 2 infrastructure service EJBs:

@Stateless
public class StorageServiceImpl implements StorageService {
  private @EJB CustomerService customerService;
  private @PersistenceContext EntityManager em;

  public void chargeStorage(int forYear) throws CustomerNotFoundException {
    storeNiceLetterInDb(em);
    Customer c = customerService.getCurrentCustomer(); 
    doSomethingElseInDB(); 
  }
} 

And now for the CustomerService which is an EJB as well:

@Stateless
public class CustomerServiceImpl implements CustomerService {
  public Customer getCurrentCustomer() throws CustomerNotFoundException {
    // do something if there is a current customer
    // otherwise throw a CustomerNotFoundException
  }
}

The Sunshine Case

Let’s first look at what happens if no problems occur at runtime.

In the normal operation mode some e.g. JSF backing bean will call storageService.chargeService(2015);. The implicit transaction interceptor will use a TransactionManager (all done in the interceptor which you do not see in your code) to check whether a Transaction is already open. If not it will open a new transaction and remember this fact. The same check will happen in the implicit transaction interceptor for the CustomerService.

When leaving CustomerService#getCurrentCustomer the interceptor will recognize that it didn’t open the transaction and thus will simply return. Otoh when leaving StorageService#chargeStorage it’s interceptor will commit the transaction and close the EntityManager.

Broken?: Handling checked Exceptions

Once we leave the sunny side of the street and hit some problems the whole handling start to become messy. Let’s look what happens if there is a checked CustomerNotFoundException thrown in CustomerService#getCurrentCustomer. Most people will now find their first surprise: The database changes done in storeNiceLetterInDb() will get committed into the database.

So we got an Exception but the transaction still got committed? WT*piep*!
Too bad that this is not a bug but the behaviour is exactly as specified in “9.2.1 Application Exceptions” of the EJB specification:

An application exception does not automatically result in marking the transaction for rollback unless the ApplicationException annotation is applied to the exception class and is specified with the rollback element value true…

So this means we could annotate the CustomerNotFoundException with @javax.ejb.ApplicationException(rollback=true) to force a rollback.
And of course we need to do this for ALL checked exceptions if we like to get a rollback.

Broken?: Handling unchecked Exceptions

The good news upfront: unchecked Exceptions (RuntimeExceptions) will usually cause a rollack of your transaction (unless annotated as @AppliationException(rollback=false) of course).

Let’s assume there is some other entity lookup in the code and we get a javax.persistence.EntityNotFoundException if the address of the customer couldn’t be found. This will rollback your transaction.

But what can we do if this is kind of expected and you just like to use a default address in that case? The natural solution would be to simply catch this Exception in the calling method. In our case that would be a try/catch block in StorageServiceImpl#chargeStorage.

That’s a great idea – but it doesn’t work in many containers!

Some containers interpret the spec pretty strictly and do the Exception check on _every_ layer (EJB spec 9.3.6) . And if the interceptor in the CustomerService detects an Exception then the implicit EJB interceptor will simply roll back the whole transaction and mark it as “rollbackOnly”. Catching this Exception in an outer level doesn’t help a bit. You will not get your changes into the database. And if you try to do even more on the database then you will blow up again with something like “The connection was already marked for rollback”.

And how is that with @javax.transaction.Transactional?

Basically the same like with EJBs. In my opinion this was a missed chance to clean up this behaviour.
You can read this up in chapter 3.6 of the JTA-1.2 specification.

The main difference is how to demarcate rollback vs commit exceptions. You can use the rollbackOn and dontRollbackOn attributes of @Transactional:

@Transactional(rollbackOn={SQLException.class}, dontRollbackOn={SQLWarning.class})

Now what about DeltaSpike @Transactional?

In Apache DeltaSpike @Transactional and it’s predecessor Apache MyFaces CODI @Transactional we have a much cleaner handling:

Exceptions only get handled on the layer where the transaction got opened. If you catch an Exception along the way than we do not care about it.

Any Exception on the outermost layer will cause a rollback of your transaction. It doesn’t matter if it is a RuntimeException or a checked Exception.

If there was no Exception in the outermost interceptor then we will commit the transaction.

PS: please note that I explicitly used interfaces in my samples. Otherwise you will get NIV (No Interface View) objects which again might behave slightly different as they use a totally different proxying technique and default behaviour. But that might be enough material for yet another own blog post.
PPS: I also spared you EJBs with TransactionManagementType.BEAN. That one is also pretty much non-portable by design as you effectively cannot nest them as it forces you to either commit or rollback the tx on every layer. Some containers work fine while others really force this.

The right Scope for JBatch Artifacts

In my recent JavaLand conference talk about JSR-352 JBatch and Apache BatchEE I briefly mentioned that JBatch Artifacts should have a scope of @Dependent (or Prototype scope if you are using Spring). Too bad there was not enough time to dig into the problem in depth so here comes the detailed explanation.

What is a JBatch Artifact

A JBatch batch needs a Job Specification Language XML file in META-INF/batch-jobs/*.xml files. These files describes how your batch job is built up.

Here is a small example of how such a batch JSL file could look like

<job id="mysamplebatch" version="1.0" xmlns="http://xmlns.jcp.org/xml/ns/javaee">
  <step id="mysample-step">
    <listeners>
      <listener ref="batchUserListener" >
      <properties>
        <property name="batchUser" value="#{batchUser}"/>
      </properties>
      </listener>
    </listeners>
    <batchlet ref="myWorkerBatchlet">
      <properties>
        <property name="inputFile" value="#{inputFile}"/>
      </properties>
    </batchlet>
  </step>

In JSR-352 an Artifact are all pieces which are defined in your JBatch JSL file and get requested by the container. In the sample above this would be batchUserListener and myWorkerBatchlet.

The following types can be referenced as Batch Artifacts from within your JSL:

  • Batchlets
  • ItemReader
  • ItemProcessor
  • ItemWriter
  • JobListener
  • StepListener
  • CheckpointAlgorithm
  • Decider
  • PartitionMapper
  • PartitionReducer
  • PartitionAnalyzer
  • PartitionCollector

The Batch Artifact Lifecycle

The JBatch spec is actually pretty clear what lifecycle needs to get applied on Batch Artifacts:

11.1 Batch Artifact Lifecycle
All batch artifacts are instantiated prior to their use in the scope in which they are declared in the Job XML and are valid for the life of their containing scope. There are three scopes that pertain to artifact lifecycle: job, step, and step-partition.
One artifact per Job XML reference is instantiated. In the case of a partitioned step, one artifact per Job XML reference per partition is instantiated. This means job level artifacts are valid for the life of the job. Step level artifacts are valid for the life of the step. Step level artifacts in a partition are valid for the life of the partition.
No artifact instance may be shared across concurrent scopes. The same instance must be used in the applicable scope for a specific Job XML reference.

The problem is that whenever you use a JavaEE artifact then you might get only a proxy. Of course the reference to this proxy gets thrown away correctly but the instance behind the proxy might survive. Let’s look at how this works internally.

How Batch Artifacts get resolved

A JBatch implementation can provide it’s own mechanism to load the artifacts. This is needed as it is obviously different whether you use BatchEE with CDI or if you use Spring Batch (which also implements JSR-352).
In general there are 3 different ways you can reference a Batch Artifact in your JSL:

  1. Via a declaration in an optional META-INF/batch.xml file. See the section 10.7.1 of the specification for further information.
  2. Via it’s fully qualified class name.
    In BatchEE we first try to get the class via BeanManager#getBeans(Type) and BeanManager#getReference. If that doesn’t give us the desired Contextual Reference (CDI proxy) then we simply call ClassLoader#loadClass create the Batch Artifact with newInstance() and apply injection into this instance
  3. Via it’s EL name. More precisely we use BeanManager#getBeans(String elName) plus a call to BeanManager#getReference() as shown above.

We now know what a Batch Artifact is. Whenever you are on a JavaEE Server you will most likely end up with a CDI or EJB Bean which got resolved via the BeanManager. If you are using Spring-Batch then you will most times get a nicely filled Spring bean.

The right Scope for a Batch Artifact

I’ve seen the usage of @javax.ejb.Stateless on Batch Artifacts in many samples. I guess the people writing such samples never used JBatch in real production yet πŸ˜‰ Why so? Well, let’s look at what would happen if we implement our StepListener as stateless EJB:

@javax.ejb.Stateless
@javax.inject.Named // this makes it available for EL lookup
public class BatchUserListener implements StepListener {
  @Inject 
  @BatchProperty
  private String batchUser;

  @Override
  public void beforeStep() throws Exception {
     setUserForThread(   
  }

  @Override
  public void afterStep() throws Exception {
    clearUserForThread();
  }
}

Now let’s assume that the BatchUserListener gets not only used in my sample batch but in 30 other batches of my company (this ‘sample’ is actually taken from a HUGE real world project where we use Apache BatchEE since over a year now).

What will happen if e.g. a ‘DocumentImport’ batch runs before my sample batch? The first batch who uses this StepListener will create the instance. At the time when the instance gets created by the container (and ONLY at that time) it will also perform all the injection. That means it will look up the ‘batchUser’ parameter and injects it into my @BatchProperty String. Let’s assume this DocumentImport batch uses a ‘documentImportUser’. So this is what we will get injected into the ‘batchUser’ variable;

Once the batch step is done the @Stateless instance might be put back into some pool cache. And if I’m rather unlucky then exactly this very instance will later get re-used for mysample-step. But since the listener already exists there will no injection be performed on that instance. What means that the steplistener STILL contains the ‘documentImportUser’ and not the ‘mySampleUser’ which I explicitly did set as parameter of my batch.

The very same issue also will happen for all injected Variables which do not use proxies, e.g.:

  • @Inject StepContext
  • @Inject JobContext

TL;DR: The Solution

Use @Dependent scoped beans for your Batch Artifacts and only use another scope if you really know what you are doing.

If you like to share code across different items of a Step or a Job then you can also use BatchEE’s @StepScoped and @JobScoped which is available through a portable BatchEE CDI module

CDI in EARs

Foreword

I was banging my head against the wall for the last few days when trying to solve a few tricky issues we saw with EAR support over at Apache DeltaSpike. I’m writing this up to clean my mind and to share my knowledge with other EAR-pig-wrestlers…

The EAR ClassLoader dilemma

EARs are a constant pain when it comes to portability amongst servers. This has to do with the fact that JavaEE still doesn’t define any standards for visibility. There is no clear rule about how the ClassLoaders, isolation and visibility has to be set up. There is just as single paragraph (JavaEE7 spec 8.3.1) about which classes you might probably see.

There are 2 standard ClassLoader setups we see frequently in EARs.
For the sake of further discussion we assume an EAR with the following structure:

sample.ear
β”œβ”€β”€ some-ejb.jar
β”œβ”€β”€ lib
β”‚   β”œβ”€β”€ some-shared.jar
β”‚   └── another-shared.jar
β”œβ”€β”€ war1.war
β”‚Β Β  └── WEB-INF
β”‚       β”œβ”€β”€ classes 
β”‚       └── lib
β”‚           └── war1-lib.jar
└── war2.war
    └── WEB-INF
        β”œβ”€β”€ classes 
        └── lib
            └── war2-lib.jar

Flat ClassLoader

The whole EAR is served by just a single flat ClassLoader. If you have 2 WARs inside your application then the classes in each of it can see each other. And also the classes in the shared EAR lib folder can see. This is e.g. used in JBoss-4 (though not from the beginning). You can also configure most of the other containers to use this setup for your EAR. But nowadays it’s hardly a default anymore (and boy is that good!)

Hierarchic ClassLoader

This is the setup used by most containers these days – but it’s still not a mandated by the spec! The Container will provide a shared EAR Application ClassLoader which itself only contains the shared ear libs. This is the parent ClassLoader of all the ClassLoaders in the EAR. Each WAR and ejb-jar inside your EAR will get an own child WebAppClassLoader.

This means war2 doesn’t see any classes or resources from war1 and the other way around. It further means that the shared libs do not see the classes of war1 nor war2, etc!

If you need some caches in your shared libs, then you need to rely on the ThreadContextClassLoader (TCCL) as outlined in JavaEE 7 paragraph 8.2.5. While this section is about “Dynamic Class Loading” it also is valid for caches and storing other dynamic information in static variables. Otherwise you end up mixing values from war1 and probably re-use them in war2 (where you will get a ClassNotFound exception). Even if your 2 WARs contain the same jar (e.g. commons-lang.jar) the Class instances are different as they come from a different ClassLoader. If you store those in a shared-lib jar then you most probably end up with the (in)famous “Cannot cast class Xxx to class Xxx”.

One common solution to this problem will look something like:

public class MyInfoStore {
  private Map<ClassLoader, Set<Info>> infoMap;

  public void storeInfo(Info info) {
    ClassLoader tccl = Thread.currentThread().getContextClassLoader(); // probably guarded for SecurityManager
    infoMap.put(tccl, info);
  }
...
}

Of course you must really be careful to clean up this Map during shutdown. In CDI Applications you can use the @Observes BeforeShutdown event to trigger the cleanup.

The impact on us programmers

These various scenarios make it really hard to write any form of portable framework who runs fine inside of EARs. This is not only true for client frameworks but also for container frameworks itself like CDI and Spring.

Integrating CDI containers in EARs

It is pretty obvious that – mostly due to the lack of a guaranteed default isolation scenario in JavaEE – CDI containers have a hard time in finding a nice and portable handling of CDI beans and Extensions in EARs. I got involved in CDI in late 2008 when the name of the spec still was WebBeans. And that name was taken literally – it originally was only targetting web applications and not EARs. The EAR support only got roughly added (according to some interpretations) shortly before the EE-6 specification got published. So there are multiple reasons why CDI in EARs is not really a first class citizen yet.

To my knowledge there are 3 sane ways how a container can integrate CDI in ears. And of those 3 sane ways, 5 are used in various containers πŸ˜‰

A.) 1 BeanManager per EAR

All the EAR is handled via 1 BeanManager. This is the way JBoss WildFly seems to handle things. First the BeanManager gets created and all it’s CDI Extensions get loaded (TODO research: all or only the ones from the shared libs?).

In reality it’s a bit more complicated. Weld uses a single BeanManager for each and every JAR in your EAR. Don’t ask me why, but that’s what I’ve seen. Still you only get one set of Extensions for your whole EAR. Keep this in mind.

The TCCL always seems to stay the shared EAR ApplicationClassLoader during boot time. Even while processing the classes of the WAR files. Thus the TCCL during @Observes ProcessAnnotatedType will be the shared EAR ApplicationClassLoader. But if you access your Extensions (or the information collected by them) later at runtime you might end up with a different TCCL. This is especially true for all servlet requests in your WARs. This makes it really hard to store anything with the usual Map<ClassLoader, Set<Info>> trick.

B.) 1 BeanManager per WAR + 1 ‘shared’ BeanManager

Each WAR fully boots up his own BeanManager. The shared EAR libs also get an own BeanManager to serve non-servlet requests like JMS and remote EJB invocations. Classes in shared EAR-libs simply get discovered over again for each WAR. Each WAR also has it’s own set of Extensions (they are usually 1:1 to the BeanManager). This is what I’ve seen in Apache Geronimo, IBM WebSphere and early Apache TomEE containers.

This is a bit tricky when it comes to handling of the @ApplicationScoped context (see CDI-129 for more information) or if you handle EJBs (which are by nature shared across the EAR). WebSphere e.g. seems to solve this by having an own Bean<SomeSharedEarLibClass> instance for each BeanManager (and thus for each WAR) but they share a single ApplicationContext storage and those beans are equals(). Probably they just compare the PassivationCapable#getPassivationId()?

It usually works fairly nice and it allows the usage of most common programming patterns. This also allows to modify classes from shared libs via Extensions you register in a single WAR. The downside is that you have to scan and process all the shared classes over and over again for each WAR. This obviously slows down the application boot time.

Oh this mode strictly seen also conflicts with the requirements for modularity of section 5 of the CDI specification. Overall I’m not very happy with section 5 but it should be mentioned.

C.) Hierarchic BeanManagers

In this case we have an 1:1 relation between a ClassLoader and the BeanManager. The shared EAR libs will get scanned by an EAR-BeanManager. Then each of the WARs get scanned via their own WebAppBeanManager. In contrast to scenario B. these WebAppBaenManagers will only scan the classes of the local WARs WEB-INF/lib and WEB-INF/classes and not the shared ear lib over again. If you need information from shared EAR lib classes then the WebAppBeanManger simply delegates to it’s ‘parent’ EAR-BeanManager. Apache TomEE uses this mode since 1.7.x

There are a few tricks the container vendor need to do in this case. E.g. propagating events and Bean lookups to the parent BeanManager, etc. Or to suppress sending CDI Extension Events to the parent (we recently learned this the hard way – is fixed in TomEE-1.7.2 which is to be released soon).

It also has an important impact on CDI Extension programmers: As your Extensions are 1:1 to the BeanManager we now also have the ClassPath split up into different Extension instances. This works fine for ProcessAnnotatedType but can be tricky in some edge cases. E.g. the DeltaSpike MessageBundleException did collect info about a certain ProducerBean and stored in in the Extension for later usage in @Observes AfterBeanDiscovery. Too bad that in my case this certain ProducerMethod was in a shared ear lib and thus gets picked up by the Extension-instance of the EAR-BeanManager but the ‘consumer’ (the interface annotated with @MessageBundle is in some WARs. And the WebAppBeanManager of this WAR obviously is not the one scanning the ProducerMethod of the class in the shared ear lib. Thus the Extension created a NullPointerException. This will be fixed in the upcoming Apache DeltaSpike-1.2.2 release.

The Impact on poor CDI Extension programmers

TODO: this is a living document. I’ll add more info and put it up review.

The tricky CDI ServiceProvider pattern

CDI is really cool, and works great. But it’s not the hammer for all your problems!

I’ve seen the following a lot of times and I confess that I also used it in the first place: The CDI ServiceProvider pattern. But first let’s clarify what it is:

The ServiceProvider pattern

Consider you have a bunch of plugable modules and all of them might add an implementation for a certain Interface:

// this is just a marker interface for all my plugins
public interface MyPlugin{} 

The idea is that extensions can later easily implement this marker interface and the system will automatically pick up all the implementations accordingly.

public class PluginA implements MyPlugin {
  ...
}

public class PluginB implements MyPlugin {
  ...
}

The known way

I’ve seen lots of blogs recommending the following solution:

public static  List getContextualReferences(Class type, Annotation... qualifiers)
{
    BeanManager beanManager = getBeanManager();

    Set<Bean> beans = beanManager.getBeans(type, qualifiers);

    List result = new ArrayList(beans.size());

    for (Bean bean : beans)
    {
        Bean<?> bean = beanManager.resolve(beans);
        CreationalContext<?> creationalContext = beanManager.createCreationalContext(bean);
        result.add(beanManager.getReference(bean, type, creationalContext););
    }
    return result;
}

The original implementation can be found in the Apache DeltaSpike BeanProvider (and in Apache MyFaces CODI before that).

This worked really well for some time. But suddenly a few surprises happened.

What’s the problem with getBeans()?

One day we got a bug report in Apache OpenWebBeans that OWB is broken because the pattern shown above doesn’t work: JIRA issue OWB-658

What we found out is that the whole pattern doesn’t work anymore once one single plugin is defined as @Alternative

I digged into the issue and became aware that this is not an OWB issue but BeanManager#getBeans() is just not made for this usage! So let’s have a look about what the CDI spec says:

11.3.4:
The method BeanManager.getBeans() returns the set of beans which have the given required type and qualifiers and are available for injection

The important part here is the difference between “give me all beans of type X” and “give me all beans which can be used for an InjectionPoint of type X”. Those 2 are fundamentally different, because in the later case all other beans are no candidates for injection anymore if a single @Alternative annotated bean gets spotted. OWB did it just right!

Possible Solution 1

Don’t use @Alternative on any of your plugin classes πŸ˜‰

That sounds a bit ridiculous, but at least it works.

Possible Solution 2

You could create a Message which collects all your plugins in a CDI-Extension-like way.
First we need a data object to store all the collected plugins.

public class PluginDetection {
  private List plugins = new ArrayList();

  public void addPlugin(MyPlugin plugin) {
    plugins.add(plugin);
  }

  public List getPlugins() {
    return plugins;
  }
}

Then we gonna fire this around:

private @Inject Event pdEvent;

public List detectPlugins() {
  PluginDetection pd = new PluginDetection();
  pdEvent.fire(pd);
  return pd.getPlugins();
}

The plugins just need to Observe this event and register themself:

public void register(@Observes PluginDetection pd) {
  pd.add(this);
}

Since the default for @Observes is Reception.ALWAYS, the contextual instance will automatically get created if it doesn’t yet exist.

But there is also a small issue with this approach: There is no way to disable/override a plugin with @Alternative anymore, since the messaging doesn’t do any bean resolving at all.

Using JPA in real projects (part 1)

Is JPA only good for samples?

This question is of course a bit provocative. But if you look at all the JPA samples out in the wild, then none of them can be applied to real world projecs without fundamental changes.

This post tries to cover a few JPA aspects as well as showing off some maven-foo from a big real world project. I am personally using Apache OpenJPA because it works well and I’m a committer on the project (which means I can immediately fix bugs if I hit one). I will try to motivate my friends from JBoss to provide a parallel guide for Hibernate and maybe we even find some Glassfish/EclipseLink geek.

One of the most fundamental differences between the different JPA providers is where they store the state information for the loaded entities. OpenJPA stores this info directly in the entities (EclipseLink as well afaik) and Hibernate stores it in the EntityManager and sometimes in the 1:n proxies used for lazy loading (if no weaving is used). All this is not defined in the spec but product specific behaviour. Please always keep this in mind when applying JPA techiques to another JPA provider.

I’ll have to split this article in 2 parts, otherwise it would be too much for a good read. Todays part will focus on the general project setup, the 2nd one will cover some coding practices usable for JPA based projects.

The Project Infrastructure and Setup

A general note on my project structure: my project is not a sample but fairly big (40k users, 5mio page hits, 600++ JSF pages) and consists of 10++ WebApps with each of them having their own backend (JPA + db + businesslogic), frontend (JSF and backing beans) and api (remote APIs) JARs. Thus I have all my shared configuration in myprj/parent/fe myprf/parent/be and myprj/parent/api maven modules, containing the pom.xml pointed to as <parent> by all backends, frontends resp apis.

β”œβ”€β”€ parent
β”‚Β Β  β”œβ”€β”€ api
β”‚Β Β  β”œβ”€β”€ be (<- here I keep all my shared backend configuration) 
β”‚Β Β  β”œβ”€β”€ fe
β”œβ”€β”€ webapp1
β”‚Β Β  β”œβ”€β”€ api
β”‚Β Β  β”œβ”€β”€ be (referencing ../../parent/be/pom.xml)
β”‚Β Β  └── fe
β”œβ”€β”€ webapp2
β”‚Β Β  β”œβ”€β”€ api
β”‚Β Β  β”œβ”€β”€ be (referencing ../../parent/be/pom.xml)
β”‚Β Β  └── fe
...

Backend Unit Test Setup

1. All my backend unit tests use testng and really do hit the database! A business process test which doesn’t touch the database is worth nothing imo…
We are using a local MySQL installation for the tests and use an Apache Maven Profile for switching to other databases like Oracle and PostgreSQL (which we both use in production).

2. We have a special testng test-group called createData which we can @Test(dependsOnGroups="createData"). Or we just use the @Test(dependsOnMethods="myTestMethodCreatingTheData").
That way we have all tests which create some pretty complex set of test-data running first. All tests which need this data as base for their own work will run afterwards.

3. Each test must be re-runnable and cleanup his own mess in @BeforeClass. We use BeforeClass because this also works if you kill your test in the debugger. Nice goodie: you also can check the produced data in the database later on. Too bad that there is no easy way to automatically proove this. The best bet is to make all your colleagues aware of it and tell them that they have to throw the next party if they introduce a broken or un-repeatable test πŸ˜‰

The Enhancement Question

I’ve outlined the details and pitfalls of JPA enhancement in a previous post.
I’m a big fan of build-time-enhancement because it a.) works nicely with OpenJPA and b.) my testng unit tests run much faster (because I only enhance those entities once). I also like the fact that I know exactly what will run on the server and my unit tests will hit side effects early on. In a big project you’ll hit enhancement and state side effects which let your app act differently in unit test and on the EE server more often than you’ll guess.
Of course, this might differ if you use another JPA provider.

For enabling build-time-enhancement with OpenJPA I have the following in my parent-be.pom.

    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.openjpa</groupId>
                <artifactId>openjpa-maven-plugin</artifactId>
                <version>${openjpa.version}</version>
                <configuration>
                    <includes>
                        ${jpa-includes}
                    </includes>
                    <excludes>
                        ${jpa-excludes}
                    </excludes>
                    <addDefaultConstructor>true</addDefaultConstructor>
                    <enforcePropertyRestrictions>true</enforcePropertyRestrictions>
                    <sqlAction>${openjpa.sql.action}</sqlAction>
                    <sqlFile>${project.build.directory}/database.sql</sqlFile>
                    <connectionDriverName>com.mchange.v2.c3p0.ComboPooledDataSource</connectionDriverName>
                    <connectionProperties>
                        driverClass=${database.driver.name},
                        jdbcUrl=${database.connection.url},
                        user=${database.user},
                        password=${database.password},
                        minPoolSize=5,
                        acquireRetryAttempts=3,
                        maxPoolSize=20
                    </connectionProperties>
                </configuration>
                <executions>
                    <execution>
                        <id>mappingtool</id>
                        <phase>process-classes</phase>
                        <goals>
                            <goal>enhance</goal>
                        </goals>
                    </execution>
                </executions>
                <dependencies>
                    <dependency>
                        <groupId>log4j</groupId>
                        <artifactId>log4j</artifactId>
                        <version>1.2.12</version>
                    </dependency>
                    <dependency>
                        <!-- 
                          otherwise you get ClassNotFoundExceptions during 
                          the code coverage report run
                        -->
                        <groupId>net.sourceforge.cobertura</groupId>
                        <artifactId>cobertura</artifactId>
                        <version>1.9.2</version>
                    </dependency>
                    <dependency>
                        <groupId>c3p0</groupId>
                        <artifactId>c3p0</artifactId>
                        <version>${c3p0.version}</version>
                    </dependency>
                    <dependency>
                        <groupId>mysql</groupId>
                        <artifactId>mysql-connector-java</artifactId>
                        <version>${mysql-connector.version}</version>
                    </dependency>
                    <dependency>
                        <groupId>com.oracle</groupId>
                        <artifactId>ojdbc14</artifactId>
                        <version>${ojdbc.version}</version>
                    </dependency>
                    <dependency>
                        <groupId>postgresql</groupId>
                        <artifactId>postgresql</artifactId>
                        <version>${postrgresql-jdbc.version}</version>
                    </dependency>
                </dependencies>
            </plugin>

You might have spotted a few maven properties which I later define in each projects pom. That way I can keep my common configuration generic and still have a way to tweak the behaviour for each sub-project. Again a nice benefit: You can easily use mvn -Dsomeproperty=anothervalue to tweak those settings on the commandline.

  • ${jpa-includes} for defining the comma separated list of classes which should get enhanced, e.g. "mycomp/project/modulea/backend/*.class,mycomp/project/modulea/backend/otherstuff/*.class
  • ${jpa-exludes} the opposite to jpa-includes
  • openjpa.sql.action to define what should be done during DB schema creation. This can be build for always create the whole DB schema (CREATE TABLES), or refresh for generating only ALTER TABLE statements for the changes. I’ll come back to this later.
  • ${database.driver.name} and credentials properties are used to be able to run the schema creation against Oracle, MySQL and PostgreSQL (switched via maven profiles).

Creating the Database

For doing tests with a real database we of course need to create the schema first. We do NOT let JPA do any automatic database schema changes on JPA-startup. Doing so might unrecoverably trash your production database, so it’s always turned off!

Instead we trigger the SQL schema creation process via the Apache OpenJPA openjpa-maven-plugin manually (for the configuration see below):

$> mvn openjpa:sql

Then we check the generated SQL in target/database.sql and copy it to the structure we have in each of our backend projects:

webapp1/be/src/main/sql/
β”œβ”€β”€ mysql
β”‚Β Β  β”œβ”€β”€ createdb.sql
β”‚Β Β  β”œβ”€β”€ createindex.sql
β”‚Β Β  β”œβ”€β”€ database.sql
β”‚Β Β  └── schema_delta.sql
β”œβ”€β”€ oracle
β”‚Β Β  β”œβ”€β”€ createdb.sql
β”‚Β Β  β”œβ”€β”€ createindex.sql
β”‚Β Β  β”œβ”€β”€ database.sql
β”‚Β Β  └── schema_delta.sql
└── postgres
    β”œβ”€β”€ createdb.sql
    β”œβ”€β”€ createindex.sql
    β”œβ”€β”€ database.sql
    └── schema_delta.sql

The following files are involved in the db setup:

createdb.sql

This file creates the database itself. It is optional as not every database supports to create a whole database. In MySQL we just do the following

DROP DATABASE if exists ProjextXDatabase
CREATE DATABASE ProjextXDatabase CHARACTER SET utf8;
USE ProjextXDatabase;

In Oracle this is not that easy. It’s a major pain to drop and then setup a whole data store. A major problem is that you cannot easily access a datastore which doesnt exist anymore via Oracles JDBC driver. Instead, we just drop all the tables.:

DROP TABLE MyTable CASCADE constraints PURGE;
DROP TABLE AnotherTable CASCADE constraints PURGE;
...

If you have a better idea, then please speak up πŸ˜‰

database.sql

This is the exact 1:1 DDL/Schema file we generated via the JPA (in my case via the openjpa-maven-plugins mvn openjpa:sql mentioned above). It is simply copied over from target/database.sql but the content remains unchanged. It runs after the createdb.sql file.

createindex.sql

This file contains the initial index tweaks which were not generated in the DDL. In Oracle and PostgreSQL this file e.g. contains all the indices on foreign keys, because OpenJPA doesn’t generate them (I remember that Hibernate does, correct?). In MySQL we don’t need those because MySQL automatically adds indices for foreign keys itself.

But this is of course a good place to add all the performance tuning stuff you ever wanted πŸ˜‰

schema_delta.sql

This one is really a goldie! Once a project goes into production we do not generate full databae schemas anymore! Instead we switch the openjpa-maven-plugin to the refresh mode. In this mode OpenJPA will compare the entities with the state of the configured database and only generate ALTER TABLE and similar statements for the changes in target/database.sql. This works surprisingly good!

We then review the generated schema changes and append the content to src/main/sql/[dbvendor]/schema_delta.sql. Of course we also add clean comments about the product revision in which the change got made. That way an administrator just picks the n last entries from this file and is easily able to bring the production database to the last revision.

Doing this step manually is very important! From time to time there are changes (renaming a column for example) which cannot be handled by the generated DDL. Such changes or small migration updates need to be maintained manually.

How to create the DB for my tests?

This one is pretty easy if you know the trick: We just make use of the sql-maven-plugin.

Here is the configuration I use in my project:

    <profiles>
        <!-- Default profile for surefire with MySQL: creates database, imports testdata and runs all unit tests -->
        <profile>
            <id>default</id>
            <activation>
                <activeByDefault>true</activeByDefault>
            </activation>
            <build>
                <plugins>
                    <plugin>
                        <groupId>org.codehaus.mojo</groupId>
                        <artifactId>sql-maven-plugin</artifactId>
                        <configuration>
                            <driver>com.mysql.jdbc.Driver</driver>
                            <url>jdbc:mysql://localhost/</url>
                            <username>root</username>
                            <password/>
                            <escapeProcessing>false</escapeProcessing>

                            <srcFiles>
                                <srcFile>src/main/sql/mysql/createdb.sql</srcFile>
                                <srcFile>src/main/sql/mysql/database.sql</srcFile>
                                <srcFile>src/main/sql/mysql/schema_delta.sql</srcFile>
                                <srcFile>src/main/sql/mysql/createindex.sql</srcFile>
                                <srcFile>src/test/sql/mysql/testdata.sql</srcFile>
                            </srcFiles>
                        </configuration>

                        <executions>
                            <execution>
                                <id>setup-test-database</id>
                                <phase>process-test-resources</phase>
                                <goals>
                                    <goal>execute</goal>
                                </goals>
                            </execution>
                        </executions>

                        <dependencies>
                            <dependency>
                                <groupId>mysql</groupId>
                                <artifactId>mysql-connector-java</artifactId>
                                <version>${mysql-connector.version}</version>
                                <scope>runtime</scope>
                            </dependency>
                        </dependencies>
                    </plugin>
                </plugins>
            </build>
        </profile>

        <profile>
            <!-- that skips sql plugin and test!!! -->
            <id>skipSql</id>
        </profile>
        ...
        add profile for oracle and postgresql accordingly

Whenever you run your build, the database will be freshly set up in the process-test-resources phase. The database will then be exactly as in production!

Guess we are now basically ready to start hacking on our project!

The 2nd part will focus on how to handle JPA stuff in the application code. Stay tuned!

LieGrue,
strub

Is there a way to fix the JPA EntityManager?

Using JPA is easy for small projects but has well hidden problems which are caused by some very basic design decisions. Quite a few of them are caused because the EntityManager cannot be made Serializable. Although there are some JPA providers which claim serializability (Hibernate) they aren’t!

Is the EntityManager Serializable?

The LazyInitializationException is a pretty bad beast if you ever worked with EJB managed EntityManagers. That problem caused lots of people to discover alternative ways. Two of the most prominent are JBoss Seam2 if you are working with the JBoss stack and Apache MyFaces Orchestra for Spring applications.

The basic problems are summed up very well in the at large still correct Apache MyFaces Orchestra documentation:
Apache MyFaces Orchestra Persistence explanation

If you read through the whole page, you will see the TODOs at the very bottom of the page:

TODO: is the persistence-context serializable? Are all persistent objects in the context always serializable?

The simple answer is: NO not at all! Neither the EntityManager nor the state in the entities are Serializable as per the current JPA specification!

Why is the EntityManager not Serializable

There are a few reasons:

1. Pessimistic Locking

The biggest blocker first: JPA doesn’t only support Optimistic Locking but also Pessimistic Locking. You can either declare this in your persistence.xml and also programmatically via the LockModeType in many functions.

EntityManager#find(java.lang.Class entityClass, java.lang.Object primaryKey, LockModeType lockMode) 
EntityManager#lock(java.lang.Object entity, LockModeType lockMode) 
...

But if you ever use pessimistic locking (a real hard lock on the database) the connection is bound to the database and cannot be ‘transferred’ to another EntityManager without losing the lock.

2. Id and Version fields are optional

To use the optimistic locking approach, a primary key plus some ‘version’ field must be used in the entity:

 UPDATE tableX SET([somevalues], version=:oldversion+1) WHERE id=:myId AND version==:oldversion

Obviously this update can only succeed once. Trying to update the row a second time will not find any database entry because the version==:oldversion will not be true anymore.

When you use optimistic locking in JPA, you will always have such a ‘version’ column already. But there is no need to specify it yet! Thus this information will not be transported if you serialize the entity!

To fully support optimistic locking, those entities will need mandatory @Id and @Version columns.

3. Losing the entity state information

As outlined in a previous blog post every JPA entity will get ‘enhanced’ with some magic code which tracks _loaded and _dirty state information. Those BitFlags will track the parts of the entity which got changed or fetched lazily.

The problem in this area is mostly caused by the JPA spec which by default prevents the JPA providers from serializing the ‘enhanced entities’ but requires serializing the ‘native’ information. At least that seems to be the common understanding of the following paragraph in the JPA spec:

β€žSerializing entities and merging those entities back into a persistence context may not be interoperable across vendors when lazy properties or fields and/or relationships are used.
A vendor is required to support the serialization and subsequent deserialization and merging of detached entity instances (which may contain lazy properties or fields and/or relationships that have not been fetched) back into a separate JVM instance of that vendor’s runtime, where both runtime instances have access to the entity classes and any required vendor persistence implementation classes.

Of course, most JPA providers know a way to enable the serialization of the state fields. In OpenJPA just provide the following magic properties to your persistence.xml:

<property name="openjpa.DetachState" value="loaded(DetachedStateField=true)"/>
<property name="openjpa.Compatibility" value="IgnoreDetachedStateFieldForProxySerialization=true"/>

This will also serialize _loaded and _state BitFlags along with your Entity.

The problem with having the EntityManager not Serializable

Well, this one is easy:

  • You cannot store the EnityManager in a Conversation
  • You cannot store the EntityManager in a Session
  • You cannot store the EntityManager in a JSF View
  • No clustering, because Clustering means that you need to Serialize the state

What can you do today?

Today the only working sulution is the entitymanager-per-request pattern. Basically creating a @RequestScoped EntityManager e.g. via a CDI @Produces for each and every request. That also means that you need to manually merge those entities on the callback. If you use JSF that is in your action.

 

How to fix the JPA EntityManager in the future?

Here are my thoughts about how we can do better in the future. Please note that there is a project called Avaje eBean which is not JPA compliant but has already successfully implemented those ideas.

Provide an OptimisticEntityManager

public interface OptimisticEntityManager extends EntityManager, Serializable

The most important change here is that it implements the java.io.Serializable interface.
This OptimisticEntityManager should throw an NonOptimisticModeException whenever one tries to execute an operation on the EntityManager which requires a non-optimistic LockModeType or another operation which creates some lock or non-serializable behaviour.

There should be a way to explicitly request an OptimisticEntityManager, e.g. via

OptimisticEntityManager EntityManagerFactoy#createOptimisticEntityManager(); 

Make @Id and @Version mandatory for those Entities

This will solve the problem with losing the optimistic lock information when serializing.

Define _loaded and _dirty Serialization

The future JPA spec could either clarify that serialization is more important than JPA-vendor inter-compatibility (who uses 2 different JPA providers in the same environment anyway?).
Or just specify that 2 BitFlags can be passed in the Serialized entity and how they should behave.

Please tell me what you think? Do we miss something? It’s not an easy move, but up to now I think it is doable!

PS: Thanks to Shane Bryzak and Jason Porter for helping me get rid of the worst English grammar and wording issues at least. Hope you folks got the gist regardless of my bad english πŸ˜‰

Unit Testing Strategies for CDI based projects

Even after 2 years of CDI going public there are still frequent questions about how to unit test CDI based applications. Well, here it goes.

1. Arquillian, the heavyweight champion

Arquillian is the core part of the EE testing effort of JBoss. The idea is to write a unit test exactly once and run it on the target platform via some container integration connectors.

How does Arquillian tests look like?

I will not give a detailed introduction here, but I’d like to mention that an Arquillian test will usually contain a @Deployment section which packages all the classes under test into an own temporary Java packaging, like a JAR, EAR, WAR, etc and uses this temporary artifact to run the tests.

There are basically 2 ways to run Arquillian tests. I’ll not go into details but only give a very rough (and probably not too exact) overview:

Remote/Managed style Arquillian tests

In this case the Arquillian tests will be started in an extenal Java VM, e.g. a started JBossAS7 or GlassFish-3.2. If you start your unit test locally, it will get ‘deployed’ to the other VM and executed over there. You can debug through as if you were working over there which is established via the java debugging and profiling API. Some container connectors also allow to locally run ‘as-client’.

The Good: this is running exactly on the target platform you do your real stuff on.
The Bad: Slower than native unit tests. No full control over the lifecycle, etc. You e.g. cannot simulate multiple servlet requests in one unit test.

Embedded style Arquillian tests

Instead of deploying your Arquillian test to another VM you are starting a standalone CDI or embedded EE container in the current Java VM.

The Good: Much faster as going remote. And you don’t need any server running.
The Bad: Sometimes this leads to ClassLoader isolation problems between your @Deployment and your unit test. This leads to problems when you e.g. like to unit test classes which starts an EntityManager. The reason is that e.g. any persistence.xml will get picked up twice. Once from your regular classpath and the second time from the @Deployment.

You can read more about it in the JBoss Arquillian documentation (thanks to Aslak Knutsen for the links):

2. DeltaSpike CdiControl, the featherweight champion

The method to unit test CDI based programs I like to introduce next uses the Apache DeltaSpike CdiControl API introduced in my previous post. It is based on the ideas and experience I had with the Apache OpenWebBeans CdiTest module I created and heavily use since 3 years.

First of all, you will need a META-INF/beans.xml in your test class path. All your unit tests will at the end become CDI enabled classes! If you use maven, then just create an empty file.

$> touch src/test/resources/META-INF/beans.xml

Instead of creating a @Deployment like Arquillian does, we now just take the whole test-dependencies and scan them. Of course this means that starting the CDI container for a unit test will always have to scan all the classes which are enabled in JARs which have a META-INF/beans.xml marker file. But on the other hand we spare creating the @Deployment and always test the ‘real thing’.

Introducing the test base class

The first step is to create a base-class for all your unit tests. I will break down the full class into distinct blocks and explain what happens. I’m using testng in this case, but jUnit tests will look pretty similar.

The main thing we need is the DeltaSpike CdiContainer. CDI containers don’t like it to run concurrently within the same ClassLoader. Thus we have our CdiContainer as static member variable and share it across parallel tests.

public abstract class CdiContainerTest {
    protected static CdiContainer cdiContainer;

Now we also need to initialize and use it for each method under test. This is also a good place to set the ProjectStage the test should execute in.
After boot()-ing the CdiContainer we also start all the contexts.
If the container is already started, we just clean the contextual instances for the current thread.

    @BeforeMethod
    public final void setUp() throws Exception {
        containerRefCount++;

        if (cdiContainer == null) {
            ProjectStageProducer.setProjectStage(ProjectStage.UnitTest);

            cdiContainer = CdiContainerLoader.getCdiContainer();
            cdiContainer.boot();
            cdiContainer.getContextControl().startContexts();
        }
        else {
            // clean the Instances by restarting the contexts
            cdiContainer.getContextControl().stopContexts();
            cdiContainer.getContextControl().startContexts();
        }
    }

We also do proper cleanup after each method finishes. We especially cleanup all RequestScoped beans to ensure that e.g. disposal methods for @RequestScoped EntityManagers gets invoked.

    
    @AfterMethod
    public final void tearDown() throws Exception {
        if (cdiContainer != null) {
            cdiContainer.getContextControl().stopContext(RequestScoped.class);
            cdiContainer.getContextControl().startContext(RequestScoped.class);
            containerRefCount--;
        }
    }

Another little trick: in the @BeforeClass we ensure that the container is booted and then do some CDI magic. We get the InjectionTarget and fill all InjectionPoints of our very unit test subclass via the inject() method.
This allows us to use @Inject in our unit test classes which extend our CdiContainerTest.

    @BeforeClass
    public final void beforeClass() throws Exception {
        setUp();
        cdiContainer.getContextControl().stopContext(RequestScoped.class);
        cdiContainer.getContextControl().startContext(RequestScoped.class);

        // perform injection into the very own test class
        BeanManager beanManager = cdiContainer.getBeanManager();

        CreationalContext creationalContext = beanManager.createCreationalContext(null);

        AnnotatedType annotatedType = beanManager.createAnnotatedType(this.getClass());
        InjectionTarget injectionTarget = beanManager.createInjectionTarget(annotatedType);
        injectionTarget.inject(this, creationalContext);
    }

After the suite is finished we need to properly shutdown() the container.

    @AfterSuite
    public synchronized void shutdownContainer() throws Exception {
        if (cdiContainer != null) {
            cdiContainer.shutdown();
            cdiContainer = null;
        }
    }

}

You can see the full class in my lightweightEE sample (work in progress):
CdiContainerTest.java

The usage

Just write your unit test class and extend the CdiContainerTest class.

public class CustomerServiceTest extends CdiContainerTest
{

You can even use injection in your unit tests, because they get resolved in the @BeforeClass method of the base class.

    private @Inject CustomerService custSvc;
    private @Inject UserService usrSvc;
...

And just use the injected resources in your test

    @Test(groups = "createData")
    public void testCustomerCreation() throws Exception {
        Customer cust1 = new Customer();
        ...
        custSvc.createCustomer(cust1);
    }

Btw, to use the latest Apache DeltaSpike snapshots you might need to configure the following maven repository in your project or ~/.m2/settings.xml :

  <repositories>
    <repository>
      <id>people.apache.snapshots</id>
      <url>https://repository.apache.org/content/repositories/snapshots/</url>
      <releases>
        <enabled>false</enabled>
      </releases>
      <snapshots>
        <enabled>true</enabled>
      </snapshots>
    </repository>
  </repositories>

Limits

The CdiContainer based unit tests work really fine if your project under test does not rely on EE container resources, like JNDI or JMS. It might work for e.g. EJB if you use Apache OpenWebBeans plus Apache OpenEJB or JBoss Weld plus JBoss microAS.
I would need to test this out to give more detailed infos.
As a general rule of thumbs: once you need heavyweight EE resources in your backend, you should go the Arquillian route.

If you only use JNDI for configuring a JPA DataSource then please look at the documentation of Apache MyFaces CODI ConfigurableDataSource. This will give you much more flexibility and removes the need of the almost-ever-broken JNDI configuration in your persistence.xml. Instead you can use plain CDI mechanics to configure your database settings.

Summary

If you like to do more end-to-end or integration testing, then Arquillian is worth looking at. If you just need a quick unit test for your CDI based project, then take a look at my CdiContainerTest base class and grab the ideas you like. And don’t forget to provide feedback πŸ˜‰

Design a site like this with WordPress.com
Get started