0
\$\begingroup\$

Between the two implementations, which one is better?

    import org.apache.commons.lang3.stream.Streams;

    public static <T> T firstNonNull(final T... values) {
        return Streams.of(values).filter(Objects::nonNull).findFirst().orElse(null);
    }

or

    public static <T> T firstNonNull(final T... values) {
        if (values != null) {
            for (final T val : values) {
                if (val != null) {
                    return val;
                }
            }
        }
        return null;
    }

The context is that of a general-purpose utility class intended for use throughout the application. Indeed, performance is one concern.

\$\endgroup\$
1
  • \$\begingroup\$ Behaviour for parameter null is not documented. Is it different?! \$\endgroup\$ Commented Apr 9 at 8:57

4 Answers 4

10
\$\begingroup\$

The use of streams is a good way to avoid boilerplate.

It's idiomatic to format this as multiple lines, which makes it even better.

    public static <T> T firstNonNull(final T... values) {
        return Streams.of(values)
            .filter(Objects::nonNull)
            .findFirst()
            .orElse(null);
    }
\$\endgroup\$
6
\$\begingroup\$

In the for loop code, it is explicitly clear how the first non-null value is chosen. We can clearly see that the loop terminates immediately as soon as a non-null is found. The loop does not continue. The code is simple and well-structured.

The Streams version makes the assumption that it is doing the same efficient looping under the hood as your for loop does. If you use it frequently, then you already know its algorithm; otherwise, you need to read the documentation. Although it is one long(-ish) line of code, it is quite readable.

The reason we use code libraries is to avoid re-inventing the wheel. This assumes the library is efficient, well-tested and well-documented.

Documentation

It couldn't hurt to add a short comment to both functions describing the efficeincy.

Performance

You need to benchmark the 2 versions of code to see which performs better using realistic data sets, especially data that you expect to encounter in your application.

\$\endgroup\$
3
  • \$\begingroup\$ From where I copied the code, both methods are well-documented with examples. I stripped it because I am interested in the implementation. \$\endgroup\$ Commented Apr 1 at 6:52
  • 7
    \$\begingroup\$ @FlorianF: This is not your code? \$\endgroup\$ Commented Apr 1 at 10:51
  • \$\begingroup\$ I would have written version 2. I actually did except I didn't test for the null array. Apache has implemented it as version 1. I wonder why. I wanted to know what is the general opinion about using Streams vs low level but efficient code. \$\endgroup\$ Commented Apr 9 at 22:39
6
\$\begingroup\$

The examples do not implement the same functionality. The first one throws a NPE when called as firstNonNull((Object[]) null). Allowing null arrays and lists is a bad practice anyway so I would remove the null check from the latter and replace it with non-null requirement

for (final T val : Objects.requireNonNull(values)) {

and likewise change the stream version to

Stream.of(Objects.requireNonNull(values))

Regarding which one is better: neither. The code is inconsequential. Both versions fulfill their purpose.

You did not explicitly ask for performance review, but since it is in the tags I should post this link about performance optimizations that I have in my profile: https://ericlippert.com/2012/12/17/performance-rant/ . There are other links there too which you should read.

\$\endgroup\$
5
  • \$\begingroup\$ Actually I don't think you can pass null to either methods. Anyway, I used Strams.of which is the apache implementation, and it returns an empty Stream if given a null value. Indeed, performance is a concern. \$\endgroup\$ Commented Apr 1 at 6:48
  • \$\begingroup\$ Yes, you can pass a null array to a varargs method with the way I showed in the answer. And if you use a specific library, you should include the import statement in the code. \$\endgroup\$ Commented Apr 1 at 6:54
  • \$\begingroup\$ You are right. I assumed that with this syntax values would be assigned an array of Object[] with null being the first and only element. \$\endgroup\$ Commented Apr 1 at 8:41
  • \$\begingroup\$ Stream.of(Objects.requireNonNull(xxx)) is pointless here, as Stream.of(xxx) does same check + throw internally with improved message (on later JDKs) "Cannot read the array length because "array" is null". A little nicer than unspecified NPE. (Not sure if this applies to Streams as used in question). \$\endgroup\$ Commented Apr 1 at 10:24
  • \$\begingroup\$ @DuncG The requireNonNull is documentation for the next person who reads the code. It explicitly states the intent that non-nullability was a conscious choice by the developer and not just a random side effect of the library. \$\endgroup\$ Commented Apr 2 at 3:39
0
\$\begingroup\$

Here is why I posted this question.

The first version is the actual implementation of findFirstNonNull in org.apache.commons.lang3.ObjectUtils.

When I saw it I was horrified. I posted it here to see if others see the problem. It seems it doesn't raise eyebrows.

The problem is that the task is simple, it could be executed in a simple loop and an if. Yet it uses the heavy artillery of a Stream and lambdas. It is what I would call using a cannon to kill a fly.

To see just how much overhead there is, I added a breakpoint on Object::notNull. Here is the stack trace.

nonNull:296, Objects (java.util)
test:-1, ObjectUtils$$Lambda/0x0000000801003448 (org.apache.commons.lang3)
accept:178, ReferencePipeline$2$1 (java.util.stream)
tryAdvance:1034, Spliterators$ArraySpliterator (java.util)
forEachWithCancel:129, ReferencePipeline (java.util.stream)
copyIntoWithCancel:527, AbstractPipeline (java.util.stream)
copyInto:513, AbstractPipeline (java.util.stream)
wrapAndCopyInto:499, AbstractPipeline (java.util.stream)
evaluateSequential:150, FindOps$FindOp (java.util.stream)
evaluate:234, AbstractPipeline (java.util.stream)
findFirst:647, ReferencePipeline (java.util.stream)
firstNonNull:593, ObjectUtils (org.apache.commons.lang3)
testC:75, BenchmarkFindFirst (benchmark.findfirst)
testAll:87, BenchmarkFindFirst (benchmark.findfirst)
main:103, BenchmarkFindFirst (benchmark.findfirst)

Sometimes I hear that the compiler will optimize it out, so I ran a benchmark. I ran each version 3 billion times with various inputs and timed it. The benchmark runs a pre-run before starting the timer. It runs the tests 5 times in a row.

Here is the result of my simple benchmark.

attempt:   1         2         3         4         5
TestA:  46194 ms  45646 ms  45648 ms  45575 ms  46066 ms
TestB:   1096 ms   1352 ms    802 ms    798 ms    798 ms
TestC:  76082 ms  75842 ms  75370 ms  75266 ms  74130 ms
  • TestA is the Stream version above.
  • TestB is the java loop version.
  • TestC uses the actual findFirstNonNull from the apache library.

You can see that the Stream version is 30-40 times slower than the loop version.

[edit] After Abion47 commented that the test could be flawed, I rewrote the tests to make the inputs less predictable. And indeed, the difference is less dramatic. But the loop version is still 10x faster than the Streams version. [/edit]

What blows my mind is that the loop version is the implementation of that method in apache commons until 2022. Then someone decided to convert it to the Streams version.

You might say the overhead is not important, but you never know whether people will use that function in a tight loop. In my opinion a code library is supposed to be efficient, well-tested and well-documented. Readability comes only after, and for more complicated code.

Is this post relevant? If you want, this is my code review of findFirstNonNull from the apache commons library.

\$\endgroup\$
6
  • 3
    \$\begingroup\$ “The first version is the actual implementation of findFirstNonNull in org.apache.commons.lang3.ObjectUtils.” – Note that only code that you own or maintain is reviewed on this site. \$\endgroup\$ Commented Apr 1 at 17:59
  • 2
    \$\begingroup\$ There are points here that feel like they should have been raised when you asked rather than later on. \$\endgroup\$ Commented Apr 1 at 18:05
  • 1
    \$\begingroup\$ Sorry, I wanted to see neutral opinions on the difference between these two implementations. The second version is pretty much what I wrote in replacement for the apache version. It is only later that I discovered that apache implemented it like that in the past. (Check the edit history.) So if you prefer, it is about how I would implement that vs. how apache does. \$\endgroup\$ Commented Apr 1 at 18:30
  • 1
    \$\begingroup\$ That doesn't look like output from a decent performance testing harness. Please write a proper test using JMH and if after that there is a real difference between the solutions, submit a fix to the Apache project and give the test results as a justification in the pull request. Our opinionated comments that do not have any connection to a real usage environment are pretty much worthless. My opinionated comment is that trivial sh*t like this is no reason to use Streams. \$\endgroup\$ Commented Apr 2 at 7:42
  • 1
    \$\begingroup\$ With such a drastic difference in performance, my first reaction is to analyze the benchmark itself to make sure there isn't anything going on to make it look like TestB is more performant than it actually is. That aside, the commit that converted firstNonNull from loops to streams converted other methods as well which implies there was a more compelling reason for the switch than simplicity or raw performance (consistency, perhaps). Unfortunately, unless a maintainer of Apache Commons from back then shows up, the only thing any of us can do is speculate on what that reason was. \$\endgroup\$ Commented Apr 2 at 17:25

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.