Hey guys! How’s it going?
Another six months are over and you might wonder what the old shaggy prepared this time? π€ No wonder, it certainly is another OpenJDK contribution! π€© This time, it speeds up reading from CharSequence, and it will allow faster Write::append. π
Never heard of CharSequence? Well, it’s the common interface of String, StringBuilder, CharBuffer, and quite some custom classes out there. But wait! There are other text classes than String? π― Maybe you never thought abot thatβ¦ π€¨ And why would you want to speed that up? Because a program never is fast enough, but more urgently, because it reduces power consumption! So, once more, we gain fun from faster apps, plus saving the climate. π΄ Ain’t that great? So read on!
So for long time, you did not think about what happens “under the hood” when you concatenated Strings, like "abc" + "def". But then someone came and told you not to do “+” but use StringBuild::append, as that would be way faster (which it was). And then someone else came and told you, that this is an urban legend, as javac meanwhile does exactly that for you (which it does). But in fact, what happens still is that (directly or indirectly) memory is allocated which is the size of "abc" plus the size of "def" (even worse, it is not even stack memory but heap memory, but let’s put that aside for today). Actually, there is even more work done: As Strings are compressed internally, an compression algorithm chimes in. And yes, that needs time and memory, and energy, too. π Indeed there is even more going on internally, but more or less we could say: Concatinating Strings is effectively making a compressed copy, then throwing away both original values, even in the Java 25 age. And “throwing away” means, leaving behind holes in the linear memory space. So besides pure Garbage Collection (“vacuuming”), we need memory defragmentation (“waste grinding”), which is another nice word for: moving even more bytes around in memory. And that costs even more time and power. And guess what: Your app concatenates Strings a lot, right? And guess what: The Java Runtime (JRE) itself internally concatenates even more Strings! So copy, reallocate, compress, deallocate, GC, defrag all the time. But for what? For nothing! π Sigh.
For nothing? Yes, for nothing. Because you could spare a lot of that – when further using StringBuilder instead of String. Ok, you know that since long time, so you do that, and so is javac (it replaces String+String by a StringBuilder “under good conditions”), and so is the JRE itself. But here comes the bad news: In the end, you, just like javac, just like the JRE itself, are calling toString() eventually. Don’t you? You do! And that meansβ¦ right: Pointless power consumption, as toString() produces another temporary copy on the heap!
So why not omitting toString()? Just directly pass around your StringBuilder everywhere, instead of toString()’ing it! This spares lots of toString(). (cheer) So all is fine now? Nope. (cheer stops). Once you want to output your StringBuilder, or once you want to input text into a StringBuilder, you’re facing a problem: Your surrounding frameworks do not accept StringBuilder! Typically these all work with String, or, like in the case we’re talking about today (to finally come to the copic of today’s posting), they do accept CharSequence – but they internally toString() it. π
For example: Java’s Writer classes (you know, like good old BufferedWriter, PrintWriter, and all those) just pretend to accept not only Strings (like in write(String)), but also any other kind of CharSequence (like in append(myStringBuilder)). That looks just as what we want to spare that toString() heap clutter. But wait! π«· Take a look at the implementation firstβ¦ it does… tada… toString()! π₯³ So that nice trick that javac internally uses a StringBuilder to implement “+” is good for nothing, as finally you end up with another time-squandering copy as soon as you output the result. π
But stopy crying! I am here to help! π¦Έ The other day Oracle kindly adopted my latest OpenJDK contribution, and this finally paves the way to fix these troubles. To understand my solution, let’s dive deeper into Writer, and why it does toString() on any CharSequence (even on Strings themselves). The cause is: Performance. Shocking! π«’ If you copy text “char-by-char” this would be totally slow, as computers can pass around much larger clusters of information with a single command. So what Writer::write internally does is, it asks the String to put all its characters into a char array with a single command (which ontop is lightning-fast machine code β‘). That command is String::getChars(int, int, char[], int). So why doing that only with Strings, but not with other CharSequences? Because CharSequence does not have that command. It’s as embarrasing as that! CharSequence only can be asked for one character at a time, so you need a loop in Java – which means, in 99% of you cases, an interpreted loop (unless you do it 10.000 times to get it hot spotted eventually). And that is not just slow, it is even super-slow! π
So what I did is that I added that exact getChars(int, int, char[], int) method signature that String always had to the CharSequence interface. Sounds easy, but took six months of discussions and a lot of convincing (for example, I had to proof that “not many” code exists on earth that already has such a method but that method does something else – as that code would silently do the wrong thing once executed on Java 25). This foundational change is now found in Java 25, and if you download a pre-release build, you can play with it right now – or already prepare your application if it does a “char-by-char” loop or temporary toString() currently.
So what about Writer? I’m working on it. Just today I filed the first of a set of pull requests towards getting this new method used in all Writers in OpenJDK. So while nothing will get faster “magically” in JDK 25, the foundation is laid, and by the time, eventually your code runs more efficient, without recompiling it. So… stay tunedβ¦! π
You must be logged in to post a comment.