Benchmark large json parsing
A few crude benchmarks for Elixir, Golang and Ruby
Processor Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz
OS: Ubuntu 16.04
10mb.json
Golang [jq]
$ time jq '.' data/10mb.json > /dev/null
real 0m0.567s
user 0m0.551s
sys 0m0.016s
Elixir
Time taken [Poison]: 1218.831ms
Time taken [Jason]: 508.461ms
Ruby
time ruby app.rb
real 0m0.220s
user 0m0.203s
sys 0m0.017s
Golang [jq]
$ time jq '.' data/citylots.json > /dev/null
real 0m14.436s
user 0m13.992s
sys 0m0.420s
Elixir
Time taken [Poison]: 32_640.87ms
Time taken [Jason]: 11_602.128ms
Ruby
$ time ruby app.rb
real 0m4.738s
user 0m4.498s
sys 0m0.240s
3 Likes
Does this mean Ruby San, is the fastests fasters ?
(btw shouldn’t this be done with benchee or so?)
1 Like
Seems about right. Would be nice to have Jiffy in there as well just for comparison sake.
One important thing to consider when looking at those numbers: the Go and Elixir solutions are written in pure Go and pure Elixir respectively.
I don’t know what they’re using on the Ruby side but most efficient code in Ruby/Python/PHP/Perl is just directly calling C.
4 Likes
The Ruby parser is implemented in C, so it’s more like comparing C to Elixir. Obviously C is going to be faster
We’re not doing that bad actually. It would be interesting to compare with Jason compiled with HiPE. In my benchmarks this makes it at least twice as fast.
The data is also quite different from what you’d face in a regular HTTP app - the JSON is pretty-printed, most JSON flying on the wire is not. This can have significant difference in the actual performance. Depending on what you want to learn from this, it can be important.
7 Likes
There’s exactly 0 chance that Ruby is not just calling some C lib for this 
Still, for native Elixir this is reassuringly fast.
3 Likes
Don’t suppose you’d be up for tweaking the Techempower benchmarks for Elixir to use Jason with HiPE. It’s the basis for the results of so much of those tests I think it would go a long way with the report.
Let’s have some fun, here is how they compare to C++/rapidjson:
# Golang
$ time jq '.' data/10mb.json > /dev/null
real 0m0.607s
user 0m0.603s
sys 0m0.004s
$ time jq '.' data/citylots.json > /dev/null
real 0m14.348s
user 0m13.839s
sys 0m0.504s
# Elixir
## 10mb.json
Time taken [Jason]: 521.655ms
Time taken [Poison]: 1358.531ms
## citylots.json
Time taken [Jason]: 12224.44ms
Time taken [Poison]: 33350.239ms
# Ruby
## 10mb.json
$ time ruby app.rb
real 0m0.350s
user 0m0.250s
sys 0m0.020s
## citylots.json
$ time ruby app.rb
real 0m5.632s
user 0m5.393s
sys 0m0.236s
# C++
$ time ./rapidjson-testing < ../../benchmark-large-json-parsing/data/10mb.json
real 0m0.035s
user 0m0.034s
sys 0m0.000s
$ time ./rapidjson-testing < ../../benchmark-large-json-parsing/data/citylots.json
real 0m0.531s
user 0m0.503s
sys 0m0.028s
Admittedly this benchmark is flawed because jq is outputting to stdout and elixir/ruby/C++ are just kind of blackholing the data after it is parsed, so jq/go is artificially limited here. In addition the elixir version is actually instancing a tree to hold the whole structure, which is wasted work as well (unsure about ruby). The C++ version is fully parsing and performing callbacks for every parse (standard sax parsing).
I can PR the C++ one in it if you want, it only needs the normal C++ compiler and cmake installed, nothing else needed (not even rapidjson, it acquires it itself). 
5 Likes
Just to make sure, here is the C++ version as both a sax parser, and as an elixir-style-structure-building document parser (yay eating memory):
$ time ./rapidjson-sax < ../../benchmark-large-json-parsing/data/10mb.json
real 0m0.038s
user 0m0.034s
sys 0m0.004s
$ time ./rapidjson-sax < ../../benchmark-large-json-parsing/data/citylots.json
real 0m0.529s
user 0m0.485s
sys 0m0.044s
$ time ./rapidjson-structure < ../../benchmark-large-json-parsing/data/10mb.json
real 0m0.037s
user 0m0.036s
sys 0m0.000s
$ time ./rapidjson-structure < ../../benchmark-large-json-parsing/data/citylots.json
real 0m0.533s
user 0m0.504s
sys 0m0.028s
Not much of a difference, honestly the C++ compiler is so good that it is probably being optimized out, hmm…
EDIT: And I added some code to print out some details about the structure to ensure it is compiled and parsed in full and it somehow got a few milliseconds faster… so yeah those are accurate, C++ is just fast as always…
1 Like
You are right about jq doing more work, I’ll use the standard json parser and make it match the ruby and elixir versions.
Not sure you mean by this:
In addition the elixir version is actually instancing a tree to hold the whole structure, which is wasted work as well (unsure about ruby).
Elixir would parse the data and have it in memory.
The C++ version seems insane, I’d love to add a Rust benchmark too. I know this is all crude benchmarking but it gives you a sense of comparison.
I didn’t use Benchee because the run times are fairly large. I’ll see if I can add benchee with small json loads.
I could foresee a Rust one outperforming a C++ one to be honest, but I doubt the current pure-rust libraries would at this time (though still plenty fast).
Sure, I’ll clean it up and PR it into a cpp/rapidjson directory or something. 
Do you want a readme.md or INSTALL file in that directory, or do you want me to edit the root readme to add instructions on how to compile/run it?
I was thinking of adding a statistical benchmarker to the C++ version instead of just using time, do you want me to do that pre-PR?
1 Like
You can add it to the README at the bottom 
I was thinking of adding a statistical benchmarker to the C++ version instead of just using time, do you want me to do that pre-PR?
I don’t know anything about C++, so whatever you feel is the best.
You’re all making me want to implement Jason.Native again! 
8 Likes
It will take longer to benchmark (as it performs lots of tests to get a statistical accuracy), but it would be more detailed (if you don’t mind it taking potentially many minutes (or more) to run)? I leave it up to you as it will make it take substantially longer, but it would also be substantially more accurate, but nothing else is using a statistical benchmarker so it seems kind of useless right now. ^.^;
Should probably leave it out for now, at least unless a real parsing benchmarker was setup across all the languages or something.
1 Like
Make it with rapidjson (though adding error handling and such would slow it down a bit) and you’d blow everything else away. 
1 Like
Does rapidjson by default validate UTF-8 when decoding? I know it has an option, but I don’t think it does it by default. This should have significant performance implications, if it does not - it’s comparing apples to oranges without this.
1 Like
I’m using the UTF<> argument so I’d hope so? Let me check the docs… Hmm, I’m unsure if it is default or not for parsing but I found where to set the flag to force it on regardless, results now:
$ time _builds/rapidjson-sax < ../../data/citylots.json
real 0m0.549s
user 0m0.533s
sys 0m0.016s
$ time _builds/rapidjson-structure < ../../data/citylots.json
real 0m0.542s
user 0m0.509s
sys 0m0.032s
Not seeing much of a difference, so it probably is on by default?
/me has never used rapidjson before, so feel free to check the code, PR incoming in a minute…
EDIT:
1 Like
You know it would help if I didn’t compile the sax version for both names… >.>
Fixed, results make more sense now:
$ time _builds/rapidjson-sax < ../../data/10mb.json
real 0m0.039s
user 0m0.035s
sys 0m0.004s
$ time _builds/rapidjson-structure < ../../data/10mb.json
real 0m0.048s
user 0m0.040s
sys 0m0.008s
$ time _builds/rapidjson-sax < ../../data/citylots.json
real 0m0.545s
user 0m0.516s
sys 0m0.028s
$ time _builds/rapidjson-structure < ../../data/citylots.json
real 0m0.742s
user 0m0.657s
sys 0m0.084s
There it goes, the structure form should be slower than the sax form, that makes MUCH more sense!
Ooo I have to say this is a nice small json library too:
34K May 24 16:12 _builds/rapidjson-sax
43K May 24 16:12 _builds/rapidjson-structure
I’m not even compiling for a minimum size release, just a standard release. 