Some optimizations#393
Conversation
|
Enabling PGO could get another >5% speedup. I use this tool to do the optimization https://github.com/Kobzol/cargo-pgo. $ cargo pgo run -- sample_files/slow_before.rs sample_files/slow_after.rs
$ cargo pgo optimize run -- sample_files/slow_before.rs sample_files/slow_after.rsBefore (this time I directly invoke the binary rather than through After: |
|
Substitute Original memory usage: Current memroy usage: |
|
Switching from mimalloc to snmalloc brings a negligible speedup and slightly less memory usage. Since the time difference is too small, hyperfine is used again. Before this change: After this change: If you think such an improvement is worthy then I will commit & push it. |
|
Change a Benchmark results (without PGO and snmalloc): I don't know why the number of instructions rised a little bit. |
|
cool! nice work |
|
Remove |
|
I have to focus on other works so the optimization ends here. In conclusion (without PGO and snmalloc):
|
|
Wow, really great changes! It's incredible to see a ~25% speedup in code I've already tried to make fast :) Thanks for mentioning snmalloc, I will take a look at it too. I've had a few problems with mimalloc (see #297) so I'm interested in looking at other malloc implementations. |
First, enable thin-LTO. This brings ~5% speedup.
Before:
After:
The numbers of instructions are relatively stable.
I also measured them using hyperfine.
Before:
After: