Speed up NLL with HybridIdxSetBuf.#53383
Conversation
|
(rust_highfive has picked a reviewer for you, use r? to override) |
|
Instruction count results for NLL-check builds: I don't know why Some notable max-rss results: I'd love to hear suggestions on how to improve the code under the |
|
@bors try |
|
⌛ Trying commit 00e7f624f47a2b5e196a24d6fb32779c856535a7 with merge 496278c16018428a506261e2ed5396168b5c17b9... |
|
The job Click to expand the log.I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
|
☀️ Test successful - status-travis |
There was a problem hiding this comment.
I'm wondering if it's a good idea to have a trait instead of different methods?
There was a problem hiding this comment.
What would the trait look like? I'm having trouble imagining it.
There was a problem hiding this comment.
A trait would read nicer, for sure, but we could leave it to follow-up. I imagine it would be something like:
trait SubtractFrom<T> {
fn subtract_from(&self, target: &mut IdxSetBuf<T>) -> bool;
}and then you would have
impl<T> IdxSetBuf<T> {
fn subtract(&mut self, other: &impl SubtractFrom<T>) -> bool {
other.subtract_from(self)
}
}and then
impl<T> SubtractFrom<T> for IdxSetBuf<T> { .. }
impl<T> SubtractFrom<T> for HybridIdxSetBuf<T> { .. }
impl<T> SubtractFrom<T> for SparseIdxSetBuf<T> { .. }There was a problem hiding this comment.
I think we can leave as-is for now and maybe clean it up a little later. I'm not actually sure that the trait would be a win for readability anyway.
There was a problem hiding this comment.
We should probably instead use impl Trait and pass through the various iterator traits; DoubleEnded, TrustedLen, etc. Otherwise we're losing performance here...
We should at least be able to implement size_hint, count, and nth.
There was a problem hiding this comment.
Can you point at some existing code that does similar things? I'm not sure what this would look like. Thanks.
There was a problem hiding this comment.
Oh, FWIW, I just copied the code in the existing Iter for IdxSetBuf. So if there is a genuine win to be had here -- and profiling doesn't indicate so, at least for the moment -- it should be changed as well. Perhaps that could also be a follow-up.
There was a problem hiding this comment.
This should be an unreachable! I think.
There was a problem hiding this comment.
Ok, but ideally someone would point out a way to rewrite the code so this isn't necessary :)
There was a problem hiding this comment.
if let HybridIdxSetBuf::Sparse(sparse, universe_size) = mem::replace(self, dummy)? It doesn't require an else branch.
There was a problem hiding this comment.
True, but I find that misleading -- it's strange to have an if let that cannot fail. The match makes the cannot-fail-ness more obvious.
There was a problem hiding this comment.
No reason to take by reference here since I'd is Copy
There was a problem hiding this comment.
I just copied the existing method in IdxSetBuf, so I will leave this alone. Converting all such methods in this file could be done as a follow-up.
|
@rust-timer build 496278c16018428a506261e2ed5396168b5c17b9 |
|
Success: Queued 496278c16018428a506261e2ed5396168b5c17b9 with parent 81cfaad, comparison URL. |
|
@rust-timer build 496278c16018428a506261e2ed5396168b5c17b9 |
|
Success: Queued 496278c16018428a506261e2ed5396168b5c17b9 with parent 81cfaad, comparison URL. |
1 similar comment
|
Success: Queued 496278c16018428a506261e2ed5396168b5c17b9 with parent 81cfaad, comparison URL. |
Here is the full Cachegrind diff showing the difference between the two versions: That is an amazingly small diff. The slowdown is entirely due to more hash table operations underneath @RalfJung, do you have any idea what might be going on? |
|
Perf results are in, and they're even better than what I saw locally. Notable instruction count results (look at the Things to note:
Notable max-rss results: These are similar to my local measurements. There are also these max-rss results: These are not |
|
The specific ICE is this: That doesn't mean much to me, alas. |
|
https://github.com/rust-lang/rust/blob/master/src/libcore/iter/mod.rs#L513 is a sample of the forwarding needed. I wish there was a better way, but I'm unaware of it. |
|
@nnethercote I can take a look at the ICE, though prob not for a day or two |
|
@nnethercote also, those perf stats look great! |
|
I worked out the ICE by incrementally reverting parts of the patch. The problem is in |
`HybridIdxSetBuf` is a sparse-when-small but dense-when-large index set that is very efficient for sets that (a) have few elements, (b) have large `universe_size` values, and (c) are cleared frequently. Which makes it perfect for the `gen_set` and `kill_set` sets used by the new borrow checker. This patch reduces the execution time of the five slowest NLL benchmarks by 55%, 21%, 16%, 10% and 9%. It also reduces the max-rss of three benchmarks by 53%, 33%, and 9%.
00e7f62 to
5745597
Compare
|
New version fixes the test failures. I also did another perf run locally, and |
|
@bors try |
|
⌛ Trying commit 5745597 with merge 10d5f0b359098fbbe68196b82c89cb4d02c93ce0... |
|
☀️ Test successful - status-travis |
|
⌛ Testing commit 5745597 with merge 8c91bddf4a977f33f8f6490fca480f9df68e9a33... |
|
💔 Test failed - status-travis |
|
The job Click to expand the log.I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
|
@bors retry |
|
⌛ Testing commit 5745597 with merge d81755a0327f37d33dd7d01251956a818b12767a... |
|
💔 Test failed - status-travis |
|
The job Click to expand the log.I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
|
@bors retry |
Speed up NLL with HybridIdxSetBuf. It's a sparse-when-small but dense-when-large index set that is very efficient for sets that (a) have few elements, (b) have large universe_size values, and (c) are cleared frequently. Which makes it perfect for the `gen_set` and `kill_set` sets used by the new borrow checker. This patch reduces `tuple-stress`'s NLL-check time by 40%, and up to 12% for several other benchmarks. And it halves the max-rss for `keccak`, and has smaller wins for `inflate` and `clap-rs`.
|
☀️ Test successful - status-appveyor, status-travis |
|
The top 5 on the NLL dashboard now looks like this: And |
|
@Mark-Simulacrum: hmm, any idea why ripgrep is showing up above sentry-cli even though its ratio is lower? A few of the others lower down the list are out of order too. |
|
No, that shouldn't be happening. Could you file an issue? |
…Simulacrum Remove `HybridBitSet` `HybridBitSet` was introduced under the name `HybridIdxSetBuf` way back in rust-lang#53383 where it was a big win for NLL borrow checker performance. In rust-lang#93984 the more flexible `ChunkedBitSet` was added. Uses of `HybridBitSet` have gradually disappeared (e.g. rust-lang#116152) and there are now few enough that they can be replaced with `BitSet` or `ChunkedBitSet`, and `HybridBitSet` can be removed, cutting more than 700 lines of code. r? `@Mark-Simulacrum`
It's a sparse-when-small but dense-when-large index set that is very
efficient for sets that (a) have few elements, (b) have large
universe_size values, and (c) are cleared frequently. Which makes it
perfect for the
gen_setandkill_setsets used by the new borrowchecker.
This patch reduces
tuple-stress's NLL-check time by 40%, and up to 12%for several other benchmarks. And it halves the max-rss for
keccak,and has smaller wins for
inflateandclap-rs.