Add optimization to avoid load of address#76683
Conversation
|
r? @oli-obk (rust_highfive has picked a reviewer for you, use r? to override) |
This comment has been minimized.
This comment has been minimized.
62de698 to
71219e7
Compare
|
Hi @jonas-schievink, thanks for the review.
@rustbot modify labels: +S-waiting-on-review -S-waiting-on-author |
oli-obk
left a comment
There was a problem hiding this comment.
This optimization is very scary, it has lots of edge cases that we need to think about
There was a problem hiding this comment.
if local_being_derefed is used on the rhs of an assignment to another local, your optimization may misfire I think.
let x = 42;
let a = 99;
let mut y = &x;
let z = &mut y;
*z = &a;
println!("{}", *y);will not print 99 after your optimization but 42
There was a problem hiding this comment.
I added a test for this. It does not seem like it will misfire. I also added more clarifying comments on the matches. I have persuaded myself that since we are only applying this optimization on immutable references, we can't have a mutable reference at the same time, so nothing (besides asm) can break this optimization. My reasoning may be wrong though..
There was a problem hiding this comment.
You are correct. Setting the lookback higher does indeed cause a miscompilation. I'll look into fixing it.
There was a problem hiding this comment.
The general way to look for mutation is to define a visitor and visit the construct you want to analyze (Statement in your case) and implement https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/visit/trait.Visitor.html#method.visit_local to and check the PlaceContext if the local matches the local you want to check.
There was a problem hiding this comment.
Ah, I saw your comment a tad too late. I already pushed a fix, but i'll see if it is cleaner/less adhoc to use the visitor. Thanks for the hint.
There was a problem hiding this comment.
The latest commits now check that local_being_derefed is not mutated using a visitor. I have verified that setting the lookback to 10000 causes a misoptimization without the fix, and fixes it with the fix. I have set it back to to avoid bad runtime complexity. I'm not sure how best to represent that the fix is actually working. For now I separated it into two commits.
There was a problem hiding this comment.
Thanks! Splitting it into individual commits made this great to review.
I also don't know how to test it. A bogus idea (Don't do this 😆): instead of 6 use 6 + mir_opt_level, then we can do -Zmir-opt-level=99999 on that test.
An alternative to having this magic number would be to implement the optimization as a feed forward (am I using words correctly here?) optimization. What I mean is that we don't go through every deref and look back, but we walk forward through each block and keep a set of "deref-optimizable" locals that we update as we go.
There was a problem hiding this comment.
That said, I'm fine with merging this optimization as long as there's a tracking issue for exploring such a change to the optimization. I'm not sure whether that change is feasible in practice, but we should explore it
There was a problem hiding this comment.
Cool. Yeah it should be possible to do this optimization without the lookback, such that a single statement is only visited once. I'll open a new pr for that.
Can we do a perf run on the current pr? I'm curious if this has any impact
|
@bors try @rust-timer queue |
|
Awaiting bors try build completion |
|
⌛ Trying commit 116283b910a7f8809260ba91cb9b9b647a8900a3 with merge 7d835e12834e0230779883b269523b01a940c2df... |
|
☀️ Try build successful - checks-actions, checks-azure |
|
Queued 7d835e12834e0230779883b269523b01a940c2df with parent 10b3595, future comparison URL. |
|
Finished benchmarking try commit (7d835e12834e0230779883b269523b01a940c2df): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
|
perf looks ok. Please squash so we get all the back and forth out of the timeline |
116283b to
4dedb76
Compare
|
Squashed and rebased on master |
|
@bors r+ |
|
📌 Commit 4dedb76 has been approved by |
|
⌛ Testing commit 4dedb76 with merge 40fe3db7186781f895de3f6d860dbd5d570ec21c... |
|
💔 Test failed - checks-actions |
|
Looks like you need to rebase and rebless again? |
…heckedAdd This makes the test run deterministic regardless of noopt testruns
4dedb76 to
dfc469d
Compare
|
Rebased on master and added a new commit such that Add is always generated instead of CheckedAdd, which broke the test diff on noopt testruns. |
|
@bors r+ |
|
📌 Commit dfc469d has been approved by |
|
☀️ Test successful - checks-actions, checks-azure |
|
@rust-lang/wg-mir-opt I think this would be better suited to constant-folding with a notion of "symbolic" values. But also I'm a bit worried this kind of optimization may break stacked borrows type assumptions if the indirection was "weakening" the semantics of the access, and making it direct "strengthens" it too much - though this may only be a problem with raw pointers. |
Look for the sequence
in which we can replace the last statement with
_5 = _1to avoid the load of _2