-
-
Notifications
You must be signed in to change notification settings - Fork 14.4k
const-eval: always do mem-to-mem copies if there might be padding involved #148967
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
rustbot has assigned @JonathanBrouwer. Use |
|
@bors try |
This comment has been minimized.
This comment has been minimized.
…try> const-eval: always do mem-to-mem copies if there might be padding involved
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
c4acb77 to
01194d7
Compare
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (78c81ee): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -3.2%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary -2.7%, secondary -9.4%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary -1.1%, secondary 0.0%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 472.272s -> 472.014s (-0.05%) |
|
Uh okay I guess this is actually good for perf.^^ At least for the benchmarks we have. The copy apparently gets a little cheaper, but we force more things to use the less efficient in-memory representation. The latter just does not seem to matter in our benchmarks.
Just to be safe:
@craterbot check
|
|
👌 Experiment ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more |
01194d7 to
472364c
Compare
|
This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers. |
|
Most of the performance regressions are from the coercions benchmark. All it does is create an array of a large number of string literals in const. Why did this benchmark's performance regress? There is no padding involved in any of the types. |
|
That's fair -- I should have gotten up-to-date perf numbers. We did try enabling full validation in const-eval before and it was never close to acceptable, but there's no harm in getting new numbers. I made a separate PR for that: #149901. |
|
@traviscross here are the numbers for the "naive" implementation: perf. It's pretty red. ;) Now, there may be ways to optimize this if we only have to reset padding and provenance, but not do full validation... and there could be a fast-path for scalar types... like, one can pour basically an unbounded amount of time into making this faster. It's unclear how far this would actually get us. Also if we do that guided by our benchmark suite, we'd be overfitting for that very quickly I think. I'm afraid I don't think I have the time, patience, and motivation for that. I'd be happy to mentor/review patches for someone who does want to work on it, though. But meanwhile, this PR doesn't really make things any worse than they are already, and it avoids accidentally changing behavior when the layout algorithm changes. |
|
Thanks for running those numbers. Does caching/memoizing come into play for optimizing this (and are we doing any now)? (Is that hard here?) As described ("doing a full traversal"), it seems the sort of thing that would be at risk of traversing the same path repeatedly, e.g. as a final value is built up from parts. Without memoizing, it's easy to see how this would go |
|
There's currently no caching of any sort. Also, adding caching is not trivial... the naive option of "here's some pairs of types and raw byte lists for which we can skip validation in the future" would quickly consume huge amounts of RAM caching values we'll never need again. Representing raw byte lists (including uninit and provenance) is itself non-trivial, and even just checking the cache would be expensive. Or alternatively we build a system where we remember "the bytes at offset O in allocation A have been validated for type T" -- but we'd have to be really sure we catch all codepaths where memory gets changed so that we can invalidate the cache. Also, Rust moves values around a lot, so this alone wouldn't suffice; we'd also want this information to be preserved when copying things from one place to another. This could be an interesting student project to speed up Miri, but it's not going to happen short-term. |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
For concreteness, the proposal I'd put before us, then, would be to accept this PR and to document in the Reference:
We would be explicit that this remains part of our RFC 1122 underspecified language semantics and that we remain within our rights to change the behavior of typed copies within consteval. @rfcbot fcp merge lang |
|
Team member @traviscross has proposed to merge this. The next step is review by the rest of the tagged team members: No concerns currently listed. Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! cc @rust-lang/lang-advisors: FCP proposed for lang, please feel free to register concerns. |
|
I find these perf results rather encouraging actually, given that this was a naive implementation. Noticeable regressions are on a limited set of tests, all of them stress-tests. I don't mind the currently-proposed lang decision so that we can get this merged, but if anyone with enough const-eval knowledge could try a non-naive version I'd be keen to revisit this. |
I think this mostly shows that we have very few interesting consts in our benchmarks. |
|
@rfcbot reviewed Ok, the proposal on the table sounds reasonable. Options I can see for having a more satisfying story in the future:
The second brings to mind a more general typed allocator API, which also would have to be optional. Then we could have typed/untyped allocations just as we have typed/untyped copies. That could be useful e.g. for heap profiling applications, among other things. I'm certain I'm not the first to think of this though. I guess the reasons not to go down this path are some combination of the following; are there others?
|
|
Turns out that there is already some code in the wild that depends on how exactly we do typed copies in consteval, according to crater on the PR implementing an alternative to this PR. Edit: It is indeed just previously-uncaught consteval UB. |
|
I think that's just code with const-UB that we currently do not detect. We are at liberty to break such code. |
|
@rfcbot reviewed |
|
🔔 This is now entering its final comment period, as per the review above. 🔔 |
|
The final comment period, with a disposition to merge, as per the review above, is now complete. As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed. |
This is the final piece of the puzzle for #148470: when copying data of a type that has padding, always do a mem-to-mem copy, so that we always preserve the source padding exactly. That prevents rustc implementation choices from leaking into user-visible behavior.
This is technically a breaking change: the example at the top of #148470 no longer compiles with this. However, it seems very unlikely that anyone would have dependent on this. My main concern is not backwards compatibility, it is performance.
Fixes #148470
Originally posted by @RalfJung in #148470
Originally posted by @RalfJung in #148470
Related: