Explicitly forget the zero remaining elements in vec::IntoIter::fold().#148486
Explicitly forget the zero remaining elements in vec::IntoIter::fold().#148486rust-bors[bot] merged 1 commit intorust-lang:mainfrom
vec::IntoIter::fold().#148486Conversation
|
rustbot has assigned @Mark-Simulacrum. Use |
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Explicitly forget the zero remaining elements in `vec::IntoIter::fold()`.
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (ae97583): comparison URL. Overall result: ❌✅ regressions and improvements - no action neededBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary 3.3%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (secondary -2.7%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary 0.0%, secondary 0.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 473.413s -> 473.632s (0.05%) |
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Explicitly forget the zero remaining elements in `vec::IntoIter::fold()`.
This comment has been minimized.
This comment has been minimized.
|
Hi, I posted the URLO topic that led to this. I found another case of unnecessary #[derive(Default)]
pub struct A {
_a: Option<Box<Option<A>>>,
}
pub fn test(slot: &mut Option<A>) {
slot.get_or_insert_default();
}Here https://godbolt.org/z/dPTb3r89T, Could you fix this as well while you're at it? |
|
@cyb0124 That looks quite doable, but it is a completely different part of the code and so should not go in this PR. Question: What is your particular goal in this area? Are you looking for code size reduction, execution time reduction, lack of spurious panic paths, or something else? Do you have a specific program you are trying to optimize? |
|
Finished benchmarking commit (48bf163): comparison URL. Overall result: ❌✅ regressions and improvements - no action neededBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (secondary -3.4%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary -4.0%, secondary -1.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary 0.1%, secondary 0.2%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 473.384s -> 475.775s (0.51%) |
It's "lack of spurious panic paths". The description of this post and this post are pretty much exactly what I need. From what I can find, panicking in the destructor seems to be the best option for now, and it'd be nice to know statically that such panics are never reached. I know this is impossible to prove statically in general without new syntactic restrictions in the language, but cases like " |
Well, that looks … good-ish? Seems unfortunate that this change increases binary size. I am not familiar with how or whether one might investigate that in the context of the rustc test suite.
The thing I would caution you about is that you — and people working on the standard library trying to help — may still find this a Sisyphean task, where it's never really complete enough to actually write the program you want and have it stay free of panic paths under maintenance. |
In `Option::get_or_insert_with()`, forget the `None` instead of dropping it. Per rust-lang#148486 (comment) In `Option::get_or_insert_with()`, after replacing the `None` with `Some`, forget the `None` instead of dropping it. This allows eliminating the `T: [const] Destruct` bounds, making the functions more flexible in (unstable) const contexts, and avoids generating an implicit `drop_in_place::<Option<T>>()` that will never do anything (and which might even persist after optimization).
In `Option::get_or_insert_with()`, forget the `None` instead of dropping it. Per #148486 (comment) In `Option::get_or_insert_with()`, after replacing the `None` with `Some`, forget the `None` instead of dropping it. This allows eliminating the `T: [const] Destruct` bounds, making the functions more flexible in (unstable) const contexts, and avoids generating an implicit `drop_in_place::<Option<T>>()` that will never do anything (and which might even persist after optimization).
Rollup merge of #148562 - kpreid:get-init-drop, r=oli-obk In `Option::get_or_insert_with()`, forget the `None` instead of dropping it. Per #148486 (comment) In `Option::get_or_insert_with()`, after replacing the `None` with `Some`, forget the `None` instead of dropping it. This allows eliminating the `T: [const] Destruct` bounds, making the functions more flexible in (unstable) const contexts, and avoids generating an implicit `drop_in_place::<Option<T>>()` that will never do anything (and which might even persist after optimization).
|
Rerolling reviewer due to lack of response. r? libs |
|
While I don't doubt that this an improvement in some situations, I'm personally not comfortable approving it without further input. cc @nnethercote as the resident performance expert — what are your thoughts? Particularly given the change in artifact size. |
|
The artifact size change for the Nov 6 perf run was 396.68 KiB. However, libLLVM.so's size varies unpredictably and should be ignored. The size change for librustc_driver.so is a much smaller 46.20 KiB. But it's been long enough that another perf run is a good idea. @bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Explicitly forget the zero remaining elements in `vec::IntoIter::fold()`.
This comment has been minimized.
This comment has been minimized.
|
Such small rustc artifact size changes are purely PGO/BOLT noise (sadly), so I wouldn't take that into account. |
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (32fc5f5): comparison URL. Overall result: ❌ regressions - no action neededBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary 1.5%, secondary 1.4%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary -2.3%, secondary -4.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary 0.1%, secondary 0.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 487.434s -> 486.071s (-0.28%) |
|
No negative perf impact. |
|
Works for me then! There's a demonstrable improvement and no apparent downside. @bors r+ |
This comment has been minimized.
This comment has been minimized.
What is this?This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.Comparing e5efd33 (parent) -> 30d0309 (this PR) Test differencesShow 10 test diffsStage 1
Stage 2
Additionally, 8 doctest diffs were found. These are ignored, as they are noisy. Job group index
Test dashboardRun cargo run --manifest-path src/ci/citool/Cargo.toml -- \
test-dashboard 30d0309fa821f7a0984a9629e0d227ca3c0d2eda --output-dir test-dashboardAnd then open Job duration changes
How to interpret the job duration changes?Job durations can vary a lot, based on the actual runner instance |
|
Finished benchmarking commit (30d0309): comparison URL. Overall result: ❌✅ regressions and improvements - no action needed@rustbot label: -perf-regression Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -0.5%, secondary 1.9%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (secondary 2.4%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary -0.0%, secondary -0.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 491.183s -> 489.264s (-0.39%) |
View all comments
[Original description:]
This seems to help LLVM notice that dropping the elements in the destructor ofIntoIteris not necessary. In cases it doesn’t help, it should be cheap since it is just one assignment.This PR adds a function to
vec::IntoIter()which is used used byfold()andspec_extend(), when those operations complete, to forget the zero remaining elements and only deallocate the allocation, ensuring that there will never be a useless loop to drop zero remaining elements when the iterator is dropped.This is my first ever attempt at this kind of codegen micro-optimization in the standard library, so please let me know what should go into the PR or what sort of additional systematic testing might indicate this is a good or bad idea.