Inline most of the code paths for conversions with boxed slices#49555
Inline most of the code paths for conversions with boxed slices#49555bors merged 3 commits intorust-lang:masterfrom
Conversation
This helps with the specific problem described in rust-lang#49541, obviously without making any large change to how inlining works in the general case. Everything involved in the conversions is made `#[inline]`, except for the `<Vec<T>>::into_boxed_slice` entry point which is made `#[inline(always)]` after checking that duplicating the function mentioned in the issue prevented its inlining if I only annotate it with `#[inline]`. For the record, that function was: ```rust pub fn foo() -> Box<[u8]> { vec![0].into_boxed_slice() } ``` To help the inliner's job, we also hoist a `self.capacity() != self.len` check in `<Vec<T>>::shrink_to_fit` and mark it as `#[inline]` too.
| self.buf.shrink_to_fit(self.len); | ||
| if self.capacity() != self.len { | ||
| self.buf.shrink_to_fit(self.len); | ||
| } |
There was a problem hiding this comment.
I think that these don't need the #[inline] attribute because they're already generic, right?
There was a problem hiding this comment.
I'll check and report back.
There was a problem hiding this comment.
Indeed, this wasn't necessary. I removed it and compiled a module with foo duplicated, here is the result of compiling that to assembly:
.section __TEXT,__text,regular,pure_instructions
.p2align 4, 0x90
__ZN55_$LT$alloc..heap..Heap$u20$as$u20$core..heap..Alloc$GT$3oom17h2c0e1b8273a8e186E:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq %rsp, %rbp
.cfi_def_cfa_register %rbp
callq ___rust_oom
ud2
.cfi_endproc
.p2align 4, 0x90
__ZN5alloc4heap15exchange_malloc28_$u7b$$u7b$closure$u7d$$u7d$17h1d897acb095d410dE:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq %rsp, %rbp
.cfi_def_cfa_register %rbp
subq $32, %rsp
movq 16(%rdi), %rax
movq %rax, -8(%rbp)
movq (%rdi), %rax
movq 8(%rdi), %rcx
movq %rcx, -16(%rbp)
movq %rax, -24(%rbp)
leaq -24(%rbp), %rdi
callq __ZN55_$LT$alloc..heap..Heap$u20$as$u20$core..heap..Alloc$GT$3oom17h2c0e1b8273a8e186E
ud2
.cfi_endproc
.globl _foo
.p2align 4, 0x90
_foo:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq %rsp, %rbp
.cfi_def_cfa_register %rbp
subq $48, %rsp
leaq -40(%rbp), %rdx
movl $1, %edi
movl $1, %esi
callq ___rust_alloc
testq %rax, %rax
je LBB2_1
movb $0, (%rax)
movl $1, %edx
addq $48, %rsp
popq %rbp
retq
LBB2_1:
movq -32(%rbp), %rax
movq -24(%rbp), %rcx
movq %rcx, -8(%rbp)
movq %rax, -16(%rbp)
movq -16(%rbp), %rax
movq -8(%rbp), %rcx
movq %rcx, -24(%rbp)
movq %rax, -32(%rbp)
leaq -40(%rbp), %rdi
callq __ZN5alloc4heap15exchange_malloc28_$u7b$$u7b$closure$u7d$$u7d$17h1d897acb095d410dE
ud2
.cfi_endproc
.globl _bar
.p2align 4, 0x90
_bar:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq %rsp, %rbp
.cfi_def_cfa_register %rbp
subq $48, %rsp
leaq -40(%rbp), %rdx
movl $1, %edi
movl $1, %esi
callq ___rust_alloc
testq %rax, %rax
je LBB3_1
movb $0, (%rax)
movl $1, %edx
addq $48, %rsp
popq %rbp
retq
LBB3_1:
movq -32(%rbp), %rax
movq -24(%rbp), %rcx
movq %rcx, -8(%rbp)
movq %rax, -16(%rbp)
movq -16(%rbp), %rax
movq -8(%rbp), %rcx
movq %rcx, -24(%rbp)
movq %rax, -32(%rbp)
leaq -40(%rbp), %rdi
callq __ZN5alloc4heap15exchange_malloc28_$u7b$$u7b$closure$u7d$$u7d$17h1d897acb095d410dE
ud2
.cfi_endproc
.subsections_via_symbolsThere was a problem hiding this comment.
Was the code change here also still necessary to get the same size reductions?
|
I'm personally ok on adding |
src/liballoc/vec.rs
Outdated
| /// assert_eq!(slice.into_vec().capacity(), 3); | ||
| /// ``` | ||
| #[stable(feature = "rust1", since = "1.0.0")] | ||
| #[inline(always)] |
There was a problem hiding this comment.
OTOH, removing this one makes the into_boxed_slice calls not be optimised anymore when the function is duplicated. Even just #[inline] makes rustc not inline the calls anymore.
There was a problem hiding this comment.
Hm so in general #[inline(always)] is an antipattern as it can cause compile time to baloon very quickly. Mind if I poke around a bit at the examples you were looking at to see what's going on?
There was a problem hiding this comment.
Sure!
#![crate_type = "lib"]
#[no_mangle]
pub fn foo() -> Box<[u8]> {
vec![0].into_boxed_slice()
}
#[no_mangle]
pub fn bar() -> Box<[u8]> {
vec![0].into_boxed_slice()
}There was a problem hiding this comment.
Hm ok I'm tempted though to leave this as-is and let LLVM decide these sorts of things, we've had bad experiences in the past optimizing too much for microbenchmarks unfortunately
There was a problem hiding this comment.
Did you mean this specific #[inline(always)] attribute, or the whole PR? Either is fine with me, to be heard.
There was a problem hiding this comment.
Oh just this one particular instance of the attribute. The function is generic so it's already candidate to be inlined everywhere, and it sounds like #[inline] doesn't otherwise work here so #[inline(always)] is going directly against LLVM's heuristics, which has caused a lot of problems for us historically
|
@bors: r+ rollup Thanks! |
|
📌 Commit b59fa0d has been approved by |
Inline most of the code paths for conversions with boxed slices This helps with the specific problem described in rust-lang#49541, obviously without making any large change to how inlining works in the general case. Everything involved in the conversions is made `#[inline]`, except for the `<Vec<T>>::into_boxed_slice` entry point which is made `#[inline(always)]` after checking that duplicating the function mentioned in the issue prevented its inlining if I only annotate it with `#[inline]`. For the record, that function was: ```rust pub fn foo() -> Box<[u8]> { vec![0].into_boxed_slice() } ``` To help the inliner's job, we also hoist a `self.capacity() != self.len` check in `<Vec<T>>::shrink_to_fit` and mark it as `#[inline]` too.
Rollup of 8 pull requests Successful merges: - #49555 (Inline most of the code paths for conversions with boxed slices) - #49606 (Prevent broken pipes causing ICEs) - #49646 (Use box syntax instead of Box::new in Mutex::remutex on Windows) - #49647 (Remove `underscore_lifetimes` and `match_default_bindings` from active feature list) - #49931 (Fix incorrect span in `&mut` suggestion) - #49959 (rustbuild: allow building tools with debuginfo) - #49965 (Remove warning about f64->f32 cast being potential UB) - #49994 (Remove unnecessary indentation in rustdoc book codeblock.) Failed merges:
This helps with the specific problem described in #49541, obviously without making any large change to how inlining works in the general case.
Everything involved in the conversions is made
#[inline], except for the<Vec<T>>::into_boxed_sliceentry point which is made#[inline(always)]after checking that duplicating the function mentioned in the issue prevented its inlining if I only annotate it with#[inline].For the record, that function was:
To help the inliner's job, we also hoist a
self.capacity() != self.lencheck in<Vec<T>>::shrink_to_fitand mark it as#[inline]too.