-
-
Notifications
You must be signed in to change notification settings - Fork 14.7k
Manipulating slice through &mut parameter not optimized well #27130
Copy link
Copy link
Closed
Labels
A-codegenArea: Code generationArea: Code generationC-enhancementCategory: An issue proposing an enhancement or a PR with one.Category: An issue proposing an enhancement or a PR with one.E-needs-testCall for participation: An issue has been fixed and does not reproduce, but no test has been added.Call for participation: An issue has been fixed and does not reproduce, but no test has been added.I-slowIssue: Problems and improvements with respect to performance of generated code.Issue: Problems and improvements with respect to performance of generated code.P-lowLow priorityLow priorityT-compilerRelevant to the compiler team, which will review and decide on the PR/issue.Relevant to the compiler team, which will review and decide on the PR/issue.
Metadata
Metadata
Assignees
Labels
A-codegenArea: Code generationArea: Code generationC-enhancementCategory: An issue proposing an enhancement or a PR with one.Category: An issue proposing an enhancement or a PR with one.E-needs-testCall for participation: An issue has been fixed and does not reproduce, but no test has been added.Call for participation: An issue has been fixed and does not reproduce, but no test has been added.I-slowIssue: Problems and improvements with respect to performance of generated code.Issue: Problems and improvements with respect to performance of generated code.P-lowLow priorityLow priorityT-compilerRelevant to the compiler team, which will review and decide on the PR/issue.Relevant to the compiler team, which will review and decide on the PR/issue.
Type
Fields
Give feedbackNo fields configured for issues without a type.
When passing a
&mut &[u8]to a function that trims elements from the start/end of the slice, the code contains pointless length checks and stack traffic. Writing the function slightly differently, or inlining it, does produce the code I'd expect.This is a minimal example (playground of all following snippets):
I expect this to compile to a tight loop that simply decrements the slice length and increments the data pointer in registers, until the length reaches 0 or a byte value of 42 is encountered.
Instead, the loop writes to the stack on every iteration and checks twice whether the length is zero, trying to panic for the second check (presumably comes from the
a[1..]slicing).The
.LBB0_9branch cannot ever be taken, and is indeed optimized out when writing the function in differently. The stack traffic also disappears, though that probably has an entirely different cause. This variant compiles to better code:It does all the loop calculations in registers. Also, marking the first function as
inline(always)and defining the second function in terms of it ({ trim_in_place(&mut a); a }) optimizes to the same code.Both variants check if the slice is empty before entering the loop and return immediately in that case.
Background
I encountered this problem in the context of my dec2flt work, in a larger and more complicated function that trims two slices on both ends, and the slices are stored in a struct. Applying
inline(always)to that function gives a small but measurable performance improvement across the board,but increases code size by 10 KiB(Edit: I just didn't measure properly. Size is about the same.). I haven't checked the assembly code (far too much code) but I suspect this means this problem, and solution, carries over to real code.Meta
Reproduces on nightly playground. Stable is worse for all variants, presumably because it doesn't have #26411 (yay, progress!). My local
rustchas the same problem. The exact commit is meaningless because it only exists in myrust.gitfork. It's basedf9f580936d81ed527115ae86375f69bb77723b1c.