-
-
Notifications
You must be signed in to change notification settings - Fork 14.7k
for loop in closure is not unrolled and not vectorlized correctly #120189
Copy link
Copy link
Closed
Labels
A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.A-autovectorizationArea: Autovectorization, which can impact perf or code sizeArea: Autovectorization, which can impact perf or code sizeC-bugCategory: This is a bug.Category: This is a bug.E-needs-testCall for participation: An issue has been fixed and does not reproduce, but no test has been added.Call for participation: An issue has been fixed and does not reproduce, but no test has been added.I-slowIssue: Problems and improvements with respect to performance of generated code.Issue: Problems and improvements with respect to performance of generated code.
Metadata
Metadata
Assignees
Labels
A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.A-autovectorizationArea: Autovectorization, which can impact perf or code sizeArea: Autovectorization, which can impact perf or code sizeC-bugCategory: This is a bug.Category: This is a bug.E-needs-testCall for participation: An issue has been fixed and does not reproduce, but no test has been added.Call for participation: An issue has been fixed and does not reproduce, but no test has been added.I-slowIssue: Problems and improvements with respect to performance of generated code.Issue: Problems and improvements with respect to performance of generated code.
Type
Fields
Give feedbackNo fields configured for issues without a type.
I tried this code on godbolt.org :
The generated assembly code is as follows. The additions in the for loop are compiled into four
incinstructions and onepsubbinstruction. Is there any particular reason why these additions cannot be compiled into one SSE addition?Instead, if you move the for loop outside the closure, the for loop will be unrolled into five
psubbinstructions.The complete test code is avaliable here: https://godbolt.org/z/YoMaWWzW7