[RyuJIT] Emit shlx, sarx, shrx on x64#67182
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsFixes #41881. Current codegen: New codegen: It needs further work to remove mov when memory address is used instead of all registers.
|
src/coreclr/jit/codegenxarch.cpp
Outdated
There was a problem hiding this comment.
Might be worth a note that we don't currently support the contained form due to more complex changes being needed in the emitter.
Ideally we'd fix up the emitter and use inst_RV_RV_TT instead so that we can emit shlx r32a, r/m32, r32b. Someone would need to walk through the relevant IF_RWR_RRD_*RD formats and ensure that it's all handled correctly (noting that technically the format is IF_RWR_*RD_RRD but that should be the same as IF_RWR_RRD_*RD with swapping op1/op2, like we do for a couple other BMI2 instructions, namely bextr and bzhi).
There was a problem hiding this comment.
As @tannergooding , lets have a comment about containment and also specify that here, the operands are swapped because the way operands are encoded in these 3 instructions.
There was a problem hiding this comment.
Added commnets
As @tannergooding , lets have a comment about containment and also specify that here, the operands are swapped because the way operands are encoded in these 3 instructions.
|
Change generally LGTM. Left a suggestion around the ordering of the opportunistic check and potentially logging an issue or a comment around adding support for containment in the future. |
a1a37bd to
fb558b3
Compare
Opened #67314. |
|
@kunalspathak PTAL. |
|
Can you check why there is a regression in coreclr_tests windows/x64? |
kunalspathak
left a comment
There was a problem hiding this comment.
Added some comments. I think we need to understand the regression in coreclr/libraries test.
src/coreclr/jit/lowerxarch.cpp
Outdated
There was a problem hiding this comment.
you could just revert this change...
src/coreclr/jit/lsraxarch.cpp
Outdated
There was a problem hiding this comment.
Is this instruction only applicable for x64? I thought it is also valid for x86? @tannergooding ?
There was a problem hiding this comment.
I wonder why it worked for x86 without changing this part of the code?
Is this instruction only applicable for x64? I thought it is also valid for x86? @tannergooding ?
There was a problem hiding this comment.
Isn't InstructionSet_BMI2 the 32 bit version and for 64 bit you'd want InstructionSet_BMI2_X64 ? If so and you checked 64 bit correctly elsewhere and then 32 here it would explain it working for 32.
There was a problem hiding this comment.
Double checked that my change handles only x64 case. Will open an issue to handle x86.
There was a problem hiding this comment.
Double checked that my change handles only x64 case. Will open an issue to handle x86.
src/coreclr/jit/codegenxarch.cpp
Outdated
There was a problem hiding this comment.
delete the extra line.
src/coreclr/jit/codegenxarch.cpp
Outdated
There was a problem hiding this comment.
same here...why is it only for x64?
There was a problem hiding this comment.
Handling for x64 only for now.
There was a problem hiding this comment.
Handling for x64 only for now.
src/coreclr/jit/codegenxarch.cpp
Outdated
There was a problem hiding this comment.
As @tannergooding , lets have a comment about containment and also specify that here, the operands are swapped because the way operands are encoded in these 3 instructions.
src/coreclr/jit/emitxarch.cpp
Outdated
There was a problem hiding this comment.
As per the changes at other places, we have the code to have it supported only for TARGET_64BIT, but here, it is under TARGET_ARM64. What is the difference and should it be consistent?
There was a problem hiding this comment.
It is under #ifdef TARGET_AMD64, not ARM64. So, I guess it is correct.
src/coreclr/jit/emitxarch.cpp
Outdated
There was a problem hiding this comment.
this code path is also for x86?
There was a problem hiding this comment.
Added #ifdef TARGET_64BIT.
src/coreclr/jit/emitxarch.cpp
Outdated
There was a problem hiding this comment.
| // BMI bextr,bzhi, shrx, shlx and sarx encode the reg2 in VEX.vvvv and reg3 in modRM, | |
| // BMI bextr, bzhi, shrx, shlx and sarx encode the reg2 in VEX.vvvv and reg3 in modRM, |
We need to have similar comment where we swap the operands I pointed out.
There was a problem hiding this comment.
nit: You have already added a comment in codegenxarch.cpp. No need here. The above comment covers it.
src/coreclr/jit/emitxarch.cpp
Outdated
src/coreclr/jit/emitxarch.cpp
Outdated
There was a problem hiding this comment.
Qhere are you getting that number from? I think that matches the newer hardware and not Skylake, which is what we have used for the other numbers
There was a problem hiding this comment.
Updated to result.insLatency += PERFSCORE_LATENCY_1C;.
There was a problem hiding this comment.
You should consider letting all the tests run, and not exiting on the first failure. Just make sure to return 101 if any test fails.
src/coreclr/jit/codegenxarch.cpp
Outdated
There was a problem hiding this comment.
Delete these lines, and let the normal fall-through make this call and return
|
Reran asmdiffs and it is an improvement. [14:06:59] Summary of Code Size diffs: Enabled this only for x64, not x86. Will open a new issue to address it for x86. |
fb558b3 to
53575ca
Compare
|
All tests passed. @kunalspathak PTAL. |
kunalspathak
left a comment
There was a problem hiding this comment.
Overall looks good. Added some minor suggestions.
src/tests/issues.targets
Outdated
There was a problem hiding this comment.
| <Issue>There is a known undefined behavior with shifts and 0x0FFFFFFFF overflows, so skip the test for mono.</Issue> | |
| <Issue>There is a known undefined behavior with shifts and 0xFFFFFFFF overflows, so skip the test for mono.</Issue> |
src/coreclr/jit/lowerxarch.cpp
Outdated
There was a problem hiding this comment.
This has been alrady removed.
There was a problem hiding this comment.
I mean put it back (revert the deletion of the comment // !TARGET_X86).
src/coreclr/jit/codegenxarch.cpp
Outdated
There was a problem hiding this comment.
This comment makes more sense next to your lsraxarch.cpp change.
There was a problem hiding this comment.
Moved it to lsraxarch.cpp.
src/coreclr/jit/emitxarch.cpp
Outdated
There was a problem hiding this comment.
nit: You have already added a comment in codegenxarch.cpp. No need here. The above comment covers it.
There was a problem hiding this comment.
Why not create a Validate() function that will do most of these things at one place?
public int Validate(..., actual, ...) {
expected =
if (expected != actual) {
Console.WriteLine("Fail");
return 101;
}
return 100;
}There was a problem hiding this comment.
You can then just have an input array and iterate over it or something like that.
|
Also, there are some significant regressions. Did you figure out why? Eg. here is from asp.net collection. |
They are due to register assignment changes. Bytes regressed in some files but perfscores are better. Some adds NOP of 4 bytes. benchmarks.run.windows.x64.checked.mch:Top method regressions (bytes):
Base: Diff:
Base: Diff: aspnet.run.windows.x64.checked.mch:Top method regressions (bytes):
Diff |
It was not a simple fall out, so I discussed with Kunal to skip x86 for now. |
Can you describe what problems were encountered, and what is required to implement it for x86? The tracking issue #67314 doesn't have any details. |
It has been a while so I cannot remember exactly, but it was not generating the right code. lsraxarch and emitxarch need to be looked into. Updated #67314. |
|
/azp list |
|
@kunalspathak and @BruceForstall it is ready to review. |
| long expectedLong = 0; | ||
| int MOD64 = 64; | ||
|
|
||
| /* TODO: Enable 32bit test when x86 shift is enabled. |
There was a problem hiding this comment.
I would suggest deleting the commented code.
|
|
||
| try | ||
| { | ||
| ulong valULong = 0; |
There was a problem hiding this comment.
I would define these variables closer to their usage.
| ulong valULong = 0; | ||
| long valLong = 0; | ||
| int shiftBy = 0; | ||
| ulong resULong = 0; |
There was a problem hiding this comment.
Is it worth adding short and ints as test cases?
kunalspathak
left a comment
There was a problem hiding this comment.
LGTM...added some suggestions.
Co-authored-by: Kunal Pathak <Kunal.Pathak@microsoft.com>
| switch (x) | ||
| { | ||
| case ulong a: | ||
| ulong resUlong = ((ulong)a) << y; |
There was a problem hiding this comment.
Why doesn't this generic work?
T res = ((T)x) << y;
return (R)Convert.ChangeType(res,typeof(R));There was a problem hiding this comment.
shift operators do not work on generics.
https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/operators/bitwise-and-shift-operators
Because the shift operators are defined only for the int, uint, long, and ulong types, the result of an operation always contains at least 32 bits. If the left-hand operand is of another integral type (sbyte, byte, short, ushort, or char), its value is converted to the int type
There was a problem hiding this comment.
@kunalspathak, this is the error message.
error CS0019: Operator '<<' cannot be applied to operands of type 'T' and 'int'
All tests passed, so merging it now. Thanks for all the code reviews.
|
Some improvements in windows x64: System.Collections.Tests.Perf_BitArray dotnet/perf-autofiling-issues#5495 |
|
Improvements in linux/x64 #67182 |
|
Improvements windows/x64: dotnet/perf-autofiling-issues#5460 |

Fixes #41881.
Generates shlx, sarx, shrx for 64 bit shifts if BMI2 platform.
Current codegen:
New codegen:
It needs further work to remove mov when memory address is used instead of all registers (handle it when enabling contained form in #67314).
x86 support needs to be enabled (added it in #67314).