Reduce branches in padding adjustment in FromBase64_ComputeResultLength#115022
Reduce branches in padding adjustment in FromBase64_ComputeResultLength#115022rameel wants to merge 4 commits intodotnet:mainfrom
Conversation
| ReadOnlySpan<byte> map = [0, 2, 1]; | ||
| padding = map[padding]; |
There was a problem hiding this comment.
This trades branches vs. memory access -- I don't know which approach will be faster and in benchmarks it will be hard to measure, because branch predictor will do a good job, and for the memory access the data will be in the cpu caches, so fast retrieval.
Or put differently: it trades possible branch mispredicts vs. potential cache eviction.
Maybe it's better to write the code as idiomatic as possible and let the compilers optimize it out. Actually with a switch it looks quite good and produces code similar to what native compilers produce.
There was a problem hiding this comment.
You're right, writing the code in a more idiomatic style is preferable. However, at the moment, the JIT doesn't seem to generate a value-to-result mapping (like C/C++ does) for simple cases like this. Even for a switch, it just falls back to a jump table, which still involves a memory access.
I agree - it's better to either leave it as-is or rewrite it using a switch, and help the JIT recognize and optimize such patterns. I think I even saw an issue about this a while ago, but couldn't find it right away.
Just for reference, here's what C/C++ produces:
mov edi, edi
mov eax, DWORD PTR CSWTCH.1[0+rdi*4]
CSWTCH.1:
.long 0
.long 2
.long 1As we can see the C/C++ compiler prefers a memory access here, rather than relying on the branch predictor.
And here's what the JIT produces in my case - basically the same thing as the C++ version:
mov rdi, 0x72BEDC82CB80 ; static handle
movzx rax, byte ptr [rax+rdi]And here's what a typical switch ends up looking like:
lea rdi, [reloc @RWD00]
mov edi, dword ptr [rdi+4*rax]
lea rcx, G_M26941_IG02
add rdi, rcx
jmp rdi
mov eax, 1
jmp SHORT G_M26941_IG07
mov eax, 2
jmp SHORT G_M26941_IG07
xor eax, eax
G_M26941_IG07: ;; offset=0x0035There was a problem hiding this comment.
I’ve found the issue related to this: #114041
There was a problem hiding this comment.
So I think it would be still good to change to the switch-approach here.
Once the JIT got improved, then the benefit here comes for free.
|
Tagging subscribers to this area: @dotnet/area-system-runtime |
| 1 => 2, | ||
| 2 => 1, | ||
| _ => throw new FormatException(SR.Format_BadBase64Char) | ||
| }; |
There was a problem hiding this comment.
The title of the PR is "reduce branches", but this doesn't actually reduce branches today, does it?
There was a problem hiding this comment.
How about this: "Improve readability of Base64 padding adjustment logic"?
There was a problem hiding this comment.
However, the change seems too minor to have a meaningful impact. It might be more worthwhile to focus on removing "unsafe" in FromBase64_ComputeResultLength and related methods. Here's the commit for that: rameel@fe9f28b
Unfortunately due to the current JIT's inability to eliminate unnecessary bounds checks in cases like this (see #115091):
while (chars.Length != 0)
{
if (chars[^1] is not (' ' or '\n' or '\r' or '\t'))
break;
chars = chars.Slice(0, chars.Length - 1); // Unnecessary bounds checks
}and a performance regression compared to NET9 (#115090) I decided to postpone creating PR for this until later.
No description provided.