-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Additionally Vectorize string.IndexOfAny for value lengths 2,3,4,5 #19790
Additionally Vectorize string.IndexOfAny for value lengths 2,3,4,5 #19790
Conversation
| char* pEndCh = pCh + length; | ||
|
|
||
| #if !netstandard11 | ||
| if (Vector.IsHardwareAccelerated && length >= Vector<ushort>.Count * 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of scope for this PR, but it might be nice to try this with Hardware Intrinsics as well.
| #if !netstandard11 | ||
| if (Vector.IsHardwareAccelerated && length >= Vector<ushort>.Count * 2) | ||
| { | ||
| // Figure out how many characters to read sequentially until we are vector aligned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why start with a sequential read?
You should be able to do a single unaligned read, mask out bits that would be covered by the first aligned read, and then do the vectorized check (and I'm not even sure you need to mask for this algorithm).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just copied the IndexOf<char>(this Span<char> span, char value0) method and used the same multi-comparison with vectorized BitwiseOr as byte to widen it to 2 and 3 chars.
So an improved approach to vectorization, whether an unaligned start or using instrinsics directly should also apply to those.
Might be worth a tracking issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth a tracking issue?
Probably. I'll log one sometime Tuesday, if I don't get to it before then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure it would help. The vectorization is fast at discounting characters; but not fast at then finding a match where the sequential by itself would probably be faster - rather than doing an overlapping vector at start and overlapping vector at the end.
| { | ||
| // Get the highest multiple of Vector<ushort>.Count that is within the search space. | ||
| // That will be how many times we iterate in the loop below. | ||
| // This is equivalent to: length = Vector<ushort>.Count * ((int)(pEndCh - pCh) / Vector<ushort>.Count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really wish the the JIT did this optimization.
tannergooding
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@danmosemsft while testing this I encounted a lot of use via private static readonly char[] s_wildcardChars =
{
'\"', '<', '>', '*', '?'
};Wondering if this should be extended for just 2-3 to go up to 5, rather than moving to a loop per item at 4+? @tannergooding would there be enough xmm registers? |
|
x64 likely has enough registers, but you'd want to look at the codegen in any case, since the effectiveness of the various registers likely depends on surrounding code and the platform abi (which would define, for example, if the registers have to be saved by the caller, before transferring to the callee) |
eerhardt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thanks @benaadams.
I noticed that, in some cases, a naiive loop searching through 4 elements (white space in json) was faster than using the vectorized IndexOfAny, so I think extended to 4/5 might be a good idea. Especially if some elements appear a lot more than others (and we can order them as such within the loop). |
|
We are good to merge this now, correct? |
|
@dotnet-bot test |
|
@dotnet-bot test Windows_NT x64 Checked corefx_baseline |
|
Unrelated Issue filed https://github.com/dotnet/corefx/issues/32520 |
|
|
@tannergooding think so |
|
Added tests for 4,5 values dotnet/corefx#32734 so they can be also vectorized |
656f621 to
36d7f6d
Compare
|
Added vectorization for 4 and 5 values |
| ref var valueRef = ref Unsafe.As<T, char>(ref MemoryMarshal.GetReference(values)); | ||
| if (values.Length == 5) | ||
| { | ||
| // Length 5 is a common length for FileSystemName expression (", <, >, *, ?) and in preference to 2 as it has an explicit overload |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JeremyKuhne helpful to have this first? FileSystemName switches between two different length arrays 5 and 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, missed this. I hope it becomes less common for 5 in the future, but for now it is by far and away the most common case. The new default when using EnumerationOptions is the simple type (i.e. 2).
|
Should wait for the tests dotnet/corefx#32734 |
|
OSX failing on files |
|
@dotnet-bot test Windows_NT x64 Checked corefx_baseline |
|
Ah, corefx tests are failing due to https://github.com/dotnet/coreclr/issues/20405 |
|
@dotnet-bot test Windows_NT x64 Checked corefx_baseline |
|
Windows_NT x64 Checked corefx_baseline issue is unrelated and previously reported https://github.com/dotnet/coreclr/issues/20373 |
@tannergooding yes, tests are now in corefx dotnet/corefx#32734 and pass with these changes. Feedback has been addressed. |
|
Thanks for the work @benaadams |
…otnet#19790) * Vectorize string.IndexOfAny * Vectorize string.IndexOfAny [4,5] * Feedback * Call order preference
…otnet/coreclr#19790) * Vectorize string.IndexOfAny * Vectorize string.IndexOfAny [4,5] * Feedback * Call order preference Commit migrated from dotnet/coreclr@018df68
Extends Vectorization for 1 value (
.IndexOf) to single pass vectorization for 2, 3, 4 and 5 value searches in.IndexOfAnyhttps://gist.github.com/benaadams/097ec032f7be451c22eedf877967403c