Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Conversation

@benaadams
Copy link
Member

@benaadams benaadams commented Aug 31, 2018

Extends Vectorization for 1 value (.IndexOf) to single pass vectorization for 2, 3, 4 and 5 value searches in .IndexOfAny

string.IndexOfAny(char[] anyOf) // anyOf.lengths 2,3,4,5
string.IndexOfAny(char[] anyOf, int startIndex) // anyOf.lengths 2,3,4,5
string.IndexOfAny(char[] anyOf, int startIndex, int count) // anyOf.lengths 2,3,4,5

// where T : char
IndexOfAny<T>(this Span<T> span, T value0, T value1)
IndexOfAny<T>(this Span<T> span, T value0, T value1, T value2)
IndexOfAny<T>(this Span<T> span, ReadOnlySpan<T> values)
IndexOfAny<T>(this ReadOnlySpan<T> span, T value0, T value1)
IndexOfAny<T>(this ReadOnlySpan<T> span, T value0, T value1, T value2)
IndexOfAny<T>(this ReadOnlySpan<T> span, ReadOnlySpan<T> values)

https://gist.github.com/benaadams/097ec032f7be451c22eedf877967403c

               Method |       Runtime |  Position |       Mean |
 -------------------- | ------------- | --------- |-----------:|
- String_IndexOfAny_3 | .NET Core 2.1 |         1 |   6.440 ns |
+ String_IndexOfAny_3 | .NET Core 3.0 |         1 |   7.479 ns |
- String_IndexOfAny_3 | .NET Core 2.1 |         7 |  11.690 ns |
+ String_IndexOfAny_3 | .NET Core 3.0 |         7 |  10.799 ns |
- String_IndexOfAny_3 | .NET Core 2.1 |         8 |  12.955 ns |
+ String_IndexOfAny_3 | .NET Core 3.0 |         8 |  11.081 ns |
- String_IndexOfAny_3 | .NET Core 2.1 |        15 |  18.339 ns |
+ String_IndexOfAny_3 | .NET Core 3.0 |        15 |  16.896 ns |
- String_IndexOfAny_3 | .NET Core 2.1 |        16 |  19.499 ns |
+ String_IndexOfAny_3 | .NET Core 3.0 |        16 |  16.046 ns |
- String_IndexOfAny_3 | .NET Core 2.1 |        31 |  33.041 ns |
+ String_IndexOfAny_3 | .NET Core 3.0 |        31 |  16.844 ns |
- String_IndexOfAny_3 | .NET Core 2.1 |        32 |  33.141 ns |
+ String_IndexOfAny_3 | .NET Core 3.0 |        32 |  16.926 ns |
- String_IndexOfAny_3 | .NET Core 2.1 |        63 |  61.875 ns |
+ String_IndexOfAny_3 | .NET Core 3.0 |        63 |  19.649 ns |
- String_IndexOfAny_3 | .NET Core 2.1 |        64 |  61.532 ns |
+ String_IndexOfAny_3 | .NET Core 3.0 |        64 |  19.700 ns |
- String_IndexOfAny_3 | .NET Core 2.1 |       127 | 102.849 ns |
+ String_IndexOfAny_3 | .NET Core 3.0 |       127 |  29.007 ns |
- String_IndexOfAny_3 | .NET Core 2.1 |       128 | 104.652 ns |
+ String_IndexOfAny_3 | .NET Core 3.0 |       128 |  25.332 ns |
- String_IndexOfAny_3 | .NET Core 2.1 |       255 | 187.974 ns |
+ String_IndexOfAny_3 | .NET Core 3.0 |       255 |  36.632 ns |
- String_IndexOfAny_3 | .NET Core 2.1 |       256 | 187.738 ns |
+ String_IndexOfAny_3 | .NET Core 3.0 |       256 |  36.718 ns |
- String_IndexOfAny_3 | .NET Core 2.1 |      1023 | 692.424 ns |
+ String_IndexOfAny_3 | .NET Core 3.0 |      1023 | 103.747 ns |
- String_IndexOfAny_3 | .NET Core 2.1 |      1024 | 694.512 ns |
+ String_IndexOfAny_3 | .NET Core 3.0 |      1024 | 103.974 ns |

char* pEndCh = pCh + length;

#if !netstandard11
if (Vector.IsHardwareAccelerated && length >= Vector<ushort>.Count * 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of scope for this PR, but it might be nice to try this with Hardware Intrinsics as well.

#if !netstandard11
if (Vector.IsHardwareAccelerated && length >= Vector<ushort>.Count * 2)
{
// Figure out how many characters to read sequentially until we are vector aligned
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why start with a sequential read?

You should be able to do a single unaligned read, mask out bits that would be covered by the first aligned read, and then do the vectorized check (and I'm not even sure you need to mask for this algorithm).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just copied the IndexOf<char>(this Span<char> span, char value0) method and used the same multi-comparison with vectorized BitwiseOr as byte to widen it to 2 and 3 chars.

So an improved approach to vectorization, whether an unaligned start or using instrinsics directly should also apply to those.

Might be worth a tracking issue?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth a tracking issue?

Probably. I'll log one sometime Tuesday, if I don't get to it before then.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure it would help. The vectorization is fast at discounting characters; but not fast at then finding a match where the sequential by itself would probably be faster - rather than doing an overlapping vector at start and overlapping vector at the end.

{
// Get the highest multiple of Vector<ushort>.Count that is within the search space.
// That will be how many times we iterate in the loop below.
// This is equivalent to: length = Vector<ushort>.Count * ((int)(pEndCh - pCh) / Vector<ushort>.Count)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really wish the the JIT did this optimization.

Copy link
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@benaadams
Copy link
Member Author

@danmosemsft while testing this I encounted a lot of use via FileSystemName and 5 chars

private static readonly char[] s_wildcardChars =
{
    '\"', '<', '>', '*', '?'
};

Wondering if this should be extended for just 2-3 to go up to 5, rather than moving to a loop per item at 4+?

@tannergooding would there be enough xmm registers?

@tannergooding
Copy link
Member

x64 likely has enough registers, but you'd want to look at the codegen in any case, since the effectiveness of the various registers likely depends on surrounding code and the platform abi (which would define, for example, if the registers have to be saved by the caller, before transferring to the callee)

Copy link
Member

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks @benaadams.

@eerhardt eerhardt requested a review from ahsonkhan September 10, 2018 20:23
@ahsonkhan
Copy link

ahsonkhan commented Sep 10, 2018

Wondering if this should be extended for just 2-3 to go up to 5

I noticed that, in some cases, a naiive loop searching through 4 elements (white space in json) was faster than using the vectorized IndexOfAny, so I think extended to 4/5 might be a good idea. Especially if some elements appear a lot more than others (and we can order them as such within the loop).

@tannergooding
Copy link
Member

We are good to merge this now, correct?

@benaadams
Copy link
Member Author

@dotnet-bot test

@benaadams
Copy link
Member Author

@dotnet-bot test Windows_NT x64 Checked corefx_baseline
@dotnet-bot test Ubuntu x64 Checked corefx_baseline

@benaadams
Copy link
Member Author

benaadams commented Sep 27, 2018

Unrelated

MESSAGE:
Assert.Throws() Failure
Expected: typeof(System.OverflowException)
Actual: (No exception was thrown)
+++++++++++++++++++
STACK TRACE:
at System.Runtime.InteropServices.Tests.SafeBufferTests.
  Initialize_NumBytesTimesSizeOfEachElement_ThrowsOverflowException() in /mnt/j/workspace/dotnet_coreclr/master/jitstress/x64_checked_ubuntu_corefx_baseline_prtest/_/fx/src/System.Runtime.InteropServices/
  tests/System/Runtime/InteropServices/SafeBufferTests.cs:line 32

Issue filed https://github.com/dotnet/corefx/issues/32520

@benaadams
Copy link
Member Author

benaadams commented Sep 28, 2018

Assert failure(PID 5308 [0x000014bc], Thread: 8072 [0x1f88]): 
CONTRACT VIOLATION by GetValueTypeForGUID 
at "d:\j\workspace\x64_checked_w---d7295605\src\vm\interoputil.cpp" @ 2862

@benaadams
Copy link
Member Author

@tannergooding think so

@benaadams
Copy link
Member Author

Added tests for 4,5 values dotnet/corefx#32734 so they can be also vectorized

@benaadams benaadams force-pushed the Vectorize-string.IndexOfAny branch from 656f621 to 36d7f6d Compare October 10, 2018 04:52
@benaadams benaadams changed the title Vectorize string.IndexOfAny [2,3] Vectorize string.IndexOfAny [2,3,4,5] Oct 10, 2018
@benaadams
Copy link
Member Author

Added vectorization for 4 and 5 values

ref var valueRef = ref Unsafe.As<T, char>(ref MemoryMarshal.GetReference(values));
if (values.Length == 5)
{
// Length 5 is a common length for FileSystemName expression (", <, >, *, ?) and in preference to 2 as it has an explicit overload
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JeremyKuhne helpful to have this first? FileSystemName switches between two different length arrays 5 and 2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, missed this. I hope it becomes less common for 5 in the future, but for now it is by far and away the most common case. The new default when using EnumerationOptions is the simple type (i.e. 2).

@benaadams
Copy link
Member Author

Should wait for the tests dotnet/corefx#32734

@benaadams benaadams changed the title Vectorize string.IndexOfAny [2,3,4,5] [WIP] Vectorize string.IndexOfAny [2,3,4,5] Oct 10, 2018
@benaadams benaadams changed the title [WIP] Vectorize string.IndexOfAny [2,3,4,5] [WIP] Additionally Vectorize string.IndexOfAny for value lengths 2,3,4,5 Oct 10, 2018
@benaadams
Copy link
Member Author

OSX failing on files

Test Result (5 failures / +5)
file_io.GetFileSize.test1
file_io.GetFileSizeEx.test1
file_io.SetFilePointer.test5
file_io.SetFilePointer.test6
file_io.SetFilePointer.test7

@benaadams
Copy link
Member Author

@dotnet-bot test Windows_NT x64 Checked corefx_baseline
@dotnet-bot test Ubuntu x64 Checked corefx_baseline

@benaadams benaadams changed the title [WIP] Additionally Vectorize string.IndexOfAny for value lengths 2,3,4,5 Additionally Vectorize string.IndexOfAny for value lengths 2,3,4,5 Oct 13, 2018
@benaadams
Copy link
Member Author

Ah, corefx tests are failing due to https://github.com/dotnet/coreclr/issues/20405

@benaadams
Copy link
Member Author

@dotnet-bot test Windows_NT x64 Checked corefx_baseline
@dotnet-bot test Ubuntu x64 Checked corefx_baseline

@benaadams
Copy link
Member Author

Windows_NT x64 Checked corefx_baseline issue is unrelated and previously reported https://github.com/dotnet/coreclr/issues/20373

Thread: 5960 [0x1748]): CONTRACT VIOLATION by GetValueTypeForGUID 
at "d:\j\workspace\x64_checked_w---d7295605\src\vm\interoputil.cpp" @ 2862

MODE_COOPERATIVE encountered while thread is in preemptive state.

@benaadams
Copy link
Member Author

We are good to merge this now, correct?

@tannergooding yes, tests are now in corefx dotnet/corefx#32734 and pass with these changes. Feedback has been addressed.

@tannergooding
Copy link
Member

Thanks for the work @benaadams

@tannergooding tannergooding merged commit 018df68 into dotnet:master Oct 22, 2018
@benaadams benaadams deleted the Vectorize-string.IndexOfAny branch October 22, 2018 20:42
A-And pushed a commit to A-And/coreclr that referenced this pull request Nov 20, 2018
…otnet#19790)

* Vectorize string.IndexOfAny

* Vectorize string.IndexOfAny [4,5]

* Feedback

* Call order preference
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
…otnet/coreclr#19790)

* Vectorize string.IndexOfAny

* Vectorize string.IndexOfAny [4,5]

* Feedback

* Call order preference


Commit migrated from dotnet/coreclr@018df68
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants