Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Conversation

@benaadams
Copy link
Member

@benaadams benaadams commented Nov 1, 2018

up to x2.3 times faster for 2 element values
up to x3.2 times faster for 3 element values

Benchmarks #20738 (comment)

Span<byte>.IndexOfAny(..., ReadOnlySpan<byte> values) is vectorized for each element in values however it iterates over each one individually.

This adds the faster paths used by IndexOf , IndexOfAny(.., byte, byte) and IndexOfAny(.., byte, byte, byte) for the values lengths 2,3 for

IndexOfAny<T>(this Span<T> span, ReadOnlySpan<T> values)
IndexOfAny<T>(this ReadOnlySpan<T> span, ReadOnlySpan<T> values)

/cc @ahsonkhan @jkotas @stephentoub @JeremyKuhne

@benaadams
Copy link
Member Author

benaadams commented Nov 1, 2018

Could extend this for ReadOnlySpan<byte> 4, 5 lengths as per #19790 in a new PR (as it would need extra tests)

@ahsonkhan does the Utf8 Json use 4, 5 lengths (i.e. is it worth doing?)

@stephentoub
Copy link
Member

We already have good tests covering all of these lengths?

@benaadams benaadams changed the title Single pass Vectorize Span<byte>.IndexOfAny(..., ReadOnlySpan<byte>) for 1,2,3 lengths Single pass Vectorize Span<byte>.IndexOfAny(..., ReadOnlySpan<byte>) for 2,3 lengths Nov 1, 2018
@benaadams
Copy link
Member Author

We already have good tests covering all of these lengths?

Yes, for the specialized overloads that are called; some for these lengths with the ROS values, can extend the multi-value tests to include the ROS

@benaadams
Copy link
Member Author

Added tests dotnet/corefx#33185

@benaadams
Copy link
Member Author

@dotnet-bot test Windows_NT x64 Checked corefx_baseline
@dotnet-bot test Ubuntu x64 Checked corefx_baseline

@stephentoub
Copy link
Member

Can you share perf results?

@ahsonkhan
Copy link

does the Utf8 Json use 4, 5 lengths (i.e. is it worth doing?)

I don't currently have a concrete scenario where I need it, especially since I wrote my own custom IndexOfAny with "less than" support (at least for the reader), but I am sure one will come up. I will let you know if I see it.

@benaadams
Copy link
Member Author

@dotnet-bot test

@ahsonkhan
Copy link

Can you share perf results?

I would be interested in seeing the impact on small spans. Wouldn't these length checks regress performance there? Do we care?

@benaadams
Copy link
Member Author

benaadams commented Nov 8, 2018

Can you share perf results?

I would be interested in seeing the impact on small spans. Wouldn't these length checks regress performance there? Do we care?

You end up pushing it upstream without it, like dotnet/corefx#33288 (review)

image

@stephentoub
Copy link
Member

It would still be good to see the perf results.

@benaadams
Copy link
Member Author

It would still be good to see the perf results.

Agreed :)

@benaadams
Copy link
Member Author

Not sure I'm running the benchmarks correctly since it should be processing 32 bytes at a time vs 1 byte at a time

@benaadams
Copy link
Member Author

Was using .IndexOf(ROS) rather than .IndexOfAny(ROS) d'oh

@benaadams
Copy link
Member Author

benaadams commented Nov 17, 2018

Manual inlined if can be faster though varies

https://gist.github.com/benaadams/cb059cda0c745fff079ba99d2db7f1be

             Method | Toolchain | Length | Items |          Mean |
 ------------------ |---------- |------- |------ |--------------:|
-        IndexOfAny |   Current |      2 |     2 |      35.46 ns |
+        IndexOfAny |        PR |      2 |     2 |      20.44 ns |
  IndexOfAny_Manual |        PR |      2 |     2 |      20.27 ns |
                    |           |        |       |               |
-        IndexOfAny |   Current |      7 |     2 |     117.00 ns |
+        IndexOfAny |        PR |      7 |     2 |      66.73 ns |
  IndexOfAny_Manual |        PR |      7 |     2 |      56.13 ns |
                    |           |        |       |               |
-        IndexOfAny |   Current |      8 |     2 |     124.61 ns |
+        IndexOfAny |        PR |      8 |     2 |      64.09 ns |
  IndexOfAny_Manual |        PR |      8 |     2 |      60.59 ns |
                    |           |        |       |               |
-        IndexOfAny |   Current |     15 |     2 |     272.92 ns |
+        IndexOfAny |        PR |     15 |     2 |     156.86 ns |
  IndexOfAny_Manual |        PR |     15 |     2 |     125.27 ns |
                    |           |        |       |               |
-        IndexOfAny |   Current |     16 |     2 |     286.71 ns |
+        IndexOfAny |        PR |     16 |     2 |     154.91 ns |
  IndexOfAny_Manual |        PR |     16 |     2 |     133.83 ns |
                    |           |        |       |               |
-        IndexOfAny |   Current |     31 |     2 |     737.20 ns |
+        IndexOfAny |        PR |     31 |     2 |     418.17 ns |
  IndexOfAny_Manual |        PR |     31 |     2 |     323.98 ns |
                    |           |        |       |               |
-        IndexOfAny |   Current |     32 |     2 |     745.88 ns |
+        IndexOfAny |        PR |     32 |     2 |     419.49 ns |
  IndexOfAny_Manual |        PR |     32 |     2 |     338.77 ns |
                    |           |        |       |               |
-        IndexOfAny |   Current |     63 |     2 |   2,223.70 ns |
+        IndexOfAny |        PR |     63 |     2 |   1,328.81 ns |
  IndexOfAny_Manual |        PR |     63 |     2 |     978.05 ns |
                    |           |        |       |               |
-        IndexOfAny |   Current |     64 |     2 |   1,204.10 ns |
+        IndexOfAny |        PR |     64 |     2 |     588.10 ns |
  IndexOfAny_Manual |        PR |     64 |     2 |     697.03 ns |
                    |           |        |       |               |
-        IndexOfAny |   Current |    127 |     2 |   3,836.02 ns |
+        IndexOfAny |        PR |    127 |     2 |   2,333.05 ns |
  IndexOfAny_Manual |        PR |    127 |     2 |   1,579.54 ns |
                    |           |        |       |               |
-        IndexOfAny |   Current |    128 |     2 |   2,658.81 ns |
+        IndexOfAny |        PR |    128 |     2 |   1,347.84 ns |
  IndexOfAny_Manual |        PR |    128 |     2 |   1,513.46 ns |
                    |           |        |       |               |
-        IndexOfAny |   Current |    255 |     2 |   8,651.33 ns |
+        IndexOfAny |        PR |    255 |     2 |   5,521.69 ns |
  IndexOfAny_Manual |        PR |    255 |     2 |   3,500.87 ns |
                    |           |        |       |               |
-        IndexOfAny |   Current |    256 |     2 |   6,292.78 ns |
+        IndexOfAny |        PR |    256 |     2 |   3,550.50 ns |
  IndexOfAny_Manual |        PR |    256 |     2 |   3,458.94 ns |
                    |           |        |       |               |
-        IndexOfAny |   Current |   1023 |     2 |  67,384.01 ns |
+        IndexOfAny |        PR |   1023 |     2 |  42,926.52 ns |
  IndexOfAny_Manual |        PR |   1023 |     2 |  26,403.50 ns |
                    |           |        |       |               |
-        IndexOfAny |   Current |   1024 |     2 |  66,452.48 ns |
+        IndexOfAny |        PR |   1024 |     2 |  42,186.38 ns |
  IndexOfAny_Manual |        PR |   1024 |     2 |  27,256.97 ns |
             Method | Toolchain | Length | Items |          Mean |
 ------------------ |---------- |------- |------ |--------------:|
-        IndexOfAny |   Current |      2 |     3 |      42.34 ns |
+        IndexOfAny |        PR |      2 |     3 |      19.67 ns |
  IndexOfAny_Manual |        PR |      2 |     3 |      22.60 ns |
                    |           |        |       |               |
-        IndexOfAny |   Current |      7 |     3 |     155.78 ns |
+        IndexOfAny |        PR |      7 |     3 |      66.80 ns |
  IndexOfAny_Manual |        PR |      7 |     3 |      65.85 ns |
                    |           |        |       |               |
-        IndexOfAny |   Current |      8 |     3 |     162.62 ns |
+        IndexOfAny |        PR |      8 |     3 |      64.12 ns |
  IndexOfAny_Manual |        PR |      8 |     3 |      72.50 ns |
                    |           |        |       |               |
-        IndexOfAny |   Current |     15 |     3 |     369.95 ns |
+        IndexOfAny |        PR |     15 |     3 |     156.88 ns |
  IndexOfAny_Manual |        PR |     15 |     3 |     150.53 ns |
                    |           |        |       |               |
-        IndexOfAny |   Current |     16 |     3 |     395.62 ns |
+        IndexOfAny |        PR |     16 |     3 |     154.56 ns |
  IndexOfAny_Manual |        PR |     16 |     3 |     164.60 ns |
                    |           |        |       |               |
-        IndexOfAny |   Current |     31 |     3 |   1,061.43 ns |
+        IndexOfAny |        PR |     31 |     3 |     417.60 ns |
  IndexOfAny_Manual |        PR |     31 |     3 |     414.74 ns |
                    |           |        |       |               |
-        IndexOfAny |   Current |     32 |     3 |   1,097.19 ns |
+        IndexOfAny |        PR |     32 |     3 |     419.65 ns |
  IndexOfAny_Manual |        PR |     32 |     3 |     436.95 ns |
                    |           |        |       |               |
-        IndexOfAny |   Current |     63 |     3 |   3,426.07 ns |
+        IndexOfAny |        PR |     63 |     3 |   1,327.05 ns |
  IndexOfAny_Manual |        PR |     63 |     3 |   1,292.54 ns |
                    |           |        |       |               |
-        IndexOfAny |   Current |     64 |     3 |   1,536.31 ns |
+        IndexOfAny |        PR |     64 |     3 |     590.88 ns |
  IndexOfAny_Manual |        PR |     64 |     3 |     806.08 ns |
                    |           |        |       |               |
-        IndexOfAny |   Current |    128 |     3 |   3,502.00 ns |
+        IndexOfAny |        PR |    128 |     3 |   1,348.12 ns |
  IndexOfAny_Manual |        PR |    128 |     3 |   1,731.29 ns |
                    |           |        |       |               |
-        IndexOfAny |   Current |    255 |     3 |  13,177.27 ns |
+        IndexOfAny |        PR |    255 |     3 |   5,524.36 ns |
  IndexOfAny_Manual |        PR |    255 |     3 |   4,174.94 ns |
                    |           |        |       |               |
-        IndexOfAny |   Current |    256 |     3 |   8,624.20 ns |
+        IndexOfAny |        PR |    256 |     3 |   3,551.49 ns |
  IndexOfAny_Manual |        PR |    256 |     3 |   4,060.20 ns |
                    |           |        |       |               |
-        IndexOfAny |   Current |   1023 |     3 | 103,632.91 ns |
+        IndexOfAny |        PR |   1023 |     3 |  42,987.17 ns |
  IndexOfAny_Manual |        PR |   1023 |     3 |  34,636.19 ns |
                    |           |        |       |               |
-        IndexOfAny |   Current |   1024 |     3 |  99,208.55 ns |
+        IndexOfAny |        PR |   1024 |     3 |  42,090.74 ns |
  IndexOfAny_Manual |        PR |   1024 |     3 |  34,160.30 ns |

@stephentoub
Copy link
Member

stephentoub commented Nov 17, 2018

Thanks, @benaadams. It's good to see it improves the cases that were meant to be improved... but what about the cases of > 3 search items that the change theoretically harms? Is the impact there negligible, even for small data lengths?

@benaadams
Copy link
Member Author

what about the cases of > 3 search items that the change theoretically harms?

Mostly noise except for longer lengths where the change to use Bmi1.TrailingZeroCount(match) more than makes up for it

      Method | Length | Items |          Mean |
 ----------- |------- |------ |--------------:|
- IndexOfAny |      2 |     4 |      59.78 ns |
+ IndexOfAny |      2 |     4 |      54.72 ns | 
- IndexOfAny |      7 |     4 |     227.85 ns |
+ IndexOfAny |      7 |     4 |     198.02 ns |
- IndexOfAny |      8 |     4 |     212.65 ns |
+ IndexOfAny |      8 |     4 |     210.93 ns |
- IndexOfAny |     15 |     4 |     549.28 ns |
+ IndexOfAny |     15 |     4 |     505.15 ns |
- IndexOfAny |     16 |     4 |     514.03 ns |
+ IndexOfAny |     16 |     4 |     528.20 ns |
- IndexOfAny |     31 |     4 |   1,453.29 ns |
+ IndexOfAny |     31 |     4 |   1,487.49 ns |
- IndexOfAny |     32 |     4 |   1,427.88 ns |
+ IndexOfAny |     32 |     4 |   1,472.47 ns |
- IndexOfAny |     63 |     4 |   4,507.29 ns |
+ IndexOfAny |     63 |     4 |   4,679.04 ns |
- IndexOfAny |     64 |     4 |   3,644.58 ns |
+ IndexOfAny |     64 |     4 |   3,901.28 ns |
- IndexOfAny |    127 |     4 |   8,546.02 ns |
+ IndexOfAny |    127 |     4 |   8,306.65 ns |
- IndexOfAny |    128 |     4 |   5,041.45 ns |
+ IndexOfAny |    128 |     4 |   4,462.34 ns |
- IndexOfAny |    255 |     4 |  20,853.73 ns |
+ IndexOfAny |    255 |     4 |  19,632.51 ns |
- IndexOfAny |    256 |     4 |  13,268.79 ns |
+ IndexOfAny |    256 |     4 |  11,142.29 ns |
- IndexOfAny |   1023 |     4 | 166,646.23 ns |
+ IndexOfAny |   1023 |     4 | 142,920.65 ns |
- IndexOfAny |   1024 |     4 | 171,990.28 ns |
+ IndexOfAny |   1024 |     4 | 142,627.71 ns |

@stephentoub
Copy link
Member

Can you make the Bmi1 change separately and measure this one in its own?

@benaadams
Copy link
Member Author

What about without it?

Margin of error

      Method | Length | Items |          Mean |
 ----------- |------- |------ |--------------:|
- IndexOfAny |      2 |     4 |      59.78 ns | 
+ IndexOfAny |      2 |     4 |      58.20 ns |
- IndexOfAny |      7 |     4 |     227.85 ns |
+ IndexOfAny |      7 |     4 |     216.25 ns |
- IndexOfAny |      8 |     4 |     212.65 ns |
+ IndexOfAny |      8 |     4 |     209.11 ns |
- IndexOfAny |     15 |     4 |     549.28 ns |
+ IndexOfAny |     15 |     4 |     524.19 ns |
- IndexOfAny |     16 |     4 |     514.03 ns |
+ IndexOfAny |     16 |     4 |     522.20 ns |
- IndexOfAny |     31 |     4 |   1,453.29 ns |
+ IndexOfAny |     31 |     4 |   1,438.64 ns |
- IndexOfAny |     32 |     4 |   1,427.88 ns |
+ IndexOfAny |     32 |     4 |   1,486.44 ns |
- IndexOfAny |     63 |     4 |   4,507.29 ns |
+ IndexOfAny |     63 |     4 |   4,694.68 ns |
- IndexOfAny |     64 |     4 |   3,644.58 ns |
+ IndexOfAny |     64 |     4 |   3,753.87 ns |
- IndexOfAny |    127 |     4 |   8,546.02 ns |
+ IndexOfAny |    127 |     4 |   8,526.17 ns |
- IndexOfAny |    128 |     4 |   5,041.45 ns |
+ IndexOfAny |    128 |     4 |   4,940.50 ns |
- IndexOfAny |    255 |     4 |  20,853.73 ns |
+ IndexOfAny |    255 |     4 |  20,703.20 ns |
- IndexOfAny |    256 |     4 |  13,268.79 ns |
+ IndexOfAny |    256 |     4 |  13,011.07 ns |
- IndexOfAny |   1023 |     4 | 166,646.23 ns |
+ IndexOfAny |   1023 |     4 | 166,523.84 ns |
- IndexOfAny |   1024 |     4 | 171,990.28 ns |
+ IndexOfAny |   1024 |     4 | 166,031.17 ns |

@benaadams
Copy link
Member Author

Can you make the Bmi1 change separately

#21073

@benaadams
Copy link
Member Author

Still improved

             Method | Length | Items |          Mean |
 ------------------ |------- |------ |--------------:|
-        IndexOfAny |      2 |     2 |      39.51 ns |
+        IndexOfAny |      2 |     2 |      21.58 ns |
  IndexOfAny_Manual |      2 |     2 |      20.24 ns |
-        IndexOfAny |      7 |     2 |     136.04 ns |
+        IndexOfAny |      7 |     2 |      62.31 ns |
  IndexOfAny_Manual |      7 |     2 |      55.72 ns |
-        IndexOfAny |      8 |     2 |     128.29 ns |
+        IndexOfAny |      8 |     2 |      67.50 ns |
  IndexOfAny_Manual |      8 |     2 |      60.37 ns |
-        IndexOfAny |     15 |     2 |     307.02 ns |
+        IndexOfAny |     15 |     2 |     137.70 ns |
  IndexOfAny_Manual |     15 |     2 |     125.55 ns |
-        IndexOfAny |     16 |     2 |     287.40 ns |
+        IndexOfAny |     16 |     2 |     146.18 ns |
  IndexOfAny_Manual |     16 |     2 |     133.20 ns |
-        IndexOfAny |     31 |     2 |     763.51 ns |
+        IndexOfAny |     31 |     2 |     351.90 ns |
  IndexOfAny_Manual |     31 |     2 |     326.99 ns |
-        IndexOfAny |     32 |     2 |     747.68 ns |
+        IndexOfAny |     32 |     2 |     365.57 ns |
  IndexOfAny_Manual |     32 |     2 |     338.81 ns |
-        IndexOfAny |     63 |     2 |   2,183.88 ns |
+        IndexOfAny |     63 |     2 |   1,025.29 ns |
  IndexOfAny_Manual |     63 |     2 |     972.57 ns |
-        IndexOfAny |     64 |     2 |   1,345.44 ns |
+        IndexOfAny |     64 |     2 |     774.74 ns |
  IndexOfAny_Manual |     64 |     2 |     712.65 ns |
-        IndexOfAny |    127 |     2 |   4,211.72 ns |
+        IndexOfAny |    127 |     2 |   1,719.49 ns |
  IndexOfAny_Manual |    127 |     2 |   1,576.10 ns |
-        IndexOfAny |    128 |     2 |   2,964.58 ns |
+        IndexOfAny |    128 |     2 |   1,629.05 ns |
  IndexOfAny_Manual |    128 |     2 |   1,515.89 ns |
-        IndexOfAny |    255 |     2 |   9,374.84 ns |
+        IndexOfAny |    255 |     2 |   3,771.19 ns |
  IndexOfAny_Manual |    255 |     2 |   3,456.91 ns |
-        IndexOfAny |    256 |     2 |   7,395.26 ns |
+        IndexOfAny |    256 |     2 |   3,655.45 ns |
  IndexOfAny_Manual |    256 |     2 |   3,411.30 ns |
-        IndexOfAny |   1023 |     2 |  78,092.99 ns |
+        IndexOfAny |   1023 |     2 |  32,838.62 ns |
  IndexOfAny_Manual |   1023 |     2 |  26,750.79 ns |
-        IndexOfAny |   1024 |     2 |  76,674.96 ns |
+        IndexOfAny |   1024 |     2 |  32,486.86 ns |
  IndexOfAny_Manual |   1024 |     2 |  26,418.94 ns |
             Method | Length | Items |          Mean |
 ------------------ |------- |------ |--------------:|
-        IndexOfAny |      2 |     3 |      50.09 ns |
+        IndexOfAny |      2 |     3 |      23.88 ns |
  IndexOfAny_Manual |      2 |     3 |      21.74 ns |
-        IndexOfAny |      7 |     3 |     180.93 ns |
+        IndexOfAny |      7 |     3 |      69.78 ns |
  IndexOfAny_Manual |      7 |     3 |      62.39 ns |
-        IndexOfAny |      8 |     3 |     170.01 ns |
+        IndexOfAny |      8 |     3 |      78.10 ns |
  IndexOfAny_Manual |      8 |     3 |      70.88 ns |
-        IndexOfAny |     15 |     3 |     425.32 ns |
+        IndexOfAny |     15 |     3 |     167.11 ns |
  IndexOfAny_Manual |     15 |     3 |     150.56 ns |
-        IndexOfAny |     16 |     3 |     403.85 ns |
+        IndexOfAny |     16 |     3 |     174.06 ns |
  IndexOfAny_Manual |     16 |     3 |     160.84 ns |
-        IndexOfAny |     31 |     3 |   1,109.71 ns |
+        IndexOfAny |     31 |     3 |     441.54 ns |
  IndexOfAny_Manual |     31 |     3 |     411.79 ns |
-        IndexOfAny |     32 |     3 |   1,086.02 ns |
+        IndexOfAny |     32 |     3 |     463.34 ns |
  IndexOfAny_Manual |     32 |     3 |     429.43 ns |
-        IndexOfAny |     63 |     3 |   3,374.42 ns |
+        IndexOfAny |     63 |     3 |   1,344.50 ns |
  IndexOfAny_Manual |     63 |     3 |   1,287.77 ns |
-        IndexOfAny |     64 |     3 |   1,771.31 ns |
+        IndexOfAny |     64 |     3 |     846.24 ns |
  IndexOfAny_Manual |     64 |     3 |     796.44 ns |
-        IndexOfAny |    127 |     3 |   6,118.98 ns |
+        IndexOfAny |    127 |     3 |   1,945.70 ns |
  IndexOfAny_Manual |    127 |     3 |   1,870.04 ns |
-        IndexOfAny |    128 |     3 |   4,058.16 ns |
+        IndexOfAny |    128 |     3 |   1,835.06 ns |
  IndexOfAny_Manual |    128 |     3 |   1,698.14 ns |
-        IndexOfAny |    255 |     3 |  14,491.22 ns |
+        IndexOfAny |    255 |     3 |   4,417.77 ns |
  IndexOfAny_Manual |    255 |     3 |   4,220.07 ns |
-        IndexOfAny |    256 |     3 |  10,220.72 ns |
+        IndexOfAny |    256 |     3 |   4,275.11 ns |
  IndexOfAny_Manual |    256 |     3 |   3,998.18 ns |
-        IndexOfAny |   1023 |     3 | 121,068.50 ns |
+        IndexOfAny |   1023 |     3 |  41,598.90 ns |
  IndexOfAny_Manual |   1023 |     3 |  34,565.33 ns |
-        IndexOfAny |   1024 |     3 | 114,939.50 ns |
+        IndexOfAny |   1024 |     3 |  41,817.49 ns |
  IndexOfAny_Manual |   1024 |     3 |  34,435.45 ns |

Copy link
Member

@stephentoub stephentoub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@jkotas jkotas merged commit d76d97f into dotnet:master Nov 19, 2018
@benaadams benaadams deleted the IndexOfAny-byte branch November 21, 2018 02:49
ref Unsafe.As<T, byte>(ref MemoryMarshal.GetReference(values)),
values.Length);
ref byte valueRef = ref Unsafe.As<T, byte>(ref MemoryMarshal.GetReference(values));
if (values.Length == 2)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a values.Length == 1 check and call IndexOf or is it not worth penalizing the common cases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had it at one point #20738 (comment)

Its still goes a Vectorized route and is still single pass (as there is only one item); it has a little bit extra overhead from an unnecessary loop of one iteration; so its only about fixed costs rather than a different technique being used.

The caller has already added overhead by creating a ReadOnlySpan or array of a single element to call this overhead; rather than just call .IndexOf, am not sure in what circumstances you'd do this?

picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants