Skip to content

Conversation

@martincostello
Copy link
Member

Use ROS.IndexOfAny(char, char)

Use IndexOfAny(char, char) with ReadOnlySpan<char>.

Description

Use AsSpan().IndexOfAny(char, char) rather than pre-allocate a static array for use with IndexOfAny(params char[]) on a string.

I spotted the array in DefaultFileVersionProvider while working on #39738 so thought I'd see if there were any that could be removed in lieu of overloads that just took the characters without an array. This PR updates all the ones I found that were in projects with TFMs that supported it, except for the usages in RoutePatternFactory that use an array of 5 characters in multiple places.

I wrote a quick microbenchmark comparing the two approaches with .NET 6, and using the span version does appear to give a performance improvement.

Benchmarks

Results

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19043.1466 (21H1/May2021Update)
Intel Core i9-9980HK CPU 2.40GHz, 1 CPU, 16 logical and 8 physical cores
.NET SDK=7.0.100-alpha.1.22068.17
  [Host]     : .NET 6.0.1 (6.0.121.56705), X64 RyuJIT
  DefaultJob : .NET 6.0.1 (6.0.121.56705), X64 RyuJIT

Method Mean Error StdDev Ratio RatioSD Allocated
IndexOfAny_String 8.037 ns 0.1816 ns 0.2018 ns 1.00 0.00 -
IndexOfAny_Span_Array 6.101 ns 0.1178 ns 0.1044 ns 0.76 0.02 -
IndexOfAny_Span_Two_Chars 5.013 ns 0.0341 ns 0.0302 ns 0.62 0.01 -

Code

Details
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Diagnosers;
using BenchmarkDotNet.Running;

BenchmarkRunner.Run<IndexOfAnyBenchmarks>();

[MemoryDiagnoser]
public class IndexOfAnyBenchmarks
{
    private const string AUrlWithAPathAndQueryString = "http://www.example.com/path/to/file.html?query=string";
    private static readonly char[] QueryStringAndFragmentTokens = new[] { '?', '#' };

    [Benchmark(Baseline = true)]
    public int IndexOfAny_String()
    {
        return AUrlWithAPathAndQueryString.IndexOfAny(QueryStringAndFragmentTokens);
    }

    [Benchmark]
    public int IndexOfAny_Span_Array()
    {
        return AUrlWithAPathAndQueryString.AsSpan().IndexOfAny(QueryStringAndFragmentTokens);
    }

    [Benchmark]
    public int IndexOfAny_Span_Two_Chars()
    {
        return AUrlWithAPathAndQueryString.AsSpan().IndexOfAny('?', '#');
    }
}
<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net6.0</TargetFramework>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>disable</Nullable>
  </PropertyGroup>
  <ItemGroup>
    <PackageReference Include="BenchmarkDotNet" Version="0.13.1" />
  </ItemGroup>
</Project>

Use AsSpan().IndexOfAny(char, char) rather than pre-allocate an array for string.IndexOfAny(params char[]).
@martincostello martincostello requested review from a team, javiercn and pranavkm as code owners January 25, 2022 09:30
@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Jan 25, 2022
Copy link
Member

@javiercn javiercn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@martincostello
Copy link
Member Author

I just realised that based on the benchmark result for IndexOfAny_Span_Array, RoutePatternFactory could still benefit from the addition of AsSpan() even if the static array isn't removed. I can push up an extra commit that does that if wanted.

@SteveSandersonMS
Copy link
Member

SteveSandersonMS commented Jan 25, 2022

Thanks for contributing this, @martincostello!

I can sort of speculate about why the IndexOfAny(char, char) variant might be faster, if the implementation is special-cased to looking for two chars. It would be like pre-unrolling the loop versus the char[] version, avoiding the bounds checks on each iteration. Or there might be something fancier in the implementation involving bitwise operations over the two input chars to discount most target chars in a single test.

However I don't have any intuition for why IndexOfAny(char[]) would be faster over a ROS<char> than a string. It seems like if that's true, the runtime could always do this for you and provide the same speed gain. Does anyone have a sense of why this would be faster? Is it some historical baggage in the string-based interpretation?

@martincostello
Copy link
Member Author

However I don't have any intuition for why IndexOfAny(char[]) would be faster over a ROS<char> than a string. It seems like if that's true, the runtime could always do this for you and provide the same speed gain. Does anyone have a sense of why this would be faster? Is it some historical baggage in the string-based interpretation?

A fair question. Looking at the source, my guess would be that maybe it's due to the aggressive inlining? Otherwise the implementations seem to ultimately go to the same place to me.

https://github.com/dotnet/runtime/blob/8d0c3d1ab1f16c3a1d15b8d9eb25cc920d936694/src/libraries/System.Private.CoreLib/src/System/MemoryExtensions.cs#L92-L99

https://github.com/dotnet/runtime/blob/8d0c3d1ab1f16c3a1d15b8d9eb25cc920d936694/src/libraries/System.Private.CoreLib/src/System/String.Searching.cs#L90-L98

@SteveSandersonMS
Copy link
Member

SteveSandersonMS commented Jan 25, 2022

Looking at the source, my guess would be that maybe it's due to the aggressive inlining? Otherwise the implementations seem to ultimately go to the same place to me.

Agreed, that looks like the only difference (assuming that GetRawStringData also ends up being inlined). Another slight difference is that with mystring.IndexOf, the runtime has to do a nullcheck on mystring and possibly throw. Whereas with AsSpan(this mystring), the nullcheck is done in .NET code and it definitely never throws.

So if I'm reading this correctly, the optimization in this PR does have a behavioral difference, in that the old version would throw for a null input, and the new one won't. However I don't think there can be a null input so that's OK.

@Pilchie Pilchie added area-blazor Includes: Blazor, Razor Components area-mvc Includes: MVC, Actions and Controllers, Localization, CORS, most templates feature-routing labels Jan 25, 2022
@SteveSandersonMS
Copy link
Member

The change in Components/..../Router.cs looks fine to me.

The changes in RouteCollection.cs and DefaultFileVersionProvider.cs also look very safe but @pranavkm @halter73 do either of you want to approve or reject those?

@javiercn
Copy link
Member

The changes in RouteCollection.cs

This is old routing, so it doesn't really impact anything :)

@SteveSandersonMS SteveSandersonMS merged commit 570bf07 into dotnet:main Jan 26, 2022
@ghost ghost added this to the 7.0-preview1 milestone Jan 26, 2022
@SteveSandersonMS
Copy link
Member

Thanks @martincostello!

@stephentoub
Copy link
Member

I wrote a quick microbenchmark comparing the two approaches with .NET 6, and using the span version does appear to give a performance improvement.

Did you try this on .NET 7? I would expect the gap to be smaller (in particular between IndexOfAny_String and IndexOfAny_Span_Array), and indeed this is what I see when I run your benchmark locally on my machine with .NET 6:

Method Mean Error StdDev Ratio
IndexOfAny_String 7.324 ns 0.0285 ns 0.0238 ns 1.00
IndexOfAny_Span_Array 5.484 ns 0.0292 ns 0.0243 ns 0.75
IndexOfAny_Span_Two_Chars 4.625 ns 0.0280 ns 0.0248 ns 0.63

and .NET 7:

Method Mean Error StdDev Ratio
IndexOfAny_String 5.519 ns 0.0202 ns 0.0179 ns 1.00
IndexOfAny_Span_Array 5.474 ns 0.0070 ns 0.0054 ns 0.99
IndexOfAny_Span_Two_Chars 4.228 ns 0.0170 ns 0.0151 ns 0.77

If you have a small set (e.g. one, two, or three) of const chars, there are benefits to just using the span-based overloads that take those chars directly, as there's no need for the additional static field, additional array, or extracting the chars from the array . But in cases where you already have an array or string of the chars to search for, there shouldn't be a meaningful difference between calling string.IndexOfAny and string.AsSpan().IndexOfAny.

@ghost
Copy link

ghost commented Jan 26, 2022

Hi @stephentoub. It looks like you just commented on a closed PR. The team will most probably miss it. If you'd like to bring something important up to their attention, consider filing a new issue and add enough details to build context.

@martincostello
Copy link
Member Author

Did you try this on .NET 7?

No I didn't try it out with .NET 7 - mainly because I just threw them into an existing console app I had set up and didn't think to point it at one of the alpha builds.

Thanks for taking a look and explaining the difference in the array case.

@ghost
Copy link

ghost commented Jan 26, 2022

Hi @martincostello. It looks like you just commented on a closed PR. The team will most probably miss it. If you'd like to bring something important up to their attention, consider filing a new issue and add enough details to build context.

@wtgodbe wtgodbe modified the milestones: 7.0-preview1, 7.0-preview2 Jan 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-blazor Includes: Blazor, Razor Components area-mvc Includes: MVC, Actions and Controllers, Localization, CORS, most templates community-contribution Indicates that the PR has been added by a community member feature-routing Perf

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants