Skip to content

Conversation

@halter73
Copy link
Member

This is a follow up to #41465 and specifically #41465 (comment).

I reran the StringLookup and Uf8Lookup (sic) scenarios from the original PR's StringLookupBenchmark after updating the Payload to match the expected casing. This yields a perf improvement over repeatedly allocating the string and looking it up in a dictionary instead of a perf regression like before:

Before

Method Mean Error StdDev Gen 0 Allocated
StringLookup 92.02 ns 1.824 ns 1.523 ns 0.0030 32 B
Uf8Lookup 104.24 ns 0.286 ns 0.239 ns - -

After

Method Mean Error StdDev Gen 0 Allocated
StringLookup 90.15 ns 0.505 ns 0.473 ns 0.0030 32 B
Uf8Lookup 77.74 ns 0.457 ns 0.428 ns - -

It also avoids any allocations in the common case where the casing matches. I expect mismatched casing between the server and client is really uncommon now that we don't generate automatic client proxies with different casing. The targets are just strings on the client and the server, so it doesn't make sense to mix up the casing. And even if the casing doesn't match, everything still works as before.

@halter73 halter73 requested a review from davidfowl May 11, 2022 21:59
@halter73 halter73 requested a review from BrennanConroy as a code owner May 11, 2022 21:59
@ghost ghost added the area-signalr Includes: SignalR clients and servers label May 11, 2022
@halter73 halter73 removed the area-signalr Includes: SignalR clients and servers label May 11, 2022
@davidfowl
Copy link
Member

davidfowl commented May 11, 2022

As discussed offline, this should support case insensitive lookup even if it's the fallback situation.

EDIT: Actually, I think it would be reasonable to pre-add, exact case, lowercase (common in JS), maybe camel case and call that a day.

@halter73
Copy link
Member Author

I updated this to fall back to the utf16-based case-insensitive comparison when there's not an exact ordinal match.

On my machine, the performance is still just under 80 ns per operation when there's an ordinal match vs 105 ns before this change. When there's a case-insensitive match perf goes from about 105 ns before to 110 ns after because of the extra SequenceEqual comparison, but now there's no allocations in either case.

the same when there's an ordinal match but about

Copy link
Member

@davidfowl davidfowl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice improvement!

@davidfowl
Copy link
Member

PS your benchmarks don't have a case insensitive comparison.

@Pilchie Pilchie added the area-signalr Includes: SignalR clients and servers label May 12, 2022
internal int caseSensitiveHashCode;

internal string value;
internal Memory<byte> encodedValue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not make this byte[]?

Copy link
Member Author

@halter73 halter73 May 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured Memory<byte>.Span was probably faster than AsSpan<T>(this T[] array), but thinking about it more, I was probably mistaken considering Memory<byte> also has to deal with slicing and possible MemoryManager<byte> objects. I doubt it's a huge difference, but we might as well remove the two extra ints from Slot.

I'll change it to byte[].

@halter73
Copy link
Member Author

Method SameCasePayload Mean Error StdDev Gen 0 Allocated
StringLookup False 95.68 ns 0.371 ns 0.347 ns 0.0076 80 B
Utf8LookupBefore False 105.16 ns 0.454 ns 0.424 ns 0.0045 48 B
Utf8Lookup False 118.70 ns 0.587 ns 0.520 ns 0.0045 48 B
StringLookup True 96.27 ns 0.317 ns 0.588 ns 0.0076 80 B
Utf8LookupBefore True 104.99 ns 0.198 ns 0.185 ns 0.0045 48 B
Utf8Lookup True 83.26 ns 0.136 ns 0.120 ns 0.0045 48 B

Benchmark code:

Details
using System.Buffers;
using System.Diagnostics.CodeAnalysis;
using System.Text;
using System.Text.Json;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Running;

//BenchmarkRunner.Run<StringLookupBenchmark>(new DebugInProcessConfig());
BenchmarkRunner.Run<StringLookupBenchmark>();

[MemoryDiagnoser]
public class StringLookupBenchmark
{
    private readonly Dictionary<string, string> _cached = new(StringComparer.OrdinalIgnoreCase)
    {
        ["Echo"] = "Echo"
    };

    private readonly Utf8HashLookupBefore _utf8LookupBefore = new Utf8HashLookupBefore();
    private readonly Utf8HashLookup _utf8Lookup = new Utf8HashLookup();

    private ReadOnlySpan<byte> Payload =>
        SameCasePayload ? "{\"target\": \"Echo\"}"u8 : "{\"target\": \"echo\"}"u8;

    [Params(true, false)] 
    public bool SameCasePayload;

    public StringLookupBenchmark()
    {
        _utf8LookupBefore.Add("Echo");
        _utf8Lookup.Add("Echo");
    }

    [Benchmark]
    public string StringLookup()
    {
        var reader = new Utf8JsonReader(Payload);
        reader.Read(); // Start object
        reader.Read(); // property name
        reader.Read(); // property value
        var target = reader.GetString()!;
        reader.Read(); // end object
        _cached.TryGetValue(target, out var value);
        return value!;
        //return target;
    }

    [Benchmark]
    public string Utf8LookupBefore()
    {
        var reader = new Utf8JsonReader(Payload);
        reader.Read(); // Start object
        reader.Read(); // property name
        reader.Read(); // property value
        var target = reader.ValueSpan;
        reader.Read(); // end object
        _utf8LookupBefore.TryGetValue(target, out var value);
        return value!;
    }

    [Benchmark]
    public string Utf8Lookup()
    {
        var reader = new Utf8JsonReader(Payload);
        reader.Read(); // Start object
        reader.Read(); // property name
        reader.Read(); // property value
        var target = reader.ValueSpan;
        reader.Read(); // end object
        _utf8Lookup.TryGetValue(target, out var value);
        return value!;
    }
}

/// <summary>
/// A small dictionary optimized for utf8 string lookup via spans. Adapted from https://github.com/dotnet/runtime/blob/4ed596ef63e60ce54cfb41d55928f0fe45f65cf3/src/libraries/System.Linq.Parallel/src/System/Linq/Parallel/Utils/HashLookup.cs.
/// </summary>
internal sealed class Utf8HashLookup
{
    private int[] _buckets;
    private int[] _caseSensitiveBuckets;
    private Slot[] _slots;
    private int _count;

    private const int HashCodeMask = 0x7fffffff;

    internal Utf8HashLookup()
    {
        _buckets = new int[7];
        _caseSensitiveBuckets = new int[7];
        _slots = new Slot[7];
    }

    internal void Add(string value)
    {
        if (_count == _slots.Length)
        {
            Resize();
        }

        int slotIndex = _count;
        _count++;

        var encodedValue = Encoding.UTF8.GetBytes(value);
        var hashCode = GetHashCode(value.AsSpan());
        var caseSensitiveHashCode = GetCaseSensitiveHashCode(encodedValue);
        int bucketIndex = hashCode % _buckets.Length;
        int caseSensitiveBucketIndex = caseSensitiveHashCode % _caseSensitiveBuckets.Length;

        _slots[slotIndex].hashCode = hashCode;
        _slots[slotIndex].caseSensitiveHashCode = caseSensitiveHashCode;

        _slots[slotIndex].value = value;
        _slots[slotIndex].encodedValue = encodedValue;

        _slots[slotIndex].next = _buckets[bucketIndex] - 1;
        _slots[slotIndex].caseSensitiveNext = _caseSensitiveBuckets[caseSensitiveBucketIndex] - 1;

        _buckets[bucketIndex] = slotIndex + 1;
        _caseSensitiveBuckets[caseSensitiveBucketIndex] = slotIndex + 1;
    }

    internal bool TryGetValue(ReadOnlySpan<byte> encodedValue, [MaybeNullWhen(false), AllowNull] out string value)
    {
        var caseSensitiveHashCode = GetCaseSensitiveHashCode(encodedValue);

        for (var i = _caseSensitiveBuckets[caseSensitiveHashCode % _caseSensitiveBuckets.Length] - 1; i >= 0; i = _slots[i].caseSensitiveNext)
        {
            if (_slots[i].caseSensitiveHashCode == caseSensitiveHashCode && encodedValue.SequenceEqual(_slots[i].encodedValue.AsSpan()))
            {
                value = _slots[i].value;
                return true;
            }
        }

        // If we cannot find a case-sensitive match, we transcode the encodedValue to a stackalloced UTF16 string
        // and do an OrdinalIgnoreCase comparison.
        return TryGetValueSlow(encodedValue, out value);
    }

    private bool TryGetValueSlow(ReadOnlySpan<byte> encodedValue, [MaybeNullWhen(false), AllowNull] out string value)
    {
        const int StackAllocThreshold = 128;

        char[]? pooled = null;
        var count = Encoding.UTF8.GetCharCount(encodedValue);
        var chars = count <= StackAllocThreshold ?
            stackalloc char[StackAllocThreshold] :
            (pooled = ArrayPool<char>.Shared.Rent(count));
        var encoded = Encoding.UTF8.GetChars(encodedValue, chars);
        var hasValue = TryGetValueFromChars(chars[..encoded], out value);
        if (pooled is not null)
        {
            ArrayPool<char>.Shared.Return(pooled);
        }

        return hasValue;
    }

    private bool TryGetValueFromChars(ReadOnlySpan<char> key, [MaybeNullWhen(false), AllowNull] out string value)
    {
        var hashCode = GetHashCode(key);

        for (var i = _buckets[hashCode % _buckets.Length] - 1; i >= 0; i = _slots[i].next)
        {
            if (_slots[i].hashCode == hashCode && key.Equals(_slots[i].value, StringComparison.OrdinalIgnoreCase))
            {
                value = _slots[i].value;
                return true;
            }
        }

        value = null;
        return false;
    }

    private static int GetHashCode(ReadOnlySpan<char> value) =>
        HashCodeMask & string.GetHashCode(value, StringComparison.OrdinalIgnoreCase);

    private static int GetCaseSensitiveHashCode(ReadOnlySpan<byte> encodedValue)
    {
        var hashCode = new HashCode();
        hashCode.AddBytes(encodedValue);
        return HashCodeMask & hashCode.ToHashCode();
    }

    private void Resize()
    {
        var newSize = checked(_count * 2 + 1);
        var newSlots = new Slot[newSize];

        var newBuckets = new int[newSize];
        var newCaseSensitiveBuckets = new int[newSize];

        Array.Copy(_slots, newSlots, _count);

        for (int i = 0; i < _count; i++)
        {
            int bucket = newSlots[i].hashCode % newSize;
            newSlots[i].next = newBuckets[bucket] - 1;
            newBuckets[bucket] = i + 1;

            int caseSensitiveBucket = newSlots[i].caseSensitiveHashCode % newSize;
            newSlots[i].caseSensitiveNext = newCaseSensitiveBuckets[caseSensitiveBucket] - 1;
            newCaseSensitiveBuckets[caseSensitiveBucket] = i + 1;
        }

        _slots = newSlots;

        _buckets = newBuckets;
        _caseSensitiveBuckets = newCaseSensitiveBuckets;
    }

    private struct Slot
    {
        internal int hashCode;
        internal int caseSensitiveHashCode;

        internal string value;
        internal byte[] encodedValue;

        internal int next;
        internal int caseSensitiveNext;
    }
}

internal sealed class Utf8HashLookupBefore
{
    private int[] buckets;
    private Slot[] slots;
    private int count;

    private const int HashCodeMask = 0x7fffffff;

    internal Utf8HashLookupBefore()
    {
        buckets = new int[7];
        slots = new Slot[7];
    }

    internal void Add(string value)
    {
        var hashCode = GetKeyHashCode(value.AsSpan());

        if (count == slots.Length)
        {
            Resize();
        }

        int index = count;
        count++;

        int bucket = hashCode % buckets.Length;
        slots[index].hashCode = hashCode;
        slots[index].key = value;
        slots[index].value = value;
        slots[index].next = buckets[bucket] - 1;
        buckets[bucket] = index + 1;
    }

    internal bool TryGetValue(ReadOnlySpan<byte> utf8, [MaybeNullWhen(false), AllowNull] out string value)
    {
        const int StackAllocThreshold = 128;

        // Transcode to utf16 for comparison
        char[]? pooled = null;
        var count = Encoding.UTF8.GetCharCount(utf8);
        var chars = count <= StackAllocThreshold ?
            stackalloc char[StackAllocThreshold] :
            (pooled = ArrayPool<char>.Shared.Rent(count));
        var encoded = Encoding.UTF8.GetChars(utf8, chars);
        var hasValue = TryGetValue(chars[..encoded], out value);
        if (pooled is not null)
        {
            ArrayPool<char>.Shared.Return(pooled);
        }

        return hasValue;
    }

    private bool TryGetValue(ReadOnlySpan<char> key, [MaybeNullWhen(false), AllowNull] out string value)
    {
        var hashCode = GetKeyHashCode(key);

        for (var i = buckets[hashCode % buckets.Length] - 1; i >= 0; i = slots[i].next)
        {
            if (slots[i].hashCode == hashCode && key.Equals(slots[i].key, StringComparison.OrdinalIgnoreCase))
            {
                value = slots[i].value;
                return true;
            }
        }

        value = null;
        return false;
    }

    private static int GetKeyHashCode(ReadOnlySpan<char> key)
    {
        return HashCodeMask & string.GetHashCode(key, StringComparison.OrdinalIgnoreCase);
    }

    private void Resize()
    {
        var newSize = checked(count * 2 + 1);
        var newBuckets = new int[newSize];
        var newSlots = new Slot[newSize];
        Array.Copy(slots, newSlots, count);
        for (int i = 0; i < count; i++)
        {
            int bucket = newSlots[i].hashCode % newSize;
            newSlots[i].next = newBuckets[bucket] - 1;
            newBuckets[bucket] = i + 1;
        }
        buckets = newBuckets;
        slots = newSlots;
    }

    internal struct Slot
    {
        internal int hashCode;
        internal int next;
        internal string key;
        internal string value;
    }
}

@halter73 halter73 merged commit 3888fda into main May 14, 2022
@halter73 halter73 deleted the halter73/pre-encode branch May 14, 2022 02:54
@ghost ghost added this to the 7.0-preview5 milestone May 14, 2022
@davidfowl
Copy link
Member

Why are there any allocations?

@davidfowl davidfowl added the Perf label Aug 26, 2022
@BrennanConroy
Copy link
Member

Why are there any allocations?

I think because his benchmark allocates the Utf8JsonWriter class every iteration. If you look at the original benchmark, StringLookup was 32 B which is the difference in the new benchmark of 80 B and 48 B for StringLookup vs. Utf8Loopup respectively.

@davidfowl
Copy link
Member

You must be working on your blog post 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-signalr Includes: SignalR clients and servers Perf

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants