Reduce allocation in StringBuilder marshaling #17928

stephentoub · 2018-05-09T01:29:30Z

When marshaling back results for a StringBuilder argument in a P/Invoke:

for UTF8 it's allocating a new char[], pinning it, and then handing that off to StringBuilder.ReplaceBufferInternal, which itself then allocates a new char[], and all of that happens even if the StringBuilder's char[] is already big enough to handle the results, which is commonly the case due to the StringBuilder having been sized appropriately from the get-go.
for both Unicode and Ansi, it's similarly allocating a new char[] invariably, even if the existing buffer is already sufficient.

This commit cleans that up.

While ideally we wouldn't use StringBuilders at all in coreclr/corefx for marshaling, we still do in some places, e.g. Dns.GetHostName. While we should separately fix that to avoid using a StringBuilder at all, it's useful to demonstrate the impact of the change here:

[Benchmark]
public static string GetHostName() => Dns.GetHostName();

on my machine results in:

      Method |  Gen 0 | Allocated |
------------ |-------:|----------:|
 Before      | 0.2668 |   1.09 KB |
 After       | 0.1392 |     584 B |

cc: @jkotas, @JeremyKuhne, @vancem

stephentoub · 2018-05-09T01:30:34Z

src/mscorlib/src/System/Text/StringBuilder.CoreCLR.cs

            if (newLength > m_MaxCapacity)
                throw new ArgumentOutOfRangeException("capacity", SR.ArgumentOutOfRange_Capacity);

-            // Leave space for a NUL character at the end, the previous implementation did this


It wasn't clear to me why this space for a null char was necessary, since as far as I can tell the only time it's added is when doing this marshaling out, which means it can't be a general requirement of StringBuilder. So I removed it. But let me know if I missed something.

I believe this is called coming back here:

coreclr/src/vm/ilmarshalers.cpp

Line 864 in acb110d

pslILEmit->EmitCALL(METHOD__STRING_BUILDER__REPLACE_BUFFER_INTERNAL, 3, 0);

Yes. Why is that relevant? What am I missing?

I assumed it mattered if you were to pass the builder back out again after copying in the returned chars (i.e. guaranteed to be null terminated). There is a good chance I'm confusing what is going on here though.

Back when I converted these methods from native to managed I could not find any use for the NUL character, it's most likely a leftover from early .NET days when StringBuilder had a very different implementation. But removing it changes the resulting builder's capacity and that's a bit risky because the capacity is checked by builder's Equals method. The test failure you see is AFAIR the result of that.

But removing it changes the resulting builder's capacity and that's a bit risky because the capacity is checked by builder's Equals method.

Ah, interesting, thanks for pointing that out, @mikedn.

I'd be interested in others' opinions, but I still think this is a reasonable change to take, even if it means StringBuilder.Capacity ends up returning a different result than it previously would have, and thus Equals (which is documented to factor in Capacity) could end up returning a different result than it would have. I'd argue that, even though it's visible, the capacity a collection chooses to use internally is an implementation detail that's subject to change.

My concern is that, for better or worse, there's a decent amount of code that does:

var sb = new StringBuilder(expectedSize); PInvoke(sb, sb.Capacity); return sb.ToString();

and if we then force the result to actually require expectedSize+1 chars, we end up needing to allocate another char[], whereas we wouldn't with just expectedSize.

@jkotas, @vancem, any compat opinions here?

PInvoke(sb, sb.Capacity);

Hmm, the interesting thing about the typical use case is that the capacity must already include the NUL terminator. Because of this we won't end up with a larger capacity by including the NUL character in the new capacity calculation.

We'll get differences only when StringBuilder is used as a return type or if space for NUL is manually added - PInvoke(sb, sb.Capacity + 1).

Also, the later variant can produce a builder where Capacity > MaxCapacity:

var sb = new StringBuilder(83, 83); Console.WriteLine(GetCurrentDirectoryW(sb.Capacity + 1, sb)); Console.WriteLine(sb.Length); Console.WriteLine(sb.Capacity); Console.WriteLine(sb.MaxCapacity);

On my machine this displays

83 83 84 83

the interesting thing about the typical use case is that the capacity must already include the NUL terminator

Yeah, I guess that's true. The API let's say is specified to accept a buffer of size N large enough to hold N-1 chars plus a null terminator, you specify the size of the buffer as N, upon return we calculate the resulting string length as N-1, and then we add back a char for the null terminator to be back at N.

JeremyKuhne

Nice fix. I'm a little nervous about not leaving the +1 char, but that is primarily because the code flow through the marshaller is so hard for me to follow. :)

mattwarren · 2018-05-09T12:02:37Z

While ideally we wouldn't use StringBuilders at all in coreclr/corefx for marshaling

@stephentoub Just for my own interest, what's the issue with using StringBuilder for marshalling? Is it too heavy-weight, does it have perf concerns, is it wrong to tie-in the marshalling implementation to it or something else?

stephentoub · 2018-05-09T13:38:05Z

@mattwar: Just for my own interest, what's the issue with using StringBuilder for marshalling? Is it too heavy-weight, does it have perf concerns, is it wrong to tie-in the marshalling implementation to it or something else?

There are inefficiencies like the ones this PR is fixing. But even once those are addressed, using StringBuilder in marshaling often leads to inefficiencies in app-level code, just because of the patterns it encourages. If you already happen to have your data in a StringBuilder and you're just looking to [In] marshal into a P/Invoke, that's fine. But often, for whatever reason, StringBuilder is used to [Out] marshal the results of some P/Invoke, and the caller often then needs to allocate a StringBuilder just for that privilege, and that in turn incurs other allocations like the char[] internally. You can see a bunch of examples of this, e.g.

etc. You often then see devs adding more code to workaround that, like using a cache of StringBuilders, which itself incurs expense, e.g.

Using a StringBuilder in the signature also prevents optimizations like first trying with stackalloc'd memory and only then going to the heap when more is needed, whereas you can do that if you just used a simple byte* or char* in the signature, and often you don't even need to go to the heap, as in the fix here, where rather than allocating a StringBuilder and allocating its internal char[] (and prior to this PR allocating another char[] when marshaling out) and allocating the resulting string, we instead just stackalloc the needed 256 bytes and then the only heap allocation we incur is the final string.

Remove StringBuilder usage from Dns.GetHostName corefx#29594

Plus, just using byte* instead of StringBuilder means all inputs/outputs are blittable and we no longer need any custom P/Invoke marshaling frames.

When marshaling back results for a StringBuilder argument in a P/Invoke: - for UTF8 it's allocating a new char[], pinning it, and then handing that off to StringBuilder.ReplaceBufferInternal, which itself then allocates a new char[], and all of that happens even if the StringBuilder's char[] is already big enough to handle the results, which is commonly the case due to the StringBuilder having been sized appropriately from the get-go. - for both Unicode and Ansi, it's similarly allocating a new char[] invariably, even if the existing buffer is already sufficient. This commit cleans that up. While ideally we wouldn't use StringBuilders at all in coreclr/corefx for marshaling, we still do in some places, e.g. Dns.GetHostName. While we should separately fix that to avoid using a StringBuilder at all, it's useful to demonstrate the impact of the change here: ```C# [Benchmark] public static string GetHostName() => Dns.GetHostName(); ``` on my machine results in: ``` Method | Gen 0 | Allocated | ------------ |-------:|----------:| Before | 0.2668 | 1.09 KB | After | 0.1392 | 584 B | ```

mattwarren · 2018-05-09T14:35:27Z

@stephentoub thanks for the detailed answer!

jeffschwMSFT

I spent some time thinking about the removal of +1 for the null terminator and I think I convinced myself this looks right. The reason is that on the transition back from native we do not copy the null terminator (if it existed) so using the raw byte count should be fine.

stephentoub · 2018-05-09T15:04:54Z

I spent some time thinking about the removal of +1 for the null terminator and I think I convinced myself this looks right. The reason is that on the transition back from native we do not copy the null terminator (if it existed) so using the raw byte count should be fine.

Ok, thanks, @jeffschwMSFT. That also addresses the rather strange issue @mikedn highlighted, where today because of the +1 you can end up with Capacity > MaxCapacity (the current code does the addition after validating the new length against the MaxCapacity rather than before. So I'll leave the code as is.

luqunl · 2018-05-09T17:16:12Z

Plus, just using byte* instead of StringBuilder means all inputs/outputs are blittable and we no longer need any custom P/Invoke marshaling frames.

this only works for the string itself doesn't need marshalling support. For ansi string, You cann't just use byte* instead of stringbuilder.

stephentoub · 2018-05-09T17:17:22Z

For ansi string, You cann't just use byte* instead of stringbuilder.

Sure you can; you then just do the marshaling yourself, e.g. new string((sbyte*)ptr).

luqunl · 2018-05-09T17:40:53Z

Sure you can; you then just do the marshaling yourself, e.g. new string((sbyte*)ptr).

Thanks for this. Do you know the perf difference between using stringbuilder and new string((sbyte*)ptr) after your change?

luqunl · 2018-05-09T17:44:15Z

Plus, just using byte* instead of StringBuilder

I think you mean sbyte*

stephentoub · 2018-05-09T17:45:10Z

I think you mean sbyte*

I actually meant byte* as I wrote, but either would work.

luqunl · 2018-05-09T17:47:47Z

I actually meant byte* as I wrote, but either would work.

There isn't a string constructor which takes byte* as argument.

jkotas · 2018-05-09T17:47:51Z

Do you know the perf difference between using stringbuilder

@luqunl Real world example: dotnet/corefx#29594 . Reduced GC allocation from 600 bytes to 40 bytes.

The overall trend in CoreFX has been to avoid builtin marshallers to get better performance, and just use blitable types. Some of the guidelines are written in https://github.com/dotnet/corefx/blob/master/Documentation/coding-guidelines/interop-guidelines.md#definition and https://github.com/dotnet/corefx/blob/master/Documentation/coding-guidelines/interop-pinvokes.md#strings

stephentoub · 2018-05-09T17:48:47Z

There isn't a string constructor which takes byte* as argument.

I know, that's why my example does new string((sbyte*)ptr).

stephentoub · 2018-05-09T17:52:06Z

Another example in corefx: dotnet/corefx#29605

luqunl · 2018-05-09T17:53:12Z

@jkotas/ @stephentoub , I guess these Reduced GC allocation numbers doesn't use this PR/fix. is it ? if this PR is checked in, I guess these GC allocation numbers should be similar.
Before this PR, We recommend people to use stringbuilder as [Out] case, at least for ANSI one.
After this PR, what should we recommend to customer?

luqunl · 2018-05-09T17:58:51Z

just using byte* instead of StringBuilder means all inputs/outputs are blittable

if you want to use sbyte*, you need to make sure the managed side allocated enough memory. otherwise, you may have buffer-overrun.

luqunl · 2018-05-09T18:01:40Z

Same as Jeff, I am fine for "removal of +1 for the null terminator", since it is convert from native to managed.

jkotas · 2018-05-09T18:06:52Z

new string((sbyte*)ptr)

The sbyte string constructor is outlier. It is more natural to use byte, and I believe that majority of other APIs do actually use byte in similar contexts.

After this PR, what should we recommend to customer?

There is no one size fits all recommendation, and I do not think that this PR changes recommendations for writing interop code. StringBuilder is perfectly fine for code that is not performance critical, and the simplicity is preferred.

It is not different from other situations where we have slower, but more convenient APIs. E.g. When to use Linq vs. hand written for-loops?

you need to make sure the managed side allocated enough memory. otherwise, you may have buffer-overrun.

This problem does not go away with StringBuilder. You need to ensure that when using StringBuilder as well by setting Capacity to the right amount.

stephentoub · 2018-05-09T18:21:44Z

I guess these Reduced GC allocation numbers doesn't include the PR/fix. is it ?

That's correct. But even after this PR, there's measurable allocation overhead, if nothing else than the overhead incurred by the developer's code needing to create the StringBuilder, e.g. here's a benchmark:

using System;
using System.Runtime.InteropServices;
using System.Text;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Attributes.Columns;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Running;
using BenchmarkDotNet.Attributes.Jobs;

[MemoryDiagnoser]
[InProcess]
public class Benchmarks
{
    private static void Main() => BenchmarkRunner.Run<Benchmarks>();

    [DllImport("kernel32", CharSet = CharSet.Ansi)]
    private static extern int GetTempPathA(int bufferLen, [Out] StringBuilder buffer);

    [DllImport("kernel32", CharSet = CharSet.Ansi)]
    private static extern unsafe int GetTempPathA(int bufferLen, byte* buffer);

    [Benchmark]
    public static string MarshalSB()
    {
        int c = GetTempPathA(0, (StringBuilder)null);
        var sb = new StringBuilder(c);
        GetTempPathA(sb.Capacity, sb);
        return sb.ToString();
    }

    [Benchmark]
    public static unsafe string MarshalArray()
    {
        int c = GetTempPathA(0, (byte*)null);
        var bytes = new byte[c];
        fixed (byte* ptr = bytes)
        {
            GetTempPathA(c, ptr);
            return new string((sbyte*)ptr);
        }
    }

    [Benchmark]
    public static unsafe string MarshalStackOrArray()
    {
        int c = GetTempPathA(0, (byte*)null);
        Span<byte> bytes = c <= 128 ? stackalloc byte[128] : new byte[c];
        fixed (byte* ptr = &MemoryMarshal.GetReference(bytes))
        {
            GetTempPathA(c, ptr);
            return new string((sbyte*)ptr);
        }
    }
}

And the results on my machine (the "B_MarshalSB" is before this PR and the "A_MarshalSB" is after this PR)... the relevant part is the Allocated column:

              Method |     Mean |     Error |    StdDev |  Gen 0 | Allocated |
-------------------- |---------:|----------:|----------:|-------:|----------:|
         B_MarshalSB | 6.887 us | 0.0653 us | 0.0611 us | 0.0763 |     336 B |
         A_MarshalSB | 6.833 us | 0.0730 us | 0.0683 us | 0.0534 |     240 B |
        MarshalArray | 6.826 us | 0.1309 us | 0.1558 us | 0.0305 |     160 B |
 MarshalStackOrArray | 6.890 us | 0.0840 us | 0.0786 us | 0.0153 |      96 B |

stephentoub · 2018-05-09T18:33:35Z

Thank you all for the reviews!

jkotas · 2018-05-09T18:43:15Z

I have looked at the history where the extra null came from.

The extra null was introduced by interop refactoring in 2007. It was noticed and fixed in 2008:

//depot/devdiv/pu/CLR/ndp/clr/src/VM/object.cpp
change 3240200 edit on 2008/10/04 05:51:01
Title: Fix for Dev10 #520387, #519429, #519314
...
Modify ReplaceBuffer and ReplaceBufferAnsi to not allocate for the terminating null character
...

However, the fix was (accidentally?) reverted in:

//depot/devdiv/pu/CLR/ndp/clr/src/VM/object.cpp
change 3431533 edit on 2010/04/02 19:38:45

	Submitting Main --> Pu\clr FI

The detailed history for this was obliterated, so I am not able to tell what happened there. My bet is that it was just incorrectly resolved merge conflict.

stephentoub · 2018-05-09T19:17:35Z

Thanks for tracking it down, Jan.

stephentoub commented May 9, 2018

View reviewed changes

jkotas requested review from jeffschwMSFT and luqunl May 9, 2018 01:36

JeremyKuhne approved these changes May 9, 2018

View reviewed changes

stephentoub force-pushed the sbmarshal branch from 9cfffbb to 436d2f4 Compare May 9, 2018 14:08

jeffschwMSFT approved these changes May 9, 2018

View reviewed changes

luqunl approved these changes May 9, 2018

View reviewed changes

stephentoub merged commit 6169761 into dotnet:master May 9, 2018

stephentoub deleted the sbmarshal branch May 9, 2018 18:35

mattwarren mentioned this pull request May 10, 2018

Results table should be a GitHub Flavored Markdown table dotnet/BenchmarkDotNet#320

Closed

Reduce allocation in StringBuilder marshaling #17928

Reduce allocation in StringBuilder marshaling #17928

Uh oh!

Conversation

stephentoub commented May 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stephentoub May 9, 2018

Choose a reason for hiding this comment

Uh oh!

JeremyKuhne May 9, 2018

Choose a reason for hiding this comment

Uh oh!

stephentoub May 9, 2018

Choose a reason for hiding this comment

Uh oh!

JeremyKuhne May 9, 2018

Choose a reason for hiding this comment

Uh oh!

mikedn May 9, 2018

Choose a reason for hiding this comment

Uh oh!

stephentoub May 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mikedn May 9, 2018

Choose a reason for hiding this comment

Uh oh!

stephentoub May 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JeremyKuhne left a comment

Choose a reason for hiding this comment

Uh oh!

mattwarren commented May 9, 2018

Uh oh!

stephentoub commented May 9, 2018

Uh oh!

mattwarren commented May 9, 2018

Uh oh!

jeffschwMSFT left a comment

Choose a reason for hiding this comment

Uh oh!

stephentoub commented May 9, 2018

Uh oh!

luqunl commented May 9, 2018

Uh oh!

stephentoub commented May 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

luqunl commented May 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

luqunl commented May 9, 2018

Uh oh!

stephentoub commented May 9, 2018

Uh oh!

luqunl commented May 9, 2018

Uh oh!

jkotas commented May 9, 2018

Uh oh!

stephentoub commented May 9, 2018

Uh oh!

stephentoub commented May 9, 2018

Uh oh!

luqunl commented May 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

luqunl commented May 9, 2018

Uh oh!

luqunl commented May 9, 2018

Uh oh!

jkotas commented May 9, 2018

Uh oh!

stephentoub commented May 9, 2018

Uh oh!

stephentoub commented May 9, 2018

Uh oh!

stephentoub commented May 9, 2018 •

edited

Loading

stephentoub May 9, 2018 •

edited

Loading

stephentoub May 9, 2018 •

edited

Loading

stephentoub commented May 9, 2018 •

edited

Loading

luqunl commented May 9, 2018 •

edited

Loading

luqunl commented May 9, 2018 •

edited

Loading