Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Conversation

@stephentoub
Copy link
Member

@stephentoub stephentoub commented May 9, 2018

When marshaling back results for a StringBuilder argument in a P/Invoke:

  • for UTF8 it's allocating a new char[], pinning it, and then handing that off to StringBuilder.ReplaceBufferInternal, which itself then allocates a new char[], and all of that happens even if the StringBuilder's char[] is already big enough to handle the results, which is commonly the case due to the StringBuilder having been sized appropriately from the get-go.
  • for both Unicode and Ansi, it's similarly allocating a new char[] invariably, even if the existing buffer is already sufficient.

This commit cleans that up.

While ideally we wouldn't use StringBuilders at all in coreclr/corefx for marshaling, we still do in some places, e.g. Dns.GetHostName. While we should separately fix that to avoid using a StringBuilder at all, it's useful to demonstrate the impact of the change here:

[Benchmark]
public static string GetHostName() => Dns.GetHostName();

on my machine results in:

      Method |  Gen 0 | Allocated |
------------ |-------:|----------:|
 Before      | 0.2668 |   1.09 KB |
 After       | 0.1392 |     584 B |

cc: @jkotas, @JeremyKuhne, @vancem

if (newLength > m_MaxCapacity)
throw new ArgumentOutOfRangeException("capacity", SR.ArgumentOutOfRange_Capacity);

// Leave space for a NUL character at the end, the previous implementation did this
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't clear to me why this space for a null char was necessary, since as far as I can tell the only time it's added is when doing this marshaling out, which means it can't be a general requirement of StringBuilder. So I removed it. But let me know if I missed something.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is called coming back here:

pslILEmit->EmitCALL(METHOD__STRING_BUILDER__REPLACE_BUFFER_INTERNAL, 3, 0);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Why is that relevant? What am I missing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed it mattered if you were to pass the builder back out again after copying in the returned chars (i.e. guaranteed to be null terminated). There is a good chance I'm confusing what is going on here though.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Back when I converted these methods from native to managed I could not find any use for the NUL character, it's most likely a leftover from early .NET days when StringBuilder had a very different implementation. But removing it changes the resulting builder's capacity and that's a bit risky because the capacity is checked by builder's Equals method. The test failure you see is AFAIR the result of that.

Copy link
Member Author

@stephentoub stephentoub May 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But removing it changes the resulting builder's capacity and that's a bit risky because the capacity is checked by builder's Equals method.

Ah, interesting, thanks for pointing that out, @mikedn.

I'd be interested in others' opinions, but I still think this is a reasonable change to take, even if it means StringBuilder.Capacity ends up returning a different result than it previously would have, and thus Equals (which is documented to factor in Capacity) could end up returning a different result than it would have. I'd argue that, even though it's visible, the capacity a collection chooses to use internally is an implementation detail that's subject to change.

My concern is that, for better or worse, there's a decent amount of code that does:

var sb = new StringBuilder(expectedSize);
PInvoke(sb, sb.Capacity);
return sb.ToString();

and if we then force the result to actually require expectedSize+1 chars, we end up needing to allocate another char[], whereas we wouldn't with just expectedSize.

@jkotas, @vancem, any compat opinions here?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PInvoke(sb, sb.Capacity);

Hmm, the interesting thing about the typical use case is that the capacity must already include the NUL terminator. Because of this we won't end up with a larger capacity by including the NUL character in the new capacity calculation.

We'll get differences only when StringBuilder is used as a return type or if space for NUL is manually added - PInvoke(sb, sb.Capacity + 1).

Also, the later variant can produce a builder where Capacity > MaxCapacity:

var sb = new StringBuilder(83, 83);
Console.WriteLine(GetCurrentDirectoryW(sb.Capacity + 1, sb));
Console.WriteLine(sb.Length);
Console.WriteLine(sb.Capacity);
Console.WriteLine(sb.MaxCapacity);

On my machine this displays

83
83
84
83

Copy link
Member Author

@stephentoub stephentoub May 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the interesting thing about the typical use case is that the capacity must already include the NUL terminator

Yeah, I guess that's true. The API let's say is specified to accept a buffer of size N large enough to hold N-1 chars plus a null terminator, you specify the size of the buffer as N, upon return we calculate the resulting string length as N-1, and then we add back a char for the null terminator to be back at N.

@jkotas jkotas requested review from jeffschwMSFT and luqunl May 9, 2018 01:36
Copy link
Member

@JeremyKuhne JeremyKuhne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix. I'm a little nervous about not leaving the +1 char, but that is primarily because the code flow through the marshaller is so hard for me to follow. :)

@mattwarren
Copy link

While ideally we wouldn't use StringBuilders at all in coreclr/corefx for marshaling

@stephentoub Just for my own interest, what's the issue with using StringBuilder for marshalling? Is it too heavy-weight, does it have perf concerns, is it wrong to tie-in the marshalling implementation to it or something else?

@stephentoub
Copy link
Member Author

@mattwar: Just for my own interest, what's the issue with using StringBuilder for marshalling? Is it too heavy-weight, does it have perf concerns, is it wrong to tie-in the marshalling implementation to it or something else?

There are inefficiencies like the ones this PR is fixing. But even once those are addressed, using StringBuilder in marshaling often leads to inefficiencies in app-level code, just because of the patterns it encourages. If you already happen to have your data in a StringBuilder and you're just looking to [In] marshal into a P/Invoke, that's fine. But often, for whatever reason, StringBuilder is used to [Out] marshal the results of some P/Invoke, and the caller often then needs to allocate a StringBuilder just for that privilege, and that in turn incurs other allocations like the char[] internally. You can see a bunch of examples of this, e.g.

etc. You often then see devs adding more code to workaround that, like using a cache of StringBuilders, which itself incurs expense, e.g.

Using a StringBuilder in the signature also prevents optimizations like first trying with stackalloc'd memory and only then going to the heap when more is needed, whereas you can do that if you just used a simple byte* or char* in the signature, and often you don't even need to go to the heap, as in the fix here, where rather than allocating a StringBuilder and allocating its internal char[] (and prior to this PR allocating another char[] when marshaling out) and allocating the resulting string, we instead just stackalloc the needed 256 bytes and then the only heap allocation we incur is the final string.

Plus, just using byte* instead of StringBuilder means all inputs/outputs are blittable and we no longer need any custom P/Invoke marshaling frames.

When marshaling back results for a StringBuilder argument in a P/Invoke:
- for UTF8 it's allocating a new char[], pinning it, and then handing that off to StringBuilder.ReplaceBufferInternal, which itself then allocates a new char[], and all of that happens even if the StringBuilder's char[] is already big enough to handle the results, which is commonly the case due to the StringBuilder having been sized appropriately from the get-go.
- for both Unicode and Ansi, it's similarly allocating a new char[] invariably, even if the existing buffer is already sufficient.

This commit cleans that up.

While ideally we wouldn't use StringBuilders at all in coreclr/corefx for marshaling, we still do in some places, e.g. Dns.GetHostName.  While we should separately fix that to avoid using a StringBuilder at all, it's useful to demonstrate the impact of the change here:
```C#
[Benchmark]
public static string GetHostName() => Dns.GetHostName();
```
on my machine results in:
```
      Method |  Gen 0 | Allocated |
------------ |-------:|----------:|
 Before      | 0.2668 |   1.09 KB |
 After       | 0.1392 |     584 B |
```
@mattwarren
Copy link

@stephentoub thanks for the detailed answer!

Copy link
Member

@jeffschwMSFT jeffschwMSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent some time thinking about the removal of +1 for the null terminator and I think I convinced myself this looks right. The reason is that on the transition back from native we do not copy the null terminator (if it existed) so using the raw byte count should be fine.

@stephentoub
Copy link
Member Author

I spent some time thinking about the removal of +1 for the null terminator and I think I convinced myself this looks right. The reason is that on the transition back from native we do not copy the null terminator (if it existed) so using the raw byte count should be fine.

Ok, thanks, @jeffschwMSFT. That also addresses the rather strange issue @mikedn highlighted, where today because of the +1 you can end up with Capacity > MaxCapacity (the current code does the addition after validating the new length against the MaxCapacity rather than before. So I'll leave the code as is.

@luqunl
Copy link

luqunl commented May 9, 2018

Plus, just using byte* instead of StringBuilder means all inputs/outputs are blittable and we no longer need any custom P/Invoke marshaling frames.

this only works for the string itself doesn't need marshalling support. For ansi string, You cann't just use byte* instead of stringbuilder.

@stephentoub
Copy link
Member Author

stephentoub commented May 9, 2018

For ansi string, You cann't just use byte* instead of stringbuilder.

Sure you can; you then just do the marshaling yourself, e.g. new string((sbyte*)ptr).

@luqunl
Copy link

luqunl commented May 9, 2018

Sure you can; you then just do the marshaling yourself, e.g. new string((sbyte*)ptr).

Thanks for this. Do you know the perf difference between using stringbuilder and new string((sbyte*)ptr) after your change?

@luqunl
Copy link

luqunl commented May 9, 2018

Plus, just using byte* instead of StringBuilder

I think you mean sbyte*

@stephentoub
Copy link
Member Author

I think you mean sbyte*

I actually meant byte* as I wrote, but either would work.

@luqunl
Copy link

luqunl commented May 9, 2018

I actually meant byte* as I wrote, but either would work.

There isn't a string constructor which takes byte* as argument.

@jkotas
Copy link
Member

jkotas commented May 9, 2018

Do you know the perf difference between using stringbuilder

@luqunl Real world example: dotnet/corefx#29594 . Reduced GC allocation from 600 bytes to 40 bytes.

The overall trend in CoreFX has been to avoid builtin marshallers to get better performance, and just use blitable types. Some of the guidelines are written in https://github.com/dotnet/corefx/blob/master/Documentation/coding-guidelines/interop-guidelines.md#definition and https://github.com/dotnet/corefx/blob/master/Documentation/coding-guidelines/interop-pinvokes.md#strings

@stephentoub
Copy link
Member Author

There isn't a string constructor which takes byte* as argument.

I know, that's why my example does new string((sbyte*)ptr).

@stephentoub
Copy link
Member Author

Another example in corefx: dotnet/corefx#29605

@luqunl
Copy link

luqunl commented May 9, 2018

@jkotas/ @stephentoub , I guess these Reduced GC allocation numbers doesn't use this PR/fix. is it ? if this PR is checked in, I guess these GC allocation numbers should be similar.
Before this PR, We recommend people to use stringbuilder as [Out] case, at least for ANSI one.
After this PR, what should we recommend to customer?

@luqunl
Copy link

luqunl commented May 9, 2018

just using byte* instead of StringBuilder means all inputs/outputs are blittable

if you want to use sbyte*, you need to make sure the managed side allocated enough memory. otherwise, you may have buffer-overrun.

@luqunl
Copy link

luqunl commented May 9, 2018

Same as Jeff, I am fine for "removal of +1 for the null terminator", since it is convert from native to managed.

@jkotas
Copy link
Member

jkotas commented May 9, 2018

new string((sbyte*)ptr)

The sbyte string constructor is outlier. It is more natural to use byte, and I believe that majority of other APIs do actually use byte in similar contexts.

After this PR, what should we recommend to customer?

There is no one size fits all recommendation, and I do not think that this PR changes recommendations for writing interop code. StringBuilder is perfectly fine for code that is not performance critical, and the simplicity is preferred.

It is not different from other situations where we have slower, but more convenient APIs. E.g. When to use Linq vs. hand written for-loops?

you need to make sure the managed side allocated enough memory. otherwise, you may have buffer-overrun.

This problem does not go away with StringBuilder. You need to ensure that when using StringBuilder as well by setting Capacity to the right amount.

@stephentoub
Copy link
Member Author

I guess these Reduced GC allocation numbers doesn't include the PR/fix. is it ?

That's correct. But even after this PR, there's measurable allocation overhead, if nothing else than the overhead incurred by the developer's code needing to create the StringBuilder, e.g. here's a benchmark:

using System;
using System.Runtime.InteropServices;
using System.Text;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Attributes.Columns;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Running;
using BenchmarkDotNet.Attributes.Jobs;

[MemoryDiagnoser]
[InProcess]
public class Benchmarks
{
    private static void Main() => BenchmarkRunner.Run<Benchmarks>();

    [DllImport("kernel32", CharSet = CharSet.Ansi)]
    private static extern int GetTempPathA(int bufferLen, [Out] StringBuilder buffer);

    [DllImport("kernel32", CharSet = CharSet.Ansi)]
    private static extern unsafe int GetTempPathA(int bufferLen, byte* buffer);

    [Benchmark]
    public static string MarshalSB()
    {
        int c = GetTempPathA(0, (StringBuilder)null);
        var sb = new StringBuilder(c);
        GetTempPathA(sb.Capacity, sb);
        return sb.ToString();
    }

    [Benchmark]
    public static unsafe string MarshalArray()
    {
        int c = GetTempPathA(0, (byte*)null);
        var bytes = new byte[c];
        fixed (byte* ptr = bytes)
        {
            GetTempPathA(c, ptr);
            return new string((sbyte*)ptr);
        }
    }

    [Benchmark]
    public static unsafe string MarshalStackOrArray()
    {
        int c = GetTempPathA(0, (byte*)null);
        Span<byte> bytes = c <= 128 ? stackalloc byte[128] : new byte[c];
        fixed (byte* ptr = &MemoryMarshal.GetReference(bytes))
        {
            GetTempPathA(c, ptr);
            return new string((sbyte*)ptr);
        }
    }
}

And the results on my machine (the "B_MarshalSB" is before this PR and the "A_MarshalSB" is after this PR)... the relevant part is the Allocated column:

              Method |     Mean |     Error |    StdDev |  Gen 0 | Allocated |
-------------------- |---------:|----------:|----------:|-------:|----------:|
         B_MarshalSB | 6.887 us | 0.0653 us | 0.0611 us | 0.0763 |     336 B |
         A_MarshalSB | 6.833 us | 0.0730 us | 0.0683 us | 0.0534 |     240 B |
        MarshalArray | 6.826 us | 0.1309 us | 0.1558 us | 0.0305 |     160 B |
 MarshalStackOrArray | 6.890 us | 0.0840 us | 0.0786 us | 0.0153 |      96 B |

@stephentoub
Copy link
Member Author

Thank you all for the reviews!

@stephentoub stephentoub merged commit 6169761 into dotnet:master May 9, 2018
@stephentoub stephentoub deleted the sbmarshal branch May 9, 2018 18:35
@jkotas
Copy link
Member

jkotas commented May 9, 2018

I have looked at the history where the extra null came from.

The extra null was introduced by interop refactoring in 2007. It was noticed and fixed in 2008:

//depot/devdiv/pu/CLR/ndp/clr/src/VM/object.cpp
change 3240200 edit on 2008/10/04 05:51:01
Title: Fix for Dev10 #520387, #519429, #519314
...
Modify ReplaceBuffer and ReplaceBufferAnsi to not allocate for the terminating null character
...

However, the fix was (accidentally?) reverted in:

//depot/devdiv/pu/CLR/ndp/clr/src/VM/object.cpp
change 3431533 edit on 2010/04/02 19:38:45

	Submitting Main --> Pu\clr FI

The detailed history for this was obliterated, so I am not able to tell what happened there. My bet is that it was just incorrectly resolved merge conflict.

@stephentoub
Copy link
Member Author

Thanks for tracking it down, Jan.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants