-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Reduce allocation in StringBuilder marshaling #17928
Conversation
| if (newLength > m_MaxCapacity) | ||
| throw new ArgumentOutOfRangeException("capacity", SR.ArgumentOutOfRange_Capacity); | ||
|
|
||
| // Leave space for a NUL character at the end, the previous implementation did this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wasn't clear to me why this space for a null char was necessary, since as far as I can tell the only time it's added is when doing this marshaling out, which means it can't be a general requirement of StringBuilder. So I removed it. But let me know if I missed something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is called coming back here:
coreclr/src/vm/ilmarshalers.cpp
Line 864 in acb110d
| pslILEmit->EmitCALL(METHOD__STRING_BUILDER__REPLACE_BUFFER_INTERNAL, 3, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Why is that relevant? What am I missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assumed it mattered if you were to pass the builder back out again after copying in the returned chars (i.e. guaranteed to be null terminated). There is a good chance I'm confusing what is going on here though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Back when I converted these methods from native to managed I could not find any use for the NUL character, it's most likely a leftover from early .NET days when StringBuilder had a very different implementation. But removing it changes the resulting builder's capacity and that's a bit risky because the capacity is checked by builder's Equals method. The test failure you see is AFAIR the result of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But removing it changes the resulting builder's capacity and that's a bit risky because the capacity is checked by builder's Equals method.
Ah, interesting, thanks for pointing that out, @mikedn.
I'd be interested in others' opinions, but I still think this is a reasonable change to take, even if it means StringBuilder.Capacity ends up returning a different result than it previously would have, and thus Equals (which is documented to factor in Capacity) could end up returning a different result than it would have. I'd argue that, even though it's visible, the capacity a collection chooses to use internally is an implementation detail that's subject to change.
My concern is that, for better or worse, there's a decent amount of code that does:
var sb = new StringBuilder(expectedSize);
PInvoke(sb, sb.Capacity);
return sb.ToString();and if we then force the result to actually require expectedSize+1 chars, we end up needing to allocate another char[], whereas we wouldn't with just expectedSize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PInvoke(sb, sb.Capacity);
Hmm, the interesting thing about the typical use case is that the capacity must already include the NUL terminator. Because of this we won't end up with a larger capacity by including the NUL character in the new capacity calculation.
We'll get differences only when StringBuilder is used as a return type or if space for NUL is manually added - PInvoke(sb, sb.Capacity + 1).
Also, the later variant can produce a builder where Capacity > MaxCapacity:
var sb = new StringBuilder(83, 83);
Console.WriteLine(GetCurrentDirectoryW(sb.Capacity + 1, sb));
Console.WriteLine(sb.Length);
Console.WriteLine(sb.Capacity);
Console.WriteLine(sb.MaxCapacity);On my machine this displays
83
83
84
83
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the interesting thing about the typical use case is that the capacity must already include the NUL terminator
Yeah, I guess that's true. The API let's say is specified to accept a buffer of size N large enough to hold N-1 chars plus a null terminator, you specify the size of the buffer as N, upon return we calculate the resulting string length as N-1, and then we add back a char for the null terminator to be back at N.
JeremyKuhne
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice fix. I'm a little nervous about not leaving the +1 char, but that is primarily because the code flow through the marshaller is so hard for me to follow. :)
@stephentoub Just for my own interest, what's the issue with using |
There are inefficiencies like the ones this PR is fixing. But even once those are addressed, using StringBuilder in marshaling often leads to inefficiencies in app-level code, just because of the patterns it encourages. If you already happen to have your data in a StringBuilder and you're just looking to [In] marshal into a P/Invoke, that's fine. But often, for whatever reason, StringBuilder is used to [Out] marshal the results of some P/Invoke, and the caller often then needs to allocate a StringBuilder just for that privilege, and that in turn incurs other allocations like the char[] internally. You can see a bunch of examples of this, e.g.
etc. You often then see devs adding more code to workaround that, like using a cache of StringBuilders, which itself incurs expense, e.g.
Using a StringBuilder in the signature also prevents optimizations like first trying with stackalloc'd memory and only then going to the heap when more is needed, whereas you can do that if you just used a simple Plus, just using |
When marshaling back results for a StringBuilder argument in a P/Invoke:
- for UTF8 it's allocating a new char[], pinning it, and then handing that off to StringBuilder.ReplaceBufferInternal, which itself then allocates a new char[], and all of that happens even if the StringBuilder's char[] is already big enough to handle the results, which is commonly the case due to the StringBuilder having been sized appropriately from the get-go.
- for both Unicode and Ansi, it's similarly allocating a new char[] invariably, even if the existing buffer is already sufficient.
This commit cleans that up.
While ideally we wouldn't use StringBuilders at all in coreclr/corefx for marshaling, we still do in some places, e.g. Dns.GetHostName. While we should separately fix that to avoid using a StringBuilder at all, it's useful to demonstrate the impact of the change here:
```C#
[Benchmark]
public static string GetHostName() => Dns.GetHostName();
```
on my machine results in:
```
Method | Gen 0 | Allocated |
------------ |-------:|----------:|
Before | 0.2668 | 1.09 KB |
After | 0.1392 | 584 B |
```
|
@stephentoub thanks for the detailed answer! |
jeffschwMSFT
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent some time thinking about the removal of +1 for the null terminator and I think I convinced myself this looks right. The reason is that on the transition back from native we do not copy the null terminator (if it existed) so using the raw byte count should be fine.
Ok, thanks, @jeffschwMSFT. That also addresses the rather strange issue @mikedn highlighted, where today because of the +1 you can end up with Capacity > MaxCapacity (the current code does the addition after validating the new length against the MaxCapacity rather than before. So I'll leave the code as is. |
this only works for the string itself doesn't need marshalling support. For ansi string, You cann't just use byte* instead of stringbuilder. |
Sure you can; you then just do the marshaling yourself, e.g. |
Thanks for this. Do you know the perf difference between using stringbuilder and new string((sbyte*)ptr) after your change? |
I think you mean sbyte* |
I actually meant |
There isn't a string constructor which takes byte* as argument. |
@luqunl Real world example: dotnet/corefx#29594 . Reduced GC allocation from 600 bytes to 40 bytes. The overall trend in CoreFX has been to avoid builtin marshallers to get better performance, and just use blitable types. Some of the guidelines are written in https://github.com/dotnet/corefx/blob/master/Documentation/coding-guidelines/interop-guidelines.md#definition and https://github.com/dotnet/corefx/blob/master/Documentation/coding-guidelines/interop-pinvokes.md#strings |
I know, that's why my example does |
|
Another example in corefx: dotnet/corefx#29605 |
|
@jkotas/ @stephentoub , I guess these Reduced GC allocation numbers doesn't use this PR/fix. is it ? if this PR is checked in, I guess these GC allocation numbers should be similar. |
if you want to use sbyte*, you need to make sure the managed side allocated enough memory. otherwise, you may have buffer-overrun. |
|
Same as Jeff, I am fine for "removal of +1 for the null terminator", since it is convert from native to managed. |
The
There is no one size fits all recommendation, and I do not think that this PR changes recommendations for writing interop code. StringBuilder is perfectly fine for code that is not performance critical, and the simplicity is preferred. It is not different from other situations where we have slower, but more convenient APIs. E.g. When to use Linq vs. hand written for-loops?
This problem does not go away with StringBuilder. You need to ensure that when using StringBuilder as well by setting Capacity to the right amount. |
That's correct. But even after this PR, there's measurable allocation overhead, if nothing else than the overhead incurred by the developer's code needing to create the StringBuilder, e.g. here's a benchmark: using System;
using System.Runtime.InteropServices;
using System.Text;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Attributes.Columns;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Running;
using BenchmarkDotNet.Attributes.Jobs;
[MemoryDiagnoser]
[InProcess]
public class Benchmarks
{
private static void Main() => BenchmarkRunner.Run<Benchmarks>();
[DllImport("kernel32", CharSet = CharSet.Ansi)]
private static extern int GetTempPathA(int bufferLen, [Out] StringBuilder buffer);
[DllImport("kernel32", CharSet = CharSet.Ansi)]
private static extern unsafe int GetTempPathA(int bufferLen, byte* buffer);
[Benchmark]
public static string MarshalSB()
{
int c = GetTempPathA(0, (StringBuilder)null);
var sb = new StringBuilder(c);
GetTempPathA(sb.Capacity, sb);
return sb.ToString();
}
[Benchmark]
public static unsafe string MarshalArray()
{
int c = GetTempPathA(0, (byte*)null);
var bytes = new byte[c];
fixed (byte* ptr = bytes)
{
GetTempPathA(c, ptr);
return new string((sbyte*)ptr);
}
}
[Benchmark]
public static unsafe string MarshalStackOrArray()
{
int c = GetTempPathA(0, (byte*)null);
Span<byte> bytes = c <= 128 ? stackalloc byte[128] : new byte[c];
fixed (byte* ptr = &MemoryMarshal.GetReference(bytes))
{
GetTempPathA(c, ptr);
return new string((sbyte*)ptr);
}
}
}And the results on my machine (the "B_MarshalSB" is before this PR and the "A_MarshalSB" is after this PR)... the relevant part is the Allocated column: |
|
Thank you all for the reviews! |
|
I have looked at the history where the extra null came from. The extra null was introduced by interop refactoring in 2007. It was noticed and fixed in 2008: However, the fix was (accidentally?) reverted in: The detailed history for this was obliterated, so I am not able to tell what happened there. My bet is that it was just incorrectly resolved merge conflict. |
|
Thanks for tracking it down, Jan. |
When marshaling back results for a StringBuilder argument in a P/Invoke:
This commit cleans that up.
While ideally we wouldn't use StringBuilders at all in coreclr/corefx for marshaling, we still do in some places, e.g. Dns.GetHostName. While we should separately fix that to avoid using a StringBuilder at all, it's useful to demonstrate the impact of the change here:
on my machine results in:
cc: @jkotas, @JeremyKuhne, @vancem