-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Use pinned arrays in Sockets #34175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use pinned arrays in Sockets #34175
Conversation
...ies/System.Net.Sockets/src/System/Net/Sockets/ReceiveMessageOverlappedAsyncResult.Windows.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncEventArgs.Windows.cs
Outdated
Show resolved
Hide resolved
3aa8726 to
e147a4e
Compare
|
It would be useful to get some performance numbers for this. Do we have any networking performance benchmarks that exercise these paths? Are these hot paths in any of the techempower benchmarks? |
Working on that.
These look like long(ish) term allocations/pins, typically spanning an async operation or even longer. I did run some kestrel scenarios in debugger with these changes - to see where the external buffers come from and they typically come from either slab allocator ( <4K buffers ) or from the shared pool ( larger ones ). Not 100% sure if this is always the case, I checked only few scenarios, but it seems to be the general strategy. It would make sense just from allocation avoidance point. If these paths are touched by the benchmarks, they could be the only cases where we pin otherwise moveable memory. (this is ignoring fast pinning for fixed/pinvoke) Whether the effect of this change will be seen on short benchmarks - not sure. Fragmentation is typically a long term problem. |
...stem.Net.HttpListener/src/System/Net/Windows/WebSockets/WebSocketHttpListenerDuplexStream.cs
Outdated
Show resolved
Hide resolved
...stem.Net.HttpListener/src/System/Net/Windows/WebSockets/WebSocketHttpListenerDuplexStream.cs
Outdated
Show resolved
Hide resolved
I don't think so. The socket APIs being impacted here are either a) part of the legacy APM implementations, b) part of the UDP-related APIs, or c) part of vectored sends/receives. (c) is valuable, but I don't believe it shows up in any kestrel benchmarks. (b) may eventually be useful, but we've not done a lot yet around these UDP APIs and there's a lot of other low-hanging fruit. I wouldn't spend much time at all on (a). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VSadov why does it make sense to use POH for SocketAddress.Buffer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My question is to better understand when it is better to use SOH vs POH for objects that will be pinned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this change I basically look for arrays pinned via GCHandleType.Pinned.
With few exceptions it is an indication that the object could be pinned for a nontrivial length of time, perhaps for its entire lifetime. SocketAddress.Buffer is one of such arrays.
mid/long term pinning in random locations could have really bad effects on Small Object Heap.
On the other hand - if the object is expected to end up in Gen1/Gen2, then there is not much downside from allocating on POH.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this can be simplified now to:
_wsaMessageBufferPinned ??= GC.AllocateUninitializedArray<byte>(sizeof(Interop.Winsock.WSAMsg), pinned: true);There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto:
_wsaRecvMsgWSABufferArrayPinned ??= GC.AllocateUninitializedArray<WSABuffer>(1, pinned: true);
src/libraries/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncEventArgs.Windows.cs
Outdated
Show resolved
Hide resolved
...stem.Net.HttpListener/src/System/Net/Windows/WebSockets/WebSocketHttpListenerDuplexStream.cs
Outdated
Show resolved
Hide resolved
|
Rebased and skipped all the changes to |
|
Trying to hit this in TE benchmarks was a bit of a dud. As I see from traces, there are lots of pinning handles created and destroyed (both regular and async), but that is for single buffers passed to socket layer from outside and such pinning is needed for correctness. Also, with Kestrel, what I typically see at the end of a handle is a manually managed chunk of memory from a slab in Large Object Heap. From the heap fragmentation point, such pinning is not a concern. Performance-wise, I see no impact either. |
Hmm. Which TE benchmark? I'm a little surprised by that. I thought they were using prepinned arrays, and then creating Memory instances from them marked as already pinned such that Memory.Pin would nop. |
|
One example: (regular pinned handle in DbFortunes with Postgre) |
|
There is also async pinning, which, I think could be hard to avoid as long as Overlapped is used. Here is an example of async pinning (in PlaintextNonPipelined): |
In plaintext, are you seeing async pinning only in accepts or also in sends/recvs? It's the latter I'm more interested in. Accepts are rare in those benchmarks. Sends/recvs dominate. |
|
Right there should be no pinning for send and receive in those benchmarks |
|
That was the dominant pinning stack. Other cases are less frequent. |
|
Yes, for plaintext they are all Accepts. |
|
The original |
|
@benaadams - Also, in theory, if #33542 is implemented, MemoryMarshal could just check if an array happens to be in the pinned object heap. |
|
So, what do we do with this PR? It feels like the changes would be helpful to the code that hits allocating/pinning paths inside the Socket layer. At very least it reduces the ceremony around handles used to pin those allocations. However, seeing that benchmarks evade such codepaths, thus making the effects hard to measure, I am not as excited about the change as when this started. |
I suggest we keep the changes in SocketAsyncEventArgs and ditch the rest. I'm interested in those specifically because a) if you're creating a SocketAsyncEventArgs, you're likely keeping it around for a while and probably pooling it, and b) the changes reduce the size of that object by eliminating all the GCHandle fields. If we end end being interested in the others, I think it'd be good to do them as official lindividual PRs with dedicated scenario-based benchmarks to back them up. Doesn't need to be TE; those aren't necessarily real-world :-) |
Sounds good! There is some entanglement between changes though - Basically - Also, if we keep Do we actually want to "remove changes that do not benefit |
|
I'm not convinced that the changes to SocketAddress are good. There are a variety of places where temporary instances are created, are short lived, etc. Creating those as gen2 is potentially concerning, and I don't know that the unknown wins from it outweigh the unknown losses. I think it'd be better to start with just the changes isolated to SAEA. That reasonable? |
|
Rebased and skipped everything not related to |
stephentoub
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
Pinning scenarios in Sockets could be roughly classified into two buckets:
This change addresses the second case - the buffers allocated internally.
Since we are in control of these buffers we can allocate them on POH. That allows skipping some of the ceremony with pinning handles.
Also reduces chances to have random long-lived pinned objects interfering with compaction.