Skip to content

Conversation

@VSadov
Copy link
Member

@VSadov VSadov commented Mar 27, 2020

Pinning scenarios in Sockets could be roughly classified into two buckets:

  1. external buffers passed in through public APIs
  2. internal buffers created by the library itself

This change addresses the second case - the buffers allocated internally.
Since we are in control of these buffers we can allocate them on POH. That allows skipping some of the ceremony with pinning handles.
Also reduces chances to have random long-lived pinned objects interfering with compaction.

@VSadov VSadov force-pushed the pinned branch 4 times, most recently from 3aa8726 to e147a4e Compare March 30, 2020 01:05
@VSadov VSadov changed the title [WIP] use pinned arrays in more places Use pinned arrays in Sockets Mar 30, 2020
@VSadov VSadov marked this pull request as ready for review March 30, 2020 19:26
@VSadov VSadov requested review from jkotas and stephentoub March 30, 2020 19:26
@jkotas
Copy link
Member

jkotas commented Mar 30, 2020

It would be useful to get some performance numbers for this.

Do we have any networking performance benchmarks that exercise these paths? Are these hot paths in any of the techempower benchmarks?

@VSadov
Copy link
Member Author

VSadov commented Mar 30, 2020

It would be useful to get some performance numbers for this.

Working on that.
I expected this question :-)

Are these hot paths in any of the techempower benchmarks?

These look like long(ish) term allocations/pins, typically spanning an async operation or even longer.
I'd expect that these are not "hot" paths in a throughput sense.

I did run some kestrel scenarios in debugger with these changes - to see where the external buffers come from and they typically come from either slab allocator ( <4K buffers ) or from the shared pool ( larger ones ). Not 100% sure if this is always the case, I checked only few scenarios, but it seems to be the general strategy. It would make sense just from allocation avoidance point.

If these paths are touched by the benchmarks, they could be the only cases where we pin otherwise moveable memory. (this is ignoring fast pinning for fixed/pinvoke)

Whether the effect of this change will be seen on short benchmarks - not sure. Fragmentation is typically a long term problem.

@stephentoub
Copy link
Member

Are these hot paths in any of the techempower benchmarks?

I don't think so. The socket APIs being impacted here are either a) part of the legacy APM implementations, b) part of the UDP-related APIs, or c) part of vectored sends/receives. (c) is valuable, but I don't believe it shows up in any kestrel benchmarks. (b) may eventually be useful, but we've not done a lot yet around these UDP APIs and there's a lot of other low-hanging fruit. I wouldn't spend much time at all on (a).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VSadov why does it make sense to use POH for SocketAddress.Buffer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question is to better understand when it is better to use SOH vs POH for objects that will be pinned.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this change I basically look for arrays pinned via GCHandleType.Pinned.
With few exceptions it is an indication that the object could be pinned for a nontrivial length of time, perhaps for its entire lifetime. SocketAddress.Buffer is one of such arrays.

mid/long term pinning in random locations could have really bad effects on Small Object Heap.
On the other hand - if the object is expected to end up in Gen1/Gen2, then there is not much downside from allocating on POH.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this can be simplified now to:

_wsaMessageBufferPinned ??= GC.AllocateUninitializedArray<byte>(sizeof(Interop.Winsock.WSAMsg), pinned: true);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto:

_wsaRecvMsgWSABufferArrayPinned ??= GC.AllocateUninitializedArray<WSABuffer>(1, pinned: true);

@VSadov
Copy link
Member Author

VSadov commented Apr 18, 2020

Rebased and skipped all the changes to HttpListener

@VSadov
Copy link
Member Author

VSadov commented Apr 18, 2020

Trying to hit this in TE benchmarks was a bit of a dud.
I can see how the changed lines could originally cause pinning problems, but I am not an expert enough in Web things to make sure the affected scenarios are frequently hit (or hit at all).

As I see from traces, there are lots of pinning handles created and destroyed (both regular and async), but that is for single buffers passed to socket layer from outside and such pinning is needed for correctness. Also, with Kestrel, what I typically see at the end of a handle is a manually managed chunk of memory from a slab in Large Object Heap. From the heap fragmentation point, such pinning is not a concern.

Performance-wise, I see no impact either.

@stephentoub
Copy link
Member

there are lots of pinning handles created and destroyed (both regular and async)

Hmm. Which TE benchmark? I'm a little surprised by that. I thought they were using prepinned arrays, and then creating Memory instances from them marked as already pinned such that Memory.Pin would nop.

@VSadov
Copy link
Member Author

VSadov commented Apr 19, 2020

One example: (regular pinned handle in DbFortunes with Postgre)

Name
 || + system.private.corelib.il!ThreadPoolWorkQueue.Dispatch
 ||  + system.private.corelib.il!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[System.Threading.Tasks.VoidTaskResult,Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol+<ProcessRequests>d__217`1[System.__Canon]].MoveNext(class System.Threading.Thread)
 ||   + system.private.corelib.il!ExecutionContext.RunFromThreadPoolDispatchLoop
 ||    + microsoft.aspnetcore.server.kestrel.core.il!Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol+<ProcessRequests>d__217`1[System.__Canon].MoveNext()
 ||     + microsoft.aspnetcore.server.kestrel.core.il!HttpProtocol.ProcessRequest
 ||      + system.private.corelib.il!AsyncMethodBuilderCore.Start
 ||       + microsoft.aspnetcore.server.kestrel.core.il!Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol+<ProcessRequest>d__218`1[System.__Canon].MoveNext()
 ||        + benchmarks!FortunesRawMiddleware.Invoke
 ||         + system.private.corelib.il!AsyncMethodBuilderCore.Start
 ||          + benchmarks!Benchmarks.Middleware.FortunesRawMiddleware+<Invoke>d__4.MoveNext()
 ||           + benchmarks!Benchmarks.Data.RawDb.LoadFortunesRows()
 ||            + system.private.corelib.il!AsyncMethodBuilderCore.Start
 ||             + benchmarks!Benchmarks.Data.RawDb+<LoadFortunesRows>d__10.MoveNext()
 ||              + npgsql!NpgsqlCommand.ExecuteDbDataReaderAsync
 ||               + system.private.corelib.il!AsyncMethodBuilderCore.Start
 ||                + npgsql!Npgsql.NpgsqlCommand+<ExecuteDbDataReaderAsync>d__97.MoveNext()
 ||                 + npgsql!NpgsqlCommand.ExecuteReaderAsync
 ||                  + npgsql!NpgsqlCommand.ExecuteReaderAsync
 ||                   + system.private.corelib.il!AsyncMethodBuilderCore.Start
 ||                    + npgsql!Npgsql.NpgsqlCommand+<ExecuteReaderAsync>d__102.MoveNext()
 ||                     + npgsql!NpgsqlDataReader.NextResultAsync
 ||                      + system.private.corelib.il!AsyncMethodBuilderCore.Start
 ||                       + npgsql!Npgsql.NpgsqlDataReader+<NextResult>d__44.MoveNext()
 ||                        + npgsql!Npgsql.NpgsqlConnector+<>c__DisplayClass160_0.<DoReadMessage>g__ReadMessageLong|0(value class Npgsql.DataRowLoadingMode,bool,bool)
 ||                         + system.private.corelib.il!AsyncMethodBuilderCore.Start
 ||                          + npgsql!Npgsql.NpgsqlConnector+<>c__DisplayClass160_0+<<DoReadMessage>g__ReadMessageLong|0>d.MoveNext()
 ||                           + npgsql!Npgsql.NpgsqlReadBuffer+<>c__DisplayClass34_0.<Ensure>g__EnsureLong|0()
 ||                            + system.private.corelib.il!AsyncMethodBuilderCore.Start
 ||                             + npgsql!Npgsql.NpgsqlReadBuffer+<>c__DisplayClass34_0+<<Ensure>g__EnsureLong|0>d.MoveNext()
 ||                              + npgsql!AwaitableSocket.ReceiveAsync
 ||                               + system.net.sockets.il!Socket.ReceiveAsync
 ||                                + system.net.sockets.il!SocketAsyncEventArgs.DoOperationReceiveSingleBuffer
 ||                                 + system.net.sockets.il!SocketAsyncEventArgs.ProcessIOCPResultWithSingleBufferHandle
 ||                                  + system.private.corelib.il!System.Memory`1[System.Byte].Pin()
 ||                                   + system.private.corelib.il!System.Runtime.InteropServices.GCHandle..ctor(class System.Object,value class System.Runtime.InteropServices.GCHandleType)

@VSadov
Copy link
Member Author

VSadov commented Apr 19, 2020

There is also async pinning, which, I think could be hard to avoid as long as Overlapped is used.

Here is an example of async pinning (in PlaintextNonPipelined):

   + coreclr!ThreadpoolMgr::CompletionPortThreadStart
   |+ coreclr!BindIoCompletionCallbackStub
   | + coreclr!BindIoCompletionCallbackStubEx
   |  + coreclr!ManagedThreadBase_DispatchOuter
   |   + coreclr!ManagedThreadBase_DispatchMiddle
   |    + coreclr!BindIoCompletionCallBack_Worker
   |     + coreclr!DispatchCallSimple
   |      + coreclr!CallDescrWorkerInternal
   |       + system.private.corelib.il!_IOCompletionCallback.PerformIOCompletionCallback
   |       |+ system.net.sockets!Socket.CompleteAccept
   |       ||+ system.private.corelib.il!System.Threading.Tasks.Task`1[System.__Canon].TrySetResult(!0)
   |       |||+ system.private.corelib.il!Task.RunContinuations
   |       ||| + system.private.corelib.il!AwaitTaskContinuation.RunOrScheduleAction
   |       |||  + system.private.corelib.il!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[System.__Canon,Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.SocketConnectionListener+<AcceptAsync>d__13].MoveNext(class System.Threading.Thread)
   |       |||  |+ system.private.corelib.il!ExecutionContext.RunInternal
   |       |||  | + microsoft.aspnetcore.server.kestrel.transport.sockets.il!Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.SocketConnectionListener+<AcceptAsync>d__13.MoveNext()
   |       |||  | |+ microsoft.aspnetcore.server.kestrel.transport.sockets.il!Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.Internal.SocketConnection..ctor(class System.Net.Sockets.Socket,class System.Buffers.MemoryPool`1,class System.IO.Pipelines.PipeScheduler,class Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.Internal.ISocketsTrace,value class System.Nullable`1,value class System.Nullable`1,bool)
   |       |||  | ||+ system.net.sockets!SocketAsyncEventArgs.InitializeInternals
   |       |||  | || + system.private.corelib.il!System.Threading.PreAllocatedOverlapped..ctor(class System.Threading.IOCompletionCallback,class System.Object,class System.Object)
   |       |||  | ||  + coreclr!AllocateNativeOverlapped
   |       |||  | ||   + coreclr!CreateHandleCommon
   |       |||  | ||    + coreclr!HndCreateHandle

@stephentoub
Copy link
Member

There is also async pinning, which, I think could be hard to avoid as long as Overlapped is used. Here is an example of async pinning (in PlaintextNonPipelined):

In plaintext, are you seeing async pinning only in accepts or also in sends/recvs? It's the latter I'm more interested in. Accepts are rare in those benchmarks. Sends/recvs dominate.

@davidfowl
Copy link
Member

Right there should be no pinning for send and receive in those benchmarks

@VSadov
Copy link
Member Author

VSadov commented Apr 19, 2020

That was the dominant pinning stack. Other cases are less frequent.
If send/receive would pin, I'd expect to see more stacks with that.

@VSadov
Copy link
Member Author

VSadov commented Apr 19, 2020

Yes, for plaintext they are all Accepts.
There are multiple pinning paths and it is not always an async handle, but one way or another all seems to be handling Accept.

@benaadams
Copy link
Member

The original Memory<T> needs to be created with the API MemoryMarshal.CreateFromPinnedArray to skip the GCHandle creation in Memory.Pin()

@VSadov
Copy link
Member Author

VSadov commented Apr 19, 2020

@benaadams - Also, in theory, if #33542 is implemented, MemoryMarshal could just check if an array happens to be in the pinned object heap.

@VSadov
Copy link
Member Author

VSadov commented Apr 19, 2020

So, what do we do with this PR?

It feels like the changes would be helpful to the code that hits allocating/pinning paths inside the Socket layer. At very least it reduces the ceremony around handles used to pin those allocations.

However, seeing that benchmarks evade such codepaths, thus making the effects hard to measure, I am not as excited about the change as when this started.

@stephentoub
Copy link
Member

So, what do we do with this PR?

I suggest we keep the changes in SocketAsyncEventArgs and ditch the rest. I'm interested in those specifically because a) if you're creating a SocketAsyncEventArgs, you're likely keeping it around for a while and probably pooling it, and b) the changes reduce the size of that object by eliminating all the GCHandle fields.

If we end end being interested in the others, I think it'd be good to do them as official lindividual PRs with dedicated scenario-based benchmarks to back them up. Doesn't need to be TE; those aren't necessarily real-world :-)

@VSadov
Copy link
Member Author

VSadov commented Apr 19, 2020

@stephentoub

I suggest we keep the changes in SocketAsyncEventArgs and ditch the rest.

Sounds good!

There is some entanglement between changes though -
Ex: If SocketAddress.Buffer is no longer pinned, then SocketAsyncEventArgs._socketAddressGCHandle will need to come back and all the machinery related to that ( calls to PinSocketAddressBuffer(), validation, cleanup, ).

Basically -SocketAddress.Buffer is not by itself a part of SocketAsyncEventArgs, but having that prepinned allows to drop some code and data in SocketAsyncEventArgs

Also, if we keep SocketAddress.Buffer as prepinned, then there is no point to bring back pinning in OverlappedAsyncResult

Do we actually want to "remove changes that do not benefit SocketAsyncEventArgs and keep the rest?"

@stephentoub
Copy link
Member

I'm not convinced that the changes to SocketAddress are good. There are a variety of places where temporary instances are created, are short lived, etc. Creating those as gen2 is potentially concerning, and I don't know that the unknown wins from it outweigh the unknown losses. I think it'd be better to start with just the changes isolated to SAEA. That reasonable?

@VSadov
Copy link
Member Author

VSadov commented Apr 20, 2020

Rebased and skipped everything not related to SocketAsyncEventArgs

Copy link
Member

@stephentoub stephentoub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@stephentoub stephentoub merged commit faebda7 into dotnet:master Apr 20, 2020
@karelz karelz added this to the 5.0.0 milestone Aug 18, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants