-
Notifications
You must be signed in to change notification settings - Fork 4.9k
SSLStream - Make read side async, and remove a bunch of allocations #24497
Conversation
|
Context this is using your code @stephentoub from https://github.com/dotnet/corefx/issues/12037#issuecomment-333951933 OriginalAfter change |
|
Bug fix is #24492 ? Those allocation traces are wildly different /cc @stephentoub |
|
Correct this includes #24492 updated top comment to reflect. Am wondering how that ReadAsync struct is getting boxed? If I can find that the allocations should shrink again. |
|
It might be happening in the state machines? I will keep hunting |
|
Will box when it goes actually async; its how it sticks it on the heap and lifts it from the stack. Although is the |
|
"Display class"es aren't generated by async. And all runtime boxing that used to be done by async should be gone entirely of you use a more recent coreclr. |
|
Could be the async local functions if not all state is passed in params dotnet/roslyn#18946 |
|
If you push the lamba state through the TCS does that get rid of the display class (or at least name it)? tcs = new TaskCompletionSource<int>(new ArraySegment<byte>(buffer, offset, count), TaskCreationOptions.RunContinuationsAsynchronously)
// ...
return new ValueTask<int>(tcs.Task.ContinueWith(
t =>
{
var seg = (ArraySegment<byte>)t.AsyncState;
CheckOldKeyDecryptedData(seg.Array, seg.Offset, seg.Count);
}));If it goes away completely, the the capture happens to soon (bug for roslyn?) |
|
Good thinking by @benaadams . I actually took it one step further just put the arraysegment in the state and then added to the pattern matcher at the other end the case of if the state is != null which avoids the Lambda completely. Trace showed the class to completely disappear. Although the Task boxing showed back up so need to check my profile settings/runtime on that last run because there were gone just before :) |
|
Now that the allocation is gone, I launched a bug on Rosyln, because that allocation shouldn't have happened. |
|
Fixed the problem with the boxed adapter. Will re-run the tests after dinner. |
|
We should hold off on this PR until #24389 is fully merged. |
| } | ||
|
|
||
| _lockReadState = LockPendingRead; | ||
| TaskCompletionSource<int> tcs = new TaskCompletionSource<int>(new ArraySegment<byte>(buffer, offset, count), TaskCreationOptions.RunContinuationsAsynchronously); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably this is rare and this doesn't really matter, but the ArraySegment is going to be boxes here. Instead you could derive a type from TCS that has the state on it as fields.
| queuedStateRequest = null; | ||
| switch (obj) | ||
| { | ||
| case null: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How common is null? If it's common, you might save a write through the ref by checking this first and then only writing null to queuedStateRequest after that if it's not null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Null is super common so I put in early exit on null
| } | ||
|
|
||
| public override Task<int> ReadAsync(byte[] buffer, int offset, int count, CancellationToken cancellationToken) | ||
| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the Memory-based overload? Are you doing that separately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was going to but it makes sense to do it right now. So its done :)
| _sslState = sslState; | ||
| } | ||
|
|
||
| public ValueTask<int> ReadAsync(byte[] buffer, int offset, int count) => new ValueTask<int>(_sslState.InnerStream.ReadAsync(buffer, offset, count, _cancellationToken)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use the Memory overload here? If it completes synchronously, that'll save a task allocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
This is still calling the array overload. Isn't this just wrapping a Task in a ValueTask?
| copyBytes = CopyDecryptedData(buffer, offset, count); | ||
| _nestedRead = 0; | ||
| return copyBytes; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not: could use a blank line below this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
| return copyBytes; | ||
| } | ||
| copyBytes = await adapter.LockAsync(buffer, offset, count); | ||
| try |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All awaits need ConfigureAwait(false)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| if (status.ErrorCode != SecurityStatusPalErrorCode.OK) | ||
| { | ||
| byte[] extraBuffer = null; | ||
| if (_decryptedBytesCount != 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this come from the pool? Or is it handed out to user code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be pooled but its going to add complexity and it only happens if there is handshake or alert happening at the same time (super rare). The whole "lock during handshake and read/write at the sametime" was something I wanted to look at in a single PR. I was actually avoiding the handshake code and SSLState due to the PR on the options bag etc.
I have done some very basic benching of the handshake but don't have 100% solid benchmarks for it yet. But its on the books for the future ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alert or handshake that starts while writing and you end up with part of the handshake frame during the read....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks.
| { | ||
| throw; | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be "catch (Exception e) when (!(e is IOException))"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
Note: fixing git history from over squash ... Fixed history |
|
@dotnet-bot Test Outerloop Windows x64 Debug Build |
|
@stephentoub I am seeing a decent number of tests fail around cancelation (like the error message is changing) is this because now I actually support cancellation? And before cancellation did nothing on SslStream so you had to wait for a complete frame to finish? I seem to think this is the case, I can remove the actual support for cancellation but that kind of seems like a backwards step? |
Which tests? |
|
eg Although it might not be the case as well. The only real way for me to check it to run an outerloop on another PR (my old standby ;) ) or to disable the cancellation in my code and resubmit and re-run the outerloop. |
|
Actually looking at it, maybe it isn't my failure because it runs on osx? |
|
@dotnet-bot Test Outerloop Windows x64 Debug Build Lets have a second bite of the apple. |
|
cc: @geoffkizer |
|
@dotnet-bot test Outerloop Linux x64 Debug Build please |
|
The errors seem random and unrelated with the outerloop being dodgy at the time so I will give it one last spin through the grinder now that it's stable before digging into the code @dotnet-bot test Outerloop Linux x64 Debug Build please |
|
Might want to rebase; then you'll get that CROSS Check run to run again (whatever it is?) |
|
I will do that after this outerloop has finished otherwise it will have to start again... I just wanna know at this point. But it does seem that it's failing on System.Memory tests again..... |
|
@dotnet-bot test Outerloop Linux x64 Debug Build please |
|
@dotnet-bot Test Outerloop Windows x64 Debug Build please |
|
Windows.7.Amd64.Open:Debug-x64 At a guess... would be the ParserTests.2gbOverflow which needs 4 * 2BG > 8 GB to run |
|
Considering the last successful outerloop on windows was 20 days ago (does no one do an outerloop before merging?) and this test is failing on every PR that has been run since, I would say unrelated /cc @stephentoub @Priya91 looks like this is passing all tests that will actually ever pass or are related. This is a better PR to go in before the throttle due to the fact that we are only throttling one side, which is a bit more of a "hack" than I would normally say is a good idea. As for the issues around Cancellation, with this PR in there, and some more refactoring I think the handshake itself is not far from being able to be made Async, and then you get cancellation and a much tidier more manageable set of code to deal with, and debug/edit/change/update. Anyway, ball is in your court on these PR's |
|
System.Memory outerloop I think is a new thing; the > 8GB test killed my machine when it was in corefxlabs until it was moved out of line into outerloop (I only have 8GB - it makes Windows very unhappy; memory compression and swap start fighting each other) |
|
It probably should check available physical memory before allocating it |
|
add reference issue #25254 |
|
@dotnet-bot test Outerloop Windows x64 Debug Build please |
|
@dotnet-bot Test Outerloop Windows x64 Debug Build |
|
@Drawaes, this is ready to go from your perspective, right? |
|
Yeah, it's been a journey but it's ready from my perspective :) |
| } | ||
| else | ||
| { | ||
| ThreadPool.QueueUserWorkItem(new WaitCallback(CompleteRequestWaitCallback), obj); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are modifying this code to now do, AsyncResumeHandshake, instead of CompleteRequestWaitCallback, in the new HandleQueuedCallback. When I looked at it, this code will be hit during re-negotiation request on read/write. If you are confident this codepath will not be hit with your changes, remove the default case in the switch in the new function, so that if breaks we know we shouldn't change this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Drawaes This difference in behavior also existed in the WriteAsync PR, I had pointed it out in one of my previous PRs and you mentioned that this codepath will not be hit. Can you also verify with a re-negotiation test..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this behaviour different to what was there previously?
As for test coverage well... In a word, no I brought up the lack of re-negotiation tests a while ago in the issue #24407 . Its not something you can do from .net and I don't know how to sort this out in your outerloop infra. So it will need to be done by a microsoftie, it will either need to be an "OpenSSL" server setup or IIS with certificate validation on a certain path to trigger the renegotiation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this behaviour different to what was there previously?
The function HandleQueuedCallback is doing ThreadPool.QueueUserWorkItem(new WaitCallback(AsyncResumeHandshake), obj); instead of ThreadPool.QueueUserWorkItem(new WaitCallback(CompleteRequestWaitCallback), obj); which this code did before replacing with HandleQueuedCallback.
As for test coverage well...
I was hinting towards using .netcore client sslstream, and testing with some thirdparty sslstream test server that supports re-negotiation. We need to do a sanity check to ensure this scenario is not broken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I have done that locally, it works, but I can't "productionize" this as it would require doing things on the outerloop servers that I just don't know how (or think I even can) do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I have done that locally, it works
If that is the case, and the default: case in the switch statement is not hit, remove that code from the default case, since it's not reachable, and the logic doesn't make sense to be there in the HandleQueuedCallback function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't I will try to explain with a bit of code (I am happy to put it back as three methods if it is a sticking point but I find code makes it easier to understand)
Originally there were three bits of code doing this
// Finish Handshake Read
if (obj is LazyAsyncResult)
{
((LazyAsyncResult)obj).InvokeCallback();
}
else
{
ThreadPool.QueueUserWorkItem(new WaitCallback(CompleteRequestWaitCallback), obj);
}
// Finish Write
if (obj is LazyAsyncResult)
{
// Sync handshake is waiting on other thread.
((LazyAsyncResult)obj).InvokeCallback();
}
else
{
// Async handshake is pending, start it on other thread.
// Consider: we could start it in on this thread but that will delay THIS write completion
ThreadPool.QueueUserWorkItem(new WaitCallback(AsyncResumeHandshake), obj);
}
// FinishHandshake
if (obj is LazyAsyncResult)
{
// Sync write is waiting on other thread.
((LazyAsyncResult)obj).InvokeCallback();
}
else
{
// Async write is pending, start it on other thread.
// Consider: we could start it in on this thread but that will delay THIS handshake completion
ThreadPool.QueueUserWorkItem(new WaitCallback(CompleteRequestWaitCallback), obj);
}Now the two completerequestwaitcallback bits of code will now have been replaced with a TaskCompletionSource as they are Async not APM. There is also code at the top I have left out for brevity but its basically a null check and null out of the "obj" which is identical.
So we can see the "Is LazyAsyncResult" is identical so it transfers directly to the switch.
The next bit, as said is a TCS for the Finish handshake read and finish handshake (because it would be the read/write that queued while they happened). So that only leaves the last Threadpool APM bit of code, which is the "Default".
I would hope to eliminate the default case with a change to async/await for the handshake (as well as the TCS's as it should be able to just be a SemaphoreSlim for the read/write locks instead of all of the Interlocks, and extra baggage). This is something I am working on, and it massively cleans up this code (in my personal opinion). Its just a case of it had to start somewhere and this is the point that the async/await and the APM collide.
|
This has been open for feedback long enough. I'm going to go ahead and merge it to unblock further efforts in this area. That'll also provide more runway to address any fallout. Thanks, @Drawaes. |
|
Cool, I will rework the throttle then to include both sides in the Semaphore slim, and then redo benchmarks/stress tests for win and Linux to check where the perf lands. |
|
Thanks |
…otnet/corefx#24497) * Changed from APM to Task * Reacting to review * Fix EOF bug * Fix override issue * Added <XunitShowProgress>true</XunitShowProgress> * Fixing merge Commit migrated from dotnet/corefx@6bfca66



Includes bug fix from writeside. (#24492)
Benches/Traces to follow.