Ensure RunningEndpointInstance.DisposeAsync completes when host is disposed without StopAsync#7806
Conversation
|
@danielmarbach can u apply your async expertise on this? |
|
I don't think this classifies as a bug. We never supported cancelable dispose before so we will quite likely treat this as a feature/improvement request with lower priority. The hosted service lifecycle takes care of shutting things down properly. What special thing are you doing in your integration tests that causes this because we cannot observe this across any of our test suites? |
|
Ok I think I understand the scenario a bit better now. You are talking about an integration test scenario where you manage the host lifecycle by calling |
|
yeah the problem is the with a misbehaving transport it is effectively an infinite wait with no timeout. i only noticed it once on one of our build servers, and coincidentally happened to have hangdump wired into that build while i was looking into something else, so i got enough info make a guess at what was happening. we use localdb via https://github.com/SimonCropp/LocalDb for all our tests. so it is possible the "misbehaving transport" part comes from a bug in there but either way i dont think infinite wait is a valid scenario in NSB when an using/exception/dispose sequence occurs |
|
Thanks for investigating this and for putting together the PR. After digging into this further, I do not think we should make this change. The behavior you observed can indeed happen, but it appears to be a consequence of how the generic host lifecycle works. if (_options.ShutdownTimeout != Timeout.InfiniteTimeSpan)
{
cts = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken);
cts.CancelAfter(_options.ShutdownTimeout);
cancellationToken = cts.Token;
}Whereas await DisposeAsync(contentRootFileProvider).ConfigureAwait(false);
if (!ReferenceEquals(contentRootFileProvider, _defaultProvider))
{
await DisposeAsync(_defaultProvider).ConfigureAwait(false);
}
await DisposeAsync(Services).ConfigureAwait(false);As a result, when a host is started manually and later only disposed without first calling For NServiceBus transports, the cancellation token passed into the stop path is also not merely a timeout for resource cleanup. It is part of the graceful shutdown contract. We intentionally stop accepting new messages first and then allow in-flight message processing to complete successfully. Only when the supplied stop token is cancelled do we signal cooperative cancellation to the message processing operations. Because of that, I am hesitant to introduce a second shutdown path through The correct usage when manually controlling the host lifecycle remains: await host.StartAsync();
try
{
// ...
}
finally
{
await host.StopAsync();
await host.DisposeAsync();
}This does not mean the observed behavior is ideal. However, fixing it by changing endpoint disposal semantics would effectively make Given the above, I do not think we should merge this PR. @DavidBoike @andreasohlund thoughts? |
|
@danielmarbach I think u r correct on the specific objection, but perhaps that points at a cleaner fix than what I pushed. I agree But "disposed without a prior StopAsync" isn't undefined territory — the framework already says what it should do, and it isn't "wait forever." When a hosted service is disposed without a prior stop, the in-flight token is cancelled immediately — not awaited gracefully. And per the So perhaps the fix isn't to make
The only thing the current code gets wrong is passing But happy to hand over any further changes to u. close the PR if u dont need it |
|
@SimonCropp that's an interesting angle. Let me think about it |
…een dispose and shutdown. Instead cancel the token immediately on dispose.
e78714a to
255bc0f
Compare
ad1ca4b to
975c976
Compare
| { | ||
| await StopCore(cancellationToken).ConfigureAwait(false); | ||
| } | ||
| catch (OperationCanceledException) when (cancellationToken.IsCancellationRequested) |
There was a problem hiding this comment.
Simon added a log warn here but I'm not sure if log warn is really providing useful context
Another angle worth considering is whether StopCore should handle the cancellation gracefully. Currently it rethrows so the public behavior of the paths that call Stop is to rethrow if the transport shutdown throws. I was hesitant to change this too, but technically we could argue it shouldn't rethrow, but I think this is another vector of change. For disposal, it definitely should not.
| // Empty ReceiveComponent (no receivers) so Stop is a no-op and StopCore reaches | ||
| // transport.Shutdown without us having to stand up the full receive pipeline. | ||
| static ReceiveComponent CreateEmptyReceiveComponent() => | ||
| new(configuration: null!, activityFactory: null!, endpointLogSlot: null!); |
There was a problem hiding this comment.
I'm fine with this ctor hack
| { | ||
| [Test] | ||
| [CancelAfter(30000)] | ||
| public async Task Should_initiate_immediate_handler_cancellation(CancellationToken cancellationToken = default) => |
There was a problem hiding this comment.
I figured this doesn't require an assertion, but if you want, I can add an assert that it doesn't throw a cancellation exception back in case we strictly want to follow the assertion pattern. That being said the ATT framework would fail anyway.
|
@andreasohlund @DavidBoike #7806 (comment) I think what Simon mentioned makes sense, and I pushed an update that goes in that direction. Please take a look. |
|
Before we merge this, I would like to update the title and the description. |
|
@danielmarbach happy for you to take full control of this PR |
RunningEndpointInstance.DisposeAsync shutdown wait…sposed without StopAsync (#7806) (#7818) * DisposeAsync should pass a bounded cancellation token to transport Shutdown * . * Update RunningEndpointInstance.cs * . * , * Update StartableEndpoint.cs * Update RunningEndpointInstance.cs * Remove shutdown token timeout since it would create akward split between dispose and shutdown. Instead cancel the token immediately on dispose. * Core acceptance test * Ignore exceptions during DisposeAsync to prevent throwing from disposal process --------- Co-authored-by: Simon Cropp <simon.cropp@gmail.com> Co-authored-by: Daniel Marbach <danielmarbach@users.noreply.github.com>
Problem
RunningEndpointInstance.DisposeAsync()callsStopCore()without aCancellationToken.StopCoreforwardsCancellationToken.NonetoreceiveComponent.Stop,featureComponent.StopFeatures, andtransportInfrastructure.Shutdown. If any of those does not return on its own, disposal hangs forever.This manifests when an
IHostis disposed without a priorStopAsynccall, for example viausing var host = ...followed by scope exit. The generic host'sDisposeAsynconly tears down the service provider and does not callStopAsyncon hosted services. The normal lifecycle path (EndpointHostedService.StopAsyncthroughBaseEndpointLifecycle.Stop) is unaffected because it properly threads the host's shutdown-timeout token.The pattern that hits this is most common in integration tests, where a test spins up an
IHostper test, exercises the endpoint, and lets the using scope tear it down without callingStopAsyncfirst.Design consideration
An earlier iteration of this PR attempted to apply a shutdown timeout within
DisposeAsync, effectively emulatingStopAsyncsemantics during disposal. This was reconsidered because it would introduce a second shutdown path that behaves differently depending on how the endpoint is hosted. The cancellation token in the stop path is part of the graceful shutdown contract: transports stop accepting new messages first and then allow in-flight processing to complete. Only when the token is cancelled does cooperative cancellation kick in. MakingDisposeAsyncmimic that process with its own timeout would blur the distinction between stopping and disposing.The correct usage when manually controlling the host lifecycle remains:
Fix
Given the above,
DisposeAsyncnow passes an already-canceled token toStopCore. This does not attempt graceful shutdown. Instead, it signals that the graceful shutdown period has elapsed and any ongoing operations should abort immediately if they participate in cooperative cancellation. This preventsDisposeAsyncfrom hanging while avoiding a competing shutdown path with its own timeout semantics.Exceptions from
StopCoreare caught and ignored since they are already logged andDisposeAsyncshould not throw. Resource cleanup (settings.Clear,stoppingTokenSource.Dispose,serviceProviderLease.DisposeAsync) runs in afinallyblock to ensure it always executes.ReceiveComponent's constructor was promoted from private tointernalso the new unit tests can build minimal instances directly. No public API surface is affected;RunningEndpointInstance,StartableEndpoint, andReceiveComponentare all internal types.Tests
DisposeAsyncpasses a cancelable token totransport.ShutdownDisposeAsynccompletes even whentransport.Shutdownhangs