Skip to content

Ensure RunningEndpointInstance.DisposeAsync completes when host is disposed without StopAsync#7806

Merged
danielmarbach merged 10 commits into
Particular:masterfrom
SimonCropp:DisposeAsync-should-pass-a-bounded-cancellation-token-to-transport-Shutdown
Jun 11, 2026
Merged

Ensure RunningEndpointInstance.DisposeAsync completes when host is disposed without StopAsync#7806
danielmarbach merged 10 commits into
Particular:masterfrom
SimonCropp:DisposeAsync-should-pass-a-bounded-cancellation-token-to-transport-Shutdown

Conversation

@SimonCropp

@SimonCropp SimonCropp commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Problem

RunningEndpointInstance.DisposeAsync() calls StopCore() without a CancellationToken. StopCore forwards CancellationToken.None to receiveComponent.Stop, featureComponent.StopFeatures, and transportInfrastructure.Shutdown. If any of those does not return on its own, disposal hangs forever.

This manifests when an IHost is disposed without a prior StopAsync call, for example via using var host = ... followed by scope exit. The generic host's DisposeAsync only tears down the service provider and does not call StopAsync on hosted services. The normal lifecycle path (EndpointHostedService.StopAsync through BaseEndpointLifecycle.Stop) is unaffected because it properly threads the host's shutdown-timeout token.

The pattern that hits this is most common in integration tests, where a test spins up an IHost per test, exercises the endpoint, and lets the using scope tear it down without calling StopAsync first.

Design consideration

An earlier iteration of this PR attempted to apply a shutdown timeout within DisposeAsync, effectively emulating StopAsync semantics during disposal. This was reconsidered because it would introduce a second shutdown path that behaves differently depending on how the endpoint is hosted. The cancellation token in the stop path is part of the graceful shutdown contract: transports stop accepting new messages first and then allow in-flight processing to complete. Only when the token is cancelled does cooperative cancellation kick in. Making DisposeAsync mimic that process with its own timeout would blur the distinction between stopping and disposing.

The correct usage when manually controlling the host lifecycle remains:

await host.StartAsync();
try
{
    // ...
}
finally
{
    await host.StopAsync();
    await host.DisposeAsync();
}

Fix

Given the above, DisposeAsync now passes an already-canceled token to StopCore. This does not attempt graceful shutdown. Instead, it signals that the graceful shutdown period has elapsed and any ongoing operations should abort immediately if they participate in cooperative cancellation. This prevents DisposeAsync from hanging while avoiding a competing shutdown path with its own timeout semantics.

Exceptions from StopCore are caught and ignored since they are already logged and DisposeAsync should not throw. Resource cleanup (settings.Clear, stoppingTokenSource.Dispose, serviceProviderLease.DisposeAsync) runs in a finally block to ensure it always executes.

ReceiveComponent's constructor was promoted from private to internal so the new unit tests can build minimal instances directly. No public API surface is affected; RunningEndpointInstance, StartableEndpoint, and ReceiveComponent are all internal types.

Tests

  • Unit tests verify DisposeAsync passes a cancelable token to transport.Shutdown
  • Unit tests verify DisposeAsync completes even when transport.Shutdown hangs
  • Acceptance test verifies that disposing without stopping initiates immediate handler cancellation

@SimonCropp

Copy link
Copy Markdown
Contributor Author

@danielmarbach can u apply your async expertise on this?

@danielmarbach

danielmarbach commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

I don't think this classifies as a bug. We never supported cancelable dispose before so we will quite likely treat this as a feature/improvement request with lower priority. The hosted service lifecycle takes care of shutting things down properly.

What special thing are you doing in your integration tests that causes this because we cannot observe this across any of our test suites?

@danielmarbach

Copy link
Copy Markdown
Contributor

Ok I think I understand the scenario a bit better now. You are talking about an integration test scenario where you manage the host lifecycle by calling StartAsync and after a while you exit due to cancellation or some other "done condition" and only let the host be disposed of by the surrounding using pattern.

@SimonCropp

Copy link
Copy Markdown
Contributor Author

yeah the problem is the with a misbehaving transport it is effectively an infinite wait with no timeout.

i only noticed it once on one of our build servers, and coincidentally happened to have hangdump wired into that build while i was looking into something else, so i got enough info make a guess at what was happening.

we use localdb via https://github.com/SimonCropp/LocalDb for all our tests. so it is possible the "misbehaving transport" part comes from a bug in there

but either way i dont think infinite wait is a valid scenario in NSB when an using/exception/dispose sequence occurs

@danielmarbach

Copy link
Copy Markdown
Contributor

Thanks for investigating this and for putting together the PR.

After digging into this further, I do not think we should make this change.

The behavior you observed can indeed happen, but it appears to be a consequence of how the generic host lifecycle works. HostOptions.ShutdownTimeout is applied by IHost.StopAsync(...), not by IHost.DisposeAsync(). Looking at the hosting implementation, the shutdown timeout is wired up as part of the stop path:

if (_options.ShutdownTimeout != Timeout.InfiniteTimeSpan)
{
    cts = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken);
    cts.CancelAfter(_options.ShutdownTimeout);
    cancellationToken = cts.Token;
}

Whereas IHost.DisposeAsync() does not invoke StopAsync() and instead proceeds directly to disposing resources:

await DisposeAsync(contentRootFileProvider).ConfigureAwait(false);

if (!ReferenceEquals(contentRootFileProvider, _defaultProvider))
{
    await DisposeAsync(_defaultProvider).ConfigureAwait(false);
}

await DisposeAsync(Services).ConfigureAwait(false);

As a result, when a host is started manually and later only disposed without first calling StopAsync, the normal hosted-service shutdown sequence is skipped.

For NServiceBus transports, the cancellation token passed into the stop path is also not merely a timeout for resource cleanup. It is part of the graceful shutdown contract. We intentionally stop accepting new messages first and then allow in-flight message processing to complete successfully. Only when the supplied stop token is cancelled do we signal cooperative cancellation to the message processing operations.

Because of that, I am hesitant to introduce a second shutdown path through RunningEndpointInstance.DisposeAsync that attempts to emulate StopAsync semantics and applies a shutdown timeout there. This would make disposal behave differently depending on how the endpoint was hosted and would blur the distinction between stopping and disposing.

The correct usage when manually controlling the host lifecycle remains:

await host.StartAsync();

try
{
    // ...
}
finally
{
    await host.StopAsync();
    await host.DisposeAsync();
}

This does not mean the observed behavior is ideal. However, fixing it by changing endpoint disposal semantics would effectively make DisposeAsync behave like StopAsync in some hosting scenarios, introducing a second shutdown path with subtly different guarantees.

Given the above, I do not think we should merge this PR.

@DavidBoike @andreasohlund thoughts?

@SimonCropp

Copy link
Copy Markdown
Contributor Author

@danielmarbach I think u r correct on the specific objection, but perhaps that points at a cleaner fix than what I pushed.

I agree DisposeAsync shouldn't take a ShutdownTimeout budget and do a graceful drain. That does emulate StopAsync and forks shutdown into two paths with subtly different guarantees.

But "disposed without a prior StopAsync" isn't undefined territory — the framework already says what it should do, and it isn't "wait forever." BackgroundService.Dispose() is just _stoppingCts?.Cancel():
https://github.com/dotnet/runtime/blob/main/src/libraries/Microsoft.Extensions.Hosting.Abstractions/src/BackgroundService.cs

When a hosted service is disposed without a prior stop, the in-flight token is cancelled immediately — not awaited gracefully. And per the StopAsync docs the token "indicates that the shutdown process should no longer be graceful," which is exactly the NSB stop contract you described. Kestrel draws the same line in its own source: StopAsync is commented "Graceful shutdown if possible" and the dispose path "Ungraceful shutdown":
https://github.com/dotnet/aspnetcore/blob/main/src/Servers/Kestrel/Core/src/KestrelServer.cs

So perhaps the fix isn't to make DisposeAsync graceful-with-timeout — it's the opposite. Pass an already-cancelled (or near-zero) token so disposal is explicitly the ungraceful path. That doesn't blur stop vs dispose, it sharpens the distinction the framework already draws:

  • StopAsync = graceful, bounded by HostOptions.ShutdownTimeout
  • DisposeAsync = non-graceful, signal cooperative cancellation and clean up

The only thing the current code gets wrong is passing CancellationToken.None into a path that — unlike Host.StopAsync, which re-bounds None with ShutdownTimeout — never re-bounds it. So a misbehaving transport hangs forever instead of being told to stop. Note the host extensions pass None on purpose precisely because the receiver re-bounds it; StopCore doesn't, which is why the same None is safe there and unsafe here:
https://github.com/dotnet/runtime/blob/main/src/libraries/Microsoft.Extensions.Hosting.Abstractions/src/HostingAbstractionsHostExtensions.cs

But happy to hand over any further changes to u. close the PR if u dont need it

@danielmarbach

Copy link
Copy Markdown
Contributor

@SimonCropp that's an interesting angle. Let me think about it

…een dispose and shutdown. Instead cancel the token immediately on dispose.
@danielmarbach danielmarbach force-pushed the DisposeAsync-should-pass-a-bounded-cancellation-token-to-transport-Shutdown branch from e78714a to 255bc0f Compare June 9, 2026 15:24
@danielmarbach danielmarbach force-pushed the DisposeAsync-should-pass-a-bounded-cancellation-token-to-transport-Shutdown branch from ad1ca4b to 975c976 Compare June 9, 2026 15:42
{
await StopCore(cancellationToken).ConfigureAwait(false);
}
catch (OperationCanceledException) when (cancellationToken.IsCancellationRequested)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simon added a log warn here but I'm not sure if log warn is really providing useful context

Another angle worth considering is whether StopCore should handle the cancellation gracefully. Currently it rethrows so the public behavior of the paths that call Stop is to rethrow if the transport shutdown throws. I was hesitant to change this too, but technically we could argue it shouldn't rethrow, but I think this is another vector of change. For disposal, it definitely should not.

// Empty ReceiveComponent (no receivers) so Stop is a no-op and StopCore reaches
// transport.Shutdown without us having to stand up the full receive pipeline.
static ReceiveComponent CreateEmptyReceiveComponent() =>
new(configuration: null!, activityFactory: null!, endpointLogSlot: null!);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this ctor hack

{
[Test]
[CancelAfter(30000)]
public async Task Should_initiate_immediate_handler_cancellation(CancellationToken cancellationToken = default) =>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured this doesn't require an assertion, but if you want, I can add an assert that it doesn't throw a cancellation exception back in case we strictly want to follow the assertion pattern. That being said the ATT framework would fail anyway.

@danielmarbach

Copy link
Copy Markdown
Contributor

@andreasohlund @DavidBoike #7806 (comment) I think what Simon mentioned makes sense, and I pushed an update that goes in that direction. Please take a look.

@danielmarbach

Copy link
Copy Markdown
Contributor

Before we merge this, I would like to update the title and the description.

@SimonCropp

Copy link
Copy Markdown
Contributor Author

@danielmarbach happy for you to take full control of this PR

@danielmarbach danielmarbach changed the title Bound RunningEndpointInstance.DisposeAsync shutdown wait Prevent RunningEndpointInstance.DisposeAsync from hanging when transport shutdown stalls Jun 11, 2026
@danielmarbach danielmarbach changed the title Prevent RunningEndpointInstance.DisposeAsync from hanging when transport shutdown stalls Ensure RunningEndpointInstance.DisposeAsync completes when host is disposed without StopAsync Jun 11, 2026
@danielmarbach danielmarbach merged commit 42c08d4 into Particular:master Jun 11, 2026
4 checks passed
danielmarbach added a commit that referenced this pull request Jun 11, 2026
…sposed without StopAsync (#7806) (#7818)

* DisposeAsync should pass a bounded cancellation token to transport Shutdown

* .

* Update RunningEndpointInstance.cs

* .

* ,

* Update StartableEndpoint.cs

* Update RunningEndpointInstance.cs

* Remove shutdown token timeout since it would create akward split between dispose and shutdown. Instead cancel the token immediately on dispose.

* Core acceptance test

* Ignore exceptions during DisposeAsync to prevent throwing from disposal process

---------

Co-authored-by: Simon Cropp <simon.cropp@gmail.com>
Co-authored-by: Daniel Marbach <danielmarbach@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants