feat(mcp_catalog): block enable_remote_mcp_server until the server is connected#2973

trungutt · 2026-06-02T16:34:45Z

Issues on main this PR fixes

1. The user has to re-ask their original question after enabling a server

enable_remote_mcp_server's handler defers the MCP connect (and any OAuth handshake) to the next Tools() enumeration. The tool's result text and the catalog Instructions block both tell the model "tools appear on your next turn". Every model reading that ends its turn after enable with a "please try again" message; the user then has to re-send their original question for the activated server's tools to actually be invoked.

User : List all Notion posts by David
Model: <calls search → calls enable("notion")>
       Notion is being set up. Please try again on your next message.
User : List all Notion posts by David          ← forced re-ask
Model: <now actually uses notion_* tools>

This was never a runtime constraint — getTools rebuilds the tool list at the top of every iteration of the run loop, and reprobe (with its regression test TestReprobe_NewToolsAvailableAfterToolCall) explicitly supports new tools appearing mid-stream. The deferred-Start design was an artifact: registering the toolset was cheap, but the prompt then had to admit the runtime couldn't guarantee tools would actually be available, since OAuth might be declined or the server might refuse the connection.

After this PR handleEnable drives Start synchronously inside the tool call. OAuth elicitation surfaces a dialog and blocks until the user responds; the success / failure result reaches the model at the point of the call, in the same RunStream. The Instructions block and the tool description are rewritten to match (no more "appears on your next turn" wording on the happy path).

2. OAuth is still required on the next question even if the previous OAuth failed

If the user stops a turn while the OAuth dialog is parked (the browser-side flow failed externally, the user gave up, etc.), Start returns context.Canceled. On main this falls into a generic warning-log branch and the entry stays in t.enabled in an unstarted state. The next user message — typically unrelated to the abandoned server — calls Tools(), which silently re-Starts the wrapper, hits 401, and re-pops the same dialog the user just abandoned.

The root cause is a lifetime mismatch: the catalog Toolset is owned by the RuntimeSession, not the RunStream, so Toolset.Stop does NOT run between a stopped turn and the user's next message.

                  RuntimeSession  (lives across many user messages)
┌─────────────────────────────────────────────────────────────┐
│  RunStream #1   │   RunStream #2   │   RunStream #3         │
│   Stop here─┘                                               │
│                                                             │
│  mcpcatalog.Tools()        runs at the start of every       │
│                            RunStream — iterates t.enabled   │
│                            and Start()s any unstarted entry │
│                                                             │
│  mcpcatalog.Toolset.Stop() runs ONCE, at session teardown   │
└─────────────────────────────────────────────────────────────┘

context.Canceled is now routed through disableAfterDecline (the same path that already handles an explicit OAuthDeclined) in both code paths that can Start a catalog server — the synchronous handleEnable and the per-turn Tools() iteration. The model-facing result tells the model the user stopped the turn while authorization was pending and to only retry if the user asks.

Public docs at docs/tools/mcp-catalog/index.md are updated for the new contract.

… connected The deferred-Start semantics of enable_remote_mcp_server pushed the OAuth handshake (and any connect error) out of the tool handler and into the next Tools() enumeration. The model therefore had to guess what would happen on its next turn, and the tool's result text + Instructions block told it to wait. In practice, models reading those instructions ended their turn after enable, asking the user to retry — which forced the user to re-ask their original question even though the runtime can already pick up new tools mid-RunStream via the top-of-loop getTools(). Drive Start synchronously from the handler so enable returns a deterministic success / OAuth-declined / auth-required / transport-error result in the same turn. Tool calls run with an interactive context, so OAuth elicitation surfaces a dialog and blocks until the user responds; on success the model has the activated tools available on its very next response within the same RunStream and continues with the user's original request — no second user message needed. Missing api_key env vars / unresolved ${VAR} headers now block enable with an error result instead of being attached as a warning to a success result; the runtime already knows the server will reject the connection, so registering it would only lead the model to hallucinate access.

The idempotent guard at the top of handleEnable short-circuited every re-enable on a registered entry, regardless of whether the previous Start had actually completed. This created two dead-ends the model could not escape on its own: - The AuthorizationRequired branch tells the model to call enable again to surface a fresh OAuth dialog, but the second call was intercepted by the guard with an "already enabled but not yet connected" message. The dialog never re-appeared. - A context.Canceled mid-handshake (Ctrl+C, soft per-turn cancel) left an unstarted entry behind. A subsequent enable hit the same guard. Recovery relied on Toolset.Stop firing between turns to clean the entry up — an invariant the catalog can't enforce. Both paths converge on the same fix: detect the existing-but-unstarted case in the guard and re-attempt Start on the wrapper. The fresh-enable path keeps its existing handler-attachment and tools_changed notify; the retry path reuses the wrapper as-is, avoiding a spurious tool-surface-change signal. StartableToolSet.Start is idempotent and single-flight, so the retry re-invokes the inner connect without re-running the bookkeeping when the previous attempt left the wrapper in the not-started state.

trungutt · 2026-06-03T06:16:07Z

+		return tools.ResultError(fmt.Sprintf(
+			"user declined the authorization dialog for %q (%s). No tools were activated — do NOT claim the server is connected and do NOT call any %q tools. Tell the user the request needs them to authorize the connection. If the user then says \"yes\", \"retry\", or re-asks for the same thing, call %s for %q again to surface a fresh authorization dialog.",
+			id, server.Title, id+"_", ToolNameEnable, id))
+	case mcp.IsAuthorizationRequired(err):


Good catch — fixed in 33d73f5.

The root cause was the same for both your comments: the top-of-handleEnable guard short-circuited any entry already in t.enabled regardless of IsStarted(). Now the guard detects existing-but-unstarted entries and falls through to a fresh Start attempt on the same wrapper, so the AuthorizationRequired branch's "call enable again to surface a fresh authorization dialog" instruction actually does something.

StartableToolSet.Start is already idempotent and single-flight: when the previous attempt left the wrapper not-started, the retry re-invokes the inner Start without re-creating the toolset or re-firing tools_changed. The "already enabled but not yet connected" dead-end message is gone.

Regression test: TestEnableRetriesStartOnExistingUnstartedEntry.

trungutt · 2026-06-03T06:16:09Z

+		return tools.ResultSuccess(fmt.Sprintf(
+			"enable requested for %q (%s); authorization is required and the host will surface the dialog. On your next turn, if tools whose names start with %q appear in your available tools, proceed with the user's original request using them. If NO such tools appear, the user dismissed the dialog — tell them the request needs them to authorize, and call %s for %q again if they want to retry.",
+			id, server.Title, id+"_", ToolNameEnable, id))
+	case errors.Is(err, context.Canceled):


Fixed in 33d73f5 — same root cause as the AuthorizationRequired comment above.

The context.Canceled branch still intentionally leaves the entry in t.enabled (rolling back would race a Ctrl+C-driven Toolset.Stop on the same map), but the dead-end you flagged is gone: the next enable for the same id now falls through the guard and re-attempts Start on the existing wrapper. The "call enable_remote_mcp_server again to retry" hint in the cancel result text has somewhere to land.

I also added a sentence to the branch comment so the invariant the design relies on (guard retries unstarted entries) is documented at the failure site, not just at the guard.

Regression test: TestEnableRetriesStartAfterCancellation.

…ndleEnable The previous comments on the context.Canceled branch invoked a "softer per-turn cancellation / timeout" path to justify the no-rollback choice, but no such path exists today — in a tool handler, ctx only cancels when the whole RunStream is shutting down. Reword the comments to state that fact directly and lean on the existing safety properties (Toolset.Stop cleans up the entry, top-of-function guard would re-Start an unstarted wrapper anyway) without speculating about hypothetical callers.

trungutt · 2026-06-03T07:16:20Z

+	// the guard.
 	t.mu.Lock()
-	if _, exists := t.enabled[id]; exists {
+	wrapped, alreadyEnabled := t.enabled[id]


I don't think this race exists — the lookup and the insert both happen inside the same t.mu critical section, so two goroutines for the same id can't both observe alreadyEnabled=false:

t.mu.Lock() // ← serializes wrapped, alreadyEnabled := t.enabled[id] // read ... if !alreadyEnabled { mcpToolset := mcp.NewRemoteToolset(...) ... wrapped = tools.NewStartable(mcpToolset) t.enabled[id] = wrapped // write — still under lock notify = t.toolsChangedHandler } t.mu.Unlock()

Whichever goroutine acquires the lock first does the read, creates the toolset, inserts it, and releases. The second goroutine then acquires, finds the just-inserted entry, observes alreadyEnabled=true, and either returns the "already enabled and connected" fast path or falls through to the Start retry — which delegates to StartableToolSet.Start, itself documented as single-flight (it holds an internal mutex and short-circuits if s.started is already true). So we get at most one NewRemoteToolset, one notify(), and one in-flight Start per id.

I don't want to plumb a "pending" sentinel through this path just to defend against a race the lock already prevents — happy to revisit if I'm missing a scenario where the lock is released between the read and the write.

trungutt · 2026-06-03T07:16:23Z

+		return tools.ResultError(fmt.Sprintf(
+			"user declined the authorization dialog for %q (%s). No tools were activated — do NOT claim the server is connected and do NOT call any %q tools. Tell the user the request needs them to authorize the connection. If the user then says \"yes\", \"retry\", or re-asks for the same thing, call %s for %q again to surface a fresh authorization dialog.",
+			id, server.Title, id+"_", ToolNameEnable, id))
+	case mcp.IsAuthorizationRequired(err):


The runtime already has a generic safeguard for this: the loop detector at pkg/runtime/loop.go (NewLoopDetector(loopThreshold, ...), default 5) terminates the agent with ErrorCodeLoopDetected after N consecutive identical tool calls. A model that keeps re-issuing enable_remote_mcp_server with the same id would trip it long before this turned into a real infinite loop, and the failure surface is already user-visible.

I don't think adding a per-id retry counter inside the catalog earns its keep when the runtime already handles "model is stuck calling the same tool" generically — it'd be a second, narrower copy of the same policy.

For context, this branch is only reachable when Start returns AuthorizationRequired, which happens when the request context disallows interactive prompts (mcp.WithoutInteractivePrompts). Tool handlers run with the interactive context, so in normal operation Start blocks on the OAuth elicitation instead and we never enter this branch. It is, as the comment says, a defensive fallback for hosts that haven't wired up the elicitation bridge.

docker-agent

Assessment: 🟢 APPROVE

The synchronous-connect refactor is well-structured. The mutex discipline is correct, the injectable startToolset seam keeps tests deterministic, and all four Start-failure branches (OAuthDeclined, AuthorizationRequired, Canceled, transport error) are individually tested with clear rollback semantics. The pre-flight env-var guard is a welcome improvement — blocking instead of warning prevents the model from claiming a connection that will immediately fail.

Three minor observations (none block merge):

stubStartOK/stubStartErr bypass wrapped.Start so wrapped.IsStarted() is never true in stub-based tests. The "already enabled and connected" fast-path guard (alreadyEnabled && IsStarted()) is consequently never exercised. Production behaviour is correct because the real startToolset calls wrapped.Start(ctx) which sets the flag. A future test could use a real StartableToolSet with a no-op inner toolset to close this gap — not urgent.
disableAfterDecline called with a potentially-cancelled ctx in the default error branch. A wrapped error containing context.Canceled falls through to default (only unwrapped context.Canceled matches the earlier branch), so wrapped.Stop(ctx) receives a cancelled context. Stop silently swallows its error, so no observable failure — low risk.
Concurrent re-enable of an unstarted entry: two goroutines can both pass alreadyEnabled && !IsStarted() and both call startToolset. Because StartableToolSet.Start is single-flight, this is benign — the second call returns immediately. Mentioned for completeness.

All high-risk paths (rollback correctness, error-branch coverage, idempotency) look solid.

docker-agent

Assessment: 🟢 APPROVE

The PR correctly implements synchronous blocking in enable_remote_mcp_server with proper error classification (OAuthDeclined, AuthRequired, Canceled, transport error) and appropriate state rollback on each failure path. The disableAfterDecline guard (current == wrapped) correctly handles concurrent failure races. Test coverage for the new failure branches is thorough.

…-OAuth The previous handleEnableStartError context.Canceled branch deliberately kept the entry in t.enabled on the assumption that Toolset.Stop would tear it down 'with the rest of the session'. That assumption was wrong: the mcp-catalog Toolset is owned by the RuntimeSession, which spans many RunStreams. Toolset.Stop only runs at session / process teardown, not when a single RunStream is cancelled. In practice this surfaced as: the user clicks Authenticate on the OAuth dialog, the browser-side flow fails (or the user gives up), the session shows 'streaming' because the elicitation goroutine is parked, the user clicks Stop. oauth.go's userCancelCh case returns context.Canceled, which bubbled through to handleEnableStartError. The entry stayed in t.enabled in an unstarted state. On the user's NEXT message — typically unrelated to the cancelled server — the runtime's per-turn Tools() call iterated t.enabled, re-Started the wrapper, hit 401 again, and re-popped the same OAuth dialog the user had just abandoned. Route context.Canceled through disableAfterDecline the same way OAuthDeclined already is, and update the model-facing result to say 'the user stopped the turn, only retry if they ask'.

…lled Mirrors the handleEnable fix in the previous commit, but for the Tools()-iteration path. When the synchronous handleEnable path falls back to the AuthorizationRequired branch (elicitation bridge not ready at enable time), Start is driven by the next Tools() call. If the user clicks Stop while that Start is parked on the OAuth dialog, Start returns context.Canceled. Before this commit context.Canceled fell into the generic warning-log catch-all and the entry stayed in t.enabled in an unstarted state. The next user message's Tools() iteration silently re-Started the wrapper and re-popped the same dialog on an unrelated question, exactly as in the user-reported scenario. Classify context.Canceled the same way as IsOAuthDeclined: call disableAfterDecline and continue (the outer ctx.Err() check at the top of the next iteration then propagates the cancellation up).

docker-agent

Assessment: 🟢 APPROVE

The PR correctly implements synchronous blocking Start() in handleEnable and properly routes context.Canceled through disableAfterDecline. The logic is sound and the approach resolves both stated issues (forced re-ask and OAuth re-pop on the next turn).

What was checked:

Concurrency safety of handleEnable for concurrent calls on the same unstarted entry → StartableToolSet.Start is documented and implemented as single-flight/idempotent; no race
Handler propagation to existing unstarted entries on retry → Set*Handler methods on Toolset already iterate over t.enabled (including unstarted entries) and push updates through, so the retry path has current handlers
context.Canceled handling in both handleEnable and the Tools() loop → correctly routes to disableAfterDecline in all affected code paths
disableAfterDecline concurrent-access guard (current == wrapped pointer check) → correct
Test coverage for the new synchronous Start path → comprehensive (success, OAuthDeclined, AuthorizationRequired, context.Canceled, retry branches all covered)
Documentation update in docs/tools/mcp-catalog/index.md → consistent with new contract

trungutt requested a review from docker-agent June 2, 2026 16:36

This comment was marked as resolved.

Sign in to view

trungutt commented Jun 3, 2026

View reviewed changes

trungutt requested a review from docker-agent June 3, 2026 06:16

This comment was marked as resolved.

Sign in to view

trungutt commented Jun 3, 2026

View reviewed changes

Comment thread pkg/tools/builtin/mcpcatalog/mcpcatalog.go

trungutt commented Jun 3, 2026

View reviewed changes

trungutt requested a review from docker-agent June 3, 2026 07:17

docker-agent reviewed Jun 3, 2026

View reviewed changes

trungutt marked this pull request as ready for review June 3, 2026 07:31

trungutt requested a review from a team as a code owner June 3, 2026 07:31

docker-agent reviewed Jun 3, 2026

View reviewed changes

trungutt added 2 commits June 3, 2026 10:15

docker deleted a comment from docker-agent Jun 3, 2026

trungutt requested a review from docker-agent June 3, 2026 08:24

docker-agent reviewed Jun 3, 2026

View reviewed changes

dgageot approved these changes Jun 3, 2026

View reviewed changes

dgageot merged commit 4684279 into docker:main Jun 3, 2026
7 checks passed

BrewTestBot mentioned this pull request Jun 3, 2026

docker-agent 1.73.0 Homebrew/homebrew-core#286068

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(mcp_catalog): block enable_remote_mcp_server until the server is connected#2973