Skip to content

feat(mcp_catalog): block enable_remote_mcp_server until the server is connected#2973

Merged
dgageot merged 5 commits into
docker:mainfrom
trungutt:feat/mcp-catalog-blocking-enable
Jun 3, 2026
Merged

feat(mcp_catalog): block enable_remote_mcp_server until the server is connected#2973
dgageot merged 5 commits into
docker:mainfrom
trungutt:feat/mcp-catalog-blocking-enable

Conversation

@trungutt
Copy link
Copy Markdown
Contributor

@trungutt trungutt commented Jun 2, 2026

Issues on main this PR fixes

1. The user has to re-ask their original question after enabling a server

enable_remote_mcp_server's handler defers the MCP connect (and any OAuth handshake) to the next Tools() enumeration. The tool's result text and the catalog Instructions block both tell the model "tools appear on your next turn". Every model reading that ends its turn after enable with a "please try again" message; the user then has to re-send their original question for the activated server's tools to actually be invoked.

User : List all Notion posts by David
Model: <calls search → calls enable("notion")>
       Notion is being set up. Please try again on your next message.
User : List all Notion posts by David          ← forced re-ask
Model: <now actually uses notion_* tools>

This was never a runtime constraint — getTools rebuilds the tool list at the top of every iteration of the run loop, and reprobe (with its regression test TestReprobe_NewToolsAvailableAfterToolCall) explicitly supports new tools appearing mid-stream. The deferred-Start design was an artifact: registering the toolset was cheap, but the prompt then had to admit the runtime couldn't guarantee tools would actually be available, since OAuth might be declined or the server might refuse the connection.

After this PR handleEnable drives Start synchronously inside the tool call. OAuth elicitation surfaces a dialog and blocks until the user responds; the success / failure result reaches the model at the point of the call, in the same RunStream. The Instructions block and the tool description are rewritten to match (no more "appears on your next turn" wording on the happy path).

2. OAuth is still required on the next question even if the previous OAuth failed

If the user stops a turn while the OAuth dialog is parked (the browser-side flow failed externally, the user gave up, etc.), Start returns context.Canceled. On main this falls into a generic warning-log branch and the entry stays in t.enabled in an unstarted state. The next user message — typically unrelated to the abandoned server — calls Tools(), which silently re-Starts the wrapper, hits 401, and re-pops the same dialog the user just abandoned.

The root cause is a lifetime mismatch: the catalog Toolset is owned by the RuntimeSession, not the RunStream, so Toolset.Stop does NOT run between a stopped turn and the user's next message.

                  RuntimeSession  (lives across many user messages)
┌─────────────────────────────────────────────────────────────┐
│  RunStream #1   │   RunStream #2   │   RunStream #3         │
│   Stop here─┘                                               │
│                                                             │
│  mcpcatalog.Tools()        runs at the start of every       │
│                            RunStream — iterates t.enabled   │
│                            and Start()s any unstarted entry │
│                                                             │
│  mcpcatalog.Toolset.Stop() runs ONCE, at session teardown   │
└─────────────────────────────────────────────────────────────┘

context.Canceled is now routed through disableAfterDecline (the same path that already handles an explicit OAuthDeclined) in both code paths that can Start a catalog server — the synchronous handleEnable and the per-turn Tools() iteration. The model-facing result tells the model the user stopped the turn while authorization was pending and to only retry if the user asks.

Public docs at docs/tools/mcp-catalog/index.md are updated for the new contract.

… connected

The deferred-Start semantics of enable_remote_mcp_server pushed the OAuth
handshake (and any connect error) out of the tool handler and into the
next Tools() enumeration. The model therefore had to guess what would
happen on its next turn, and the tool's result text + Instructions block
told it to wait. In practice, models reading those instructions ended
their turn after enable, asking the user to retry — which forced the
user to re-ask their original question even though the runtime can
already pick up new tools mid-RunStream via the top-of-loop getTools().

Drive Start synchronously from the handler so enable returns a
deterministic success / OAuth-declined / auth-required / transport-error
result in the same turn. Tool calls run with an interactive context, so
OAuth elicitation surfaces a dialog and blocks until the user responds;
on success the model has the activated tools available on its very next
response within the same RunStream and continues with the user's
original request — no second user message needed.

Missing api_key env vars / unresolved ${VAR} headers now block enable
with an error result instead of being attached as a warning to a
success result; the runtime already knows the server will reject the
connection, so registering it would only lead the model to hallucinate
access.
@trungutt trungutt requested a review from docker-agent June 2, 2026 16:36
docker-agent

This comment was marked as resolved.

The idempotent guard at the top of handleEnable short-circuited every
re-enable on a registered entry, regardless of whether the previous
Start had actually completed. This created two dead-ends the model
could not escape on its own:

  - The AuthorizationRequired branch tells the model to call enable
    again to surface a fresh OAuth dialog, but the second call was
    intercepted by the guard with an "already enabled but not yet
    connected" message. The dialog never re-appeared.

  - A context.Canceled mid-handshake (Ctrl+C, soft per-turn cancel)
    left an unstarted entry behind. A subsequent enable hit the same
    guard. Recovery relied on Toolset.Stop firing between turns to
    clean the entry up — an invariant the catalog can't enforce.

Both paths converge on the same fix: detect the existing-but-unstarted
case in the guard and re-attempt Start on the wrapper. The fresh-enable
path keeps its existing handler-attachment and tools_changed notify;
the retry path reuses the wrapper as-is, avoiding a spurious
tool-surface-change signal.

StartableToolSet.Start is idempotent and single-flight, so the retry
re-invokes the inner connect without re-running the bookkeeping when
the previous attempt left the wrapper in the not-started state.
return tools.ResultError(fmt.Sprintf(
"user declined the authorization dialog for %q (%s). No tools were activated — do NOT claim the server is connected and do NOT call any %q tools. Tell the user the request needs them to authorize the connection. If the user then says \"yes\", \"retry\", or re-asks for the same thing, call %s for %q again to surface a fresh authorization dialog.",
id, server.Title, id+"_", ToolNameEnable, id))
case mcp.IsAuthorizationRequired(err):
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in 33d73f5.

The root cause was the same for both your comments: the top-of-handleEnable guard short-circuited any entry already in t.enabled regardless of IsStarted(). Now the guard detects existing-but-unstarted entries and falls through to a fresh Start attempt on the same wrapper, so the AuthorizationRequired branch's "call enable again to surface a fresh authorization dialog" instruction actually does something.

StartableToolSet.Start is already idempotent and single-flight: when the previous attempt left the wrapper not-started, the retry re-invokes the inner Start without re-creating the toolset or re-firing tools_changed. The "already enabled but not yet connected" dead-end message is gone.

Regression test: TestEnableRetriesStartOnExistingUnstartedEntry.

return tools.ResultSuccess(fmt.Sprintf(
"enable requested for %q (%s); authorization is required and the host will surface the dialog. On your next turn, if tools whose names start with %q appear in your available tools, proceed with the user's original request using them. If NO such tools appear, the user dismissed the dialog — tell them the request needs them to authorize, and call %s for %q again if they want to retry.",
id, server.Title, id+"_", ToolNameEnable, id))
case errors.Is(err, context.Canceled):
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 33d73f5 — same root cause as the AuthorizationRequired comment above.

The context.Canceled branch still intentionally leaves the entry in t.enabled (rolling back would race a Ctrl+C-driven Toolset.Stop on the same map), but the dead-end you flagged is gone: the next enable for the same id now falls through the guard and re-attempts Start on the existing wrapper. The "call enable_remote_mcp_server again to retry" hint in the cancel result text has somewhere to land.

I also added a sentence to the branch comment so the invariant the design relies on (guard retries unstarted entries) is documented at the failure site, not just at the guard.

Regression test: TestEnableRetriesStartAfterCancellation.

@trungutt trungutt requested a review from docker-agent June 3, 2026 06:16
docker-agent

This comment was marked as resolved.

…ndleEnable

The previous comments on the context.Canceled branch invoked a
"softer per-turn cancellation / timeout" path to justify the no-rollback
choice, but no such path exists today — in a tool handler, ctx only
cancels when the whole RunStream is shutting down. Reword the comments
to state that fact directly and lean on the existing safety properties
(Toolset.Stop cleans up the entry, top-of-function guard would re-Start
an unstarted wrapper anyway) without speculating about hypothetical
callers.
// the guard.
t.mu.Lock()
if _, exists := t.enabled[id]; exists {
wrapped, alreadyEnabled := t.enabled[id]
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this race exists — the lookup and the insert both happen inside the same t.mu critical section, so two goroutines for the same id can't both observe alreadyEnabled=false:

t.mu.Lock()                                  // ← serializes
wrapped, alreadyEnabled := t.enabled[id]     // read
...
if !alreadyEnabled {
    mcpToolset := mcp.NewRemoteToolset(...)
    ...
    wrapped = tools.NewStartable(mcpToolset)
    t.enabled[id] = wrapped                   // write — still under lock
    notify = t.toolsChangedHandler
}
t.mu.Unlock()

Whichever goroutine acquires the lock first does the read, creates the toolset, inserts it, and releases. The second goroutine then acquires, finds the just-inserted entry, observes alreadyEnabled=true, and either returns the "already enabled and connected" fast path or falls through to the Start retry — which delegates to StartableToolSet.Start, itself documented as single-flight (it holds an internal mutex and short-circuits if s.started is already true). So we get at most one NewRemoteToolset, one notify(), and one in-flight Start per id.

I don't want to plumb a "pending" sentinel through this path just to defend against a race the lock already prevents — happy to revisit if I'm missing a scenario where the lock is released between the read and the write.

Comment thread pkg/tools/builtin/mcpcatalog/mcpcatalog.go
return tools.ResultError(fmt.Sprintf(
"user declined the authorization dialog for %q (%s). No tools were activated — do NOT claim the server is connected and do NOT call any %q tools. Tell the user the request needs them to authorize the connection. If the user then says \"yes\", \"retry\", or re-asks for the same thing, call %s for %q again to surface a fresh authorization dialog.",
id, server.Title, id+"_", ToolNameEnable, id))
case mcp.IsAuthorizationRequired(err):
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The runtime already has a generic safeguard for this: the loop detector at pkg/runtime/loop.go (NewLoopDetector(loopThreshold, ...), default 5) terminates the agent with ErrorCodeLoopDetected after N consecutive identical tool calls. A model that keeps re-issuing enable_remote_mcp_server with the same id would trip it long before this turned into a real infinite loop, and the failure surface is already user-visible.

I don't think adding a per-id retry counter inside the catalog earns its keep when the runtime already handles "model is stuck calling the same tool" generically — it'd be a second, narrower copy of the same policy.

For context, this branch is only reachable when Start returns AuthorizationRequired, which happens when the request context disallows interactive prompts (mcp.WithoutInteractivePrompts). Tool handlers run with the interactive context, so in normal operation Start blocks on the OAuth elicitation instead and we never enter this branch. It is, as the comment says, a defensive fallback for hosts that haven't wired up the elicitation bridge.

@trungutt trungutt requested a review from docker-agent June 3, 2026 07:17
Copy link
Copy Markdown

@docker-agent docker-agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assessment: 🟢 APPROVE

The synchronous-connect refactor is well-structured. The mutex discipline is correct, the injectable startToolset seam keeps tests deterministic, and all four Start-failure branches (OAuthDeclined, AuthorizationRequired, Canceled, transport error) are individually tested with clear rollback semantics. The pre-flight env-var guard is a welcome improvement — blocking instead of warning prevents the model from claiming a connection that will immediately fail.

Three minor observations (none block merge):

  1. stubStartOK/stubStartErr bypass wrapped.Start so wrapped.IsStarted() is never true in stub-based tests. The "already enabled and connected" fast-path guard (alreadyEnabled && IsStarted()) is consequently never exercised. Production behaviour is correct because the real startToolset calls wrapped.Start(ctx) which sets the flag. A future test could use a real StartableToolSet with a no-op inner toolset to close this gap — not urgent.

  2. disableAfterDecline called with a potentially-cancelled ctx in the default error branch. A wrapped error containing context.Canceled falls through to default (only unwrapped context.Canceled matches the earlier branch), so wrapped.Stop(ctx) receives a cancelled context. Stop silently swallows its error, so no observable failure — low risk.

  3. Concurrent re-enable of an unstarted entry: two goroutines can both pass alreadyEnabled && !IsStarted() and both call startToolset. Because StartableToolSet.Start is single-flight, this is benign — the second call returns immediately. Mentioned for completeness.

All high-risk paths (rollback correctness, error-branch coverage, idempotency) look solid.

@trungutt trungutt marked this pull request as ready for review June 3, 2026 07:31
@trungutt trungutt requested a review from a team as a code owner June 3, 2026 07:31
Copy link
Copy Markdown

@docker-agent docker-agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assessment: 🟢 APPROVE

The PR correctly implements synchronous blocking in enable_remote_mcp_server with proper error classification (OAuthDeclined, AuthRequired, Canceled, transport error) and appropriate state rollback on each failure path. The disableAfterDecline guard (current == wrapped) correctly handles concurrent failure races. Test coverage for the new failure branches is thorough.

trungutt added 2 commits June 3, 2026 10:15
…-OAuth

The previous handleEnableStartError context.Canceled branch deliberately
kept the entry in t.enabled on the assumption that Toolset.Stop would
tear it down 'with the rest of the session'. That assumption was wrong:
the mcp-catalog Toolset is owned by the RuntimeSession, which spans many
RunStreams. Toolset.Stop only runs at session / process teardown, not
when a single RunStream is cancelled.

In practice this surfaced as: the user clicks Authenticate on the OAuth
dialog, the browser-side flow fails (or the user gives up), the session
shows 'streaming' because the elicitation goroutine is parked, the user
clicks Stop. oauth.go's userCancelCh case returns context.Canceled,
which bubbled through to handleEnableStartError. The entry stayed in
t.enabled in an unstarted state. On the user's NEXT message — typically
unrelated to the cancelled server — the runtime's per-turn Tools() call
iterated t.enabled, re-Started the wrapper, hit 401 again, and re-popped
the same OAuth dialog the user had just abandoned.

Route context.Canceled through disableAfterDecline the same way
OAuthDeclined already is, and update the model-facing result to say
'the user stopped the turn, only retry if they ask'.
…lled

Mirrors the handleEnable fix in the previous commit, but for the
Tools()-iteration path. When the synchronous handleEnable path falls
back to the AuthorizationRequired branch (elicitation bridge not ready
at enable time), Start is driven by the next Tools() call. If the user
clicks Stop while that Start is parked on the OAuth dialog, Start
returns context.Canceled.

Before this commit context.Canceled fell into the generic
warning-log catch-all and the entry stayed in t.enabled in an
unstarted state. The next user message's Tools() iteration silently
re-Started the wrapper and re-popped the same dialog on an unrelated
question, exactly as in the user-reported scenario.

Classify context.Canceled the same way as IsOAuthDeclined: call
disableAfterDecline and continue (the outer ctx.Err() check at the
top of the next iteration then propagates the cancellation up).
@docker docker deleted a comment from docker-agent Jun 3, 2026
@docker docker deleted a comment from docker-agent Jun 3, 2026
@docker docker deleted a comment from docker-agent Jun 3, 2026
@docker docker deleted a comment from docker-agent Jun 3, 2026
@docker docker deleted a comment from docker-agent Jun 3, 2026
@docker docker deleted a comment from docker-agent Jun 3, 2026
@docker docker deleted a comment from docker-agent Jun 3, 2026
@docker docker deleted a comment from docker-agent Jun 3, 2026
@docker docker deleted a comment from docker-agent Jun 3, 2026
@docker docker deleted a comment from docker-agent Jun 3, 2026
@docker docker deleted a comment from docker-agent Jun 3, 2026
@docker docker deleted a comment from docker-agent Jun 3, 2026
@trungutt trungutt requested a review from docker-agent June 3, 2026 08:24
Copy link
Copy Markdown

@docker-agent docker-agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assessment: 🟢 APPROVE

The PR correctly implements synchronous blocking Start() in handleEnable and properly routes context.Canceled through disableAfterDecline. The logic is sound and the approach resolves both stated issues (forced re-ask and OAuth re-pop on the next turn).

What was checked:

  • Concurrency safety of handleEnable for concurrent calls on the same unstarted entry → StartableToolSet.Start is documented and implemented as single-flight/idempotent; no race
  • Handler propagation to existing unstarted entries on retry → Set*Handler methods on Toolset already iterate over t.enabled (including unstarted entries) and push updates through, so the retry path has current handlers
  • context.Canceled handling in both handleEnable and the Tools() loop → correctly routes to disableAfterDecline in all affected code paths
  • disableAfterDecline concurrent-access guard (current == wrapped pointer check) → correct
  • Test coverage for the new synchronous Start path → comprehensive (success, OAuthDeclined, AuthorizationRequired, context.Canceled, retry branches all covered)
  • Documentation update in docs/tools/mcp-catalog/index.md → consistent with new contract

@dgageot dgageot merged commit 4684279 into docker:main Jun 3, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants