feat(mcp_catalog): block enable_remote_mcp_server until the server is connected#2973
Conversation
… connected
The deferred-Start semantics of enable_remote_mcp_server pushed the OAuth
handshake (and any connect error) out of the tool handler and into the
next Tools() enumeration. The model therefore had to guess what would
happen on its next turn, and the tool's result text + Instructions block
told it to wait. In practice, models reading those instructions ended
their turn after enable, asking the user to retry — which forced the
user to re-ask their original question even though the runtime can
already pick up new tools mid-RunStream via the top-of-loop getTools().
Drive Start synchronously from the handler so enable returns a
deterministic success / OAuth-declined / auth-required / transport-error
result in the same turn. Tool calls run with an interactive context, so
OAuth elicitation surfaces a dialog and blocks until the user responds;
on success the model has the activated tools available on its very next
response within the same RunStream and continues with the user's
original request — no second user message needed.
Missing api_key env vars / unresolved ${VAR} headers now block enable
with an error result instead of being attached as a warning to a
success result; the runtime already knows the server will reject the
connection, so registering it would only lead the model to hallucinate
access.
The idempotent guard at the top of handleEnable short-circuited every
re-enable on a registered entry, regardless of whether the previous
Start had actually completed. This created two dead-ends the model
could not escape on its own:
- The AuthorizationRequired branch tells the model to call enable
again to surface a fresh OAuth dialog, but the second call was
intercepted by the guard with an "already enabled but not yet
connected" message. The dialog never re-appeared.
- A context.Canceled mid-handshake (Ctrl+C, soft per-turn cancel)
left an unstarted entry behind. A subsequent enable hit the same
guard. Recovery relied on Toolset.Stop firing between turns to
clean the entry up — an invariant the catalog can't enforce.
Both paths converge on the same fix: detect the existing-but-unstarted
case in the guard and re-attempt Start on the wrapper. The fresh-enable
path keeps its existing handler-attachment and tools_changed notify;
the retry path reuses the wrapper as-is, avoiding a spurious
tool-surface-change signal.
StartableToolSet.Start is idempotent and single-flight, so the retry
re-invokes the inner connect without re-running the bookkeeping when
the previous attempt left the wrapper in the not-started state.
| return tools.ResultError(fmt.Sprintf( | ||
| "user declined the authorization dialog for %q (%s). No tools were activated — do NOT claim the server is connected and do NOT call any %q tools. Tell the user the request needs them to authorize the connection. If the user then says \"yes\", \"retry\", or re-asks for the same thing, call %s for %q again to surface a fresh authorization dialog.", | ||
| id, server.Title, id+"_", ToolNameEnable, id)) | ||
| case mcp.IsAuthorizationRequired(err): |
There was a problem hiding this comment.
Good catch — fixed in 33d73f5.
The root cause was the same for both your comments: the top-of-handleEnable guard short-circuited any entry already in t.enabled regardless of IsStarted(). Now the guard detects existing-but-unstarted entries and falls through to a fresh Start attempt on the same wrapper, so the AuthorizationRequired branch's "call enable again to surface a fresh authorization dialog" instruction actually does something.
StartableToolSet.Start is already idempotent and single-flight: when the previous attempt left the wrapper not-started, the retry re-invokes the inner Start without re-creating the toolset or re-firing tools_changed. The "already enabled but not yet connected" dead-end message is gone.
Regression test: TestEnableRetriesStartOnExistingUnstartedEntry.
| return tools.ResultSuccess(fmt.Sprintf( | ||
| "enable requested for %q (%s); authorization is required and the host will surface the dialog. On your next turn, if tools whose names start with %q appear in your available tools, proceed with the user's original request using them. If NO such tools appear, the user dismissed the dialog — tell them the request needs them to authorize, and call %s for %q again if they want to retry.", | ||
| id, server.Title, id+"_", ToolNameEnable, id)) | ||
| case errors.Is(err, context.Canceled): |
There was a problem hiding this comment.
Fixed in 33d73f5 — same root cause as the AuthorizationRequired comment above.
The context.Canceled branch still intentionally leaves the entry in t.enabled (rolling back would race a Ctrl+C-driven Toolset.Stop on the same map), but the dead-end you flagged is gone: the next enable for the same id now falls through the guard and re-attempts Start on the existing wrapper. The "call enable_remote_mcp_server again to retry" hint in the cancel result text has somewhere to land.
I also added a sentence to the branch comment so the invariant the design relies on (guard retries unstarted entries) is documented at the failure site, not just at the guard.
Regression test: TestEnableRetriesStartAfterCancellation.
…ndleEnable The previous comments on the context.Canceled branch invoked a "softer per-turn cancellation / timeout" path to justify the no-rollback choice, but no such path exists today — in a tool handler, ctx only cancels when the whole RunStream is shutting down. Reword the comments to state that fact directly and lean on the existing safety properties (Toolset.Stop cleans up the entry, top-of-function guard would re-Start an unstarted wrapper anyway) without speculating about hypothetical callers.
| // the guard. | ||
| t.mu.Lock() | ||
| if _, exists := t.enabled[id]; exists { | ||
| wrapped, alreadyEnabled := t.enabled[id] |
There was a problem hiding this comment.
I don't think this race exists — the lookup and the insert both happen inside the same t.mu critical section, so two goroutines for the same id can't both observe alreadyEnabled=false:
t.mu.Lock() // ← serializes
wrapped, alreadyEnabled := t.enabled[id] // read
...
if !alreadyEnabled {
mcpToolset := mcp.NewRemoteToolset(...)
...
wrapped = tools.NewStartable(mcpToolset)
t.enabled[id] = wrapped // write — still under lock
notify = t.toolsChangedHandler
}
t.mu.Unlock()Whichever goroutine acquires the lock first does the read, creates the toolset, inserts it, and releases. The second goroutine then acquires, finds the just-inserted entry, observes alreadyEnabled=true, and either returns the "already enabled and connected" fast path or falls through to the Start retry — which delegates to StartableToolSet.Start, itself documented as single-flight (it holds an internal mutex and short-circuits if s.started is already true). So we get at most one NewRemoteToolset, one notify(), and one in-flight Start per id.
I don't want to plumb a "pending" sentinel through this path just to defend against a race the lock already prevents — happy to revisit if I'm missing a scenario where the lock is released between the read and the write.
| return tools.ResultError(fmt.Sprintf( | ||
| "user declined the authorization dialog for %q (%s). No tools were activated — do NOT claim the server is connected and do NOT call any %q tools. Tell the user the request needs them to authorize the connection. If the user then says \"yes\", \"retry\", or re-asks for the same thing, call %s for %q again to surface a fresh authorization dialog.", | ||
| id, server.Title, id+"_", ToolNameEnable, id)) | ||
| case mcp.IsAuthorizationRequired(err): |
There was a problem hiding this comment.
The runtime already has a generic safeguard for this: the loop detector at pkg/runtime/loop.go (NewLoopDetector(loopThreshold, ...), default 5) terminates the agent with ErrorCodeLoopDetected after N consecutive identical tool calls. A model that keeps re-issuing enable_remote_mcp_server with the same id would trip it long before this turned into a real infinite loop, and the failure surface is already user-visible.
I don't think adding a per-id retry counter inside the catalog earns its keep when the runtime already handles "model is stuck calling the same tool" generically — it'd be a second, narrower copy of the same policy.
For context, this branch is only reachable when Start returns AuthorizationRequired, which happens when the request context disallows interactive prompts (mcp.WithoutInteractivePrompts). Tool handlers run with the interactive context, so in normal operation Start blocks on the OAuth elicitation instead and we never enter this branch. It is, as the comment says, a defensive fallback for hosts that haven't wired up the elicitation bridge.
docker-agent
left a comment
There was a problem hiding this comment.
Assessment: 🟢 APPROVE
The synchronous-connect refactor is well-structured. The mutex discipline is correct, the injectable startToolset seam keeps tests deterministic, and all four Start-failure branches (OAuthDeclined, AuthorizationRequired, Canceled, transport error) are individually tested with clear rollback semantics. The pre-flight env-var guard is a welcome improvement — blocking instead of warning prevents the model from claiming a connection that will immediately fail.
Three minor observations (none block merge):
-
stubStartOK/stubStartErrbypasswrapped.Startsowrapped.IsStarted()is nevertruein stub-based tests. The "already enabled and connected" fast-path guard (alreadyEnabled && IsStarted()) is consequently never exercised. Production behaviour is correct because the realstartToolsetcallswrapped.Start(ctx)which sets the flag. A future test could use a realStartableToolSetwith a no-op inner toolset to close this gap — not urgent. -
disableAfterDeclinecalled with a potentially-cancelled ctx in thedefaulterror branch. A wrapped error containingcontext.Canceledfalls through todefault(only unwrappedcontext.Canceledmatches the earlier branch), sowrapped.Stop(ctx)receives a cancelled context.Stopsilently swallows its error, so no observable failure — low risk. -
Concurrent re-enable of an unstarted entry: two goroutines can both pass
alreadyEnabled && !IsStarted()and both callstartToolset. BecauseStartableToolSet.Startis single-flight, this is benign — the second call returns immediately. Mentioned for completeness.
All high-risk paths (rollback correctness, error-branch coverage, idempotency) look solid.
docker-agent
left a comment
There was a problem hiding this comment.
Assessment: 🟢 APPROVE
The PR correctly implements synchronous blocking in enable_remote_mcp_server with proper error classification (OAuthDeclined, AuthRequired, Canceled, transport error) and appropriate state rollback on each failure path. The disableAfterDecline guard (current == wrapped) correctly handles concurrent failure races. Test coverage for the new failure branches is thorough.
…-OAuth The previous handleEnableStartError context.Canceled branch deliberately kept the entry in t.enabled on the assumption that Toolset.Stop would tear it down 'with the rest of the session'. That assumption was wrong: the mcp-catalog Toolset is owned by the RuntimeSession, which spans many RunStreams. Toolset.Stop only runs at session / process teardown, not when a single RunStream is cancelled. In practice this surfaced as: the user clicks Authenticate on the OAuth dialog, the browser-side flow fails (or the user gives up), the session shows 'streaming' because the elicitation goroutine is parked, the user clicks Stop. oauth.go's userCancelCh case returns context.Canceled, which bubbled through to handleEnableStartError. The entry stayed in t.enabled in an unstarted state. On the user's NEXT message — typically unrelated to the cancelled server — the runtime's per-turn Tools() call iterated t.enabled, re-Started the wrapper, hit 401 again, and re-popped the same OAuth dialog the user had just abandoned. Route context.Canceled through disableAfterDecline the same way OAuthDeclined already is, and update the model-facing result to say 'the user stopped the turn, only retry if they ask'.
…lled Mirrors the handleEnable fix in the previous commit, but for the Tools()-iteration path. When the synchronous handleEnable path falls back to the AuthorizationRequired branch (elicitation bridge not ready at enable time), Start is driven by the next Tools() call. If the user clicks Stop while that Start is parked on the OAuth dialog, Start returns context.Canceled. Before this commit context.Canceled fell into the generic warning-log catch-all and the entry stayed in t.enabled in an unstarted state. The next user message's Tools() iteration silently re-Started the wrapper and re-popped the same dialog on an unrelated question, exactly as in the user-reported scenario. Classify context.Canceled the same way as IsOAuthDeclined: call disableAfterDecline and continue (the outer ctx.Err() check at the top of the next iteration then propagates the cancellation up).
docker-agent
left a comment
There was a problem hiding this comment.
Assessment: 🟢 APPROVE
The PR correctly implements synchronous blocking Start() in handleEnable and properly routes context.Canceled through disableAfterDecline. The logic is sound and the approach resolves both stated issues (forced re-ask and OAuth re-pop on the next turn).
What was checked:
- Concurrency safety of
handleEnablefor concurrent calls on the same unstarted entry →StartableToolSet.Startis documented and implemented as single-flight/idempotent; no race - Handler propagation to existing unstarted entries on retry →
Set*Handlermethods onToolsetalready iterate overt.enabled(including unstarted entries) and push updates through, so the retry path has current handlers context.Canceledhandling in bothhandleEnableand theTools()loop → correctly routes todisableAfterDeclinein all affected code pathsdisableAfterDeclineconcurrent-access guard (current == wrappedpointer check) → correct- Test coverage for the new synchronous Start path → comprehensive (success, OAuthDeclined, AuthorizationRequired, context.Canceled, retry branches all covered)
- Documentation update in
docs/tools/mcp-catalog/index.md→ consistent with new contract
Issues on main this PR fixes
1. The user has to re-ask their original question after enabling a server
enable_remote_mcp_server's handler defers the MCP connect (and any OAuth handshake) to the nextTools()enumeration. The tool's result text and the catalog Instructions block both tell the model "tools appear on your next turn". Every model reading that ends its turn after enable with a "please try again" message; the user then has to re-send their original question for the activated server's tools to actually be invoked.This was never a runtime constraint —
getToolsrebuilds the tool list at the top of every iteration of the run loop, andreprobe(with its regression testTestReprobe_NewToolsAvailableAfterToolCall) explicitly supports new tools appearing mid-stream. The deferred-Start design was an artifact: registering the toolset was cheap, but the prompt then had to admit the runtime couldn't guarantee tools would actually be available, since OAuth might be declined or the server might refuse the connection.After this PR
handleEnabledrives Start synchronously inside the tool call. OAuth elicitation surfaces a dialog and blocks until the user responds; the success / failure result reaches the model at the point of the call, in the same RunStream. The Instructions block and the tool description are rewritten to match (no more "appears on your next turn" wording on the happy path).2. OAuth is still required on the next question even if the previous OAuth failed
If the user stops a turn while the OAuth dialog is parked (the browser-side flow failed externally, the user gave up, etc.), Start returns
context.Canceled. On main this falls into a generic warning-log branch and the entry stays int.enabledin an unstarted state. The next user message — typically unrelated to the abandoned server — callsTools(), which silently re-Starts the wrapper, hits 401, and re-pops the same dialog the user just abandoned.The root cause is a lifetime mismatch: the catalog
Toolsetis owned by theRuntimeSession, not theRunStream, soToolset.Stopdoes NOT run between a stopped turn and the user's next message.context.Canceledis now routed throughdisableAfterDecline(the same path that already handles an explicitOAuthDeclined) in both code paths that can Start a catalog server — the synchronous handleEnable and the per-turn Tools() iteration. The model-facing result tells the model the user stopped the turn while authorization was pending and to only retry if the user asks.Public docs at
docs/tools/mcp-catalog/index.mdare updated for the new contract.