Skip to content

refactor(runtime): extract tool execution into pkg/runtime/toolexec#2545

Merged
dgageot merged 5 commits intodocker:mainfrom
dgageot:board/refactoring-tool-execution-from-runtime-88169cdd
Apr 28, 2026
Merged

refactor(runtime): extract tool execution into pkg/runtime/toolexec#2545
dgageot merged 5 commits intodocker:mainfrom
dgageot:board/refactoring-tool-execution-from-runtime-88169cdd

Conversation

@dgageot
Copy link
Copy Markdown
Member

@dgageot dgageot commented Apr 27, 2026

Goal

Make tool execution easier to read, test, and build on top of, by extracting it from pkg/runtime into a self-contained pkg/runtime/toolexec sub-package — following the existing pkg/runtime/compactor and pkg/runtime/delegator precedents.

pkg/runtime/tool_dispatch.go shrinks from 442 LOC of orchestration to 123 LOC of runtime adapters, and the new package is fully unit-testable without spinning up a runtime.

Public API

Zero changes. runtime.Runtime, runtime.ResumeRequest, runtime.ResumeType, the ResumeType* constants, runtime.LocalRuntime.Resume(...), the Run/RunStream/Steer/FollowUp flow — all byte-for-byte identical. Resume types are now type aliases over toolexec.

Commits (each one builds, lints, and tests cleanly)

# Commit What moved
1 extract pure tool-execution utilities into pkg/runtime/toolexec Loop detector + per-tool model override (already pure helpers)
2 move hooks-input helpers into toolexec parseToolInput, newHooksInput (+ tests)
3 extract approval-pipeline decision into toolexec.Decide The yolo → checkers → read-only → default ladder, as a pure Decide function with full unit-test matrix
4 extract Dispatcher into pkg/runtime/toolexec toolexec.Dispatcher + Emitter / HookDispatcher interfaces, with 10 dispatcher unit tests covering routing, approval outcomes, and batch-cancellation synthesis
5 simplify dispatcher with per-call value type Internal cleanup: per-call *call value type to cut 7-arg method signatures, plus 4 review fixes (see below)

What lives in pkg/runtime/toolexec now

pkg/runtime/toolexec/
├── doc.go                      # explains the package's purpose
├── loop_detector.go            # consecutive-duplicate detector
├── model_override.go           # per-toolset model-override resolution
├── hooks_input.go              # ParseToolInput, NewHooksInput
├── permissions.go              # Decide(yolo, checkers, name, args, readOnly)
├── dispatcher.go               # Dispatcher + Process(ctx, sess, calls, tools, em)
└── *_test.go                   # 30+ unit tests, no runtime needed

Design highlights

  • Dispatcher carries its dependencies as struct fields: Tracer, Hooks, Resume, AgentFor, Permissions, Handlers. No method-on-LocalRuntime hidden state.
  • Emitter is a 5-method interface covering only the dispatch-shaped events (ToolCall, ToolCallResponse, ToolCallConfirmation, HookBlocked, MessageAdded). Anything else stays runtime-private.
  • HookDispatcher is a 2-method interface so the dispatcher doesn't depend on the runtime's hook registry directly.
  • Approval policy is pure: toolexec.Decide(...) → PermissionDecision{Outcome, Reason, Source}. Every log message the original code emitted is preserved via the Reason field, with no behavior change.
  • Per-call state lives in a *call value type so helper methods take (ctx) or (ctx, runTool) instead of 7 arguments.
  • ctx is never stored in a struct — passed explicitly to every method that needs it (idiomatic Go, satisfies containedctx).

Validation

  • go build ./...
  • go vet ./...
  • mise lint (golangci-lint + custom 800-file Go cop + go mod tidy --diff) → 0 issues
  • mise testall packages pass, no failures ✅
  • go test -race -count=1 ./pkg/runtime/...

Reviewing tip

Each commit is independently reviewable. For the cleanest reading order:

  1. Start with commit 1 to see the package's seed (pure utilities).
  2. Skim commits 2–3 for the small, mechanical extractions.
  3. Spend most time on commit 4 (the Dispatcher extraction) — that's where the architectural shape lives.
  4. Commit 5 is internal cleanup of commit 4; the diff is mostly mechanical (per-call struct rather than wide signatures).

Suggested follow-ups (not in this PR)

  • Move ToolHandlerFunc and r.toolMap registration into toolexec so the runtime stops needing to bind handlers to a chan Event per stream.
  • Move pkg/runtime/event.go constructors to pkg/runtime/events, then drop the chanEmitter adapter in favour of a direct chan Event implementation of Emitter.

@dgageot dgageot requested a review from a team as a code owner April 27, 2026 15:57
@dgageot dgageot force-pushed the board/refactoring-tool-execution-from-runtime-88169cdd branch from 10934f2 to f191a16 Compare April 27, 2026 17:22
rumpl
rumpl previously approved these changes Apr 28, 2026
dgageot added 5 commits April 28, 2026 09:11
…time/toolexec

Mirrors the compactor and delegator extractions: a new sub-package

hosts the tool-execution primitives that don't need runtime-private

state, leaving runtime to shrink toward pure orchestration.

Moved verbatim (renamed to exported APIs, no behaviour change):

  - tool_loop_detector.go -> toolexec/loop_detector.go

    toolLoopDetector       -> LoopDetector

    newToolLoopDetector(\u2026) -> NewLoopDetector(\u2026)

    record/reset/consecutive -> Record/Reset/Consecutive()

  - tool_model_override.go -> toolexec/model_override.go

    resolveToolCallModelOverride -> ResolveModelOverride

Tests follow each file. loop.go is updated to call the new APIs.

Assisted-By: docker-agent
Lift the two pure helpers used to build hooks.Input from a tool call out

of pkg/runtime and into pkg/runtime/toolexec, where they belong with the

rest of the tool-execution primitives:

  newHooksInput  -> toolexec.NewHooksInput

  parseToolInput -> toolexec.ParseToolInput

Both functions are pure and depend only on pkg/hooks, pkg/session and

pkg/tools — packages already imported by toolexec callers — so the move

doesn't introduce a new dependency.

Tests added for ParseToolInput (empty/invalid/valid/empty-object) and

NewHooksInput (field population, invalid-arguments path).

tool_dispatch.go is updated to call the toolexec.* APIs; no behaviour

change.

Assisted-By: docker-agent
…ecide

The yolo-then-checkers-then-readonly-then-default ladder used to live as

a tangled switch inside (*LocalRuntime).executeWithApproval, mixing the

policy decision with three different runtime side effects (run the tool,

emit a deny error response, ask the user).

Lift the policy out as a pure function:

  toolexec.Decide(yoloApproved, []NamedChecker, name, args, readOnly)

      -> PermissionDecision{Outcome, Reason, Source}

Outcome   = Allow | Deny | Ask          (what to do)

Reason    = Yolo | Checker | ReadOnlyHint | Default

                                          (why; lets the caller log

                                          the same messages as before)

Source    = checker label, when Reason == Checker

executeWithApproval becomes a flat 3-arm switch that just translates the

decision into runtime side effects. permissionChecker (the unexported

pair type) is replaced by toolexec.NamedChecker so the runtime adapter

permissionCheckers() returns the type Decide consumes directly.

ForceAsk now correctly overrides the read-only fast path (it always did,

but the new Decide makes the invariant explicit and unit-testable).

12 table tests cover the full matrix in pkg/runtime/toolexec without

needing a runtime; runtime tests still pass unchanged.

Assisted-By: docker-agent
The 442-line tool dispatch in pkg/runtime/tool_dispatch.go used to be a

set of methods on *LocalRuntime that pulled in tracer, hooks, resume

channel, agent resolver, permissions, runtime-managed handlers, and the

events channel implicitly via the receiver. Extract that orchestration

into a self-contained toolexec.Dispatcher with explicit dependencies, so

any embedder can drive tool execution without dragging the runtime in.

API:

  type Dispatcher struct {

    Tracer      trace.Tracer

    Hooks       HookDispatcher       // 2-method interface

    Resume      <-chan ResumeRequest

    AgentFor    func(*session.Session) *agent.Agent

    Permissions func(*session.Session) []NamedChecker

    Handlers    map[string]ToolHandler

  }

  func (d *Dispatcher) Process(ctx, sess, calls, agentTools, em Emitter)

Emitter is a 5-method interface covering the dispatch-shaped events

(ToolCall, ToolCallResponse, ToolCallConfirmation, HookBlocked,

MessageAdded). Anything else stays a runtime-private concern.

Runtime side becomes a thin adapter:

  pkg/runtime/tool_dispatch.go (~140 LOC, was 442) wires:

    chanEmitter      — chan Event sender for the 5 typed events

    hookDispatcher   — wraps r.dispatchHook + r.executeOnUserInputHooks

    bindRuntimeHandlers — closes chan Event into ToolHandler shape

  pkg/runtime/runtime.go aliases the shared types so the public

  Runtime API is byte-for-byte identical:

    runtime.ResumeRequest = toolexec.ResumeRequest

    runtime.ResumeType    = toolexec.ResumeType

    + the four ResumeType* constants

10 dispatcher unit tests cover routing (toolset vs runtime handler),

approval outcomes (allow/deny/ask/cancel/reject/approve-tool),

read-only auto-approval, and batch-cancellation synthesis. They run

without a runtime, with race detection, in <0.1s each.

Existing pkg/runtime tests (44 of them, including processToolCalls

permission tests, transfer_task validation, yolo-mode bypass, steer

race tests) all pass unchanged because processToolCalls remains a

method on *LocalRuntime that just constructs a Dispatcher and calls

Process.

Assisted-By: docker-agent
The previous dispatcher.go threaded 7-8 arguments (ctx, sess, toolCall,

tool, em, a, runTool, ...) through every helper, making signatures hard

to scan and adding a paper cut every time a new piece of per-call state

needed to be plumbed. Bundle that state in a value type and turn the

helpers into methods on it.

Changes:

  * Introduce an internal *call value (one per tool invocation) carrying

    {d, sess, em, a, tc, tool, available}. Most helper signatures shrink

    to (ctx, runTool) or just ctx — the rest comes from the receiver.

  * Track tool availability with an explicit 'available bool' instead

    of the tool.Name == "" sentinel. The synthetic 'unavailable' tool

    still carries the requested name so error events stay informative.

  * Rename the public method that wraps tracing/telemetry/event-emission

    around the actual handler call from execute() to invoke(), so it no

    longer collides conceptually with the runTool closure parameter that

    represents "deferred work to perform once approval clears".

  * Extract two tiny helpers from a 70-line method:

      - translateError(ctx, span, err) — context-cancel vs error mapping

      - buildMultiContent(text, images) — image MultiContent assembly

  * Drop the 3-line permissionsFor() nil-guard helper (inlined at the

    one caller) and the free logAllow() function (now a method that

    reads tool/session info from the receiver).

  * tool_dispatch.go: inline the 7-line bindRuntimeHandlers helper

    (single caller) and drop newHookDispatcher (struct literal at the

    one callsite).

  * runtime.go: remove a dead 'const _ ResumeType' block and an

    orphaned ResumeRequest doc comment left over from aliasing.

Bug fix surfaced during review:

  Storing context.Context in *call would have been a documented Go

  anti-pattern (containedctx lint rule) — ctx is passed explicitly to

  every method that needs it instead.

Validation: go build, go vet, go test -race ./pkg/runtime/...,

golangci-lint run ./pkg/runtime/... — all clean. The full repository

test suite passes unchanged. dispatcher.go grows by 60 lines (mostly

comments and the small helpers); the *call refactor cuts roughly 80

argument-passing lines from the rest of the file.

Assisted-By: docker-agent
@dgageot dgageot force-pushed the board/refactoring-tool-execution-from-runtime-88169cdd branch from f191a16 to 66f4022 Compare April 28, 2026 07:25
@dgageot dgageot merged commit 531701e into docker:main Apr 28, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants