Skip to content

feat(hooks): add 6 builtin hooks + widen post_tool_use / before_llm_call contract#2538

Merged
dgageot merged 5 commits intodocker:mainfrom
dgageot:board/extracting-runtime-components-as-builtin-b822c9ff
Apr 27, 2026
Merged

feat(hooks): add 6 builtin hooks + widen post_tool_use / before_llm_call contract#2538
dgageot merged 5 commits intodocker:mainfrom
dgageot:board/extracting-runtime-components-as-builtin-b822c9ff

Conversation

@dgageot
Copy link
Copy Markdown
Member

@dgageot dgageot commented Apr 27, 2026

Summary

Adds 6 new in-process hook builtins and a small hook-contract widening (post_tool_use / before_llm_call may now signal run termination via decision: "block"). The existing inline tool-loop detector is unchanged.

What's new

Five context-injecting builtins (commit 1)

Each follows the same shape as the existing add_date / add_environment_info / add_prompt_files. Opt-in via YAML — none auto-injected, since they shell out / read the filesystem and shouldn't run silently.

Name Event Behavior
add_git_status turn_start git status --short --branch
add_git_diff turn_start git diff --stat (or full unified with args: [full])
add_directory_listing session_start Sorted top-level entries of Cwd, hidden skipped, capped at 100
add_user_info session_start os/user.Current() + os.Hostname()
add_recent_commits session_start git log --oneline -n N (default 10, override via args: ["20"])

All git-backed builtins cap output at 4 KiB and treat git-not-installed / non-repo failures as graceful no-ops, so they can never abort a session or turn.

max_iterations builtin (commit 2)

A new stateful before_llm_call builtin — hard stop after N model calls, configurable per-agent in YAML:

hooks:
  before_llm_call:
    - {type: builtin, command: max_iterations, args: ["50"]}

Additive, not a replacement. The existing agent.MaxIterations flag with its MaxIterationsReachedEvent and interactive resume dialog is unchanged — TUI / CLI / ACP all still use it. The new builtin is a hard-stop alternative for headless / cost-control scenarios.

Per-session counter cleared from session_end via builtins.State.ClearSession.

Hook contract widening (commit 2)

Documents and enforces that decision: "block" (or continue: false / exit code 2) from a post_tool_use or before_llm_call hook signals run termination. Previously these events were observational only. The runtime now translates a deny verdict into the standard Error + notification(level=error) + on_error fanout via a new emitHookDrivenShutdown helper.

User-authored hooks can now stop the run; the inline tool-loop detector is unchanged.

Refactors and fixes (commits 3 + 4)

  • processToolCalls / executeWithApproval / askUserForConfirmation: closure plumbing simplified — LocalRuntime.runTool now returns a toolApprovalOutcome so callers can return invoke() everywhere instead of unpacking and repacking a (stop, msg) tuple in 7 places.
  • loop_detector per-call vs per-batch behavior change: reverted in commit 5 after review concerns. The inline tool_loop_detector is byte-identical to origin/main.
  • Bug fix: before_llm_call was dispatched twice per loop iteration after a rebase; once-per-iteration is now pinned by a regression test (pkg/runtime/before_llm_call_test.go).
  • sortKeys (private, used by builtins) made non-mutating to avoid a foot-gun for future callers.
  • gitOutput's empty-dir error message trimmed to Go style (no function name prefix).
  • Stale comment in hooks_wiring_test.go updated post eager-build refactor.

Commits

  1. feat(hooks/builtins): add five context-injecting builtins
  2. feat(hooks): widen contract for post_tool_use / before_llm_call + extract loop_detector & max_iterations builtins
  3. refactor(hooks/runtime): simplify approval plumbing and trim docstrings
  4. fix(runtime,hooks): review-driven fixes (duplicate before_llm_call dispatch + 4 quality)
  5. revert(hooks): drop loop_detector builtin, keep inline tool-loop detector

(Each commit message has full details — left as separate commits for review legibility; squash-or-merge as you prefer.)

Validation

  • go build ./...
  • go test ./... (97 packages) → all pass
  • go test -race ./pkg/hooks/... ./pkg/runtime/... → clean
  • go test ./e2e/... → pass
  • go test ./pkg/tui/... ./pkg/cli/... ./pkg/acp/... → pass
  • golangci-lint run ./... → 0 issues
  • Full binary builds (go build .)

Behavior changes for existing users

  • None for existing in-repo configs. Loop detection, max_iterations UX, and all default agent flags behave identically to origin/main.
  • The post_tool_use / before_llm_call deny-verdict contract is now honored. If a user happened to have a hook on those events returning decision: "block" it would have been silently ignored before; it now actually stops the run. No in-repo configs do this.

Honest caveats

  • All in-repo tests pass (unit + integration + e2e + TUI + CLI + ACP), but I haven't exercised the runtime against a real LLM. The contract widening tests pin executor-level semantics; integration tests cover the runtime fanout.
  • The new git-shelling builtins are bounded by the hooks executor's 60 s timeout — pathological repos will time out cleanly rather than hang.

🤖 Generated with Claude Code via docker-agent

dgageot added 5 commits April 27, 2026 16:20
Adds five new in-process hook builtins that follow the same pattern as
add_date / add_environment_info / add_prompt_files. They are opt-in via
`{type: builtin, command: ...}` in YAML and not auto-injected from any
agent flag, since they shell out / read the filesystem and shouldn't run
silently:

  - add_git_status        (turn_start)    `git status --short --branch`
  - add_git_diff          (turn_start)    `git diff --stat`, "full" arg for unified
  - add_directory_listing (session_start) sorted top-level entries, hidden skipped
  - add_user_info         (session_start) os/user.Current + os.Hostname
  - add_recent_commits    (session_start) `git log --oneline -n N` (default 10)

All git-shelling builtins cap output at 4 KiB and treat git-not-installed
or non-repo failures as graceful no-ops so they can never abort a session
or turn. add_directory_listing caps the visible entries at 100 with a
"... and N more" summary.

Adds a private `gitOutput` helper, a `sessionStartContext` mirror of the
existing `turnStartContext`, and an `initGitRepo` test helper that skips
the suite when git isn't installed.

Assisted-By: docker-agent
…ract loop_detector & max_iterations builtins

Documents that decision="block" / continue=false / exit code 2 from a
post_tool_use or before_llm_call hook signals run termination — not just
the previously-undocumented behaviour of pre_tool_use deny. The executor's
aggregate logic already produced Allowed=false uniformly for these
verdicts; the runtime now actually acts on them.

Adds `hooks.DecisionBlockValue` so in-process builtins don't carry the
"block" string literal.

  * `executePostToolHook` and `executeBeforeLLMCallHooks` now return
    (stop, message). `processToolCalls` aggregates and propagates the
    post-tool stop up to the run loop, synthesising error responses
    for any unprocessed tool calls in the batch (same shape as user
    cancellation) so the API doesn't reject orphan function calls.

  * New `emitHookDrivenShutdown` factors the standard
    Error / notification(level=error) / on_error fanout used by both
    the post_tool_use and before_llm_call shutdown paths.

  * `executeWithApproval` now returns a `toolApprovalOutcome` struct
    instead of a single canceled bool to thread the new stop signal
    through every approval path (yolo, allow, force-ask, default,
    user prompt).

Replaces the inline `tool_loop_detector` previously in pkg/runtime.
Stateful per-session via `builtins.State`, registered through
`builtins.Register` and cleared on session_end. Auto-injected from
`agent.MaxConsecutiveToolCalls()` (defaulting to 5 to preserve the
historical always-on contract).

**Behaviour change:** signatures are now per-call, not per-batch. Single-
tool repetition (the dominant stuck-agent pattern) and parallel-identical
batches still trip; alternating multi-tool batches like `[A,B] [A,B] [A,B]`
no longer trip — that case should now be caught by max_iterations or
manual threshold tuning. Polling tools (view_background_agent,
view_background_job) remain invisible to the counter.

**Additive**, not a replacement: the existing `agent.MaxIterations` flow
(MaxIterationsReachedEvent + interactive resume dialog in TUI/CLI/ACP)
is unchanged because those UIs depend on the special event. The new
builtin gives users a hard-stop alternative they can configure purely
in YAML, with no resume protocol and no special event:

    hooks:
      before_llm_call:
        - {type: builtin, command: max_iterations, args: ["50"]}

  * `pkg/hooks/contract_widening_test.go` pins decision=block and
    continue=false producing Allowed=false on post_tool_use and
    before_llm_call.

  * `pkg/hooks/builtins/loop_detector_test.go` covers threshold
    tripping, JSON-key-order normalisation, exempt-tool invisibility,
    per-session isolation, lenient arg parsing, and concurrent safety.

  * `pkg/hooks/builtins/max_iterations_test.go` covers limit
    tripping, per-session isolation, lenient arg parsing, and
    concurrent safety.

  * `hooks_wiring_test.go` updated to reflect that loop_detector is
    auto-injected on every agent.

  * `pkg/runtime/tool_loop_detector.go` and its test (logic moved to
    `pkg/hooks/builtins/loop_detector.go` with per-call semantics).

Lint clean, all tests pass with -race.

Assisted-By: docker-agent
Pure cleanup pass over the two preceding feature commits. No features
removed, no public API changes.

  * `LocalRuntime.runTool` now returns `toolApprovalOutcome` directly,
    so callers don't repeat `stop, msg := runTool(); return outcome{...}`
    in seven places.
  * Local `runTool` closure in `processToolCalls` is renamed to `invoke`
    (no more shadowing the method) and extracted into a new
    `toolInvoker` helper for clarity.
  * `executeWithApproval` and `askUserForConfirmation` now `return invoke()`
    in every approval path \u2014 every one of those paths used to be a
    two-line tuple-unpack-and-pack.
  * `processToolCalls` synthesised-error loops are extracted into a
    `synthesizeRemaining` closure and wrapped in a `switch` with
    `canceled`/`stopRun`/`default` arms so the control flow is visible
    at a glance.

  * `loop_detector`: inline `parseLoopDetectorArgs`; replace the
    per-call `map[string]struct{}` exempt set with `slices.Contains`
    over a slice (no allocations on the hot path); shorter state field
    names (`sig`, `count`).
  * `max_iterations`: inline `parseMaxIterationsArgs`; collapse the
    `(int, bool)` parse-result into a single `limit\u003c=0\u200a\u2192\u200ano-op` check.
  * `json.go`: previously held `canonicalJSON` and `joinNonEmpty`
    helpers that were already deleted as unused; the file is now just
    `sortKeys` + `canonicalToolInput`.

Trimmed the verbose comments added in the previous two commits to
match the surrounding house style. The contract is the same; the
prose just earned its keep:

  * `emitHookDrivenShutdown`, `executeBeforeLLMCallHooks`,
    `executeSessionEndHooks`, `loopDetectorExemptTools`, the
    `maxConsecutive` normalisation block, the `builtinsState` field
    comment.
  * Package doc and `State` / `Register` / `AgentDefaults` doc in
    pkg/hooks/builtins.
  * `hooks.EventPostToolUse`, `hooks.EventBeforeLLMCall`, and
    `hooks.DecisionBlockValue` docs.

    9 files changed, 238 insertions(+), 395 deletions(-)

Lint clean. `go test ./...` and `go test -race ./pkg/hooks/...
./pkg/runtime/...` all pass.

Assisted-By: docker-agent
…spatch + 4 quality)

Review of the three feature commits surfaced one critical bug introduced
during the rebase plus a handful of low-impact issues. Fixes them, with a
regression test for the critical one.

## Critical bug

**before_llm_call hook fired twice per loop iteration.** The rebase
left two identical dispatch blocks in [LocalRuntime.RunStream], so any
stateful before_llm_call hook \u2014 prominently the new max_iterations
builtin \u2014 advanced its counter twice per LLM call and tripped at half
the configured limit. Other affected handlers: any user-authored
before_llm_call hook with side effects (audit logging, cost meters).

Adds pkg/runtime/before_llm_call_test.go to pin "fires exactly once
per iteration". I verified the test fails on the buggy code and passes
on the fix before checking either in.

## Quality fixes

  * pkg/hooks/builtins/json.go: sortKeys was mutating []any slices in
    place. Currently safe because each builtin gets a freshly-decoded
    Input, but it's a foot-gun for any future caller that re-uses
    inputs. sortKeys now returns a deep copy.
  * pkg/runtime/tool_dispatch.go: processToolCalls switch had three
    arms each ending in nearly-identical span End() / SetStatus
    (codes.Ok, "...") boilerplate. Pulled span finalisation up before
    the switch so each arm only carries the logic that's actually
    distinct.
  * pkg/hooks/builtins/git.go: gitOutput's empty-dir error included
    the function name ("gitOutput: empty working directory") against
    Go style. Trimmed.
  * pkg/runtime/hooks_wiring_test.go: stale comment referred to
    "caching by agent name"; the post-rebase eager-build doesn't
    cache, it just looks up. Updated.

## Validation

    go test ./...                                  -> all packages pass
    go test -race ./pkg/hooks/... ./pkg/runtime/... -> clean
    golangci-lint run ./...                         -> 0 issues

Assisted-By: docker-agent
…ctor

Reverts the loop-detector portion of the builtin extraction. The
behavior change from per-batch to per-call detection turned out to be
the wrong trade-off: alternating multi-tool batches like `[A,B] [A,B]`
are a real failure mode that the inline detector caught and the per-call
builtin missed.

Surgical revert \u2014 keeps the rest of the work intact:

  * post_tool_use / before_llm_call deny contract widening stays
    (general feature, no behavior change without an explicit hook).
  * max_iterations builtin stays (separate from loop detection, no
    change to the existing agent.MaxIterations UX).
  * The five context-injecting builtins stay.
  * The duplicate-dispatch fix and simplifications stay.

## Removed

  * pkg/hooks/builtins/loop_detector.go and its test.
  * pkg/hooks/builtins/json.go (canonicalToolInput / sortKeys were
    only consumed by the loop_detector builtin).
  * `builtins.LoopDetector` constant.
  * `AgentDefaults.MaxConsecutiveToolCalls` and `ExemptLoopTools`
    fields, plus the auto-injection branch in ApplyAgentDefaults.
  * `State.loopDetector` field.

## Restored from origin/main

  * pkg/runtime/tool_loop_detector.go (per-batch signature detector).
  * pkg/runtime/tool_loop_detector_test.go.
  * Inline `loopDetector := newToolLoopDetector(...)` setup at the top
    of RunStream's goroutine.
  * `if loopDetector.record(res.Calls)` block emitting Error +
    notifyError after a tool batch in loop.go.
  * `MaxConsecutiveToolCalls` is once again read straight from the
    session, with the historic threshold-of-5 default.

## Tests

  * hooks_wiring_test.go: "no flags: no implicit hooks, no executor"
    case is back; loop_detector-specific assertions removed.
  * builtins_test.go: removed LoopDetector from the registration roster.

## Validation

    go build ./...                                  -> ok
    go test ./...                                   -> all packages pass
    go test -race ./pkg/hooks/... ./pkg/runtime/... -> clean
    golangci-lint run ./...                         -> 0 issues

Assisted-By: docker-agent
@dgageot dgageot requested a review from a team as a code owner April 27, 2026 15:04
Copy link
Copy Markdown
Member

@rumpl rumpl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I read it

@dgageot dgageot merged commit 64b2bd1 into docker:main Apr 27, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants