Add redact_secrets builtin hook + before_llm_call transform#2577

dgageot · 2026-04-28T14:43:40Z

Adds a single agent-level switch — redact_secrets: true — that brackets every place where credential material can leak from a docker-agent run to a third party. It scrubs known secret patterns (GitHub PATs, AWS keys, Stripe / Slack / GitLab tokens, JWTs, private keys, Docker Hub PATs, …) at two enforcement points:

Before the LLM call — a runtime-shipped MessageTransform walks every text-bearing field of the outgoing chat history (Content, ReasoningContent, MultiContent text parts, the legacy singular FunctionCall.Arguments, and each ToolCalls[i].Function.Arguments) and replaces detected secrets with [REDACTED].
Before a tool call — a pre_tool_use builtin hook (redact_secrets) walks ToolInput recursively (into nested maps and slices) and emits an UpdatedInput with only the keys whose value actually changed.

Detection mirrors the MIT-licensed docker/mcp-gateway/pkg/secretsscan ruleset (re-licensed under Apache-2.0 in attribution) and is kept fast by a keyword pre-filter in front of every regex.

Files

pkg/secretsscan/ — new package: ruleset, ContainsSecrets, Redact, tests, benchmarks.
pkg/hooks/builtins/redact_secrets.go — the pre_tool_use builtin and its registration.
pkg/runtime/redact_secrets.go — the before_llm_call MessageTransform (gated on Agent.RedactSecrets()).
pkg/agent/{agent.go,opts.go}, pkg/teamloader/teamloader.go, pkg/config/latest/types.go — the redact_secrets: true agent flag plumbing.
agent-schema.json + examples/redact_secrets.yaml — schema entry and a runnable example.

Try it

```yaml
agents:
root:
model: openai/gpt-5-mini
instruction: You are a helpful assistant.
redact_secrets: true
toolsets:
- type: shell
```

Now any secret pasted by the user — or generated by a previous turn's tool — is replaced with `[REDACTED]` before it reaches the model OR a tool process. The surrounding text is preserved (`--token=[REDACTED]` rather than the whole arg disappearing).

Notes

False positive floor: every rule pairs a regex with a discriminating keyword, so plain English never trips detection. The new `TestRedactionMarkerIsNotASecret` locks in that the marker itself can't loop through the ruleset.
False negatives are possible: only patterns the ruleset recognises are scrubbed. This is defense-in-depth, not a substitute for keeping secrets out of the conversation in the first place.
No mutation of caller history: the transform clones `MultiContent` / `ToolCalls` / `FunctionCall` before rewriting, so persisted session messages stay byte-identical to what the user sent.
Concurrent-hook safety: pre_tool_use hooks aggregate via shallow `maps.Copy` in config order — the builtin emits only the keys it actually rewrote so it doesn't clobber a sibling hook's modifications.
Performance: `Redact` is `O(rules)` for clean inputs (just a keyword scan against a single lower-cased copy); 2 allocations on a 9 KB clean message in the benchmark.

Tests

`mise lint` and `mise test` are green. The new tests cover detection parity with upstream, redaction span correctness, idempotence, multi-secret handling, marker safety, the pre_tool_use builtin's recursion + "only changed keys" output, agent-flag wiring, and the runtime transform's per-surface scrubbing + non-mutation contract.

Assisted-By: docker-agent

Adds a single agent-level switch (redact_secrets: true) that brackets every leak vector for known credentials shapes (GitHub PATs, AWS keys, Stripe / Slack / GitLab tokens, JWTs, private keys, ...): - pkg/secretsscan: detection + redaction, ruleset derived from the MIT-licensed docker/mcp-gateway/pkg/secretsscan. - pkg/hooks/builtins/redact_secrets: pre_tool_use builtin that scrubs secret material from ToolInput before tools see it. - pkg/runtime/redact_secrets: before_llm_call message transform that scrubs the same patterns from outgoing chat content (Content, MultiContent, and prior-turn ToolCall.Arguments). - AgentConfig.RedactSecrets / agent.WithRedactSecrets / teamloader wiring so a single YAML knob enables both halves. Assisted-By: docker-agent

Behavior is unchanged. The redact_secrets feature still scrubs the same patterns from tool arguments (pre_tool_use builtin) and outgoing chat messages (before_llm_call transform), gated on the same agent flag. Concretely: - pkg/secretsscan: drop the hasKeyword method (just a free function inlined at the two call sites), drop the dead defensive branch in redactWithRule, use re.SubexpIndex instead of looping SubexpNames. - pkg/hooks/builtins/redact_secrets: drop the unused RedactSecretsName type alias, fold redactMap and redactValue into one redactAny function, drop the slice-identity preservation that wasn't observable at the executor boundary. - pkg/runtime/redact_secrets: fold redactMessageSecrets and redactSingleMessage into a single redactMessage, replace the hand-rolled copy-on-write loop with slices.Clone (Redact returns the same string on no-match, so allocations only happen when there's actually something to scrub anyway). - Trim essay-length docstrings throughout (fields, options, tests). Net: -203 lines. mise lint + mise test stay green. Assisted-By: docker-agent

…nctionCall scrub Five fixes from a self-review pass: 1. Bug: pre_tool_use builtin returned the entire ToolInput map in UpdatedInput. Hooks for one event run concurrently and aggregate via shallow maps.Copy in config order, so emitting unchanged keys would clobber a sibling hook's modifications. Fix: only emit keys whose value actually changed (new redactToolInput helper). This also removes the unsafe top-level type assertion on redactAny's any-typed return value. 2. Security gap: the runtime transform missed two text-bearing surfaces that round-trip to providers: - chat.Message.ReasoningContent (sent back to Anthropic, Bedrock, and DeepSeek as a thinking block) - the legacy singular chat.Message.FunctionCall.Arguments (still emitted by the OpenAI provider when set) A secret in either of those fields would round-trip to the next LLM call. Now both are scrubbed; the FunctionCall pointer is deep-copied so the caller's history stays untouched. 3. Performance: secretsscan.Redact called strings.ToLower(out) and compiledRules() once per rule (~85 iterations), which on long inputs dominated the work. Hoisted both out of the loop. Sound because RedactionMarker contains no rule keyword (locked in by the new TestRedactionMarkerIsNotASecret) so a stale lower-case only ever produces extra-cautious regex runs, not false negatives. Allocations on a 9KB clean message: ~85 -> 2. 4. Test gap: added TestRedactionMarkerIsNotASecret to lock in the safety property the idempotence claim relies on (the marker must not match any rule, even when surrounded by arbitrary text). 5. Added BenchmarkRedactCleanInput / BenchmarkRedactWithSecret as a regression guard for docker#3. Tests + lint: green. Behavior on clean inputs and previously-tested secrets is unchanged. Assisted-By: docker-agent

aheritier

LGTM but missing docs updates

dgageot added 3 commits April 28, 2026 15:45

dgageot requested a review from a team as a code owner April 28, 2026 14:43

aheritier approved these changes Apr 28, 2026

View reviewed changes

dgageot merged commit 5015403 into docker:main Apr 28, 2026
9 checks passed

dgageot mentioned this pull request Apr 28, 2026

docs: document redact_secrets agent flag #2581

Merged

BrewTestBot mentioned this pull request Apr 29, 2026

docker-agent 1.54.0 Homebrew/homebrew-core#280008

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add redact_secrets builtin hook + before_llm_call transform#2577