Skip to content

Add redact_secrets builtin hook + before_llm_call transform#2577

Merged
dgageot merged 3 commits intodocker:mainfrom
dgageot:board/add-builtin-hook-to-redact-secrets-for-l-b46617ef
Apr 28, 2026
Merged

Add redact_secrets builtin hook + before_llm_call transform#2577
dgageot merged 3 commits intodocker:mainfrom
dgageot:board/add-builtin-hook-to-redact-secrets-for-l-b46617ef

Conversation

@dgageot
Copy link
Copy Markdown
Member

@dgageot dgageot commented Apr 28, 2026

Adds a single agent-level switch — redact_secrets: true — that brackets every place where credential material can leak from a docker-agent run to a third party. It scrubs known secret patterns (GitHub PATs, AWS keys, Stripe / Slack / GitLab tokens, JWTs, private keys, Docker Hub PATs, …) at two enforcement points:

  1. Before the LLM call — a runtime-shipped MessageTransform walks every text-bearing field of the outgoing chat history (Content, ReasoningContent, MultiContent text parts, the legacy singular FunctionCall.Arguments, and each ToolCalls[i].Function.Arguments) and replaces detected secrets with [REDACTED].
  2. Before a tool call — a pre_tool_use builtin hook (redact_secrets) walks ToolInput recursively (into nested maps and slices) and emits an UpdatedInput with only the keys whose value actually changed.

Detection mirrors the MIT-licensed docker/mcp-gateway/pkg/secretsscan ruleset (re-licensed under Apache-2.0 in attribution) and is kept fast by a keyword pre-filter in front of every regex.

Files

  • pkg/secretsscan/ — new package: ruleset, ContainsSecrets, Redact, tests, benchmarks.
  • pkg/hooks/builtins/redact_secrets.go — the pre_tool_use builtin and its registration.
  • pkg/runtime/redact_secrets.go — the before_llm_call MessageTransform (gated on Agent.RedactSecrets()).
  • pkg/agent/{agent.go,opts.go}, pkg/teamloader/teamloader.go, pkg/config/latest/types.go — the redact_secrets: true agent flag plumbing.
  • agent-schema.json + examples/redact_secrets.yaml — schema entry and a runnable example.

Try it

```yaml
agents:
root:
model: openai/gpt-5-mini
instruction: You are a helpful assistant.
redact_secrets: true
toolsets:
- type: shell
```

Now any secret pasted by the user — or generated by a previous turn's tool — is replaced with `[REDACTED]` before it reaches the model OR a tool process. The surrounding text is preserved (`--token=[REDACTED]` rather than the whole arg disappearing).

Notes

  • False positive floor: every rule pairs a regex with a discriminating keyword, so plain English never trips detection. The new `TestRedactionMarkerIsNotASecret` locks in that the marker itself can't loop through the ruleset.
  • False negatives are possible: only patterns the ruleset recognises are scrubbed. This is defense-in-depth, not a substitute for keeping secrets out of the conversation in the first place.
  • No mutation of caller history: the transform clones `MultiContent` / `ToolCalls` / `FunctionCall` before rewriting, so persisted session messages stay byte-identical to what the user sent.
  • Concurrent-hook safety: pre_tool_use hooks aggregate via shallow `maps.Copy` in config order — the builtin emits only the keys it actually rewrote so it doesn't clobber a sibling hook's modifications.
  • Performance: `Redact` is `O(rules)` for clean inputs (just a keyword scan against a single lower-cased copy); 2 allocations on a 9 KB clean message in the benchmark.

Tests

`mise lint` and `mise test` are green. The new tests cover detection parity with upstream, redaction span correctness, idempotence, multi-secret handling, marker safety, the pre_tool_use builtin's recursion + "only changed keys" output, agent-flag wiring, and the runtime transform's per-surface scrubbing + non-mutation contract.

Assisted-By: docker-agent

dgageot added 3 commits April 28, 2026 15:45
Adds a single agent-level switch (redact_secrets: true) that brackets

every leak vector for known credentials shapes (GitHub PATs, AWS keys,

Stripe / Slack / GitLab tokens, JWTs, private keys, ...):

- pkg/secretsscan: detection + redaction, ruleset derived from the

  MIT-licensed docker/mcp-gateway/pkg/secretsscan.

- pkg/hooks/builtins/redact_secrets: pre_tool_use builtin that scrubs

  secret material from ToolInput before tools see it.

- pkg/runtime/redact_secrets: before_llm_call message transform that

  scrubs the same patterns from outgoing chat content (Content,

  MultiContent, and prior-turn ToolCall.Arguments).

- AgentConfig.RedactSecrets / agent.WithRedactSecrets / teamloader

  wiring so a single YAML knob enables both halves.

Assisted-By: docker-agent
Behavior is unchanged. The redact_secrets feature still scrubs the

same patterns from tool arguments (pre_tool_use builtin) and outgoing

chat messages (before_llm_call transform), gated on the same agent

flag.

Concretely:

- pkg/secretsscan: drop the hasKeyword method (just a free function

  inlined at the two call sites), drop the dead defensive branch in

  redactWithRule, use re.SubexpIndex instead of looping SubexpNames.

- pkg/hooks/builtins/redact_secrets: drop the unused RedactSecretsName

  type alias, fold redactMap and redactValue into one redactAny

  function, drop the slice-identity preservation that wasn't observable

  at the executor boundary.

- pkg/runtime/redact_secrets: fold redactMessageSecrets and

  redactSingleMessage into a single redactMessage, replace the

  hand-rolled copy-on-write loop with slices.Clone (Redact returns the

  same string on no-match, so allocations only happen when there's

  actually something to scrub anyway).

- Trim essay-length docstrings throughout (fields, options, tests).

Net: -203 lines. mise lint + mise test stay green.

Assisted-By: docker-agent
…nctionCall scrub

Five fixes from a self-review pass:

1. Bug: pre_tool_use builtin returned the entire ToolInput map in

   UpdatedInput. Hooks for one event run concurrently and aggregate

   via shallow maps.Copy in config order, so emitting unchanged keys

   would clobber a sibling hook's modifications. Fix: only emit keys

   whose value actually changed (new redactToolInput helper). This

   also removes the unsafe top-level type assertion on redactAny's

   any-typed return value.

2. Security gap: the runtime transform missed two text-bearing

   surfaces that round-trip to providers:

     - chat.Message.ReasoningContent (sent back to Anthropic, Bedrock,

       and DeepSeek as a thinking block)

     - the legacy singular chat.Message.FunctionCall.Arguments

       (still emitted by the OpenAI provider when set)

   A secret in either of those fields would round-trip to the next

   LLM call. Now both are scrubbed; the FunctionCall pointer is

   deep-copied so the caller's history stays untouched.

3. Performance: secretsscan.Redact called strings.ToLower(out) and

   compiledRules() once per rule (~85 iterations), which on long

   inputs dominated the work. Hoisted both out of the loop. Sound

   because RedactionMarker contains no rule keyword (locked in by

   the new TestRedactionMarkerIsNotASecret) so a stale lower-case

   only ever produces extra-cautious regex runs, not false negatives.

   Allocations on a 9KB clean message: ~85 -> 2.

4. Test gap: added TestRedactionMarkerIsNotASecret to lock in the

   safety property the idempotence claim relies on (the marker must

   not match any rule, even when surrounded by arbitrary text).

5. Added BenchmarkRedactCleanInput / BenchmarkRedactWithSecret as a

   regression guard for docker#3.

Tests + lint: green. Behavior on clean inputs and previously-tested

secrets is unchanged.

Assisted-By: docker-agent
@dgageot dgageot requested a review from a team as a code owner April 28, 2026 14:43
Copy link
Copy Markdown
Contributor

@aheritier aheritier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but missing docs updates

@dgageot dgageot merged commit 5015403 into docker:main Apr 28, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants