Skip to content

fix(httpclient): fall back when http.DefaultTransport is not *http.Transport#2970

Merged
dgageot merged 1 commit into
docker:mainfrom
Daniel-Kolev:fix/ssrf-safe-transport-wrapped-default
Jun 2, 2026
Merged

fix(httpclient): fall back when http.DefaultTransport is not *http.Transport#2970
dgageot merged 1 commit into
docker:mainfrom
Daniel-Kolev:fix/ssrf-safe-transport-wrapped-default

Conversation

@Daniel-Kolev
Copy link
Copy Markdown
Contributor

@Daniel-Kolev Daniel-Kolev commented Jun 2, 2026

Problem

NewSSRFSafeTransport in pkg/httpclient/ssrf.go panicked when http.DefaultTransport had been replaced by a wrapper. The offending line was a hard type assertion:

t := http.DefaultTransport.(*http.Transport).Clone()

This is fine when DefaultTransport is the stock *http.Transport. It panics the moment any caller installs a wrapper around it — and wrapping DefaultTransport is the standard process-wide observability pattern (http.DefaultTransport = otelhttp.NewTransport(http.DefaultTransport), as recommended by the otelhttp docs).

What we considered

  1. Walk Unwrap() to find the inner *http.Transport. Rejected. otelhttp.Transport (the exact wrapper that triggered this) doesn't implement Unwrap(), so a walk would be dead code for the case we actually need to fix. More importantly, generic unwrapping would silently strip any wrapper — including security-relevant ones (auth, retry) in future call sites. Wrong default.

  2. Rewrite the SSRF transport with explicit literal field values (no clone, no dependency on DefaultTransport). Cleanest in isolation, but introduces drift: when Go's stdlib changes a default (e.g. bumps MaxIdleConns) or adds a new tuning field, our hand-written copy doesn't pick it up. In practice Go is very conservative with these values, but the concern is real and worth acknowledging.

  3. Snapshot DefaultTransport at package init, before any global wrapping runs, and re-clone the snapshot on every call. This solves both the panic and the drift concern. We're not doing it in this PR (see "Out of scope" below) — leaving it on the table as the right next step if the current fix isn't enough.

  4. Dependency injection — stop reading DefaultTransport globally; pass transports/clients into callers explicitly. This is the architecturally ideal answer (testable, no init-order coupling, no global mutation), but it requires an upstream change to cagent's fetch.New(...) to accept an injected transport. Out of scope for a stop-the-bleeding fix.

This PR's approach

Replace the hard assertion with a guarded type-assertion (if base, ok := ...; ok):

  • If DefaultTransport is a plain *http.Transport (the normal case): Clone() it, inheriting proxy / TLS / idle-pool / HTTP/2 settings. Behaviour is identical to before.
  • If it's a wrapper: fall back to &http.Transport{Proxy: http.ProxyFromEnvironment} and emit a slog.Warn (including the actual wrapper type to aid debugging).

The SSRF dial check is installed in both paths — the security property is preserved unconditionally.

Known limitations of the fallback branch

  • HTTP/2 is not auto-configured in the fallback transport (the DefaultTransport enables H2 via internal stdlib hooks that a bare &http.Transport{} doesn't trigger). Outbound fetches in the wrapped case will be HTTP/1.1. Acceptable for the call sites we care about (OpenAPI spec fetches, attacker-influenced URLs); flagged here for reviewers.
  • Connection pool tuning is lostMaxIdleConns, IdleConnTimeout, TLSHandshakeTimeout, etc. fall back to Go's zero-value defaults rather than the DefaultTransport defaults. Again, acceptable for the call volume in question.

Callers that need full DefaultTransport parity should construct the SSRF transport at startup, before any global wrapping installs.

Test

TestNewSSRFSafeTransport_WrappedDefaultTransport in pkg/httpclient/ssrf_test.go is a regression test:

  1. Replaces http.DefaultTransport with a wrappedRoundTripper that does not implement Unwrap() (mirroring otelhttp.Transport's actual shape).
  2. Calls NewSSRFSafeTransport() and asserts it does not panic.
  3. Drives a request to http://127.0.0.1/ through the returned transport and asserts the SSRF dial check still fires ("non-public address").

The test is intentionally non-parallel because it mutates the http.DefaultTransport global; t.Cleanup restores the original before any parallel test reads it.

Out of scope (the ideal end state)

The right long-term shape for httpclient in this repo is:

  1. Stop sampling http.DefaultTransport at all — construct the SSRF transport from known-good explicit settings, or from a snapshot taken before any global wrapping.
  2. Add fetch.WithTransport(...) to cagent so downstream code can inject a known-good transport instead of relying on global state.
  3. Stop replacing http.DefaultTransport process-wide. Move telemetry onto the injected *http.Client instead.

Each of those is a bigger change than I have confidence to land in a single PR, and the current fix is conservative and addresses the immediate panic. Filing separately is the better next step.

@Daniel-Kolev Daniel-Kolev requested a review from a team as a code owner June 2, 2026 14:47
@Daniel-Kolev Daniel-Kolev requested a review from rumpl June 2, 2026 14:48
Copy link
Copy Markdown

@docker-agent docker-agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assessment: 🟡 NEEDS ATTENTION

Comment thread pkg/httpclient/ssrf.go
"NewSSRFSafeTransport is using a minimal fallback transport — "+
"proxy env vars are honoured but other DefaultTransport settings are not inherited",
"type", fmt.Sprintf("%T", http.DefaultTransport))
t = &http.Transport{Proxy: http.ProxyFromEnvironment}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Fallback transport missing TLS handshake and response header timeouts

The newly-added fallback path creates &http.Transport{Proxy: http.ProxyFromEnvironment} with no TLSHandshakeTimeout or ResponseHeaderTimeout. The net.Dialer.Timeout of 30 s only covers TCP connection establishment. Once the TCP handshake completes, an unresponsive HTTPS server could stall indefinitely in the TLS handshake phase, or — after a successful TLS handshake — during the wait for the first response byte.

Go's http.DefaultTransport sets both timeouts by default (TLSHandshakeTimeout: 10s, ResponseHeaderTimeout absent but IdleConnTimeout: 90s); the clone path inherits them, but the fallback does not.

Consider adding explicit timeouts to the fallback transport:

t = &http.Transport{
    Proxy:                 http.ProxyFromEnvironment,
    TLSHandshakeTimeout:   10 * time.Second,
    ResponseHeaderTimeout: 30 * time.Second,
    IdleConnTimeout:       90 * time.Second,
}

This is acknowledged as a known limitation in the PR description, but given that this transport is used for outbound fetches to attacker-influenced URLs, hanging indefinitely would be a denial-of-service vector worth closing.

@dgageot dgageot merged commit 29017ca into docker:main Jun 2, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants