fix(httpclient): fall back when http.DefaultTransport is not *http.Transport#2970
Conversation
| "NewSSRFSafeTransport is using a minimal fallback transport — "+ | ||
| "proxy env vars are honoured but other DefaultTransport settings are not inherited", | ||
| "type", fmt.Sprintf("%T", http.DefaultTransport)) | ||
| t = &http.Transport{Proxy: http.ProxyFromEnvironment} |
There was a problem hiding this comment.
[MEDIUM] Fallback transport missing TLS handshake and response header timeouts
The newly-added fallback path creates &http.Transport{Proxy: http.ProxyFromEnvironment} with no TLSHandshakeTimeout or ResponseHeaderTimeout. The net.Dialer.Timeout of 30 s only covers TCP connection establishment. Once the TCP handshake completes, an unresponsive HTTPS server could stall indefinitely in the TLS handshake phase, or — after a successful TLS handshake — during the wait for the first response byte.
Go's http.DefaultTransport sets both timeouts by default (TLSHandshakeTimeout: 10s, ResponseHeaderTimeout absent but IdleConnTimeout: 90s); the clone path inherits them, but the fallback does not.
Consider adding explicit timeouts to the fallback transport:
t = &http.Transport{
Proxy: http.ProxyFromEnvironment,
TLSHandshakeTimeout: 10 * time.Second,
ResponseHeaderTimeout: 30 * time.Second,
IdleConnTimeout: 90 * time.Second,
}This is acknowledged as a known limitation in the PR description, but given that this transport is used for outbound fetches to attacker-influenced URLs, hanging indefinitely would be a denial-of-service vector worth closing.
Problem
NewSSRFSafeTransportinpkg/httpclient/ssrf.gopanicked whenhttp.DefaultTransporthad been replaced by a wrapper. The offending line was a hard type assertion:This is fine when
DefaultTransportis the stock*http.Transport. It panics the moment any caller installs a wrapper around it — and wrappingDefaultTransportis the standard process-wide observability pattern (http.DefaultTransport = otelhttp.NewTransport(http.DefaultTransport), as recommended by theotelhttpdocs).What we considered
Walk
Unwrap()to find the inner*http.Transport. Rejected.otelhttp.Transport(the exact wrapper that triggered this) doesn't implementUnwrap(), so a walk would be dead code for the case we actually need to fix. More importantly, generic unwrapping would silently strip any wrapper — including security-relevant ones (auth, retry) in future call sites. Wrong default.Rewrite the SSRF transport with explicit literal field values (no clone, no dependency on
DefaultTransport). Cleanest in isolation, but introduces drift: when Go's stdlib changes a default (e.g. bumpsMaxIdleConns) or adds a new tuning field, our hand-written copy doesn't pick it up. In practice Go is very conservative with these values, but the concern is real and worth acknowledging.Snapshot
DefaultTransportat package init, before any global wrapping runs, and re-clone the snapshot on every call. This solves both the panic and the drift concern. We're not doing it in this PR (see "Out of scope" below) — leaving it on the table as the right next step if the current fix isn't enough.Dependency injection — stop reading
DefaultTransportglobally; pass transports/clients into callers explicitly. This is the architecturally ideal answer (testable, no init-order coupling, no global mutation), but it requires an upstream change to cagent'sfetch.New(...)to accept an injected transport. Out of scope for a stop-the-bleeding fix.This PR's approach
Replace the hard assertion with a guarded type-assertion (
if base, ok := ...; ok):DefaultTransportis a plain*http.Transport(the normal case):Clone()it, inheriting proxy / TLS / idle-pool / HTTP/2 settings. Behaviour is identical to before.&http.Transport{Proxy: http.ProxyFromEnvironment}and emit aslog.Warn(including the actual wrapper type to aid debugging).The SSRF dial check is installed in both paths — the security property is preserved unconditionally.
Known limitations of the fallback branch
DefaultTransportenables H2 via internal stdlib hooks that a bare&http.Transport{}doesn't trigger). Outbound fetches in the wrapped case will be HTTP/1.1. Acceptable for the call sites we care about (OpenAPI spec fetches, attacker-influenced URLs); flagged here for reviewers.MaxIdleConns,IdleConnTimeout,TLSHandshakeTimeout, etc. fall back to Go's zero-value defaults rather than theDefaultTransportdefaults. Again, acceptable for the call volume in question.Callers that need full
DefaultTransportparity should construct the SSRF transport at startup, before any global wrapping installs.Test
TestNewSSRFSafeTransport_WrappedDefaultTransportinpkg/httpclient/ssrf_test.gois a regression test:http.DefaultTransportwith awrappedRoundTripperthat does not implementUnwrap()(mirroringotelhttp.Transport's actual shape).NewSSRFSafeTransport()and asserts it does not panic.http://127.0.0.1/through the returned transport and asserts the SSRF dial check still fires ("non-public address").The test is intentionally non-parallel because it mutates the
http.DefaultTransportglobal;t.Cleanuprestores the original before any parallel test reads it.Out of scope (the ideal end state)
The right long-term shape for
httpclientin this repo is:http.DefaultTransportat all — construct the SSRF transport from known-good explicit settings, or from a snapshot taken before any global wrapping.fetch.WithTransport(...)to cagent so downstream code can inject a known-good transport instead of relying on global state.http.DefaultTransportprocess-wide. Move telemetry onto the injected*http.Clientinstead.Each of those is a bigger change than I have confidence to land in a single PR, and the current fix is conservative and addresses the immediate panic. Filing separately is the better next step.