Skip to content

[BUG] Python OTLP exporter hardcodes gRPC, ignores OTEL_EXPORTER_OTLP_PROTOCOL#1681

Merged
EItanya merged 14 commits into
kagent-dev:mainfrom
shmuelarditi:fix/otlp-protocol-selection
Apr 30, 2026
Merged

[BUG] Python OTLP exporter hardcodes gRPC, ignores OTEL_EXPORTER_OTLP_PROTOCOL#1681
EItanya merged 14 commits into
kagent-dev:mainfrom
shmuelarditi:fix/otlp-protocol-selection

Conversation

@shmuelarditi

Copy link
Copy Markdown
Contributor

Bug Description

The Python agent tracing code in python/packages/kagent-core/src/kagent/core/tracing/_utils.py hardcodes the gRPC OTLP exporter via top-level imports:

from opentelemetry.exporter.otlp.proto.grpc._log_exporter import OTLPLogExporter
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

The standard OTEL_EXPORTER_OTLP_TRACES_PROTOCOL and OTEL_EXPORTER_OTLP_PROTOCOL environment variables are completely ignored. This means agents cannot export traces over HTTP/protobuf, which is required by backends like Langfuse that only support OTLP over HTTP (not gRPC).

Steps to Reproduce

  1. Deploy kagent with otel.tracing.enabled: true and otel.tracing.exporter.otlp.protocol: http/protobuf
  2. Point the endpoint to an HTTP-only OTLP backend (e.g., Langfuse https://cloud.langfuse.com/api/public/otel)
  3. Observe agent pods have OTEL_EXPORTER_OTLP_TRACES_PROTOCOL=http/protobuf set correctly
  4. Agent pods still use the gRPC exporter and fail with:
opentelemetry.exporter.otlp.proto.grpc.exporter - ERROR - Failed to export traces, error code: StatusCode.UNKNOWN
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
    debug_error_string = "UNKNOWN:Error received from peer {grpc_status:2, grpc_message:"Received http2 header with status: 464"}"

Expected Behavior

The configure() function should respect OTEL_EXPORTER_OTLP_TRACES_PROTOCOL (or the general OTEL_EXPORTER_OTLP_PROTOCOL) and dynamically select the appropriate exporter, per the OpenTelemetry specification:

  • grpc (default) → opentelemetry.exporter.otlp.proto.grpc
  • http/protobufopentelemetry.exporter.otlp.proto.http

Fix

This PR replaces the hardcoded gRPC imports with factory functions (_create_span_exporter, _create_log_exporter) that resolve the protocol from env vars following the OTel spec precedence: signal-specific > general > default (grpc).

The fix applies to both the trace and log exporters. Default behavior (gRPC) is unchanged — this is fully backwards compatible.

Environment

  • kagent version: v0.8.6 (also confirmed on main branch)
  • Backend: Langfuse Cloud (EU), which only supports OTLP over HTTP/JSON and HTTP/protobuf (docs)

…upport

The OTLP span and log exporters were hardcoded to use gRPC, ignoring
the standard OTEL_EXPORTER_OTLP_TRACES_PROTOCOL and
OTEL_EXPORTER_OTLP_PROTOCOL environment variables. This prevented
exporting traces to backends that only support OTLP over HTTP, such
as Langfuse.

Replace the top-level gRPC imports with factory functions that
dynamically select the exporter based on the protocol env vars,
following the OpenTelemetry specification precedence (signal-specific
> general > default "grpc").

Signed-off-by: shmuelarditi <shmuelrdt@gmail.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes the Python agent’s OTLP exporter selection so it can honor OTEL_EXPORTER_OTLP_TRACES_PROTOCOL / OTEL_EXPORTER_OTLP_PROTOCOL (instead of always using the gRPC exporter), enabling OTLP over HTTP/protobuf for backends that don’t support gRPC.

Changes:

  • Replaces hardcoded gRPC OTLP exporter imports with protocol-resolving factory helpers (_create_span_exporter, _create_log_exporter).
  • Adds _resolve_otlp_protocol() to implement env-var precedence (signal-specific > general > default grpc).
  • Updates configure() to build span/log exporters via the new factories.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +35 to +50
if protocol == "http/protobuf":
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
else:
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
logging.info("Using %s protocol for trace exporter", protocol)
return OTLPSpanExporter(**kwargs)


def _create_log_exporter(**kwargs):
"""Create an OTLPLogExporter using the protocol from env vars."""
protocol = _resolve_otlp_protocol("LOGS")
if protocol == "http/protobuf":
from opentelemetry.exporter.otlp.proto.http._log_exporter import OTLPLogExporter
else:
from opentelemetry.exporter.otlp.proto.grpc._log_exporter import OTLPLogExporter
logging.info("Using %s protocol for log exporter", protocol)
Comment on lines +19 to +40
def _resolve_otlp_protocol(signal: str) -> str:
"""Resolve the OTLP protocol from signal-specific or general env vars.

Follows the OpenTelemetry specification precedence:
signal-specific (e.g. OTEL_EXPORTER_OTLP_TRACES_PROTOCOL) > general > default (grpc).
"""
return (
os.getenv(f"OTEL_EXPORTER_OTLP_{signal}_PROTOCOL")
or os.getenv("OTEL_EXPORTER_OTLP_PROTOCOL")
or "grpc"
)


def _create_span_exporter(**kwargs):
"""Create an OTLPSpanExporter using the protocol from env vars."""
protocol = _resolve_otlp_protocol("TRACES")
if protocol == "http/protobuf":
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
else:
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
logging.info("Using %s protocol for trace exporter", protocol)
return OTLPSpanExporter(**kwargs)
Comment on lines 132 to +143
# Check standard OTEL env vars: signal-specific endpoint first, then general endpoint
trace_endpoint = (
os.getenv("OTEL_EXPORTER_OTLP_TRACES_ENDPOINT")
or os.getenv("OTEL_TRACING_EXPORTER_OTLP_ENDPOINT") # Backward compatibility
or os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT")
)
trace_timeout_seconds = _resolve_otlp_timeout_seconds("TRACES")
logging.info("Trace endpoint: %s", trace_endpoint or "<default>")
if trace_endpoint:
processor = BatchSpanProcessor(OTLPSpanExporter(endpoint=trace_endpoint, timeout=trace_timeout_seconds))
processor = BatchSpanProcessor(_create_span_exporter(endpoint=trace_endpoint, timeout=trace_timeout_seconds))
else:
processor = BatchSpanProcessor(OTLPSpanExporter(timeout=trace_timeout_seconds))
processor = BatchSpanProcessor(_create_span_exporter(timeout=trace_timeout_seconds))
Comment on lines 4 to 8
from fastapi import FastAPI
from opentelemetry import _logs, trace
from opentelemetry.exporter.otlp.proto.grpc._log_exporter import OTLPLogExporter
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry.instrumentation.httpx import HTTPXClientInstrumentor
from opentelemetry.instrumentation.openai import OpenAIInstrumentor
- Add opentelemetry-exporter-otlp-proto-http to pyproject.toml so
  the http/protobuf import doesn't fail at runtime
- Normalize protocol value with strip().lower() for robustness
- Update test to monkeypatch _create_log_exporter instead of the
  removed module-level OTLPLogExporter

Signed-off-by: shmuelarditi <shmuelrdt@gmail.com>
@supreme-gg-gg

Copy link
Copy Markdown
Contributor

Thanks for the PR! Would you be able to do this for the go runtime as well? See https://github.com/kagent-dev/kagent/blob/main/go/adk/pkg/telemetry/tracing.go#L90

Apply the same protocol-aware exporter selection to the Go ADK
telemetry package. The newTracerProvider and newLoggerProvider
functions now check OTEL_EXPORTER_OTLP_{TRACES,LOGS}_PROTOCOL
(falling back to OTEL_EXPORTER_OTLP_PROTOCOL, then "grpc") and
create the corresponding HTTP or gRPC exporter.

This fixes the controller timeout when exporting to HTTP-only
backends like Langfuse.

Signed-off-by: shmuelarditi <shmuelrdt@gmail.com>
@shmuelarditi shmuelarditi requested a review from ilackarms as a code owner April 17, 2026 10:19
@shmuelarditi

shmuelarditi commented Apr 17, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the PR! Would you be able to do this for the go runtime as well? See https://github.com/kagent-dev/kagent/blob/main/go/adk/pkg/telemetry/tracing.go#L90
@supreme-gg-gg

Added the same protocol-aware selection to the Go runtime in go/adk/pkg/telemetry/tracing.go, see the latest commit.
newTracerProvider and newLoggerProvider now check OTEL_EXPORTER_OTLP_{TRACES,LOGS}_PROTOCOL and switch between gRPC and HTTP exporters accordingly.

@supreme-gg-gg

Copy link
Copy Markdown
Contributor

Great, you will need to sign your commits as well for DCO to pass in the CI

supreme-gg-gg and others added 4 commits April 17, 2026 17:39
Sets the setuid bit on `/usr/bin/bwrap` in both runtime `Dockerfiles` so
the non-root agent process (uid 1001) can create the user + network
namespaces that bubblewrap relies on to sandbox skills and executed
code. Without this, hosts with
`kernel.apparmor_restrict_unprivileged_userns=1` deny bwrap's
`RTM_NEWADDR` call when it brings up loopback, making every sandboxed
command fail and blocking two CI e2e tests.

The binary already runs inside a `privileged: true` Kubernetes pod, so
the container already has full host capabilities; setuid only changes
which process inside that pod holds them, and bubblewrap is a small,
audited tool specifically designed to be setuid-safe. Privilege mode is
dropped before running the user's command.

---------

Signed-off-by: Jet Chiang <pokyuen.jetchiang-ext@solo.io>
)

## Motivation

This will allow most users to directly switch from `runtime: python` to
`runtime: go` without needing to worry about existing LLM provider
configs since everything will be supported on the Go side, facilitating
adoption of the new go runtime

## Summary

Closes most of the gap between python and go identified in kagent-dev#1643 

- TLS and api key passthrough for LLM provider
- Support Ollama and Bedrock using client SDK as we've done earlier in
kagent-dev#1540
- Use Bedrock client instead of messages API for Anthropic on Bedrock to
support all bedrock runtime models
- Tightens tool config conversion for Anthropic + Bedrock and fixes
issues like kagent-dev#1645, kagent-dev#1683
- Sanitize ToolName for bedrock LLMs kagent-dev#1473, see [bedrock API
docs](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ToolSpecification.html)
- Refactor embedding models in python to be separate from memory service
- Strip approval confirmation synthetic tool calls from LLM requests,
these messages are persisted in task / events store and used by ADK
internally but sending them to the model will be wasting tokens and
confuse the model. If the session is long and has many HITL events,
these internal tool messages will be consuming many unnecessary token!

## Testing Plan

- [x] All new unit tests in go adk passes, all old unit tests in python
passes
- [x] Test for no regression with OpenAI and Gemini models in go runtime
- [x] Validate with a wide range of use cases such as: builtin tools
(ask user, save memory), ADK built-in tools (load memory) MCP tools,
Remote A2A (subagent) tools, HITL tools (approvals)
- [x] Bedrock LLM and embedding model in Go runtime
- [x] OpenAI API key passthrough with A2A `--token` option
- [x] Ollama LLM and embedding in Go runtime (local models, Gemma 4 +
embedding Gemma)
- [x] Ollama with TLS (local https server with self-signed certs)

---------

Signed-off-by: Jet Chiang <pokyuen.jetchiang-ext@solo.io>
)

CVE Scan for postgres:18.3-alpine
```
| CVE ID         | SEVERITY | PACKAGE      | FIXED IN                     | SCANNERS                    |
|----------------|----------|--------------|------------------------------|-----------------------------|
| CVE-2025-68121 | CRITICAL | stdlib       | 1.24.13, 1.25.7, 1.26.0-rc.3 | grype, trivy                |
| CVE-2025-58183 | HIGH     | stdlib       | 1.24.8, 1.25.2               | grype(MEDIUM), trivy(HIGH)  |
| CVE-2025-58187 | HIGH     | stdlib       | 1.24.9, 1.25.3               | grype(HIGH), trivy(MEDIUM)  |
| CVE-2025-58188 | HIGH     | stdlib       | 1.24.8, 1.25.2               | grype(HIGH), trivy(MEDIUM)  |
| CVE-2025-61723 | HIGH     | stdlib       | 1.24.8, 1.25.2               | grype(HIGH), trivy(MEDIUM)  |
| CVE-2025-61725 | HIGH     | stdlib       | 1.24.8, 1.25.2               | grype(HIGH), trivy(MEDIUM)  |
| CVE-2025-61726 | HIGH     | stdlib       | 1.24.12, 1.25.6              | grype, trivy                |
| CVE-2025-61728 | HIGH     | stdlib       | 1.24.12, 1.25.6              | grype(MEDIUM), trivy(HIGH)  |
| CVE-2025-61729 | HIGH     | stdlib       | 1.24.11, 1.25.5              | grype, trivy                |
| CVE-2025-61731 | HIGH     | stdlib       | 1.24.12, 1.25.6              | grype                       |
| CVE-2025-61732 | HIGH     | stdlib       | 1.24.13, 1.25.7              | grype                       |
| CVE-2026-25679 | HIGH     | stdlib       | 1.25.8, 1.26.1               | grype, trivy                |
| CVE-2026-27135 | HIGH     | nghttp2-libs | n/a                          | grype                       |
| CVE-2026-27140 | HIGH     | stdlib       | 1.25.9, 1.26.2               | grype                       |
| CVE-2026-32280 | HIGH     | stdlib       | 1.25.9, 1.26.2               | grype, trivy                |
| CVE-2026-32281 | HIGH     | stdlib       | 1.25.9, 1.26.2               | grype(HIGH), trivy(MEDIUM)  |
| CVE-2026-32282 | HIGH     | stdlib       | 1.25.9, 1.26.2               | grype(MEDIUM), trivy(HIGH)  |
| CVE-2026-32283 | HIGH     | stdlib       | 1.25.9, 1.26.2               | grype(HIGH), trivy(UNKNOWN) |
```

CVE Scan for postgres:18
```
| CVE ID         | SEVERITY | PACKAGE                 | FIXED IN                     | SCANNERS                    |
|----------------|----------|-------------------------|------------------------------|-----------------------------|
| CVE-2025-68121 | CRITICAL | stdlib                  | 1.24.13, 1.25.7, 1.26.0-rc.3 | grype, trivy                |
| CVE-2025-13151 | HIGH     | libtasn1-6              | n/a                          | grype(HIGH), trivy(MEDIUM)  |
| CVE-2025-58183 | HIGH     | stdlib                  | 1.24.8, 1.25.2               | grype(MEDIUM), trivy(HIGH)  |
| CVE-2025-58187 | HIGH     | stdlib                  | 1.24.9, 1.25.3               | grype(HIGH), trivy(MEDIUM)  |
| CVE-2025-58188 | HIGH     | stdlib                  | 1.24.8, 1.25.2               | grype(HIGH), trivy(MEDIUM)  |
| CVE-2025-61723 | HIGH     | stdlib                  | 1.24.8, 1.25.2               | grype(HIGH), trivy(MEDIUM)  |
| CVE-2025-61725 | HIGH     | stdlib                  | 1.24.8, 1.25.2               | grype(HIGH), trivy(MEDIUM)  |
| CVE-2025-61726 | HIGH     | stdlib                  | 1.24.12, 1.25.6              | grype, trivy                |
| CVE-2025-61728 | HIGH     | stdlib                  | 1.24.12, 1.25.6              | grype(MEDIUM), trivy(HIGH)  |
| CVE-2025-61729 | HIGH     | stdlib                  | 1.24.11, 1.25.5              | grype, trivy                |
| CVE-2025-61731 | HIGH     | stdlib                  | 1.24.12, 1.25.6              | grype                       |
| CVE-2025-61732 | HIGH     | stdlib                  | 1.24.13, 1.25.7              | grype                       |
| CVE-2025-69720 | HIGH     | libncursesw6            | n/a                          | grype, trivy                |
| CVE-2025-69720 | HIGH     | libtinfo6               | n/a                          | grype, trivy                |
| CVE-2025-69720 | HIGH     | ncurses-base            | n/a                          | grype, trivy                |
| CVE-2025-69720 | HIGH     | ncurses-bin             | n/a                          | grype, trivy                |
| CVE-2026-24882 | HIGH     | dirmngr                 | n/a                          | grype, trivy                |
| CVE-2026-24882 | HIGH     | gnupg                   | n/a                          | grype, trivy                |
| CVE-2026-24882 | HIGH     | gnupg-l10n              | n/a                          | grype, trivy                |
| CVE-2026-24882 | HIGH     | gpg                     | n/a                          | grype, trivy                |
| CVE-2026-24882 | HIGH     | gpg-agent               | n/a                          | grype, trivy                |
| CVE-2026-24882 | HIGH     | gpgconf                 | n/a                          | grype, trivy                |
| CVE-2026-24882 | HIGH     | gpgsm                   | n/a                          | grype, trivy                |
| CVE-2026-25679 | HIGH     | stdlib                  | 1.25.8, 1.26.1               | grype, trivy                |
| CVE-2026-2673  | HIGH     | libssl3t64              | 3.5.5-1~deb13u2              | grype(HIGH), trivy(LOW)     |
| CVE-2026-2673  | HIGH     | openssl                 | 3.5.5-1~deb13u2              | grype(HIGH), trivy(LOW)     |
| CVE-2026-2673  | HIGH     | openssl-provider-legacy | 3.5.5-1~deb13u2              | grype(HIGH), trivy(LOW)     |
| CVE-2026-27140 | HIGH     | stdlib                  | 1.25.9, 1.26.2               | grype                       |
| CVE-2026-28388 | HIGH     | libssl3t64              | 3.5.5-1~deb13u2              | grype(HIGH), trivy(MEDIUM)  |
| CVE-2026-28388 | HIGH     | openssl                 | 3.5.5-1~deb13u2              | grype(HIGH), trivy(MEDIUM)  |
| CVE-2026-28388 | HIGH     | openssl-provider-legacy | 3.5.5-1~deb13u2              | grype(HIGH), trivy(MEDIUM)  |
| CVE-2026-28389 | HIGH     | libssl3t64              | 3.5.5-1~deb13u2              | grype(HIGH), trivy(MEDIUM)  |
| CVE-2026-28389 | HIGH     | openssl                 | 3.5.5-1~deb13u2              | grype(HIGH), trivy(MEDIUM)  |
| CVE-2026-28389 | HIGH     | openssl-provider-legacy | 3.5.5-1~deb13u2              | grype(HIGH), trivy(MEDIUM)  |
| CVE-2026-28390 | HIGH     | libssl3t64              | 3.5.5-1~deb13u2              | grype, trivy                |
| CVE-2026-28390 | HIGH     | openssl                 | 3.5.5-1~deb13u2              | grype, trivy                |
| CVE-2026-28390 | HIGH     | openssl-provider-legacy | 3.5.5-1~deb13u2              | grype, trivy                |
| CVE-2026-29111 | HIGH     | libsystemd0             | n/a                          | grype(MEDIUM), trivy(HIGH)  |
| CVE-2026-29111 | HIGH     | libudev1                | n/a                          | grype(MEDIUM), trivy(HIGH)  |
| CVE-2026-31790 | HIGH     | libssl3t64              | 3.5.5-1~deb13u2              | grype(HIGH), trivy(MEDIUM)  |
| CVE-2026-31790 | HIGH     | openssl                 | 3.5.5-1~deb13u2              | grype(HIGH), trivy(MEDIUM)  |
| CVE-2026-31790 | HIGH     | openssl-provider-legacy | 3.5.5-1~deb13u2              | grype(HIGH), trivy(MEDIUM)  |
| CVE-2026-32280 | HIGH     | stdlib                  | 1.25.9, 1.26.2               | grype, trivy                |
| CVE-2026-32281 | HIGH     | stdlib                  | 1.25.9, 1.26.2               | grype(HIGH), trivy(MEDIUM)  |
| CVE-2026-32282 | HIGH     | stdlib                  | 1.25.9, 1.26.2               | grype(MEDIUM), trivy(HIGH)  |
| CVE-2026-32283 | HIGH     | stdlib                  | 1.25.9, 1.26.2               | grype(HIGH), trivy(UNKNOWN) |
| CVE-2026-4046  | HIGH     | libc-bin                | n/a                          | grype(HIGH), trivy(MEDIUM)  |
| CVE-2026-4046  | HIGH     | libc-l10n               | n/a                          | grype(HIGH), trivy(MEDIUM)  |
| CVE-2026-4046  | HIGH     | libc6                   | n/a                          | grype(HIGH), trivy(MEDIUM)  |
| CVE-2026-4046  | HIGH     | locales                 | n/a                          | grype(HIGH), trivy(MEDIUM)  |
| CVE-2026-4437  | HIGH     | libc-bin                | n/a                          | grype(HIGH), trivy(MEDIUM)  |
| CVE-2026-4437  | HIGH     | libc-l10n               | n/a                          | grype(HIGH), trivy(MEDIUM)  |
| CVE-2026-4437  | HIGH     | libc6                   | n/a                          | grype(HIGH), trivy(MEDIUM)  |
| CVE-2026-4437  | HIGH     | locales                 | n/a                          | grype(HIGH), trivy(MEDIUM)  |
```

---------

Signed-off-by: Jonathan Jamroga <jjamroga@gmail.com>
Co-authored-by: Eitan Yarmush <eitan.yarmush@solo.io>
@shmuelarditi shmuelarditi force-pushed the fix/otlp-protocol-selection branch from b1a2796 to 80d84d1 Compare April 19, 2026 09:10
@shmuelarditi

Copy link
Copy Markdown
Contributor Author

Great, you will need to sign your commits as well for DCO to pass in the CI

@supreme-gg-gg oh yea, done.

@supreme-gg-gg

Copy link
Copy Markdown
Contributor

Seems like you need to update uv lock and reformat python

@shmuelarditi

shmuelarditi commented Apr 23, 2026

Copy link
Copy Markdown
Contributor Author

@supreme-gg-gg Done! Updated the uv.lock and reformatted the Python code with ruff.
let me know if anything else is needed.

@supreme-gg-gg supreme-gg-gg self-assigned this Apr 23, 2026

@supreme-gg-gg supreme-gg-gg left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks @shmuelarditi ! I've tested both runtimes with Jaeger

@EItanya EItanya merged commit 8a29628 into kagent-dev:main Apr 30, 2026
41 of 42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants