chore/otel-collector: upgrade to v0.103.0, remove jaegerexporter#63171
chore/otel-collector: upgrade to v0.103.0, remove jaegerexporter#63171Chickensoupwithrice merged 31 commits intomainfrom
Conversation
|
Caution License checking failed, please read: how to deal with third parties licensing. |
1 similar comment
|
Caution License checking failed, please read: how to deal with third parties licensing. |
jhchabran
left a comment
There was a problem hiding this comment.
This PR updates GRPC from 1.61 to 1.64 which deprecates the .Dial* functions. This is caught by the CI, and we most likely want to update it.
Looking at jaegertracing/jaeger#5330 it seems that there are a few little semantics differences to take in account, making this potentially not trivial.
I'll ping the source team as @ggilmore has been leading this for a while.
|
I think I am fine with keeping the deprecated method for now, given the complexity laid out in the thread JH linked. The release notes say:
So LGTM. |
| - MPL-2.0 | ||
| - :who: | ||
| :why: | ||
| :versions: [] |
There was a problem hiding this comment.
this seems to be true for all things in here, and is unrelated to the PR, but I am confused why versions is empty :D
There was a problem hiding this comment.
I think it's optional fields you can record when tracking a dependency change, e.g. "this code was MPL for vX to vY"
go.mod
Outdated
| github.com/gomodule/redigo => github.com/gomodule/redigo v1.8.9 | ||
| // Pending: Renamed to github.com/google/gnostic. Transitive deps still use the old name (kubernetes/kubernetes). | ||
| github.com/googleapis/gnostic => github.com/googleapis/gnostic v0.5.5 | ||
| // Pending a release cut of https://github.com/prometheus/alertmanager/pull/3010 |
go.mod
Outdated
| cloud.google.com/go/bigquery v1.61.0 | ||
| cloud.google.com/go/kms v1.17.1 | ||
| cloud.google.com/go/monitoring v1.19.0 | ||
| cloud.google.com/go/profiler v0.4.0 | ||
| cloud.google.com/go/pubsub v1.37.0 | ||
| cloud.google.com/go/secretmanager v1.11.5 | ||
| cloud.google.com/go/storage v1.38.0 | ||
| github.com/GoogleCloudPlatform/opentelemetry-operations-go/exporter/metric v0.45.0 | ||
| github.com/GoogleCloudPlatform/opentelemetry-operations-go/exporter/trace v1.21.0 | ||
| cloud.google.com/go/pubsub v1.38.0 | ||
| cloud.google.com/go/secretmanager v1.13.1 | ||
| cloud.google.com/go/storage v1.40.0 | ||
| github.com/GoogleCloudPlatform/opentelemetry-operations-go/exporter/metric v0.48.0 |
There was a problem hiding this comment.
Sad that these all had to upgrade for this otel bump, makes it harder to track down regressions :)
There was a problem hiding this comment.
yeah the dependency tree is pretty wild. I did upgrade aggressively but we've ran into more issues with not upgrading in lockstep, than we have with upgrading everything OTEL depends on in one go
32336fe to
75327cc
Compare
| COLLECTOR_OTLP_ENABLED: 'true' | ||
| JAEGER_OTLP_GRPC_PORT: 4320 | ||
| JAEGER_OTLP_HTTP_PORT: 4321 |
There was a problem hiding this comment.
We need to enable this on deploy-*
| (patternInfo . TextPatternInfo{"content",case,nopath,filematchlimit:1000000,lang:cpp,F:"(?i)(?:\\.cpp$)|(?:\\.c\\+\\+$)|(?:\\.cc$)|(?:\\.cp$)|(?:\\.cppm$)|(?:\\.cxx$)|(?:\\.h$)|(?:\\.h\\+\\+$)|(?:\\.hh$)|(?:\\.hpp$)|(?:\\.hxx$)|(?:\\.inc$)|(?:\\.inl$)|(?:\\.ino$)|(?:\\.ipp$)|(?:\\.ixx$)|(?:\\.re$)|(?:\\.tcc$)|(?:\\.tpp$)|(?:\\.txx$)"}) | ||
| (numRepos . 0) | ||
| (pathRegexps . [(?i)(?:\.cpp$)|(?:\.c\+\+$)|(?:\.cc$)|(?:\.cp$)|(?:\.cppm$)|(?:\.cxx$)|(?:\.h$)|(?:\.h\+\+$)|(?:\.hh$)|(?:\.hpp$)|(?:\.hxx$)|(?:\.inc$)|(?:\.inl$)|(?:\.ino$)|(?:\.ipp$)|(?:\.ixx$)|(?:\.re$)|(?:\.tcc$)|(?:\.tpp$)|(?:\.txx$)]) | ||
| (pathRegexps . ["(?i)(?:\\.cpp$)|(?:\\.c\\+\\+$)|(?:\\.cc$)|(?:\\.cp$)|(?:\\.cppm$)|(?:\\.cxx$)|(?:\\.h$)|(?:\\.h\\+\\+$)|(?:\\.hh$)|(?:\\.hpp$)|(?:\\.hxx$)|(?:\\.inc$)|(?:\\.inl$)|(?:\\.ino$)|(?:\\.ipp$)|(?:\\.ixx$)|(?:\\.re$)|(?:\\.tcc$)|(?:\\.tpp$)|(?:\\.txx$)"]) |
There was a problem hiding this comment.
Do we know where this is coming from? cc @varungandhi-src perhaps?
There was a problem hiding this comment.
@jhchabran sorry, I don't know about this. I've flagged this in the search platform team channel.
There was a problem hiding this comment.
Apparently we use go.opentelemetry.io/otel to encode values in the s-exp printer for the tests (see here).
Without looking at the source of go.opentelemetry.io/otel, My best bet is they changed the way they process arrays.
There was a problem hiding this comment.
There was a problem hiding this comment.
Thanks @stefanhengl. Do you think this is creating a problem here? (I don't have context on this, was flagged as a reviewer and spotted this, which prompted me to reach out as this could break something)
There was a problem hiding this comment.
I don't think so. As far as I can see it is only used in tests.
<!-- PR description tips: https://www.notion.so/sourcegraph/Write-a-good-pull-request-description-610a7fd3e613496eb76f450db5a49b6e --> We need to update the wolfi lock image for https://github.com/sourcegraph/sourcegraph/pull/63171 in order for `sg run` to work We've made all the changes to the deployment repos for this to be pushed out in the release today. ## Test plan <!-- REQUIRED; info at https://docs-legacy.sourcegraph.com/dev/background-information/testing_principles --> Manually tested ## Changelog <!-- OPTIONAL; info at https://www.notion.so/sourcegraph/Writing-a-changelog-entry-dd997f411d524caabf0d8d38a24a878c --> - fix(build): update wolfi lock for otel-collector
…#63790) The OTEL upgrade https://github.com/sourcegraph/sourcegraph/pull/63171 bumps the `prometheus/common` package too far via transitive deps, causing us to generate configuration for alertmanager that altertmanager doesn't accept, at least until the alertmanager project cuts a new release with a newer version of `promethues/common`. For now we forcibly downgrade with a replace. Everything still builds, so we should be good to go. ## Test plan `sg start` and `sg run prometheus`. On `main`, editing `observability.alerts` will cause Alertmanager to refuse to accept the generated configuration. With this patch, all is well it seems - config changes go through as expected. This is a similar test plan for https://github.com/sourcegraph/sourcegraph/pull/63329 ## Changelog - Fix Prometheus Alertmanager configuration failing to apply `observability.alerts` from site config
…#63790) The OTEL upgrade https://github.com/sourcegraph/sourcegraph/pull/63171 bumps the `prometheus/common` package too far via transitive deps, causing us to generate configuration for alertmanager that altertmanager doesn't accept, at least until the alertmanager project cuts a new release with a newer version of `promethues/common`. For now we forcibly downgrade with a replace. Everything still builds, so we should be good to go. ## Test plan `sg start` and `sg run prometheus`. On `main`, editing `observability.alerts` will cause Alertmanager to refuse to accept the generated configuration. With this patch, all is well it seems - config changes go through as expected. This is a similar test plan for https://github.com/sourcegraph/sourcegraph/pull/63329 ## Changelog - Fix Prometheus Alertmanager configuration failing to apply `observability.alerts` from site config (cherry picked from commit ffa873f)
… generated config (#63793) The OTEL upgrade https://github.com/sourcegraph/sourcegraph/pull/63171 bumps the `prometheus/common` package too far via transitive deps, causing us to generate configuration for alertmanager that altertmanager doesn't accept, at least until the alertmanager project cuts a new release with a newer version of `promethues/common`. For now we forcibly downgrade with a replace. Everything still builds, so we should be good to go. ## Test plan `sg start` and `sg run prometheus`. On `main`, editing `observability.alerts` will cause Alertmanager to refuse to accept the generated configuration. With this patch, all is well it seems - config changes go through as expected. This is a similar test plan for https://github.com/sourcegraph/sourcegraph/pull/63329 ## Changelog - Fix Prometheus Alertmanager configuration failing to apply `observability.alerts` from site config <br> Backport ffa873f from #63790 Co-authored-by: Robert Lin <robert@bobheadxi.dev>
Thread: https://sourcegraph.slack.com/archives/C1JH2BEHZ/p1717797870638299
One problem caused by this upgrade is that the deprecated
jaegerexporterno longer builds at all with the last published version, so for the upgrade to go through it must be removed. I've updated localsg startto work with this change, but some Release team support is needed for deployment configuration + customer-facing docs changes: https://sourcegraph.slack.com/archives/C1JH2BEHZ/p1718143249191349?thread_ts=1717797870.638299&cid=C1JH2BEHZ, since current guidance asks customers to configurejaegerexporter.Part of https://linear.app/sourcegraph/issue/SEC-1680
Closes https://linear.app/sourcegraph/issue/CORE-177
Test plan
Followed steps shared in https://sourcegraph.slack.com/archives/C04MYFW01NV/p1718136211292469 to run locally, since
sg run jaeger otel-collectoralone is insufficient to get updated images:plus
sg wolfi lock opentelemetry-collectorwill makesg run otel-collectoruse the correct image.The above diffs SHOULD NOT be committed. The lock should happen post-merge.
Spot-checked some traces locally with:
Changelog
jaegerexporterhas been removed. Users ofexporter: { jaeger: ... }in the OpenTelemetry Collector should useexporter: { otlp/jaeger: ... }to send traces directly to Jaeger via its OTLP receiver.JAEGER_OTLP_GRPC_PORTas well as the existingJAEGER_HOSTconfiguration.