default to eager runeDocSection#503

keegancsmith · 2022-12-15T13:26:14Z

We have been running with eager runDocSection decoding on sourcegraph.com this week and have seen improvements to tail latency and average latency. We believe the lazy decoding was unnecessary since we so often do global symbol searches that we would constantly be decoding doc section data all the time. See #312 for more context and below for details on perf.

This PR switches the feature flag to be opt-in to lazy section decoding. Additionally we remove the misguided warning, since that would trigger whenever symbols was disabled in a shard. I confirmed it is fine to have an empty []byte by reading old code (that predates lazy decoding) and experimenting.

There is a chance that on a quiet instance we suddenly have a lot more RAM sitting in the heap that doesn't get claimed back since it stays alive. In that case the lazy decoding may in fact be better for them. As such when this PR lands in Sourcegraph the changelog should document this just in case we get an increase in OOMs for low request volume instances.

Perf monitoring

First up time to first result from our continuous perf monitoring. This shows that nearly all our queries increased in perf vs what we observed a week ago.

histogram_quantile(0.5, sum by (query_name, le)(rate(search_blitz_first_result_seconds_bucket{query_name=~"^(literal|mono|regex)_.*",query_name!="literal_repo_excluded_scope",query_name!~".*(structural|_rev_|diff|symbol|commit).*"}[1h]))) - histogram_quantile(0.5, sum by (query_name, le)(rate(search_blitz_first_result_seconds_bucket{query_name=~"^(literal|mono|regex)_.*",query_name!="literal_repo_excluded_scope",query_name!~".*(structural|_rev_|diff|symbol|commit).*"}[1h] offset 7d)))

This image shows how we much less we are allocating as time goes on.

The heap profiler though does say we have increased memory use by 3.4GB averaged over cluster. This is the main risk of this change. But what was happening before is with a global symbol query we would allocate that 3.4GB just for the request and then make the GC work super hard.

Below are some numbers from average profiles over the last 12 hours compared to a week ago at the same time. The improvements are more dramatic at higher percentiles (instead of averages). IE our tail latencies have likely improved a bunch, a lot of which is likely due to the GC running way less rather than just less IO.

runtime.gcBgMarkWorker
/usr/local/go/src/runtime/mgc.go
total:439.44 ms vs. 1.06 s (-623.96 ms), 3.34% vs. 7.52% (-4.17%)self:0 vs. 80 µs (-80 µs), 0% vs. 0.001% (-0.001%)

github.com/sourcegraph/zoekt.(*indexData).Search
/go/src/github.com/sourcegraph/zoekt/eval.go
total:10.67 s vs. 11.57 s (-894.44 ms), 81% vs. 82% (-0.608%)self:54.24 ms vs. 46.4 ms (+7.84 ms), 0.412% vs. 0.328% (+0.084%)

We have been running with eager TODO

We have been running with eager runDocSection decoding on sourcegraph.com for a month and have seen improvements to tail latency and average latency. We believe the lazy decoding was unnecessary since we so often do global symbol searches that we would constantly be decoding doc section data all the time. There is a chance that on a quiet instance we suddenly have a lot more RAM sitting in the heap that doesn't get claimed back since it stays alive. In that case the lazy decoding may in fact be better for them. As such we document how to disable in the CHANGELOG. For more details see the PR in zoekt sourcegraph/zoekt#503 Test Plan: tested already on sourcegraph.com.

We have been running with eager runDocSection decoding on sourcegraph.com this week and have seen improvements to tail latency and average latency. We believe the lazy decoding was unnecessary since we so often do global symbol searches that we would constantly be decoding doc section data all the time.

default to eager runeDocSection

3c25b03

We have been running with eager TODO

keegancsmith requested review from a team and camdencheek December 15, 2022 13:26

stefanhengl approved these changes Dec 16, 2022

View reviewed changes

camdencheek approved these changes Jan 3, 2023

View reviewed changes

keegancsmith merged commit 6d5ed59 into main Jan 9, 2023

keegancsmith deleted the k/lazy-default branch January 9, 2023 13:11

keegancsmith mentioned this pull request Jan 9, 2023

zoekt: default to eager runeDocSection sourcegraph/sourcegraph-public-snapshot#46245

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

default to eager runeDocSection#503

default to eager runeDocSection#503
keegancsmith merged 1 commit intomainfrom
k/lazy-default

keegancsmith commented Dec 15, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

keegancsmith commented Dec 15, 2022

Perf monitoring

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants