Merged
Conversation
We have been running with eager TODO
stefanhengl
approved these changes
Dec 16, 2022
camdencheek
approved these changes
Jan 3, 2023
keegancsmith
added a commit
to sourcegraph/sourcegraph-public-snapshot
that referenced
this pull request
Jan 9, 2023
We have been running with eager runDocSection decoding on sourcegraph.com for a month and have seen improvements to tail latency and average latency. We believe the lazy decoding was unnecessary since we so often do global symbol searches that we would constantly be decoding doc section data all the time. There is a chance that on a quiet instance we suddenly have a lot more RAM sitting in the heap that doesn't get claimed back since it stays alive. In that case the lazy decoding may in fact be better for them. As such we document how to disable in the CHANGELOG. For more details see the PR in zoekt sourcegraph/zoekt#503 Test Plan: tested already on sourcegraph.com.
keegancsmith
added a commit
to sourcegraph/sourcegraph-public-snapshot
that referenced
this pull request
Jan 9, 2023
We have been running with eager runDocSection decoding on sourcegraph.com for a month and have seen improvements to tail latency and average latency. We believe the lazy decoding was unnecessary since we so often do global symbol searches that we would constantly be decoding doc section data all the time. There is a chance that on a quiet instance we suddenly have a lot more RAM sitting in the heap that doesn't get claimed back since it stays alive. In that case the lazy decoding may in fact be better for them. As such we document how to disable in the CHANGELOG. For more details see the PR in zoekt sourcegraph/zoekt#503 Test Plan: tested already on sourcegraph.com.
peterguy
pushed a commit
that referenced
this pull request
Jan 10, 2023
We have been running with eager runDocSection decoding on sourcegraph.com this week and have seen improvements to tail latency and average latency. We believe the lazy decoding was unnecessary since we so often do global symbol searches that we would constantly be decoding doc section data all the time.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
We have been running with eager runDocSection decoding on sourcegraph.com this week and have seen improvements to tail latency and average latency. We believe the lazy decoding was unnecessary since we so often do global symbol searches that we would constantly be decoding doc section data all the time. See #312 for more context and below for details on perf.
This PR switches the feature flag to be opt-in to lazy section decoding. Additionally we remove the misguided warning, since that would trigger whenever symbols was disabled in a shard. I confirmed it is fine to have an empty []byte by reading old code (that predates lazy decoding) and experimenting.
There is a chance that on a quiet instance we suddenly have a lot more RAM sitting in the heap that doesn't get claimed back since it stays alive. In that case the lazy decoding may in fact be better for them. As such when this PR lands in Sourcegraph the changelog should document this just in case we get an increase in OOMs for low request volume instances.
Perf monitoring
First up time to first result from our continuous perf monitoring. This shows that nearly all our queries increased in perf vs what we observed a week ago.
This image shows how we much less we are allocating as time goes on.
The heap profiler though does say we have increased memory use by 3.4GB averaged over cluster. This is the main risk of this change. But what was happening before is with a global symbol query we would allocate that 3.4GB just for the request and then make the GC work super hard.
Below are some numbers from average profiles over the last 12 hours compared to a week ago at the same time. The improvements are more dramatic at higher percentiles (instead of averages). IE our tail latencies have likely improved a bunch, a lot of which is likely due to the GC running way less rather than just less IO.