grpc: symbols: spread LocalCodeIntel response symbols across multiple messages#55242
grpc: symbols: spread LocalCodeIntel response symbols across multiple messages#55242
Conversation
67cdb83 to
c6f8bef
Compare
27eb72d to
ef82944
Compare
ef82944 to
77fc77e
Compare
|
Codenotify: Notifying subscribers in CODENOTIFY files for diff af19739...6aafcca.
|
| type localCodeIntelSender struct { | ||
| sendFunc func(symbols []*proto.LocalCodeIntelResponse_Symbol) error | ||
|
|
||
| buf []*proto.LocalCodeIntelResponse_Symbol | ||
| } |
There was a problem hiding this comment.
nit: seems like this will be the pattern used for pretty much every implementor of Sender. Should we just define something like the following?
type BufferedSender[T] {
sendFunc: func([]T) error,
buf []T
}There was a problem hiding this comment.
or maybe even just merge the buffer into the Chunker?
There was a problem hiding this comment.
I can see the merit of the BufferedSender. Do you think we should get rid of the Sender interface entirely and replace it with BufferedSender, or just provide BufferedSender as a "default" implementation?
There was a problem hiding this comment.
I can't actually think of a reason we'd need the Sender interface. The slice buffer + callback func seems general enough for any situation I can come up with
There was a problem hiding this comment.
@camdencheek How does 334eba7 (#55242) look?
Co-authored-by: Camden Cheek <camden@ccheek.com>
…across multiple messages (#55292) This PR changes the LocalCodeIntel symbols gRPC method to return a _stream_ of chunks of symbols, rather than the entire result set in a single message. --- gRPC has default limits on how large individual messages can be. This is because gRPC is a message-based framework, which means that messages have to be entirely loaded into memory for **both** sending and receiving. As such, large allocations can negatively impact server performance and stability. The go-grpc default limit is 4MB. The original LocalCodeIntel RPC implementation (both gRPC and REST) returned the entire set of returned symbols in one message. For certain repositories / files, this message can contains _thousands_ of symbols, which can add up to **[hundreds of MB](https://console.cloud.google.com/logs/query;cursorTimestamp=2023-07-24T15:43:57.509461420Z;duration=P7D;query=resource.type%3D%22k8s_container%22%0Aresource.labels.project_id%3D%22sourcegraph-dev%22%0Aresource.labels.location%3D%22us-central1-f%22%0Aresource.labels.cluster_name%3D%22cloud%22%0Aresource.labels.namespace_name%3D%22prod%22%0AjsonPayload.InstrumentationScope%3D~%22gRPC.internal.error.reporter%22%0A-jsonPayload.message%3D%22grpc:%20error%20while%20marshaling:%20string%20field%20contains%20invalid%20UTF-8%22%0A-jsonPayload.Body%3D%22grpc:%20error%20while%20marshaling:%20string%20field%20contains%20invalid%20UTF-8%22%0Atimestamp%3D%222023-07-24T15:43:57.509461420Z%22%0AinsertId%3D%22f964g54awjrevxzi%22?project=sourcegraph-dev) or even [gigabytes of memory](https://sourcegraph.slack.com/archives/C04HCK4K3DL/p1683228335350079).** This PR adjusts the server-side implementation of LocalCodeIntel to chunk up its full result set into smaller chunks (so that each individual gRPC message is smaller). It does this in a few steps. - Adopting a `chunk` utility I found in the [Gitaly](https://gitlab.com/gitlab-org/gitaly/-/blob/v16.2.0/internal/helper/chunk/chunker.go) project that can intelligently send a group of protobuf messages in smaller chunks (all ~ 1MB): sourcegraph/sourcegraph@59ba6d0 - I made some minor tweaks to the original code here: - using generics for type-safety - supporting variadic arguments for convenience / ergonomics - removing some unneeded gitaly packages from the test setup - Using the above `chunk` utility in the Symbols service to divide up the list of returned symbols into ~ 1MB batches and sending it across the result stream: sourcegraph/sourcegraph@77fc77e --- Note that this PR doesn't "fix" the real issues with the underlying application itself. Notably: - The server is still calculating the entire result set in one shot (requiring it to hold all the symbols in a contiguous chunk of memory), as opposed to sending out incremental progress. - Is it necessary for the server to return thousands upon thousands of symbols, or should it support some sort of "limit" parameter? - The custom symbols client API still returns all these symbols in one giant slice (by joining all the received messages), so it still allocates a lot of memory - same as before as opposed to some sort of API where the result can be consumed incrementally. However, we still aren't any worse off than we were before (the REST implementation still has the same memory allocation problems and sends the results in one giant JSON message). This PR does "work around" the gRPC-specific message size complaints while providing a building block to improve this in the future. cc @sourcegraph/code-intel ^ ## Test plan 1. CI 2. Manual testing: - Using the [repository and file mentioned in this log message](https://console.cloud.google.com/logs/query;cursorTimestamp=2023-07-24T15:43:57.509461420Z;duration=P7D;query=resource.type%3D%22k8s_container%22%0Aresource.labels.project_id%3D%22sourcegraph-dev%22%0Aresource.labels.location%3D%22us-central1-f%22%0Aresource.labels.cluster_name%3D%22cloud%22%0Aresource.labels.namespace_name%3D%22prod%22%0AjsonPayload.InstrumentationScope%3D~%22gRPC.internal.error.reporter%22%0A-jsonPayload.message%3D%22grpc:%20error%20while%20marshaling:%20string%20field%20contains%20invalid%20UTF-8%22%0A-jsonPayload.Body%3D%22grpc:%20error%20while%20marshaling:%20string%20field%20contains%20invalid%20UTF-8%22%0Atimestamp%3D%222023-07-24T15:43:57.509461420Z%22%0AinsertId%3D%22f964g54awjrevxzi%22?project=sourcegraph-dev), run a local sourcegraph instance with the following diff applied (to simulate the absence of the chunker by setting the message size to 1 TB) ```diff diff --git a/internal/grpc/chunk/chunker.go b/internal/grpc/chunk/chunker.go index 947b721..c954854fc6 100644 --- a/internal/grpc/chunk/chunker.go +++ b/internal/grpc/chunk/chunker.go @@ -42,7 +42,7 @@ type Chunker[T Message] struct { } // maxMessageSize is the maximum size per protobuf message -const maxMessageSize = 1 * 1024 * 1024 +const maxMessageSize = 1 * 1024 * 1024 * 1024 * 1024 ``` - Nagivate to the above-mentioned repository and file, and hover over any token - see the expected log message: ``` [ frontend] ERROR symbolsConnectionCache.gRPC.internal.error.reporter.streamingMethod.postMessageReceive internalerrs/logging.go:239 grpc: received message larger than max (708588568 vs. 94371840) {"grpcService": "symbols.v1.SymbolsService", "grpcMethod": "LocalCodeIntel", "grpcCode": "ResourceExhausted", "initialRequestJSON": "[omitted]"} ``` - Revert the above diff (which should restore the 1MB chunking behavior), and hard-refresh the blob page and hover over the same token. Eventually (after 30s) on my machine, the request will complete and a hover will appear - no error occurs. <br> Backport 2a6c569 from #55242
… messages (#55242) Co-authored-by: Camden Cheek <camden@ccheek.com>
…sages sent by servers/clients (#55495) Follow up to https://github.com/sourcegraph/sourcegraph/pull/55209 and https://github.com/sourcegraph/sourcegraph/pull/55242. This PR adds interceptors that records Prometheus metrics that observe: - the individual size of each **sent** protobuf message by a server or client - the total amount data sent over the course a single RPC by a server (responses) or client (requests) This allows us to track the total amount of a data returned by any of our RPCs. In some cases, this can reveal opportunities for future performance / stability improvements (Example: symbols' [LocalCodeIntel method returning ~gigabyte sized responses that has to be held all at once in memory](https://github.com/sourcegraph/sourcegraph/pull/55242)). This PR also provides new grafana dashboards that track this metric for every gRPC service. See below for a screenshot of what this looks like when I run the symbols service locally. Co-authored-by: Geoffrey Gilmore <geoffrey@sourcegraph.com>
… messages (#55242) Co-authored-by: Camden Cheek <camden@ccheek.com>
This PR changes the LocalCodeIntel symbols gRPC method to return a stream of chunks of symbols, rather than the entire result set in a single message.
gRPC has default limits on how large individual messages can be. This is because gRPC is a message-based framework, which means that messages have to be entirely loaded into memory for both sending and receiving. As such, large allocations can negatively impact server performance and stability. The go-grpc default limit is 4MB.
The original LocalCodeIntel RPC implementation (both gRPC and REST) returned the entire set of returned symbols in one message. For certain repositories / files, this message can contains thousands of symbols, which can add up to hundreds of MB or even gigabytes of memory.
This PR adjusts the server-side implementation of LocalCodeIntel to chunk up its full result set into smaller chunks (so that each individual gRPC message is smaller). It does this in a few steps.
chunkutility I found in the Gitaly project that can intelligently send a group of protobuf messages in smaller chunks (all ~ 1MB): https://github.com/sourcegraph/sourcegraph/pull/55242/commits/59ba6d0aba0c98a82d46b7a2f24127bbf97181ccchunkutility in the Symbols service to divide up the list of returned symbols into ~ 1MB batches and sending it across the result stream: https://github.com/sourcegraph/sourcegraph/pull/55242/commits/77fc77ee88e638c31c3ba8cba2b9f50f225503c5Note that this PR doesn't "fix" the real issues with the underlying application itself. Notably:
However, we still aren't any worse off than we were before (the REST implementation still has the same memory allocation problems and sends the results in one giant JSON message). This PR does "work around" the gRPC-specific message size complaints while providing a building block to improve this in the future.
cc @sourcegraph/code-intel ^
Test plan