fix: update BlobReadSession ScatteringByteChannel projection to use less CPU#3324
Merged
BenWhitehead merged 2 commits intomainfrom Oct 6, 2025
Merged
fix: update BlobReadSession ScatteringByteChannel projection to use less CPU#3324BenWhitehead merged 2 commits intomainfrom
BenWhitehead merged 2 commits intomainfrom
Conversation
…ess CPU Switch to using a graceful poll of the queue rather than a spin loop poll. Elements being added to the queue will happen asynchronously from a background gRPC thread. Rather than immediately returning if the queue is empty allow waiting up to 10 microseconds (possibly not optimal duration, but testing shows good results -- before ~51% of cpu time is spent in `StreamingRead.read`, after ~13% of cpu time is spent in `StreamingRead.read` for roughly the same wall time 1.59% before, 1.52% after. This also has the added benefit of reduced perceived latency for an application. ##### profiled workload results Download 128MiB object from 0-EOF 1000 times, while async profiler is profiling `event=cpu`. Using `StorageDataClient.fastOpenReadSession` to begin the read while creating the session. All values below are latency in milliseconds | | p50 | p90 | p95 | |--------|------:|------:|------:| | before | ~ 266 | ~ 408 | ~ 468 | | after | ~ 253 | ~ 384 | ~ 431 | * Remove no longer necessary Buffers.totalRemaining call
bajajneha27
previously approved these changes
Oct 3, 2025
…ment from the queue
bajajneha27
approved these changes
Oct 6, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Switch to using a graceful poll of the queue rather than a spin loop poll.
Elements being added to the queue will happen asynchronously from a background gRPC thread. Rather than immediately returning if the queue is empty allow waiting up to 10 microseconds (possibly not optimal duration, but testing shows good results -- before ~51% of cpu time is spent in
StreamingRead.read, after ~13% of cpu time is spent inStreamingRead.readfor roughly the same wall time 1.59% before, 1.52% after). This also has the added benefit of reduced perceived latency for an application.profiled workload results
Download 128MiB object from 0-EOF 1000 times, while async profiler is profiling
event=cpu. UsingStorageDataClient.fastOpenReadSessionto begin the read while creating the session.All values below are latency in milliseconds
async-profilers flame graphs