-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Use PoolingAsyncValueTaskMethodBuilder on various ReadAsync methods. #35011
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @stephentoub |
stephentoub
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM iff resulting throughput numbers look good. I'm very interested.
|
Here's what I've seen so far using a new websocket benchmark (aspnet/Benchmarks#1686) Config: https://raw.githubusercontent.com/kondratyev-nv/Benchmarks/main/scenarios/websocket.benchmarks.yml Unmodified build: With this pooling change: [EDIT: Removed text here that said there wasn't a noticeable improvement... I had ignored the Max Allocation Rate delta at first because I wasn't sure it was stable, but the improvement does seem to hold.] |
Is there a number here that represents throughput?
Please measure (not just managed allocation reduction). There are tradeoffs here. |
There's no throughput regression but there's no improvement from this benchmark either. Just a reduction in allocation rate so far. One thing we've excited to look at is running in more memory constrained environments to see if we run faster because we're spending less time doing GCs. Also want test using the new conserve GC mode to see if there's any difference. |
|
What are the negatives of using ValueTask pooling? I briefly looked at the implementation and it appears to use thread-local storage. Are you worried about overall memory going up? Most of the HTTP/3 methods where state machines are being allocated will have exactly one caller per object. For methods like those, have you considered a builder that stashes the state machine on the object in a field instead of TLS? (if that is even possible. I'm not familar with this area 😬) |
Potential for:
It's not possible without language help, and we cut it from C# 10 due to not being able to find a good way to express it with a builder, which would likely incur other costs like delegate invocations for every rent and return. We might revisit for C# 11. |
What should I be looking at in Aditya's numbers to see that? I was expecting some kind of websocket messages/sec number. |
|
@stephentoub I didn't include this part in the original post, but here are the client metrics from those runs (including the throughput figures): Before: After: The benchmark PR is still in progress (see comments there about what these numbers mean), but we can see that the throughput difference between the runs is tiny (~0.1%). |
|
An update: I tested this change with other benchmarks in the lab (plaintext, json, fortunes) and, as in the websocket case, observed no throughput or latency impact when running them. To summarize where we are, we've seen some allocation reduction with this change (based on the data above), but no signs of any other impact--positive or negative--on the benchmarks that have been run. We'll get more coverage once this is merged and gets picked up by all the automated benchmarking runs. If we see any regressions at that point, we could back out the change. How does that sound? Raw DataPlaintextNote: P90 latency of this one is highly variable from run to run (I've seen 22 ms in a "before" run...), so the delta here shouldn't be taken seriously. Before: After: JSONBefore: After: FortunesBefore: After: |
|
Merging per plan in comment above. |


Use
[AsyncMethodBuilder(typeof(PoolingAsyncValueTaskMethodBuilder<>))]on Kestrel ReadAsync methods to remove per read allocations in the mainline body reading logic.Will collect allocation metrics before/after.
Fixes #34645