Use PoolingAsyncValueTaskMethodBuilder on various ReadAsync methods. #35011

adityamandaleeka · 2021-08-03T20:05:48Z

Use [AsyncMethodBuilder(typeof(PoolingAsyncValueTaskMethodBuilder<>))] on Kestrel ReadAsync methods to remove per read allocations in the mainline body reading logic.

Will collect allocation metrics before/after.

Fixes #34645

davidfowl · 2021-08-03T20:15:33Z

cc @stephentoub

stephentoub

LGTM iff resulting throughput numbers look good. I'm very interested.

adityamandaleeka · 2021-08-04T04:56:59Z

Here's what I've seen so far using a new websocket benchmark (aspnet/Benchmarks#1686)

Config: https://raw.githubusercontent.com/kondratyev-nv/Benchmarks/main/scenarios/websocket.benchmarks.yml

Unmodified build:

| application                             |                             |
| --------------------------------------- | --------------------------- |
| CPU Usage (%)                           | 90                          |
| Cores usage (%)                         | 2,518                       |
| Working Set (MB)                        | 406                         |
| Private Memory (MB)                     | 1,556                       |
| Build Time (ms)                         | 2,600                       |
| Start Time (ms)                         | 115                         |
| Published Size (KB)                     | 88,242                      |
| .NET Core SDK Version                   | 6.0.100-preview.6.21276.12  |
| ASP.NET Core Version                    | 6.0.0-rc.1.21403.21+e7d9161 |
| .NET Runtime Version                    | 6.0.0-rc.1.21403.6+b0cea40  |
| Max CPU Usage (%)                       | 101                         |
| Max Working Set (MB)                    | 425                         |
| Max GC Heap Size (MB)                   | 294                         |
| Size of committed memory by the GC (MB) | 358                         |
| Max Number of Gen 0 GCs / sec           | 2.00                        |
| Max Number of Gen 1 GCs / sec           | 1.00                        |
| Max Number of Gen 2 GCs / sec           | 1.00                        |
| Max Time in GC (%)                      | 0.00                        |
| Max Gen 0 Size (B)                      | 5,633,664                   |
| Max Gen 1 Size (B)                      | 3,544,152                   |
| Max Gen 2 Size (B)                      | 3,374,544                   |
| Max LOH Size (B)                        | 672                         |
| Max Allocation Rate (B/sec)             | 294,794,136                 |
| Max GC Heap Fragmentation               | 73                          |
| # of Assemblies Loaded                  | 73                          |
| Max Exceptions (#/s)                    | 600                         |
| Max Lock Contention (#/s)               | 30                          |
| Max ThreadPool Threads Count            | 48                          |
| Max ThreadPool Queue Length             | 25                          |
| Max ThreadPool Items (#/s)              | 787,024                     |
| Max Active Timers                       | 100                         |
| IL Jitted (B)                           | 190,097                     |
| Methods Jitted                          | 2,075                       |

With this pooling change:

| application                             |                             |
| --------------------------------------- | --------------------------- |
| CPU Usage (%)                           | 88                          |
| Cores usage (%)                         | 2,473                       |
| Working Set (MB)                        | 405                         |
| Private Memory (MB)                     | 1,554                       |
| Build Time (ms)                         | 6,082                       |
| Start Time (ms)                         | 129                         |
| Published Size (KB)                     | 88,242                      |
| .NET Core SDK Version                   | 6.0.100-preview.6.21276.12  |
| ASP.NET Core Version                    | 6.0.0-rc.1.21403.24+8eeeafc |
| .NET Runtime Version                    | 6.0.0-rc.1.21403.6+b0cea40  |
| Max CPU Usage (%)                       | 102                         |
| Max Working Set (MB)                    | 424                         |
| Max GC Heap Size (MB)                   | 310                         |
| Size of committed memory by the GC (MB) | 356                         |
| Max Number of Gen 0 GCs / sec           | 2.00                        |
| Max Number of Gen 1 GCs / sec           | 1.00                        |
| Max Number of Gen 2 GCs / sec           | 1.00                        |
| Max Time in GC (%)                      | 0.00                        |
| Max Gen 0 Size (B)                      | 7,059,768                   |
| Max Gen 1 Size (B)                      | 2,735,312                   |
| Max Gen 2 Size (B)                      | 3,142,848                   |
| Max LOH Size (B)                        | 672                         |
| Max Allocation Rate (B/sec)             | 203,759,352                 |
| Max GC Heap Fragmentation               | 65                          |
| # of Assemblies Loaded                  | 73                          |
| Max Exceptions (#/s)                    | 598                         |
| Max Lock Contention (#/s)               | 34                          |
| Max ThreadPool Threads Count            | 48                          |
| Max ThreadPool Queue Length             | 64                          |
| Max ThreadPool Items (#/s)              | 772,336                     |
| Max Active Timers                       | 100                         |
| IL Jitted (B)                           | 230,296                     |
| Methods Jitted                          | 2,793                       |

[EDIT: Removed text here that said there wasn't a noticeable improvement... I had ignored the Max Allocation Rate delta at first because I wasn't sure it was stable, but the improvement does seem to hold.]

JamesNK · 2021-08-07T23:06:07Z

Cool! I didn't know this had been added. I've got a whole bunch of methods in HTTP/3 where this will be useful:

Can look at HTTP/3 in a different PR.

stephentoub · 2021-08-07T23:16:56Z

Here's what I've seen so far using a new websocket benchmark

Is there a number here that represents throughput?

I've got a whole bunch of methods in HTTP/3 where this will be useful

Please measure (not just managed allocation reduction). There are tradeoffs here.

davidfowl · 2021-08-07T23:29:40Z

Please measure (not just managed allocation reduction). There are tradeoffs here.

There's no throughput regression but there's no improvement from this benchmark either. Just a reduction in allocation rate so far. One thing we've excited to look at is running in more memory constrained environments to see if we run faster because we're spending less time doing GCs.

Also want test using the new conserve GC mode to see if there's any difference.

JamesNK · 2021-08-07T23:41:08Z

What are the negatives of using ValueTask pooling? I briefly looked at the implementation and it appears to use thread-local storage. Are you worried about overall memory going up?

Most of the HTTP/3 methods where state machines are being allocated will have exactly one caller per object. For methods like those, have you considered a builder that stashes the state machine on the object in a field instead of TLS? (if that is even possible. I'm not familar with this area 😬)

stephentoub · 2021-08-07T23:45:34Z

What are the negatives of using ValueTask pooling?

Potential for:

Increased working set
Increased costs of GCs that do occur due to likely gen2 -> gen0 references from the pooled objects
Every await being a bit more expensive due to more interface invocations

have you considered a builder that stashes the state machine on the object in a field instead of TLS?

It's not possible without language help, and we cut it from C# 10 due to not being able to find a good way to express it with a builder, which would likely incur other costs like delegate invocations for every rent and return. We might revisit for C# 11.

stephentoub · 2021-08-07T23:46:29Z

There's no throughput regression but there's no improvement from this benchmark either.

What should I be looking at in Aditya's numbers to see that? I was expecting some kind of websocket messages/sec number.

adityamandaleeka · 2021-08-09T19:25:41Z

@stephentoub I didn't include this part in the original post, but here are the client metrics from those runs (including the throughput figures):

Before:

| load                  |                             |
| --------------------- | --------------------------- |
| CPU Usage (%)         | 37                          |
| Cores usage (%)       | 1,045                       |
| Working Set (MB)      | 58                          |
| Private Memory (MB)   | 535                         |
| Build Time (ms)       | 2,427                       |
| Start Time (ms)       | 78                          |
| Published Size (KB)   | 69,422                      |
| .NET Core SDK Version | 6.0.100-preview.6.21276.12  |
| ASP.NET Core Version  | 6.0.0-rc.1.21403.21+e7d9161 |
| .NET Runtime Version  | 6.0.0-rc.1.21403.6+b0cea40  |
| Max RPS               | 326,169                     |
| Requests              | 4,896,120                   |
| Mean latency (ms)     | 0                           |

After:


| load                  |                             |
| --------------------- | --------------------------- |
| CPU Usage (%)         | 39                          |
| Cores usage (%)       | 1,086                       |
| Working Set (MB)      | 59                          |
| Private Memory (MB)   | 527                         |
| Build Time (ms)       | 2,691                       |
| Start Time (ms)       | 78                          |
| Published Size (KB)   | 69,422                      |
| .NET Core SDK Version | 6.0.100-preview.6.21276.12  |
| ASP.NET Core Version  | 6.0.0-rc.1.21403.24+8eeeafc |
| .NET Runtime Version  | 6.0.0-rc.1.21403.6+b0cea40  |
| Max RPS               | 325,692                     |
| Requests              | 4,888,316                   |
| Mean latency (ms)     | 0                           |

The benchmark PR is still in progress (see comments there about what these numbers mean), but we can see that the throughput difference between the runs is tiny (~0.1%).

adityamandaleeka · 2021-08-16T17:51:18Z

An update: I tested this change with other benchmarks in the lab (plaintext, json, fortunes) and, as in the websocket case, observed no throughput or latency impact when running them.

To summarize where we are, we've seen some allocation reduction with this change (based on the data above), but no signs of any other impact--positive or negative--on the benchmarks that have been run. We'll get more coverage once this is merged and gets picked up by all the automated benchmarking runs. If we see any regressions at that point, we could back out the change. How does that sound?

Raw Data

Plaintext

Note: P90 latency of this one is highly variable from run to run (I've seen 22 ms in a "before" run...), so the delta here shouldn't be taken seriously.

Before:

| application           |                            |
| --------------------- | -------------------------- |
| CPU Usage (%)         | 94                         |
| Cores usage (%)       | 2,642                      |
| Working Set (MB)      | 391                        |
| Private Memory (MB)   | 808                        |
| Build Time (ms)       | 3,000                      |
| Start Time (ms)       | 337                        |
| Published Size (KB)   | 109,320                    |
| .NET Core SDK Version | 6.0.100-preview.6.21276.12 |
| ASP.NET Core Version  | 6.0.0-rc.1.21413.5+fad9658 |
| .NET Runtime Version  | 6.0.0-rc.1.21413.9+60d9b98 |


| load                   |            |
| ---------------------- | ---------- |
| CPU Usage (%)          | 68         |
| Cores usage (%)        | 1,893      |
| Working Set (MB)       | 48         |
| Private Memory (MB)    | 376        |
| Start Time (ms)        | 0          |
| First Request (ms)     | 233        |
| Requests/sec           | 2,663,433  |
| Requests               | 40,215,905 |
| Mean latency (ms)      | 1.68       |
| Max latency (ms)       | 166.39     |
| Bad responses          | 0          |
| Socket errors          | 0          |
| Read throughput (MB/s) | 335.29     |
| Latency 50th (ms)      | 0.88       |
| Latency 75th (ms)      | 1.28       |
| Latency 90th (ms)      | 1.76       |
| Latency 99th (ms)      | 0.00       |

After:

| application           |                            |
| --------------------- | -------------------------- |
| CPU Usage (%)         | 94                         |
| Cores usage (%)       | 2,641                      |
| Working Set (MB)      | 387                        |
| Private Memory (MB)   | 804                        |
| Build Time (ms)       | 3,022                      |
| Start Time (ms)       | 353                        |
| Published Size (KB)   | 109,320                    |
| .NET Core SDK Version | 6.0.100-preview.6.21276.12 |
| ASP.NET Core Version  | 6.0.0-rc.1.21413.5+fad9658 |
| .NET Runtime Version  | 6.0.0-rc.1.21413.9+60d9b98 |


| load                   |            |
| ---------------------- | ---------- |
| CPU Usage (%)          | 67         |
| Cores usage (%)        | 1,882      |
| Working Set (MB)       | 49         |
| Private Memory (MB)    | 376        |
| Start Time (ms)        | 0          |
| First Request (ms)     | 266        |
| Requests/sec           | 2,709,348  |
| Requests               | 40,904,982 |
| Mean latency (ms)      | 1.73       |
| Max latency (ms)       | 214.63     |
| Bad responses          | 0          |
| Socket errors          | 0          |
| Read throughput (MB/s) | 341.07     |
| Latency 50th (ms)      | 0.89       |
| Latency 75th (ms)      | 1.29       |
| Latency 90th (ms)      | 2.64       |
| Latency 99th (ms)      | 0.00       |

JSON

Before:

| application           |                            |
| --------------------- | -------------------------- |
| CPU Usage (%)         | 82                         |
| Cores usage (%)       | 2,296                      |
| Working Set (MB)      | 361                        |
| Private Memory (MB)   | 443                        |
| Build Time (ms)       | 5,835                      |
| Start Time (ms)       | 296                        |
| Published Size (KB)   | 109,320                    |
| .NET Core SDK Version | 6.0.100-preview.6.21276.12 |
| ASP.NET Core Version  | 6.0.0-rc.1.21413.5+fad9658 |
| .NET Runtime Version  | 6.0.0-rc.1.21413.9+60d9b98 |


| load                   |            |
| ---------------------- | ---------- |
| CPU Usage (%)          | 52         |
| Cores usage (%)        | 1,458      |
| Working Set (MB)       | 37         |
| Private Memory (MB)    | 358        |
| Start Time (ms)        | 0          |
| First Request (ms)     | 113        |
| Requests/sec           | 748,260    |
| Requests               | 11,298,306 |
| Mean latency (ms)      | 0.67       |
| Max latency (ms)       | 192.01     |
| Bad responses          | 0          |
| Socket errors          | 0          |
| Read throughput (MB/s) | 108.47     |
| Latency 50th (ms)      | 0.23       |
| Latency 75th (ms)      | 0.48       |
| Latency 90th (ms)      | 1.15       |
| Latency 99th (ms)      | 6.61       |

After:

| application           |                            |
| --------------------- | -------------------------- |
| CPU Usage (%)         | 79                         |
| Cores usage (%)       | 2,225                      |
| Working Set (MB)      | 362                        |
| Private Memory (MB)   | 782                        |
| Build Time (ms)       | 9,493                      |
| Start Time (ms)       | 308                        |
| Published Size (KB)   | 109,321                    |
| .NET Core SDK Version | 6.0.100-preview.6.21276.12 |
| ASP.NET Core Version  | 6.0.0-rc.1.21413.9+4076bad |
| .NET Runtime Version  | 6.0.0-rc.1.21413.9+60d9b98 |


| load                   |            |
| ---------------------- | ---------- |
| CPU Usage (%)          | 52         |
| Cores usage (%)        | 1,454      |
| Working Set (MB)       | 37         |
| Private Memory (MB)    | 358        |
| Start Time (ms)        | 0          |
| First Request (ms)     | 141        |
| Requests/sec           | 740,802    |
| Requests               | 11,185,649 |
| Mean latency (ms)      | 0.66       |
| Max latency (ms)       | 112.99     |
| Bad responses          | 0          |
| Socket errors          | 0          |
| Read throughput (MB/s) | 107.39     |
| Latency 50th (ms)      | 0.23       |
| Latency 75th (ms)      | 0.48       |
| Latency 90th (ms)      | 1.00       |
| Latency 99th (ms)      | 7.14       |

Fortunes

Before:

| db                  |         |
| ------------------- | ------- |
| CPU Usage (%)       | 25      |
| Cores usage (%)     | 711     |
| Working Set (MB)    | 47      |
| Build Time (ms)     | 1,943   |
| Start Time (ms)     | 474     |
| Published Size (KB) | 914,909 |


| application           |                            |
| --------------------- | -------------------------- |
| CPU Usage (%)         | 75                         |
| Cores usage (%)       | 2,114                      |
| Working Set (MB)      | 409                        |
| Private Memory (MB)   | 824                        |
| Build Time (ms)       | 2,876                      |
| Start Time (ms)       | 1,850                      |
| Published Size (KB)   | 93,209                     |
| .NET Core SDK Version | 6.0.100-preview.6.21276.12 |
| ASP.NET Core Version  | 6.0.0-rc.1.21416.2+98ee247 |
| .NET Runtime Version  | 6.0.0-rc.1.21415.6+fde6b37 |


| load                   |           |
| ---------------------- | --------- |
| CPU Usage (%)          | 31        |
| Cores usage (%)        | 861       |
| Working Set (MB)       | 37        |
| Private Memory (MB)    | 363       |
| Start Time (ms)        | 0         |
| First Request (ms)     | 100       |
| Requests/sec           | 327,682   |
| Requests               | 4,943,368 |
| Mean latency (ms)      | 2.04      |
| Max latency (ms)       | 102.00    |
| Bad responses          | 0         |
| Socket errors          | 0         |
| Read throughput (MB/s) | 425.31    |
| Latency 50th (ms)      | 1.37      |
| Latency 75th (ms)      | 1.92      |
| Latency 90th (ms)      | 3.50      |
| Latency 99th (ms)      | 13.52     |

After:

| db                  |         |
| ------------------- | ------- |
| CPU Usage (%)       | 25      |
| Cores usage (%)     | 713     |
| Working Set (MB)    | 47      |
| Build Time (ms)     | 2,020   |
| Start Time (ms)     | 475     |
| Published Size (KB) | 914,909 |


| application           |                            |
| --------------------- | -------------------------- |
| CPU Usage (%)         | 74                         |
| Cores usage (%)       | 2,074                      |
| Working Set (MB)      | 409                        |
| Private Memory (MB)   | 824                        |
| Build Time (ms)       | 2,676                      |
| Start Time (ms)       | 1,903                      |
| Published Size (KB)   | 93,209                     |
| .NET Core SDK Version | 6.0.100-preview.6.21276.12 |
| ASP.NET Core Version  | 6.0.0-rc.1.21416.2+98ee247 |
| .NET Runtime Version  | 6.0.0-rc.1.21415.6+fde6b37 |


| load                   |           |
| ---------------------- | --------- |
| CPU Usage (%)          | 31        |
| Cores usage (%)        | 863       |
| Working Set (MB)       | 38        |
| Private Memory (MB)    | 363       |
| Start Time (ms)        | 0         |
| First Request (ms)     | 99        |
| Requests/sec           | 328,971   |
| Requests               | 4,967,324 |
| Mean latency (ms)      | 2.07      |
| Max latency (ms)       | 95.46     |
| Bad responses          | 0         |
| Socket errors          | 0         |
| Read throughput (MB/s) | 426.99    |
| Latency 50th (ms)      | 1.36      |
| Latency 75th (ms)      | 1.90      |
| Latency 90th (ms)      | 3.50      |
| Latency 99th (ms)      | 13.70     |

adityamandaleeka · 2021-08-17T20:12:04Z

Merging per plan in comment above.

davidfowl · 2022-04-24T03:07:35Z

Looking at a new websocket profile with SignalR we missed one of these.

Use PoolingAsyncValueTaskMethodBuilder on various ReadAsync methods.

d2f7576

adityamandaleeka requested a review from davidfowl August 3, 2021 20:05

stephentoub reviewed Aug 3, 2021

View reviewed changes

Pilchie added the area-runtime label Aug 4, 2021

adityamandaleeka marked this pull request as ready for review August 16, 2021 18:48

adityamandaleeka requested review from BrennanConroy, Pilchie, Tratcher and halter73 as code owners August 16, 2021 18:48

davidfowl approved these changes Aug 17, 2021

View reviewed changes

adityamandaleeka merged commit 8556362 into dotnet:main Aug 17, 2021

ghost added this to the 6.0-rc1 milestone Aug 17, 2021

davidfowl mentioned this pull request Apr 25, 2022

Pool the missing ReadAsync state machine #41345

Merged

amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Jun 6, 2023

github-actions bot locked and limited conversation to collaborators Dec 8, 2023

Use PoolingAsyncValueTaskMethodBuilder on various ReadAsync methods. #35011

Use PoolingAsyncValueTaskMethodBuilder on various ReadAsync methods. #35011

Uh oh!

Conversation

adityamandaleeka commented Aug 3, 2021

Uh oh!

davidfowl commented Aug 3, 2021

Uh oh!

stephentoub left a comment

Choose a reason for hiding this comment

Uh oh!

adityamandaleeka commented Aug 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JamesNK commented Aug 7, 2021

Uh oh!

stephentoub commented Aug 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davidfowl commented Aug 7, 2021

Uh oh!

JamesNK commented Aug 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stephentoub commented Aug 7, 2021

Uh oh!

stephentoub commented Aug 7, 2021

Uh oh!

adityamandaleeka commented Aug 9, 2021

Uh oh!

adityamandaleeka commented Aug 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Raw Data

Plaintext

JSON

Fortunes

Uh oh!

adityamandaleeka commented Aug 17, 2021

Uh oh!

davidfowl commented Apr 24, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

adityamandaleeka commented Aug 4, 2021 •

edited

Loading

stephentoub commented Aug 7, 2021 •

edited

Loading

JamesNK commented Aug 7, 2021 •

edited

Loading

adityamandaleeka commented Aug 16, 2021 •

edited

Loading