Skip to content

Conversation

@davidfowl
Copy link
Member

@davidfowl davidfowl commented Oct 7, 2022

  • This is a similar change to the one made in Kestrel's transport layer. We avoid double dispatching from one thread pool thread to another when we're not running inline. Since the windows thread pool dispatches by default, we can just use that same thread to start the execution of the request. As a result, we need to dispatch the next accept call before executing the request.

This is the JsonHttpSys benchmark. Here is the crank command.

crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/json.benchmarks.yml --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/build/azure.profile.yml --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/build/ci.profile.yml --scenario https --profile intel-win-app --profile intel-load2-load --variable server=HttpSys --application.options.requiredOperatingSystem windows --application.framework net7.0 --application.collectDependencies true --application.options.collectCounters true --application.sdkVersion 8.0.100-alpha.1.22504.22

Results

application before Current
CPU Usage (%) 66 67 +1.52%
Cores usage (%) 1,834 1,877 +2.34%
Working Set (MB) 481 488 +1.46%
Private Memory (MB) 525 533 +1.52%
Build Time (ms) 1,920 1,859 -3.18%
Start Time (ms) 327 392 +19.88%
Published Size (KB) 118,666 118,669 +0.00%
.NET Core SDK Version 8.0.100-alpha.1.22504.22 8.0.100-alpha.1.22504.22
Max CPU Usage (%) 66 66 -1.34%
Max Working Set (MB) 503 511 +1.44%
Max GC Heap Size (MB) 241 277 +14.72%
Size of committed memory by the GC (MB) 450 456 +1.39%
Max Number of Gen 0 GCs / sec 7.00 8.00 +14.29%
Max Number of Gen 1 GCs / sec 2.00 2.00 0.00%
Max Number of Gen 2 GCs / sec 1.00 1.00 0.00%
Max Time in GC (%) 1.00 1.00 0.00%
Max Gen 0 Size (B) 34,923,992 40,624,112 +16.32%
Max Gen 1 Size (B) 7,417,808 6,384,088 -13.94%
Max Gen 2 Size (B) 1,510,984 1,458,928 -3.45%
Max LOH Size (B) 113,184 113,184 0.00%
Max POH Size (B) 1,777,656 830,056 -53.31%
Max Allocation Rate (B/sec) 1,705,638,936 1,837,851,808 +7.75%
Max GC Heap Fragmentation 90 93 +4.07%
# of Assemblies Loaded 88 88 0.00%
Max Exceptions (#/s) 0 0
Max Lock Contention (#/s) 1,260 1,294 +2.70%
Max ThreadPool Threads Count 48 48 0.00%
Max ThreadPool Queue Length 166 136 -18.07%
Max ThreadPool Items (#/s) 1,335,672 1,195,343 -10.51%
Max Active Timers 0 0
IL Jitted (B) 117,992 159,413 +35.10%
Methods Jitted 1,318 2,100 +59.33%
load before Current
CPU Usage (%) 40 41 +2.50%
Cores usage (%) 1,113 1,149 +3.23%
Working Set (MB) 49 49 0.00%
Private Memory (MB) 376 376 0.00%
Start Time (ms) 0 0
First Request (ms) 218 240 +10.09%
Requests/sec 405,756 427,578 +5.38%
Requests 6,126,807 6,456,235 +5.38%
Mean latency (ms) 0.70 0.63 -9.89%
Max latency (ms) 23.48 21.29 -9.33%
Bad responses 0 0
Socket errors 0 0
Read throughput (MB/s) 64.24 67.69 +5.37%
Latency 50th (ms) 0.57 0.55 -2.64%
Latency 75th (ms) 0.76 0.74 -3.01%
Latency 90th (ms) 1.02 0.97 -4.90%
Latency 99th (ms) 2.64 1.88 -28.79%

- This is a similar change to the one made in Kestrel's transport layer. We avoid double dispatching from one thread pool thread to another when we're not running inline. Since the windows thread pool dispatches by default, we can just use that same thread to start the execution of the request. As a result, we need to dispatch the next accept call before executing the request.
@ghost ghost added the area-runtime label Oct 7, 2022
@davidfowl davidfowl marked this pull request as ready for review October 8, 2022 02:10
@davidfowl davidfowl added the Perf label Oct 8, 2022
_asyncAcceptContext = asyncAcceptContext;
_messagePump = messagePump;
_workerIndex = workerIndex;
_preferInlineScheduling = _messagePump._options.UnsafePreferInlineScheduling;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Is there a reason for a field for this flag? Everything else is pulled off _messagePump

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid multiple layers of indirection per request.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it a local in ExecuteAsync above the while?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That while loop lasts for a single iteration when things are being dispatched (the default scenario).

_asyncAcceptContext = asyncAcceptContext;
_messagePump = messagePump;
_workerIndex = workerIndex;
_preferInlineScheduling = _messagePump._options.UnsafePreferInlineScheduling;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it a local in ExecuteAsync above the while?

@davidfowl davidfowl merged commit 7d17cb5 into main Oct 11, 2022
@davidfowl davidfowl deleted the davidfowl/avoid-double-dispatch branch October 11, 2022 04:54
@ghost ghost added this to the 8.0-preview1 milestone Oct 11, 2022
@amcasey amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Jun 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions Perf

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants