Avoid double dispatch by default on HTTP.sys #44409

davidfowl · 2022-10-07T06:23:56Z

This is a similar change to the one made in Kestrel's transport layer. We avoid double dispatching from one thread pool thread to another when we're not running inline. Since the windows thread pool dispatches by default, we can just use that same thread to start the execution of the request. As a result, we need to dispatch the next accept call before executing the request.

This is the JsonHttpSys benchmark. Here is the crank command.

crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/json.benchmarks.yml --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/build/azure.profile.yml --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/build/ci.profile.yml --scenario https --profile intel-win-app --profile intel-load2-load --variable server=HttpSys --application.options.requiredOperatingSystem windows --application.framework net7.0 --application.collectDependencies true --application.options.collectCounters true --application.sdkVersion 8.0.100-alpha.1.22504.22

Results

application	before	Current
CPU Usage (%)	66	67	+1.52%
Cores usage (%)	1,834	1,877	+2.34%
Working Set (MB)	481	488	+1.46%
Private Memory (MB)	525	533	+1.52%
Build Time (ms)	1,920	1,859	-3.18%
Start Time (ms)	327	392	+19.88%
Published Size (KB)	118,666	118,669	+0.00%
.NET Core SDK Version	8.0.100-alpha.1.22504.22	8.0.100-alpha.1.22504.22
Max CPU Usage (%)	66	66	-1.34%
Max Working Set (MB)	503	511	+1.44%
Max GC Heap Size (MB)	241	277	+14.72%
Size of committed memory by the GC (MB)	450	456	+1.39%
Max Number of Gen 0 GCs / sec	7.00	8.00	+14.29%
Max Number of Gen 1 GCs / sec	2.00	2.00	0.00%
Max Number of Gen 2 GCs / sec	1.00	1.00	0.00%
Max Time in GC (%)	1.00	1.00	0.00%
Max Gen 0 Size (B)	34,923,992	40,624,112	+16.32%
Max Gen 1 Size (B)	7,417,808	6,384,088	-13.94%
Max Gen 2 Size (B)	1,510,984	1,458,928	-3.45%
Max LOH Size (B)	113,184	113,184	0.00%
Max POH Size (B)	1,777,656	830,056	-53.31%
Max Allocation Rate (B/sec)	1,705,638,936	1,837,851,808	+7.75%
Max GC Heap Fragmentation	90	93	+4.07%
# of Assemblies Loaded	88	88	0.00%
Max Exceptions (#/s)	0	0
Max Lock Contention (#/s)	1,260	1,294	+2.70%
Max ThreadPool Threads Count	48	48	0.00%
Max ThreadPool Queue Length	166	136	-18.07%
Max ThreadPool Items (#/s)	1,335,672	1,195,343	-10.51%
Max Active Timers	0	0
IL Jitted (B)	117,992	159,413	+35.10%
Methods Jitted	1,318	2,100	+59.33%

load	before	Current
CPU Usage (%)	40	41	+2.50%
Cores usage (%)	1,113	1,149	+3.23%
Working Set (MB)	49	49	0.00%
Private Memory (MB)	376	376	0.00%
Start Time (ms)	0	0
First Request (ms)	218	240	+10.09%
Requests/sec	405,756	427,578	+5.38%
Requests	6,126,807	6,456,235	+5.38%
Mean latency (ms)	0.70	0.63	-9.89%
Max latency (ms)	23.48	21.29	-9.33%
Bad responses	0	0
Socket errors	0	0
Read throughput (MB/s)	64.24	67.69	+5.37%
Latency 50th (ms)	0.57	0.55	-2.64%
Latency 75th (ms)	0.76	0.74	-3.01%
Latency 90th (ms)	1.02	0.97	-4.90%
Latency 99th (ms)	2.64	1.88	-28.79%

- This is a similar change to the one made in Kestrel's transport layer. We avoid double dispatching from one thread pool thread to another when we're not running inline. Since the windows thread pool dispatches by default, we can just use that same thread to start the execution of the request. As a result, we need to dispatch the next accept call before executing the request.