Skip to content

perf: use GSO#2593

Merged
larseggert merged 17 commits into
mozilla:mainfrom
mxinden:gso-v3
Jul 1, 2025
Merged

perf: use GSO#2593
larseggert merged 17 commits into
mozilla:mainfrom
mxinden:gso-v3

Conversation

@mxinden

@mxinden mxinden commented Apr 18, 2025

Copy link
Copy Markdown
Member

Use generic send offloading (GSO) on Linux and UDP segment offloading (USO) on Windows.

GSO and USO allow us to batch multiple datagrams into one large payload (up to 64 KB) and pass it in a single system call to the kernel. The kernel either itself segments it, or has the NIC segment it, before sending it out on the network.

Early measurements show an up to 2x throughput improvement on artificial CPU bound localhost transfer benchmark.


Attempt 1: f25b0b7
Attempt 2: #2532

Compared to attempt 2:

  • implements the datagram batching in neqo-transport instead of neqo-bin
  • does not copy each datagram in the larger GSO buffer, but instead writes each into the GSO buffer right away.

Once this is merged, we can switch to a long-lived send buffer (see discussed in #2670). #2677 and this pull request lay the groundwork for it.

@github-actions

github-actions Bot commented Apr 18, 2025

Copy link
Copy Markdown
Contributor

Failed Interop Tests

QUIC Interop Runner, client vs. server, differences relative to 66be2e6.

neqo-latest as client

neqo-latest as server

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

@github-actions

github-actions Bot commented Apr 18, 2025

Copy link
Copy Markdown
Contributor

Benchmark results

Performance differences relative to 95f9bed.

1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: 💚 Performance has improved.
       time:   [202.13 ms 202.48 ms 202.85 ms]
       thrpt:  [492.97 MiB/s 493.86 MiB/s 494.74 MiB/s]
change:
       time:   [−69.170% −69.106% −69.043%] (p = 0.00 < 0.05)
       thrpt:  [+223.02% +223.69% +224.36%]

Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild

1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: Change within noise threshold.
       time:   [304.31 ms 305.83 ms 307.35 ms]
       thrpt:  [32.536 Kelem/s 32.698 Kelem/s 32.862 Kelem/s]
change:
       time:   [+0.6959% +1.3800% +2.0501%] (p = 0.00 < 0.05)
       thrpt:  [−2.0089% −1.3612% −0.6911%]
1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: 💔 Performance has regressed.
       time:   [27.525 ms 27.597 ms 27.673 ms]
       thrpt:  [36.136  elem/s 36.236  elem/s 36.331  elem/s]
change:
       time:   [+1.1068% +1.8128% +2.4725%] (p = 0.00 < 0.05)
       thrpt:  [−2.4128% −1.7805% −1.0947%]

Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild

1-conn/1-100mb-req/mtu-1504 (aka. Upload)/client: 💚 Performance has improved.
       time:   [648.95 ms 654.10 ms 659.22 ms]
       thrpt:  [151.69 MiB/s 152.88 MiB/s 154.10 MiB/s]
change:
       time:   [−28.772% −27.945% −27.096%] (p = 0.00 < 0.05)
       thrpt:  [+37.166% +38.782% +40.394%]

Found 10 outliers among 100 measurements (10.00%)
4 (4.00%) low severe
4 (4.00%) low mild
2 (2.00%) high severe

decode 4096 bytes, mask ff: No change in performance detected.
       time:   [11.792 µs 11.818 µs 11.851 µs]
       change: [−0.7711% −0.1718% +0.3634%] (p = 0.57 > 0.05)

Found 15 outliers among 100 measurements (15.00%)
3 (3.00%) low severe
2 (2.00%) low mild
3 (3.00%) high mild
7 (7.00%) high severe

decode 1048576 bytes, mask ff: No change in performance detected.
       time:   [3.0229 ms 3.0323 ms 3.0435 ms]
       change: [−0.2404% +0.1966% +0.6361%] (p = 0.39 > 0.05)

Found 9 outliers among 100 measurements (9.00%)
9 (9.00%) high severe

decode 4096 bytes, mask 7f: No change in performance detected.
       time:   [19.968 µs 20.023 µs 20.082 µs]
       change: [−0.7959% −0.1788% +0.3899%] (p = 0.57 > 0.05)

Found 21 outliers among 100 measurements (21.00%)
1 (1.00%) low severe
4 (4.00%) low mild
16 (16.00%) high severe

decode 1048576 bytes, mask 7f: No change in performance detected.
       time:   [5.0371 ms 5.0487 ms 5.0618 ms]
       change: [−0.5165% −0.1114% +0.2906%] (p = 0.59 > 0.05)

Found 14 outliers among 100 measurements (14.00%)
14 (14.00%) high severe

decode 4096 bytes, mask 3f: No change in performance detected.
       time:   [8.2722 µs 8.3105 µs 8.3530 µs]
       change: [−0.2019% +0.2442% +0.7830%] (p = 0.32 > 0.05)

Found 19 outliers among 100 measurements (19.00%)
6 (6.00%) low mild
2 (2.00%) high mild
11 (11.00%) high severe

decode 1048576 bytes, mask 3f: No change in performance detected.
       time:   [1.5850 ms 1.5902 ms 1.5962 ms]
       change: [−0.6684% −0.1223% +0.4020%] (p = 0.66 > 0.05)

Found 9 outliers among 100 measurements (9.00%)
3 (3.00%) high mild
6 (6.00%) high severe

1000 streams of 1 bytes/multistream: No change in performance detected.
       time:   [33.203 ns 39.555 ns 51.878 ns]
       change: [+10.728% +32.674% +75.025%] (p = 0.06 > 0.05)

Found 3 outliers among 500 measurements (0.60%)
1 (0.20%) high mild
2 (0.40%) high severe

1000 streams of 1000 bytes/multistream: 💔 Performance has regressed.
       time:   [34.055 ns 34.490 ns 34.929 ns]
       change: [+12.649% +14.534% +16.408%] (p = 0.00 < 0.05)

Found 1 outliers among 500 measurements (0.20%)
1 (0.20%) high severe

coalesce_acked_from_zero 1+1 entries: No change in performance detected.
       time:   [88.115 ns 88.448 ns 88.789 ns]
       change: [−0.4490% +0.5174% +1.7914%] (p = 0.49 > 0.05)

Found 11 outliers among 100 measurements (11.00%)
7 (7.00%) high mild
4 (4.00%) high severe

coalesce_acked_from_zero 3+1 entries: No change in performance detected.
       time:   [105.48 ns 105.73 ns 105.99 ns]
       change: [−0.8906% −0.3531% +0.1143%] (p = 0.18 > 0.05)

Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) low mild
2 (2.00%) high mild
5 (5.00%) high severe

coalesce_acked_from_zero 10+1 entries: No change in performance detected.
       time:   [105.05 ns 105.38 ns 105.80 ns]
       change: [−0.2958% +0.2637% +0.8589%] (p = 0.38 > 0.05)

Found 21 outliers among 100 measurements (21.00%)
4 (4.00%) low severe
6 (6.00%) low mild
3 (3.00%) high mild
8 (8.00%) high severe

coalesce_acked_from_zero 1000+1 entries: No change in performance detected.
       time:   [88.820 ns 88.971 ns 89.126 ns]
       change: [−0.7026% +0.2281% +1.1404%] (p = 0.65 > 0.05)

Found 8 outliers among 100 measurements (8.00%)
3 (3.00%) high mild
5 (5.00%) high severe

RxStreamOrderer::inbound_frame(): No change in performance detected.
       time:   [107.81 ms 107.97 ms 108.23 ms]
       change: [−0.4699% −0.1075% +0.2218%] (p = 0.59 > 0.05)

Found 10 outliers among 100 measurements (10.00%)
7 (7.00%) low mild
2 (2.00%) high mild
1 (1.00%) high severe

sent::Packets::take_ranges: No change in performance detected.
       time:   [8.0612 µs 8.2610 µs 8.4441 µs]
       change: [−0.7407% +5.8984% +17.161%] (p = 0.24 > 0.05)

Found 20 outliers among 100 measurements (20.00%)
4 (4.00%) low severe
11 (11.00%) low mild
4 (4.00%) high mild
1 (1.00%) high severe

transfer/pacing-false/varying-seeds: 💔 Performance has regressed.
       time:   [37.072 ms 37.169 ms 37.279 ms]
       change: [+4.5101% +4.8981% +5.2577%] (p = 0.00 < 0.05)

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe

transfer/pacing-true/varying-seeds: 💔 Performance has regressed.
       time:   [37.692 ms 37.808 ms 37.931 ms]
       change: [+5.0829% +5.4940% +5.9561%] (p = 0.00 < 0.05)

Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe

transfer/pacing-false/same-seed: 💔 Performance has regressed.
       time:   [36.999 ms 37.067 ms 37.140 ms]
       change: [+4.6033% +4.8770% +5.1647%] (p = 0.00 < 0.05)

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe

transfer/pacing-true/same-seed: 💔 Performance has regressed.
       time:   [38.372 ms 38.472 ms 38.576 ms]
       change: [+4.1365% +4.4851% +4.8031%] (p = 0.00 < 0.05)

Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) low mild
1 (1.00%) high mild
1 (1.00%) high severe

Client/server transfer results

Performance differences relative to 95f9bed.

Transfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.

Client vs. server (params) Mean ± σ Min Max MiB/s ± σ Δ main Δ main
google vs. google 451.8 ± 4.7 444.9 461.2 70.8 ± 6.8
google vs. neqo (cubic, paced) 268.6 ± 4.5 261.4 283.8 119.2 ± 7.1 💚 -49.9 -15.7%
msquic vs. msquic 133.0 ± 34.2 100.8 374.4 240.6 ± 0.9
msquic vs. neqo (cubic, paced) 145.8 ± 16.7 121.6 225.1 219.5 ± 1.9 💚 -125.9 -46.3%
neqo vs. google (cubic, paced) 751.4 ± 4.5 743.5 769.3 42.6 ± 7.1 -0.5 -0.1%
neqo vs. msquic (cubic, paced) 155.6 ± 5.0 147.3 176.0 205.6 ± 6.4 -0.6 -0.4%
neqo vs. neqo (cubic) 90.0 ± 4.7 78.9 105.0 355.7 ± 6.8 💚 -121.0 -57.4%
neqo vs. neqo (cubic, paced) 90.2 ± 4.0 82.7 99.1 354.7 ± 8.0 💚 -121.0 -57.3%
neqo vs. neqo (reno) 90.8 ± 5.2 80.3 108.5 352.5 ± 6.2 💚 -118.3 -56.6%
neqo vs. neqo (reno, paced) 93.2 ± 5.3 82.0 113.0 343.2 ± 6.0 💚 -116.8 -55.6%
neqo vs. quiche (cubic, paced) 191.7 ± 4.2 185.4 202.1 167.0 ± 7.6 💔 2.3 1.2%
neqo vs. s2n (cubic, paced) 217.8 ± 4.6 210.3 225.9 146.9 ± 7.0 1.1 0.5%
quiche vs. neqo (cubic, paced) 157.6 ± 5.8 146.1 183.5 203.1 ± 5.5 💚 -590.4 -78.9%
quiche vs. quiche 147.0 ± 4.9 137.7 164.8 217.6 ± 6.5
s2n vs. neqo (cubic, paced) 172.1 ± 5.0 161.3 183.3 186.0 ± 6.4 💚 -126.3 -42.3%
s2n vs. s2n 248.2 ± 27.7 230.3 345.1 128.9 ± 1.2

Download data for profiler.firefox.com or download performance comparison data.

@mxinden

mxinden commented Apr 18, 2025

Copy link
Copy Markdown
Member Author

Optimized Upload only thus far.

1-conn/1-100mb-req/mtu-1504 (aka. Upload)/client: 💚 Performance has improved.

   time:   [1.2891 s 1.2983 s 1.3077 s]
   thrpt:  [76.469 MiB/s 77.023 MiB/s 77.571 MiB/s]

change:
time: [-32.828% -31.716% -30.597%] (p = 0.00 < 0.05)
thrpt: [+44.086% +46.447% +48.872%]

Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high mild

🎉 matches #2532 (comment).

@mxinden mxinden force-pushed the gso-v3 branch 8 times, most recently from f9ff613 to a21983d Compare April 21, 2025 16:01
@mxinden

mxinden commented Apr 21, 2025

Copy link
Copy Markdown
Member Author

Introduced the same optimizations to neqo-server. In addition I removed the memory copy, now allocating each datagram of a GSO train into a single contiguous Vec right away. Result looks promising.

1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: 💚 Performance has improved.

   time:   [245.19 ms 245.65 ms 246.12 ms]
   thrpt:  [406.30 MiB/s 407.09 MiB/s 407.85 MiB/s]

change:
time: [-66.225% -66.008% -65.788%] (p = 0.00 < 0.05)
thrpt: [+192.30% +194.19% +196.08%]

@larseggert

Copy link
Copy Markdown
Collaborator

Why do we see a massive benefit in the client/server tests, but not in the transfer benches?

@mxinden

mxinden commented Jun 13, 2025

Copy link
Copy Markdown
Member Author

@larseggert the neqo-transport/bench/transfer.rs benchmarks use the test-fixtures/src/sim Simulator. The Simulator only processes a single datagram at a time.

let mut dgram = None;

Let me see whether I can change that as part of this pull request. After all our benchmarks and tests should mirror how we run Neqo in Firefox as close as possible.

@larseggert

Copy link
Copy Markdown
Collaborator

@mxinden tests::send_ignore_emsgsize still failing on Windows.

@github-actions

Copy link
Copy Markdown
Contributor

🐰 Bencher Report

Branchgso-v3
Testbedt-linux64-ms-279
Click to view all benchmark results
BenchmarkLatencynanoseconds (ns)
1-conn/1-100mb-req/mtu-1504 (aka. Upload)/client📈 view plot
🚷 view threshold
646,670,000.00 ns
1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client📈 view plot
🚷 view threshold
201,340,000.00 ns
1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client📈 view plot
🚷 view threshold
27,380,000.00 ns
1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client📈 view plot
🚷 view threshold
307,020,000.00 ns
1000 streams of 1 bytes/multistream📈 view plot
🚷 view threshold
34.99 ns
1000 streams of 1000 bytes/multistream📈 view plot
🚷 view threshold
35.03 ns
RxStreamOrderer::inbound_frame()📈 view plot
🚷 view threshold
110,960,000.00 ns
coalesce_acked_from_zero 1+1 entries📈 view plot
🚷 view threshold
88.31 ns
coalesce_acked_from_zero 10+1 entries📈 view plot
🚷 view threshold
105.52 ns
coalesce_acked_from_zero 1000+1 entries📈 view plot
🚷 view threshold
90.91 ns
coalesce_acked_from_zero 3+1 entries📈 view plot
🚷 view threshold
105.85 ns
decode 1048576 bytes, mask 3f📈 view plot
🚷 view threshold
1,590,700.00 ns
decode 1048576 bytes, mask 7f📈 view plot
🚷 view threshold
5,047,400.00 ns
decode 1048576 bytes, mask ff📈 view plot
🚷 view threshold
3,031,800.00 ns
decode 4096 bytes, mask 3f📈 view plot
🚷 view threshold
8,308.50 ns
decode 4096 bytes, mask 7f📈 view plot
🚷 view threshold
20,011.00 ns
decode 4096 bytes, mask ff📈 view plot
🚷 view threshold
11,832.00 ns
sent::Packets::take_ranges📈 view plot
🚷 view threshold
5,182.40 ns
transfer/pacing-false/same-seed📈 view plot
🚷 view threshold
36,846,000.00 ns
transfer/pacing-false/varying-seeds📈 view plot
🚷 view threshold
37,089,000.00 ns
transfer/pacing-true/same-seed📈 view plot
🚷 view threshold
38,620,000.00 ns
transfer/pacing-true/varying-seeds📈 view plot
🚷 view threshold
38,194,000.00 ns
🐰 View full continuous benchmarking report in Bencher

@github-actions

github-actions Bot commented Jun 27, 2025

Copy link
Copy Markdown
Contributor

🐰 Bencher Report

Branchgso-v3
Testbedt-linux64-ms-279
Click to view all benchmark results
BenchmarkLatencymilliseconds (ms)
s2n vs. neqo (cubic, paced)📈 view plot
🚷 view threshold
210.06 ms
🐰 View full continuous benchmarking report in Bencher

@github-actions

github-actions Bot commented Jun 30, 2025

Copy link
Copy Markdown
Contributor

🐰 Bencher Report

Branchgso-v3
Testbedt-linux64-ms-278
Click to view all benchmark results
BenchmarkLatencynanoseconds (ns)
1-conn/1-100mb-req/mtu-1504 (aka. Upload)/client📈 view plot
🚷 view threshold
654,100,000.00 ns
1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client📈 view plot
🚷 view threshold
202,480,000.00 ns
1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client📈 view plot
🚷 view threshold
27,597,000.00 ns
1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client📈 view plot
🚷 view threshold
305,830,000.00 ns
1000 streams of 1 bytes/multistream📈 view plot
🚷 view threshold
39.55 ns
1000 streams of 1000 bytes/multistream📈 view plot
🚷 view threshold
34.49 ns
RxStreamOrderer::inbound_frame()📈 view plot
🚷 view threshold
107,970,000.00 ns
coalesce_acked_from_zero 1+1 entries📈 view plot
🚷 view threshold
88.45 ns
coalesce_acked_from_zero 10+1 entries📈 view plot
🚷 view threshold
105.38 ns
coalesce_acked_from_zero 1000+1 entries📈 view plot
🚷 view threshold
88.97 ns
coalesce_acked_from_zero 3+1 entries📈 view plot
🚷 view threshold
105.73 ns
decode 1048576 bytes, mask 3f📈 view plot
🚷 view threshold
1,590,200.00 ns
decode 1048576 bytes, mask 7f📈 view plot
🚷 view threshold
5,048,700.00 ns
decode 1048576 bytes, mask ff📈 view plot
🚷 view threshold
3,032,300.00 ns
decode 4096 bytes, mask 3f📈 view plot
🚷 view threshold
8,310.50 ns
decode 4096 bytes, mask 7f📈 view plot
🚷 view threshold
20,023.00 ns
decode 4096 bytes, mask ff📈 view plot
🚷 view threshold
11,818.00 ns
sent::Packets::take_ranges📈 view plot
🚷 view threshold
8,261.00 ns
transfer/pacing-false/same-seed📈 view plot
🚷 view threshold
37,067,000.00 ns
transfer/pacing-false/varying-seeds📈 view plot
🚷 view threshold
37,169,000.00 ns
transfer/pacing-true/same-seed📈 view plot
🚷 view threshold
38,472,000.00 ns
transfer/pacing-true/varying-seeds📈 view plot
🚷 view threshold
37,808,000.00 ns
🐰 View full continuous benchmarking report in Bencher

@github-actions

github-actions Bot commented Jun 30, 2025

Copy link
Copy Markdown
Contributor

🐰 Bencher Report

Branchgso-v3
Testbedt-linux64-ms-278
Click to view all benchmark results
BenchmarkLatencymilliseconds (ms)
s2n vs. neqo (cubic, paced)📈 view plot
🚷 view threshold
172.07 ms
🐰 View full continuous benchmarking report in Bencher

@larseggert

Copy link
Copy Markdown
Collaborator

@mxinden is this ready to merge?

@mxinden

mxinden commented Jun 30, 2025

Copy link
Copy Markdown
Member Author

Yes, ready to merge from my end. We have a couple of benchmark regressions. Explainer for each:

1-conn/1-100mb-req/mtu-1504 (aka. Upload)/client: 💚 Performance has improved.

   time:   [650.03 ms 655.28 ms 660.94 ms]
   thrpt:  [151.30 MiB/s 152.61 MiB/s 153.84 MiB/s]

change:
time: [−27.566% −26.708% −25.736%] (p = 0.00 < 0.05)
thrpt: [+34.655% +36.441% +38.056%]

This will improve even further with #2734.

1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: 💔 Performance has regressed.

   time:   [27.404 ms 27.500 ms 27.616 ms]
   thrpt:  [36.211  elem/s 36.363  elem/s 36.491  elem/s]

change:
time: [+1.5705% +2.1502% +2.7519%] (p = 0.00 < 0.05)
thrpt: [−2.6782% −2.1049% −1.5463%]

This is expected. We pay a slight cost in latency when sending in batches.

1000 streams of 1000 bytes/multistream: 💔 Performance has regressed.

   time:   [36.454 ns 36.834 ns 37.215 ns]
   change: [+25.596% +27.533% +29.527%] (p = 0.00 < 0.05)

This should be due to neqo-http3/benches/streams.rs not using the batched IO paths. Instead of altering the IO handling in the benchmark, I suggest we do #2728. Given that the benchmark measures stream performance and not UDP IO performance, I suggest doing this in a follow-up.

transfer/pacing-false/varying-seeds: 💔 Performance has regressed.

   time:   [36.886 ms 36.956 ms 37.027 ms]
   change: [+4.0332% +4.3753% +4.6740%] (p = 0.00 < 0.05)

Again, slight regression as the Simulator is not using the batched IO paths. The non-batched IO path (i.e. process), now no-longer pre-allocate, as we don't know the datagram size ahead of time. Once #2747 is merged, this overhead should be reduced, as we would write datagrams into a long-lived buffer.

@larseggert let me know whether you are fine proceeding here, or would prefer any of the above to be addressed first.

@larseggert

larseggert commented Jun 30, 2025

Copy link
Copy Markdown
Collaborator

I'll merge now; please do issues for the missing bits?

Great we can land this!

@larseggert larseggert enabled auto-merge June 30, 2025 16:09
@larseggert larseggert added this pull request to the merge queue Jun 30, 2025
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jun 30, 2025
@larseggert larseggert added this pull request to the merge queue Jun 30, 2025
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to no response for status checks Jun 30, 2025
@larseggert larseggert added this pull request to the merge queue Jul 1, 2025
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jul 1, 2025
@larseggert larseggert added this pull request to the merge queue Jul 1, 2025
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jul 1, 2025
@larseggert larseggert added this pull request to the merge queue Jul 1, 2025
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jul 1, 2025
@larseggert

Copy link
Copy Markdown
Collaborator

This keeps getting kicked out of the merge queue while tests are still running and haven't failed yet. I think GitHub may have issues. Doing a force merge.

@larseggert larseggert merged commit a341259 into mozilla:main Jul 1, 2025
40 of 41 checks passed
@mxinden

mxinden commented Jul 6, 2025

Copy link
Copy Markdown
Member Author

please do issues for the missing bits?

I assume you are fine with the following pull requests tracking the progress. Let me know if you want additional GitHub issues.

@mxinden mxinden mentioned this pull request Jul 14, 2025
@mxinden

mxinden commented Jul 22, 2025

Copy link
Copy Markdown
Member Author

Early numbers on GSO in Firefox Nightly:

  • ~5% of sends on Linux and Windows use GSO with 2 or more segments
  • ~5% of sends on Linux and Windows send 2.4 k bytes or more
  • We currently limit number of segments to 10, which is reflected in the metrics (apart from some crazy machine on Linux doing > 100)

Good signals. We should explore increasing max number of segments (currently 10). Maybe just limit by what our pacer allows to send.

Datagram (batch) size

Windows

image

https://glam.telemetry.mozilla.org/fog/probe/networking_http_3_udp_datagram_size_sent/explore?os=Windows&visiblePercentiles=%5B99%2C95%2C75%2C50%2C25%2C5%5D

Linux

image

https://glam.telemetry.mozilla.org/fog/probe/networking_http_3_udp_datagram_size_sent/explore?os=Linux&visiblePercentiles=%5B99.9%2C99%2C95%2C75%2C50%2C25%2C5%5D

Number of segments in a batch

Windows

image

https://glam.telemetry.mozilla.org/fog/probe/networking_http_3_udp_datagram_segments_sent/explore?os=Windows&visiblePercentiles=%5B99.9%2C99%2C95%2C75%2C50%2C25%2C5%5D

Linux

image

https://glam.telemetry.mozilla.org/fog/probe/networking_http_3_udp_datagram_segments_sent/explore?os=Linux&visiblePercentiles=%5B99.9%2C99%2C95%2C75%2C50%2C25%2C5%5D

@larseggert

Copy link
Copy Markdown
Collaborator

Yes, let's increase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants