Skip to content

Comments

opt ll dispatch layered algo#500

Merged
wangfakang merged 2 commits intodeepseek-ai:antgroup-optfrom
alpha-baby:fujh/pr/ll_dispatch_opt
Dec 4, 2025
Merged

opt ll dispatch layered algo#500
wangfakang merged 2 commits intodeepseek-ai:antgroup-optfrom
alpha-baby:fujh/pr/ll_dispatch_opt

Conversation

@alpha-baby
Copy link
Contributor

@alpha-baby alpha-baby commented Nov 21, 2025

introduce

algo opt for dispatch in low-latency mode:

In the dispatch kernel of DeepEP's low-latency mode, the original algorithm directly sends data to the destination rank via the RDMA cross-orbit network. A drawback of this algorithm is that it results in excessive duplicate data being transmitted over the RDMA network. Now, drawing inspiration from the approach used in normal mode, we can improve the dispatch kernel in low-latency mode by first sending data to the same-orbit rank on the cross-node, and then forwarding it to the actual destination rank via the NVLink interconnect.

Note: This feature conflicts with the existing Elasticity Support to DeepEP for Fault-Tolerant EP Inference functionality, and the two features cannot be enabled simultaneously.

before:
image

after:

image

performance

benchmark:

image

use

This feature is enabled by default and requires no additional activation from the user. To disable it, please set the following environment variable: DEEPEP_DISABLE_LL_DISPATCH_OPT=1.

@alpha-baby
Copy link
Contributor Author

benchmark test pass on env: one/two/four 8*H200,

Copy link
Collaborator

@wangfakang wangfakang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks.

@wangfakang wangfakang merged commit 38dfc2d into deepseek-ai:antgroup-opt Dec 4, 2025
@alpha-baby alpha-baby changed the title opt ll dispatch opt ll dispatch layered algo Dec 4, 2025
@ywj55555
Copy link

ywj55555 commented Feb 2, 2026

Hello, I'd like to ask why similar optimizations weren't made to combine?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants