MoE EP Solution
Ascend MoE EP solution is tailored for large-scale MoE models. It achieves ultra-high throughput and ultra-low latency inference by deeply optimizing communication, memory access, expert deployment and scheduling, and parallel strategies.
View details 








