[TRTLLM-9108][feat] refactor MoE unit tests: add unified ConfigurableMoE test framework#11437
Conversation
📝 WalkthroughWalkthroughThis PR updates the MoE (Mixture of Experts) communication and backend infrastructure. Key changes include: removing platform guards from DeepEP communication layer, adding hidden-size constraints to DeepEPLowLatency, systematically renaming a public parameter across all MoE backends from Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~35 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
tests/unittest/_torch/modules/moe/quantize_utils.py (1)
313-389:⚠️ Potential issue | 🟠 MajorGuard swiglu tensor creation against None defaults
When
swiglu_gptoss_styleis enabled,torch.fullwill throw if any ofswiglu_alpha/beta/limitisNone. Consider defaulting to the same semantics ascustom_swiglu(alpha=1.0, beta=0.0, limit=inf) or raising a clear error before tensor creation.🛠️ Suggested fix
- if self._swiglu_gptoss_style: - self._swiglu_tensors = self._create_swiglu_tensors() + if self._swiglu_gptoss_style: + self.swiglu_alpha = 1.0 if self.swiglu_alpha is None else self.swiglu_alpha + self.swiglu_beta = 0.0 if self.swiglu_beta is None else self.swiglu_beta + self.swiglu_limit = ( + float("inf") if self.swiglu_limit is None else self.swiglu_limit + ) + self._swiglu_tensors = self._create_swiglu_tensors()tensorrt_llm/_torch/modules/fused_moe/communication/communication_factory.py (1)
1-2:⚠️ Potential issue | 🟠 MajorUpdate SPDX year to 2026 for modified file
This file was edited in 2026, so the header year should be updated to reflect the latest modification.
As per coding guidelines, "Include NVIDIA copyright header on ALL new files and update year on modified files".🔧 Suggested fix
-# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.tensorrt_llm/_torch/modules/fused_moe/communication/deep_ep.py (1)
1-2:⚠️ Potential issue | 🟠 MajorUpdate SPDX year to 2026 for modified file
This file was edited in 2026, so the header year should be updated to reflect the latest modification.
As per coding guidelines, "Include NVIDIA copyright header on ALL new files and update year on modified files".🔧 Suggested fix
-# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.tensorrt_llm/_torch/modules/fused_moe/interface.py (1)
1-1:⚠️ Potential issue | 🟠 MajorAdd NVIDIA SPDX header to this Python source
This file is missing the required Apache-2.0 SPDX header.
As per coding guidelines, "All source files must contain an NVIDIA copyright header with the year of latest meaningful modification. Use the Apache License 2.0 format. This applies to .cpp, .h, .cu, .py, and other compiled or interpreted source files".🔧 Suggested fix
+#+#+#+#+------------------------------------------------------------------------------------------------------------------- +# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +#tensorrt_llm/_torch/modules/fused_moe/configurable_moe.py (1)
1-2:⚠️ Potential issue | 🟠 MajorUpdate SPDX year to 2026 for modified file
This file was edited in 2026, so the header year should be updated to reflect the latest modification.
As per coding guidelines, "Include NVIDIA copyright header on ALL new files and update year on modified files".🔧 Suggested fix
-# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
🤖 Fix all issues with AI agents
In `@tensorrt_llm/_torch/modules/fused_moe/quantization.py`:
- Around line 1-5: Add the required NVIDIA Apache-2.0 copyright header (with the
latest year of meaningful modification) at the very top of the file before any
imports; ensure the header follows the standard Apache-2.0 format used across
the repo and appears above the existing imports (inspect, math, etc.) in the
quantization.py module so classes like the ABC/Enum definitions and functions in
this file are properly attributed.
- Around line 190-193: Replace the comment-style documentation for the class
attributes in FusedMoEMethodBase with inline attribute docstrings: add a concise
docstring for weight_alignment explaining what alignment value represents and
expected units/constraints, and replace the eplb_support_status comment with a
class-level docstring explaining that this is the default online EPLB support
level for the class (not per-instance), that subclasses should override it to
EplbSupportStatus.SUPPORTED or NOT_VERIFIED as appropriate, and that the default
EplbSupportStatus.NOT_SUPPORTED is chosen for safety; ensure the docstrings sit
immediately after the attribute definitions (weight_alignment and
eplb_support_status) and succinctly describe override behavior and the safety
rationale.
In `@tests/unittest/_torch/modules/moe/moe_test_utils.py`:
- Around line 93-100: Rename the three helper functions should_skip_TRTLLM,
should_skip_CUTEDSL, and should_skip_DEEPGEMM to snake_case names
should_skip_trtllm, should_skip_cutedsl, and should_skip_deepgemm in their
definitions (previously in moe_test_utils.py) and update every use: change the
imports in test_moe_module.py to import the new names, replace all call sites in
test_moe_module.py and moe_test_utils.py (and any other callers) to call the new
snake_case names, and update any comment references (e.g., in moe_test_utils.py
and test_moe_backend.py) to the new names so tests and docs remain consistent.
- Line 48: Rename the module-level variable logger to G_LOGGER and update all
its uses (there are three occurrences) to follow the upper snake_case with G_
prefix guideline; specifically change the assignment logging.getLogger(__name__)
to G_LOGGER = logging.getLogger(__name__) and replace every reference to logger
in this module (all three locations) with G_LOGGER so imports and tests continue
to work.
- Around line 456-459: The function supports_autotuner_capture has an unused
parameter quant_algo causing a lint warning; rename the parameter to _quant_algo
in the supports_autotuner_capture signature so callers can still pass it but the
linter sees it as intentionally unused (update the function definition for
supports_autotuner_capture accordingly and ensure any internal references remain
unchanged).
- Around line 40-46: The file currently imports classes directly (e.g.,
AutoTuner, CutlassFusedMoE, DeepGemmFusedMoE, CuteDslFusedMoE,
TRTLLMGenFusedMoE, MoE, QuantAlgo); change each to module-level imports (e.g.,
import tensorrt_llm._torch.autotuner as autotuner, import
tensorrt_llm._torch.modules.fused_moe.fused_moe_cutlass as fused_moe_cutlass,
fused_moe_deepgemm as fused_moe_deepgemm, fused_moe_cute_dsl as
fused_moe_cute_dsl, fused_moe_trtllm_gen as fused_moe_trtllm_gen, import
tensorrt_llm._torch.modules.fused_moe.interface as moe_interface, and import
tensorrt_llm.models.modeling_utils as modeling_utils) and then update all call
sites to qualify names (AutoTuner.get() → autotuner.AutoTuner.get(),
CutlassFusedMoE → fused_moe_cutlass.CutlassFusedMoE, DeepGemmFusedMoE →
fused_moe_deepgemm.DeepGemmFusedMoE, CuteDslFusedMoE →
fused_moe_cute_dsl.CuteDslFusedMoE, TRTLLMGenFusedMoE →
fused_moe_trtllm_gen.TRTLLMGenFusedMoE, MoE type hints → moe_interface.MoE, and
QuantAlgo values → modeling_utils.QuantAlgo) so the namespace is preserved
across the file.
In `@tests/unittest/_torch/modules/moe/test_moe_module.py`:
- Around line 280-283: The loop defines an unused loop variable "step" which
triggers lint warnings; change the loop to use a throwaway variable (e.g., "_"
or "__") so the behavior is unchanged: keep extra_steps, call run_forward_fn()
and ref_fused_moe.check_accuracy(output, ref_output) inside the loop, but
replace "for step in range(extra_steps):" with "for _ in range(extra_steps):"
(or another conventional unused-name) to silence the lint warning.
- Around line 860-871: The lambdas in the loop capture loop variables and
trigger ruff B023; replace the lambda list with direct sequential calls to the
check functions instead: call _get_comm_method_skip_reason(comm_method,
model_config), then should_skip_TRTLLM(backend_type, quant_algo, model_config,
comm_method=comm_method), then should_skip_CUTEDSL(backend_type, quant_algo,
model_config, comm_method), then should_skip_DEEPGEMM(backend_type,
comm_method), and finally should_skip_multi_gpu(parallel_mode, model_config,
world_size=4), setting skip_reason to the first non-falsy result (if not already
set) after each call; this removes the closures while preserving behavior.
🧹 Nitpick comments (6)
tensorrt_llm/_torch/modules/fused_moe/fused_moe_cutlass.py (1)
116-117: Add an inline attribute docstring for the new class constant.This constant should be documented using an inline attribute docstring rather than a preceding comment.
🔧 Suggested update
- # Quantization algorithms that support swiglu_gptoss_style - _GPTOSS_SUPPORTED_ALGOS = {QuantAlgo.W4A8_MXFP4_MXFP8} + _GPTOSS_SUPPORTED_ALGOS = {QuantAlgo.W4A8_MXFP4_MXFP8} + """set[QuantAlgo]: Quantization algorithms that support swiglu_gptoss_style."""As per coding guidelines, document attributes and variables inline with
"""<type>: Description"""syntax.tensorrt_llm/_torch/modules/fused_moe/communication/deep_ep_low_latency.py (1)
40-45: Use inline attribute docstrings for the new kernel-size constants.Prefer inline attribute docstrings over block comments for class attributes.
🔧 Suggested update
- # DeepEP low-latency kernel supported hidden sizes (from SWITCH_HIDDEN in launch.cuh) - SUPPORTED_HIDDEN_SIZES = {2048, 2560, 3584, 4096, 5120, 6144, 7168} + SUPPORTED_HIDDEN_SIZES = {2048, 2560, 3584, 4096, 5120, 6144, 7168} + """set[int]: Hidden sizes supported by the low-latency DeepEP kernel.""" @@ - # Extension kernel supported hidden sizes (from SWITCH_HIDDEN_FOR_EXTENSION_KERNELS - # in extension_kernels.cu), used for nvfp4 post-quant dispatch and low-precision combine - SUPPORTED_HIDDEN_SIZES_EXTENSION = {4096, 6144, 7168} + SUPPORTED_HIDDEN_SIZES_EXTENSION = {4096, 6144, 7168} + """set[int]: Hidden sizes supported by extension kernels (nvfp4 post-quant/low-precision)."""As per coding guidelines, document attributes and variables inline with
"""<type>: Description"""syntax.tests/unittest/_torch/modules/moe/test_moe_backend.py (1)
31-44: Prefer module-level imports to preserve namespace.The direct symbol imports from
moe_test_utilsbreak the repo’s namespace-preserving import rule. Consider importing the module and qualifying references.🔧 Suggested update
-from _torch.modules.moe.moe_test_utils import ( - MoeBackendType, - MoeModelConfig, - create_test_param, - get_backend_class, - iter_base_test_configs, - module_timer, # noqa: F401 - imported for pytest fixture registration - replay_tactics_and_check, - supports_autotuner_capture, -) +from _torch.modules.moe import moe_test_utilsThen reference symbols as
moe_test_utils.MoeBackendType,moe_test_utils.iter_base_test_configs, etc.As per coding guidelines, always maintain the namespace when importing. Use
from package.subpackage import fooinstead offrom package.subpackage.foo import SomeClassorimport package.tests/unittest/_torch/modules/moe/test_moe_module.py (3)
40-90: Prefer module-level imports to preserve namespace.Several new imports pull symbols directly from modules. Please switch to module-level imports and qualify usage to align with repo conventions.
🔧 Suggested update
-from _torch.modules.moe.moe_test_utils import ( - MoeBackendType, - MoeModelConfig, - create_test_param, - get_quick_skip_reason, - iter_base_test_configs, - module_timer, # noqa: F401 - imported for pytest fixture registration - replay_tactics_and_check, - should_skip_CUTEDSL, - should_skip_DEEPGEMM, - should_skip_multi_gpu, - should_skip_TRTLLM, - supports_autotuner_capture, -) +from _torch.modules.moe import moe_test_utilsThen reference symbols as
moe_test_utils.MoeBackendType,moe_test_utils.iter_base_test_configs, etc. Apply the same pattern to other direct symbol imports in this block.As per coding guidelines, always maintain the namespace when importing. Use
from package.subpackage import fooinstead offrom package.subpackage.foo import SomeClassorimport package.
96-96: Prefix the module-level logger withG_.Module-level globals should use the
G_prefix and upper snake_case.🔧 Suggested update
-logger = logging.getLogger(__name__) +G_LOGGER = logging.getLogger(__name__)Remember to update usages (e.g.,
G_LOGGER.info(...)).As per coding guidelines, use upper snake_case with prefix 'G' for global variables.
962-972: Rename test functions to snake_case.These test function names are not snake_case; please rename them to follow the repo’s naming rules.
🔧 Suggested update
-def test_ConfigurableMoE_single_gpu( +def test_configurable_moe_single_gpu( @@ -def test_ConfigurableMoE_multi_gpu( +def test_configurable_moe_multi_gpu( @@ -def test_ConfigurableMoE_multi_gpu_eplb( +def test_configurable_moe_multi_gpu_eplb(As per coding guidelines, use snake_case for function and method names.
Also applies to: 1030-1042, 1278-1287
8c85f09 to
9aa00ae
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #35590 [ run ] triggered by Bot. Commit: |
d5c2720 to
ed1a0ea
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #35590 [ run ] completed with state
|
|
PR_Github #35598 [ run ] triggered by Bot. Commit: |
|
PR_Github #35598 [ run ] completed with state
|
- Extract shared utilities into moe_test_utils.py (skip logic, model configs, etc.) - Add comprehensive single/multi-GPU tests with configurable backends, quant algos, routing methods, and SwiGLU parameters - Add EPLB and autotune tactic capture/replay tests - Support CI/local config split via TRTLLM_TEST_MOE_CI env var - Skip TRTLLM+DeepEP combinations that crash with CUDA errors in multi-GPU mode - Relax accuracy threshold for MXFP4+MXFP8 with large hidden_size Signed-off-by: xxi <xxi@nvidia.com>
ed1a0ea to
a37b6c9
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #35633 [ run ] triggered by Bot. Commit: |
|
PR_Github #35633 [ run ] completed with state |
|
/bot run --disable-fail-fast |
|
PR_Github #35681 [ run ] triggered by Bot. Commit: |
|
PR_Github #35681 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #35763 [ run ] triggered by Bot. Commit: |
|
PR_Github #35763 [ run ] completed with state |
…MoE test framework (NVIDIA#11437) Signed-off-by: xxi <xxi@nvidia.com>
Description
Test Framework Overhaul
New shared utility module (moe_test_utils.py)
Expanded test_moe_module.py (ConfigurableMoE tests)
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...Provide a user friendly way for developers to interact with a Jenkins server.
Run
/bot [-h|--help]to print this help message.See details below for each supported subcommand.
Details
run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]Launch build/test pipelines. All previously running jobs will be killed.
--reuse-test (optional)pipeline-id(OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.--disable-reuse-test(OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.--disable-fail-fast(OPTIONAL) : Disable fail fast on build/tests/infra failures.--skip-test(OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.--stage-list "A10-PyTorch-1, xxx"(OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.--gpu-type "A30, H100_PCIe"(OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.--test-backend "pytorch, cpp"(OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.--only-multi-gpu-test(OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.--disable-multi-gpu-test(OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.--add-multi-gpu-test(OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.--post-merge(OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx"(OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".--detailed-log(OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.--debug(OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in thestage-listparameter to access the appropriate container environment. Note: Does NOT update GitHub check status.For guidance on mapping tests to stage names, see
docs/source/reference/ci-overview.mdand the
scripts/test_to_stage_mapping.pyhelper.kill
killKill all running builds associated with pull request.
skip
skip --comment COMMENTSkip testing for latest commit on pull request.
--comment "Reason for skipping build/test"is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.reuse-pipeline
reuse-pipelineReuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.
Summary by CodeRabbit
Release Notes
New Features
Improvements
Bug Fixes