[TRTLLM-9108][feat] refactor MoE unit tests: add unified ConfigurableMoE test framework#11437

xxi-nv · 2026-02-11T02:56:00Z

Description

Test Framework Overhaul
New shared utility module (moe_test_utils.py)

Extracted common code from test_moe_backend.py and test_moe_module.py into a shared module
MoeBackendType enum, MoeModelConfig dataclass, get_backend_class() helper
Centralized skip logic functions: should_skip_TRTLLM(), should_skip_CUTEDSL(), should_skip_DEEPGEMM(), should_skip_multi_gpu(), should_skip_routing_method()
Unified get_quick_skip_reason() supporting both backend-level and module-level tests
iter_base_test_configs() generator for systematic parameter matrix traversal with pre-computed skip reasons
replay_tactics_and_check() for autotuner tactic capture/replay validation
module_timer pytest fixture for per-test timing

Expanded test_moe_module.py (ConfigurableMoE tests)

Renamed test functions: test_moe_single_gpu → test_ConfigurableMoE_single_gpu, etc.
Tests exercise the high-level create_moe() → load_weights() → forward() path
Full parameter matrix: 4 backends (CUTLASS, TRTLLM, CuteDSL, DeepGemm) × 9 quant algos × 6 routing methods × multiple model configs × SwiGLU variants
Multi-GPU tests: 4 parallel modes (DEP/TEP/DTP/TTP) × 4 comm methods (NVLink one-sided/two-sided, DeepEP, DeepEP low-latency)
EPLB (Expert Load Balancing) tests with expert ID update verification
CI/local config split via TRTLLM_TEST_MOE_CI env var (default: full local matrix; set =1 for CI subset)

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Summary by CodeRabbit

Release Notes

New Features
- Added support status tracking for online load balancing across quantization methods.
- Introduced hidden size constraints for optimized kernel support to ensure correctness.
Improvements
- Enabled DeepEP optimization by default for BF16 activations to improve performance.
- Enhanced data type handling for improved numerical stability in dispatch operations.
Bug Fixes
- Removed unnecessary platform guard to streamline optimization eligibility checks.

coderabbitai · 2026-02-11T03:08:50Z

📝 Walkthrough

Walkthrough

This PR updates the MoE (Mixture of Experts) communication and backend infrastructure. Key changes include: removing platform guards from DeepEP communication layer, adding hidden-size constraints to DeepEPLowLatency, systematically renaming a public parameter across all MoE backends from gptoss_style to swiglu_gptoss_style, introducing EPLB (Online Expert-Parallel Load Balancer) support status tracking, and establishing comprehensive test utilities and refactoring test modules for MoE backends.

Changes

Cohort / File(s)	Summary
Communication Backend Enhancements `tensorrt_llm/_torch/modules/fused_moe/communication/communication_factory.py`, `tensorrt_llm/_torch/modules/fused_moe/communication/deep_ep.py`	Changed DeepEP default activation from "0" to "1" for TRTLLM_CAN_USE_DEEP_EP; removed platform guard in `is_platform_supported()` and added dtype normalization for token_final_scales dispatch/restore cycle.
DeepEPLowLatency Constraints `tensorrt_llm/_torch/modules/fused_moe/communication/deep_ep_low_latency.py`	Added supported hidden-size constants (SUPPORTED_HIDDEN_SIZES, SUPPORTED_HIDDEN_SIZES_EXTENSION); introduced hidden-size validation in `__init__`; updated `supports_post_quant_dispatch()` and `supports_low_precision_combine()` to enforce size constraints; removed explicit TRTLLM_CAN_USE_DEEP_EP env check from `is_platform_supported()`.
MoE Backend Parameter Rename `tensorrt_llm/_torch/modules/fused_moe/interface.py`, `tensorrt_llm/_torch/modules/fused_moe/configurable_moe.py`, `tensorrt_llm/_torch/modules/fused_moe/fused_moe_cute_dsl.py`, `tensorrt_llm/_torch/modules/fused_moe/fused_moe_cutlass.py`, `tensorrt_llm/_torch/modules/fused_moe/fused_moe_deepgemm.py`, `tensorrt_llm/_torch/modules/fused_moe/fused_moe_triton.py`, `tensorrt_llm/_torch/modules/fused_moe/fused_moe_trtllm_gen.py`	Renamed public parameter in `can_implement()` method signatures and all supporting documentation from `gptoss_style: bool = False` to `swiglu_gptoss_style: bool = False` across all MoE backend implementations and base interface.
EPLB Support Tracking `tensorrt_llm/_torch/modules/fused_moe/quantization.py`	Introduced `EplbSupportStatus` enum (SUPPORTED, NOT_SUPPORTED, NOT_VERIFIED); added `eplb_support_status` class attribute and `supports_online_eplb()` classmethod to `FusedMoEMethodBase`; annotated quantization method classes with appropriate EPLB support statuses.
Test Utilities Infrastructure `tests/unittest/_torch/modules/moe/moe_test_utils.py`	New comprehensive test utilities module providing MoE testing abstractions: `MoeBackendType` enum, `MoeModelConfig` dataclass, backend-specific skip-check functions (TRTLLM, CUTEDSL, DEEPGEMM, multi-GPU, routing), autotuner support detection, `replay_tactics_and_check()` function, test parameter helpers, `module_timer` fixture, and `iter_base_test_configs()` configuration generator.
Quantization Utilities Refactor `tests/unittest/_torch/modules/moe/quantize_utils.py`	Updated all quantization utility constructors and methods to use `swiglu_gptoss_style` instead of `gptoss_style`; changed `create_ref_module()` signatures to accept `ref_cls=None` with internal default resolution; added optional `device` parameter to CUDA methods; enhanced internal handling for per-local-expert sizing (`num_local_experts`).
Backend Test Refactoring `tests/unittest/_torch/modules/moe/test_moe_backend.py`	Consolidated in-file test scaffolding by importing from centralized `moe_test_utils`; removed redundant backend enum, skip checks, and autotuner logic; refactored test parameter generation to use `iter_base_test_configs()` and `create_test_param()`; updated parameter naming to reflect `swiglu_gptoss_style` terminology.
Module Test Expansion `tests/unittest/_torch/modules/moe/test_moe_module.py`	Significantly expanded test infrastructure with new helpers: `_create_mapping_for_parallel_mode()`, `_create_moe_load_balancer()`, `_setup_autotuner_for_test()`, `_create_model_config()`, `_run_autotune_test()`, `_run_eplb_test()`, `_create_routing_method()`; refactored worker functions with comprehensive parameters for multi-GPU, EPLB, autotune, and routing support; added EPLB-specific test scaffolding and parameter generation; introduced `test_ConfigurableMoE_multi_gpu_eplb()`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~35 minutes

Possibly related PRs

[TRTLLM-9111][feat] provide the uniform test framework to test all MoE backends #11128: Continues the same MoE backend capability and testing work, modifying fused_moe backends with parameter renames (gptoss_style → swiglu_gptoss_style), updating quantize/test utilities, and adjusting DeepEP/DeepEPLowLatency behaviors.

Suggested reviewers

leslie-fang25
syuoni
rosenrodt
QiJune
Barry-Delaney

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 70.10% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: refactoring MoE unit tests with a unified ConfigurableMoE test framework.
Description check	✅ Passed	The PR description is comprehensive and follows the template structure, explaining the overhaul, test framework changes, and test coverage with clear sections.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)

tests/unittest/_torch/modules/moe/quantize_utils.py (1)
313-389: ⚠️ Potential issue | 🟠 Major

Guard swiglu tensor creation against None defaults

When swiglu_gptoss_style is enabled, torch.full will throw if any of swiglu_alpha/beta/limit is None. Consider defaulting to the same semantics as custom_swiglu (alpha=1.0, beta=0.0, limit=inf) or raising a clear error before tensor creation.
🛠️ Suggested fix
-        if self._swiglu_gptoss_style:
-            self._swiglu_tensors = self._create_swiglu_tensors()
+        if self._swiglu_gptoss_style:
+            self.swiglu_alpha = 1.0 if self.swiglu_alpha is None else self.swiglu_alpha
+            self.swiglu_beta = 0.0 if self.swiglu_beta is None else self.swiglu_beta
+            self.swiglu_limit = (
+                float("inf") if self.swiglu_limit is None else self.swiglu_limit
+            )
+            self._swiglu_tensors = self._create_swiglu_tensors()
tensorrt_llm/_torch/modules/fused_moe/communication/communication_factory.py (1)
1-2: ⚠️ Potential issue | 🟠 Major

Update SPDX year to 2026 for modified file

This file was edited in 2026, so the header year should be updated to reflect the latest modification.
🔧 Suggested fix
-# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
As per coding guidelines, "Include NVIDIA copyright header on ALL new files and update year on modified files".
tensorrt_llm/_torch/modules/fused_moe/communication/deep_ep.py (1)
1-2: ⚠️ Potential issue | 🟠 Major

Update SPDX year to 2026 for modified file

This file was edited in 2026, so the header year should be updated to reflect the latest modification.
🔧 Suggested fix
-# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
As per coding guidelines, "Include NVIDIA copyright header on ALL new files and update year on modified files".
tensorrt_llm/_torch/modules/fused_moe/interface.py (1)
1-1: ⚠️ Potential issue | 🟠 Major

Add NVIDIA SPDX header to this Python source

This file is missing the required Apache-2.0 SPDX header.
🔧 Suggested fix
+#+#+#+#+-------------------------------------------------------------------------------------------------------------------
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
As per coding guidelines, "All source files must contain an NVIDIA copyright header with the year of latest meaningful modification. Use the Apache License 2.0 format. This applies to .cpp, .h, .cu, .py, and other compiled or interpreted source files".
tensorrt_llm/_torch/modules/fused_moe/configurable_moe.py (1)
1-2: ⚠️ Potential issue | 🟠 Major

Update SPDX year to 2026 for modified file

This file was edited in 2026, so the header year should be updated to reflect the latest modification.
🔧 Suggested fix
-# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
As per coding guidelines, "Include NVIDIA copyright header on ALL new files and update year on modified files".

🤖 Fix all issues with AI agents

In `@tensorrt_llm/_torch/modules/fused_moe/quantization.py`:
- Around line 1-5: Add the required NVIDIA Apache-2.0 copyright header (with the
latest year of meaningful modification) at the very top of the file before any
imports; ensure the header follows the standard Apache-2.0 format used across
the repo and appears above the existing imports (inspect, math, etc.) in the
quantization.py module so classes like the ABC/Enum definitions and functions in
this file are properly attributed.
- Around line 190-193: Replace the comment-style documentation for the class
attributes in FusedMoEMethodBase with inline attribute docstrings: add a concise
docstring for weight_alignment explaining what alignment value represents and
expected units/constraints, and replace the eplb_support_status comment with a
class-level docstring explaining that this is the default online EPLB support
level for the class (not per-instance), that subclasses should override it to
EplbSupportStatus.SUPPORTED or NOT_VERIFIED as appropriate, and that the default
EplbSupportStatus.NOT_SUPPORTED is chosen for safety; ensure the docstrings sit
immediately after the attribute definitions (weight_alignment and
eplb_support_status) and succinctly describe override behavior and the safety
rationale.

In `@tests/unittest/_torch/modules/moe/moe_test_utils.py`:
- Around line 93-100: Rename the three helper functions should_skip_TRTLLM,
should_skip_CUTEDSL, and should_skip_DEEPGEMM to snake_case names
should_skip_trtllm, should_skip_cutedsl, and should_skip_deepgemm in their
definitions (previously in moe_test_utils.py) and update every use: change the
imports in test_moe_module.py to import the new names, replace all call sites in
test_moe_module.py and moe_test_utils.py (and any other callers) to call the new
snake_case names, and update any comment references (e.g., in moe_test_utils.py
and test_moe_backend.py) to the new names so tests and docs remain consistent.
- Line 48: Rename the module-level variable logger to G_LOGGER and update all
its uses (there are three occurrences) to follow the upper snake_case with G_
prefix guideline; specifically change the assignment logging.getLogger(__name__)
to G_LOGGER = logging.getLogger(__name__) and replace every reference to logger
in this module (all three locations) with G_LOGGER so imports and tests continue
to work.
- Around line 456-459: The function supports_autotuner_capture has an unused
parameter quant_algo causing a lint warning; rename the parameter to _quant_algo
in the supports_autotuner_capture signature so callers can still pass it but the
linter sees it as intentionally unused (update the function definition for
supports_autotuner_capture accordingly and ensure any internal references remain
unchanged).
- Around line 40-46: The file currently imports classes directly (e.g.,
AutoTuner, CutlassFusedMoE, DeepGemmFusedMoE, CuteDslFusedMoE,
TRTLLMGenFusedMoE, MoE, QuantAlgo); change each to module-level imports (e.g.,
import tensorrt_llm._torch.autotuner as autotuner, import
tensorrt_llm._torch.modules.fused_moe.fused_moe_cutlass as fused_moe_cutlass,
fused_moe_deepgemm as fused_moe_deepgemm, fused_moe_cute_dsl as
fused_moe_cute_dsl, fused_moe_trtllm_gen as fused_moe_trtllm_gen, import
tensorrt_llm._torch.modules.fused_moe.interface as moe_interface, and import
tensorrt_llm.models.modeling_utils as modeling_utils) and then update all call
sites to qualify names (AutoTuner.get() → autotuner.AutoTuner.get(),
CutlassFusedMoE → fused_moe_cutlass.CutlassFusedMoE, DeepGemmFusedMoE →
fused_moe_deepgemm.DeepGemmFusedMoE, CuteDslFusedMoE →
fused_moe_cute_dsl.CuteDslFusedMoE, TRTLLMGenFusedMoE →
fused_moe_trtllm_gen.TRTLLMGenFusedMoE, MoE type hints → moe_interface.MoE, and
QuantAlgo values → modeling_utils.QuantAlgo) so the namespace is preserved
across the file.

In `@tests/unittest/_torch/modules/moe/test_moe_module.py`:
- Around line 280-283: The loop defines an unused loop variable "step" which
triggers lint warnings; change the loop to use a throwaway variable (e.g., "_"
or "__") so the behavior is unchanged: keep extra_steps, call run_forward_fn()
and ref_fused_moe.check_accuracy(output, ref_output) inside the loop, but
replace "for step in range(extra_steps):" with "for _ in range(extra_steps):"
(or another conventional unused-name) to silence the lint warning.
- Around line 860-871: The lambdas in the loop capture loop variables and
trigger ruff B023; replace the lambda list with direct sequential calls to the
check functions instead: call _get_comm_method_skip_reason(comm_method,
model_config), then should_skip_TRTLLM(backend_type, quant_algo, model_config,
comm_method=comm_method), then should_skip_CUTEDSL(backend_type, quant_algo,
model_config, comm_method), then should_skip_DEEPGEMM(backend_type,
comm_method), and finally should_skip_multi_gpu(parallel_mode, model_config,
world_size=4), setting skip_reason to the first non-falsy result (if not already
set) after each call; this removes the closures while preserving behavior.

🧹 Nitpick comments (6)

tensorrt_llm/_torch/modules/fused_moe/fused_moe_cutlass.py (1)
116-117: Add an inline attribute docstring for the new class constant.

This constant should be documented using an inline attribute docstring rather than a preceding comment.
🔧 Suggested update
-    # Quantization algorithms that support swiglu_gptoss_style
-    _GPTOSS_SUPPORTED_ALGOS = {QuantAlgo.W4A8_MXFP4_MXFP8}
+    _GPTOSS_SUPPORTED_ALGOS = {QuantAlgo.W4A8_MXFP4_MXFP8}
+    """set[QuantAlgo]: Quantization algorithms that support swiglu_gptoss_style."""
As per coding guidelines, document attributes and variables inline with """<type>: Description""" syntax.
tensorrt_llm/_torch/modules/fused_moe/communication/deep_ep_low_latency.py (1)
40-45: Use inline attribute docstrings for the new kernel-size constants.

Prefer inline attribute docstrings over block comments for class attributes.
🔧 Suggested update
-    # DeepEP low-latency kernel supported hidden sizes (from SWITCH_HIDDEN in launch.cuh)
-    SUPPORTED_HIDDEN_SIZES = {2048, 2560, 3584, 4096, 5120, 6144, 7168}
+    SUPPORTED_HIDDEN_SIZES = {2048, 2560, 3584, 4096, 5120, 6144, 7168}
+    """set[int]: Hidden sizes supported by the low-latency DeepEP kernel."""
@@
-    # Extension kernel supported hidden sizes (from SWITCH_HIDDEN_FOR_EXTENSION_KERNELS
-    # in extension_kernels.cu), used for nvfp4 post-quant dispatch and low-precision combine
-    SUPPORTED_HIDDEN_SIZES_EXTENSION = {4096, 6144, 7168}
+    SUPPORTED_HIDDEN_SIZES_EXTENSION = {4096, 6144, 7168}
+    """set[int]: Hidden sizes supported by extension kernels (nvfp4 post-quant/low-precision)."""
As per coding guidelines, document attributes and variables inline with """<type>: Description""" syntax.
tests/unittest/_torch/modules/moe/test_moe_backend.py (1)
31-44: Prefer module-level imports to preserve namespace.

The direct symbol imports from moe_test_utils break the repo’s namespace-preserving import rule. Consider importing the module and qualifying references.
🔧 Suggested update
-from _torch.modules.moe.moe_test_utils import (
-    MoeBackendType,
-    MoeModelConfig,
-    create_test_param,
-    get_backend_class,
-    iter_base_test_configs,
-    module_timer,  # noqa: F401 - imported for pytest fixture registration
-    replay_tactics_and_check,
-    supports_autotuner_capture,
-)
+from _torch.modules.moe import moe_test_utils
Then reference symbols as moe_test_utils.MoeBackendType, moe_test_utils.iter_base_test_configs, etc.
As per coding guidelines, always maintain the namespace when importing. Use from package.subpackage import foo instead of from package.subpackage.foo import SomeClass or import package.
tests/unittest/_torch/modules/moe/test_moe_module.py (3)
40-90: Prefer module-level imports to preserve namespace.

Several new imports pull symbols directly from modules. Please switch to module-level imports and qualify usage to align with repo conventions.
🔧 Suggested update
-from _torch.modules.moe.moe_test_utils import (
-    MoeBackendType,
-    MoeModelConfig,
-    create_test_param,
-    get_quick_skip_reason,
-    iter_base_test_configs,
-    module_timer,  # noqa: F401 - imported for pytest fixture registration
-    replay_tactics_and_check,
-    should_skip_CUTEDSL,
-    should_skip_DEEPGEMM,
-    should_skip_multi_gpu,
-    should_skip_TRTLLM,
-    supports_autotuner_capture,
-)
+from _torch.modules.moe import moe_test_utils
Then reference symbols as moe_test_utils.MoeBackendType, moe_test_utils.iter_base_test_configs, etc. Apply the same pattern to other direct symbol imports in this block.
As per coding guidelines, always maintain the namespace when importing. Use from package.subpackage import foo instead of from package.subpackage.foo import SomeClass or import package.

96-96: Prefix the module-level logger with G_.

Module-level globals should use the G_ prefix and upper snake_case.
🔧 Suggested update
-logger = logging.getLogger(__name__)
+G_LOGGER = logging.getLogger(__name__)
Remember to update usages (e.g., G_LOGGER.info(...)).
As per coding guidelines, use upper snake_case with prefix 'G' for global variables.

962-972: Rename test functions to snake_case.

These test function names are not snake_case; please rename them to follow the repo’s naming rules.
🔧 Suggested update
-def test_ConfigurableMoE_single_gpu(
+def test_configurable_moe_single_gpu(
@@
-def test_ConfigurableMoE_multi_gpu(
+def test_configurable_moe_multi_gpu(
@@
-def test_ConfigurableMoE_multi_gpu_eplb(
+def test_configurable_moe_multi_gpu_eplb(
As per coding guidelines, use snake_case for function and method names.

Also applies to: 1030-1042, 1278-1287

tensorrt_llm/_torch/modules/fused_moe/quantization.py

tests/unittest/_torch/modules/moe/moe_test_utils.py

tests/unittest/_torch/modules/moe/test_moe_module.py

xxi-nv · 2026-02-11T05:40:08Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-02-11T05:47:05Z

PR_Github #35590 [ run ] triggered by Bot. Commit: 9aa00ae

xxi-nv · 2026-02-11T06:49:40Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-02-11T06:51:59Z

PR_Github #35590 [ run ] completed with state SUCCESS. Commit: 9aa00ae
/LLM/main/L0_MergeRequest_PR pipeline #27491 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

tensorrt-cicd · 2026-02-11T06:57:40Z

PR_Github #35598 [ run ] triggered by Bot. Commit: ed1a0ea

tensorrt-cicd · 2026-02-11T11:16:53Z

PR_Github #35598 [ run ] completed with state SUCCESS. Commit: ed1a0ea
/LLM/main/L0_MergeRequest_PR pipeline #27495 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

- Extract shared utilities into moe_test_utils.py (skip logic, model configs, etc.) - Add comprehensive single/multi-GPU tests with configurable backends, quant algos, routing methods, and SwiGLU parameters - Add EPLB and autotune tactic capture/replay tests - Support CI/local config split via TRTLLM_TEST_MOE_CI env var - Skip TRTLLM+DeepEP combinations that crash with CUDA errors in multi-GPU mode - Relax accuracy threshold for MXFP4+MXFP8 with large hidden_size Signed-off-by: xxi <xxi@nvidia.com>

xxi-nv · 2026-02-11T13:01:12Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-02-11T13:07:32Z

PR_Github #35633 [ run ] triggered by Bot. Commit: a37b6c9

tensorrt-cicd · 2026-02-11T13:07:33Z

PR_Github #35633 [ run ] completed with state DISABLED
CI server is currently disabled for unplanned maintenance. Estimated completion time: 8 AM PST on 2/11.

xxi-nv · 2026-02-11T23:45:46Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-02-11T23:51:38Z

PR_Github #35681 [ run ] triggered by Bot. Commit: a37b6c9

tensorrt-cicd · 2026-02-12T06:31:10Z

PR_Github #35681 [ run ] completed with state SUCCESS. Commit: a37b6c9
/LLM/main/L0_MergeRequest_PR pipeline #27559 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

xxi-nv · 2026-02-12T07:57:36Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-02-12T08:03:23Z

PR_Github #35763 [ run ] triggered by Bot. Commit: a37b6c9

tensorrt-cicd · 2026-02-12T10:53:29Z

PR_Github #35763 [ run ] completed with state SUCCESS. Commit: a37b6c9
/LLM/main/L0_MergeRequest_PR pipeline #27621 completed with status: 'SUCCESS'

…MoE test framework (NVIDIA#11437) Signed-off-by: xxi <xxi@nvidia.com>

xxi-nv requested a review from a team as a code owner February 11, 2026 02:56

xxi-nv requested review from Barry-Delaney, QiJune, dongfengy, leslie-fang25, mikeiovine, rosenrodt and syuoni and removed request for mikeiovine February 11, 2026 02:56

coderabbitai bot reviewed Feb 11, 2026

View reviewed changes

xxi-nv force-pushed the testConfigurableMoE branch from 8c85f09 to 9aa00ae Compare February 11, 2026 05:39

xxi-nv force-pushed the testConfigurableMoE branch 2 times, most recently from d5c2720 to ed1a0ea Compare February 11, 2026 06:28

xxi-nv force-pushed the testConfigurableMoE branch from ed1a0ea to a37b6c9 Compare February 11, 2026 13:00

syuoni approved these changes Feb 13, 2026

View reviewed changes

QiJune approved these changes Feb 13, 2026

View reviewed changes

xxi-nv merged commit 2565f0f into NVIDIA:main Feb 13, 2026
5 checks passed

chzblych mentioned this pull request Feb 15, 2026

[None][revert] - Revert "[TRTLLM-9108][feat] refactor MoE unit tests: add unified ConfigurableMoE test framework" #11532

Merged

ekou24 pushed a commit to ekou24/TensorRT-LLM that referenced this pull request Feb 16, 2026

[TRTLLM-9108][feat] refactor MoE unit tests: add unified Configurable…

03bbe5a

…MoE test framework (NVIDIA#11437) Signed-off-by: xxi <xxi@nvidia.com>

coderabbitai bot mentioned this pull request Feb 23, 2026

[TRTLLM-9108][feat] refactor MoE unit tests: add unified ConfigurableMoE test framework #11648

Merged

1 task

Comments

Conversation

xxi-nv commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Coverage

PR Checklist

GitHub Bot Help

kill

skip

reuse-pipeline

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Feb 11, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xxi-nv commented Feb 11, 2026

Uh oh!

tensorrt-cicd commented Feb 11, 2026

Uh oh!

xxi-nv commented Feb 11, 2026

Uh oh!

tensorrt-cicd commented Feb 11, 2026

Uh oh!

tensorrt-cicd commented Feb 11, 2026

Uh oh!

tensorrt-cicd commented Feb 11, 2026

Uh oh!

xxi-nv commented Feb 11, 2026

Uh oh!

tensorrt-cicd commented Feb 11, 2026

Uh oh!

tensorrt-cicd commented Feb 11, 2026

Uh oh!

xxi-nv commented Feb 11, 2026

Uh oh!

tensorrt-cicd commented Feb 11, 2026

Uh oh!

tensorrt-cicd commented Feb 12, 2026

Uh oh!

xxi-nv commented Feb 12, 2026

Uh oh!

tensorrt-cicd commented Feb 12, 2026

Uh oh!

tensorrt-cicd commented Feb 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

xxi-nv commented Feb 11, 2026 •

edited

Loading