Skip to content

Comments

[TRTLLM-9108][feat] refactor MoE unit tests: add unified ConfigurableMoE test framework#11437

Merged
xxi-nv merged 1 commit intoNVIDIA:mainfrom
xxi-nv:testConfigurableMoE
Feb 13, 2026
Merged

[TRTLLM-9108][feat] refactor MoE unit tests: add unified ConfigurableMoE test framework#11437
xxi-nv merged 1 commit intoNVIDIA:mainfrom
xxi-nv:testConfigurableMoE

Conversation

@xxi-nv
Copy link
Collaborator

@xxi-nv xxi-nv commented Feb 11, 2026

Description

Test Framework Overhaul
New shared utility module (moe_test_utils.py)

  • Extracted common code from test_moe_backend.py and test_moe_module.py into a shared module
  • MoeBackendType enum, MoeModelConfig dataclass, get_backend_class() helper
  • Centralized skip logic functions: should_skip_TRTLLM(), should_skip_CUTEDSL(), should_skip_DEEPGEMM(), should_skip_multi_gpu(), should_skip_routing_method()
  • Unified get_quick_skip_reason() supporting both backend-level and module-level tests
  • iter_base_test_configs() generator for systematic parameter matrix traversal with pre-computed skip reasons
  • replay_tactics_and_check() for autotuner tactic capture/replay validation
  • module_timer pytest fixture for per-test timing

Expanded test_moe_module.py (ConfigurableMoE tests)

  • Renamed test functions: test_moe_single_gpu → test_ConfigurableMoE_single_gpu, etc.
  • Tests exercise the high-level create_moe() → load_weights() → forward() path
  • Full parameter matrix: 4 backends (CUTLASS, TRTLLM, CuteDSL, DeepGemm) × 9 quant algos × 6 routing methods × multiple model configs × SwiGLU variants
  • Multi-GPU tests: 4 parallel modes (DEP/TEP/DTP/TTP) × 4 comm methods (NVLink one-sided/two-sided, DeepEP, DeepEP low-latency)
  • EPLB (Expert Load Balancing) tests with expert ID update verification
  • CI/local config split via TRTLLM_TEST_MOE_CI env var (default: full local matrix; set =1 for CI subset)

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support status tracking for online load balancing across quantization methods.
    • Introduced hidden size constraints for optimized kernel support to ensure correctness.
  • Improvements

    • Enabled DeepEP optimization by default for BF16 activations to improve performance.
    • Enhanced data type handling for improved numerical stability in dispatch operations.
  • Bug Fixes

    • Removed unnecessary platform guard to streamline optimization eligibility checks.

@xxi-nv xxi-nv requested a review from a team as a code owner February 11, 2026 02:56
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 11, 2026

📝 Walkthrough

Walkthrough

This PR updates the MoE (Mixture of Experts) communication and backend infrastructure. Key changes include: removing platform guards from DeepEP communication layer, adding hidden-size constraints to DeepEPLowLatency, systematically renaming a public parameter across all MoE backends from gptoss_style to swiglu_gptoss_style, introducing EPLB (Online Expert-Parallel Load Balancer) support status tracking, and establishing comprehensive test utilities and refactoring test modules for MoE backends.

Changes

Cohort / File(s) Summary
Communication Backend Enhancements
tensorrt_llm/_torch/modules/fused_moe/communication/communication_factory.py, tensorrt_llm/_torch/modules/fused_moe/communication/deep_ep.py
Changed DeepEP default activation from "0" to "1" for TRTLLM_CAN_USE_DEEP_EP; removed platform guard in is_platform_supported() and added dtype normalization for token_final_scales dispatch/restore cycle.
DeepEPLowLatency Constraints
tensorrt_llm/_torch/modules/fused_moe/communication/deep_ep_low_latency.py
Added supported hidden-size constants (SUPPORTED_HIDDEN_SIZES, SUPPORTED_HIDDEN_SIZES_EXTENSION); introduced hidden-size validation in __init__; updated supports_post_quant_dispatch() and supports_low_precision_combine() to enforce size constraints; removed explicit TRTLLM_CAN_USE_DEEP_EP env check from is_platform_supported().
MoE Backend Parameter Rename
tensorrt_llm/_torch/modules/fused_moe/interface.py, tensorrt_llm/_torch/modules/fused_moe/configurable_moe.py, tensorrt_llm/_torch/modules/fused_moe/fused_moe_cute_dsl.py, tensorrt_llm/_torch/modules/fused_moe/fused_moe_cutlass.py, tensorrt_llm/_torch/modules/fused_moe/fused_moe_deepgemm.py, tensorrt_llm/_torch/modules/fused_moe/fused_moe_triton.py, tensorrt_llm/_torch/modules/fused_moe/fused_moe_trtllm_gen.py
Renamed public parameter in can_implement() method signatures and all supporting documentation from gptoss_style: bool = False to swiglu_gptoss_style: bool = False across all MoE backend implementations and base interface.
EPLB Support Tracking
tensorrt_llm/_torch/modules/fused_moe/quantization.py
Introduced EplbSupportStatus enum (SUPPORTED, NOT_SUPPORTED, NOT_VERIFIED); added eplb_support_status class attribute and supports_online_eplb() classmethod to FusedMoEMethodBase; annotated quantization method classes with appropriate EPLB support statuses.
Test Utilities Infrastructure
tests/unittest/_torch/modules/moe/moe_test_utils.py
New comprehensive test utilities module providing MoE testing abstractions: MoeBackendType enum, MoeModelConfig dataclass, backend-specific skip-check functions (TRTLLM, CUTEDSL, DEEPGEMM, multi-GPU, routing), autotuner support detection, replay_tactics_and_check() function, test parameter helpers, module_timer fixture, and iter_base_test_configs() configuration generator.
Quantization Utilities Refactor
tests/unittest/_torch/modules/moe/quantize_utils.py
Updated all quantization utility constructors and methods to use swiglu_gptoss_style instead of gptoss_style; changed create_ref_module() signatures to accept ref_cls=None with internal default resolution; added optional device parameter to CUDA methods; enhanced internal handling for per-local-expert sizing (num_local_experts).
Backend Test Refactoring
tests/unittest/_torch/modules/moe/test_moe_backend.py
Consolidated in-file test scaffolding by importing from centralized moe_test_utils; removed redundant backend enum, skip checks, and autotuner logic; refactored test parameter generation to use iter_base_test_configs() and create_test_param(); updated parameter naming to reflect swiglu_gptoss_style terminology.
Module Test Expansion
tests/unittest/_torch/modules/moe/test_moe_module.py
Significantly expanded test infrastructure with new helpers: _create_mapping_for_parallel_mode(), _create_moe_load_balancer(), _setup_autotuner_for_test(), _create_model_config(), _run_autotune_test(), _run_eplb_test(), _create_routing_method(); refactored worker functions with comprehensive parameters for multi-GPU, EPLB, autotune, and routing support; added EPLB-specific test scaffolding and parameter generation; introduced test_ConfigurableMoE_multi_gpu_eplb().

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~35 minutes

Possibly related PRs

Suggested reviewers

  • leslie-fang25
  • syuoni
  • rosenrodt
  • QiJune
  • Barry-Delaney
🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 70.10% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: refactoring MoE unit tests with a unified ConfigurableMoE test framework.
Description check ✅ Passed The PR description is comprehensive and follows the template structure, explaining the overhaul, test framework changes, and test coverage with clear sections.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
tests/unittest/_torch/modules/moe/quantize_utils.py (1)

313-389: ⚠️ Potential issue | 🟠 Major

Guard swiglu tensor creation against None defaults

When swiglu_gptoss_style is enabled, torch.full will throw if any of swiglu_alpha/beta/limit is None. Consider defaulting to the same semantics as custom_swiglu (alpha=1.0, beta=0.0, limit=inf) or raising a clear error before tensor creation.

🛠️ Suggested fix
-        if self._swiglu_gptoss_style:
-            self._swiglu_tensors = self._create_swiglu_tensors()
+        if self._swiglu_gptoss_style:
+            self.swiglu_alpha = 1.0 if self.swiglu_alpha is None else self.swiglu_alpha
+            self.swiglu_beta = 0.0 if self.swiglu_beta is None else self.swiglu_beta
+            self.swiglu_limit = (
+                float("inf") if self.swiglu_limit is None else self.swiglu_limit
+            )
+            self._swiglu_tensors = self._create_swiglu_tensors()
tensorrt_llm/_torch/modules/fused_moe/communication/communication_factory.py (1)

1-2: ⚠️ Potential issue | 🟠 Major

Update SPDX year to 2026 for modified file

This file was edited in 2026, so the header year should be updated to reflect the latest modification.

🔧 Suggested fix
-# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
As per coding guidelines, "Include NVIDIA copyright header on ALL new files and update year on modified files".
tensorrt_llm/_torch/modules/fused_moe/communication/deep_ep.py (1)

1-2: ⚠️ Potential issue | 🟠 Major

Update SPDX year to 2026 for modified file

This file was edited in 2026, so the header year should be updated to reflect the latest modification.

🔧 Suggested fix
-# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
As per coding guidelines, "Include NVIDIA copyright header on ALL new files and update year on modified files".
tensorrt_llm/_torch/modules/fused_moe/interface.py (1)

1-1: ⚠️ Potential issue | 🟠 Major

Add NVIDIA SPDX header to this Python source

This file is missing the required Apache-2.0 SPDX header.

🔧 Suggested fix
+#+#+#+#+-------------------------------------------------------------------------------------------------------------------
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
As per coding guidelines, "All source files must contain an NVIDIA copyright header with the year of latest meaningful modification. Use the Apache License 2.0 format. This applies to .cpp, .h, .cu, .py, and other compiled or interpreted source files".
tensorrt_llm/_torch/modules/fused_moe/configurable_moe.py (1)

1-2: ⚠️ Potential issue | 🟠 Major

Update SPDX year to 2026 for modified file

This file was edited in 2026, so the header year should be updated to reflect the latest modification.

🔧 Suggested fix
-# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
As per coding guidelines, "Include NVIDIA copyright header on ALL new files and update year on modified files".
🤖 Fix all issues with AI agents
In `@tensorrt_llm/_torch/modules/fused_moe/quantization.py`:
- Around line 1-5: Add the required NVIDIA Apache-2.0 copyright header (with the
latest year of meaningful modification) at the very top of the file before any
imports; ensure the header follows the standard Apache-2.0 format used across
the repo and appears above the existing imports (inspect, math, etc.) in the
quantization.py module so classes like the ABC/Enum definitions and functions in
this file are properly attributed.
- Around line 190-193: Replace the comment-style documentation for the class
attributes in FusedMoEMethodBase with inline attribute docstrings: add a concise
docstring for weight_alignment explaining what alignment value represents and
expected units/constraints, and replace the eplb_support_status comment with a
class-level docstring explaining that this is the default online EPLB support
level for the class (not per-instance), that subclasses should override it to
EplbSupportStatus.SUPPORTED or NOT_VERIFIED as appropriate, and that the default
EplbSupportStatus.NOT_SUPPORTED is chosen for safety; ensure the docstrings sit
immediately after the attribute definitions (weight_alignment and
eplb_support_status) and succinctly describe override behavior and the safety
rationale.

In `@tests/unittest/_torch/modules/moe/moe_test_utils.py`:
- Around line 93-100: Rename the three helper functions should_skip_TRTLLM,
should_skip_CUTEDSL, and should_skip_DEEPGEMM to snake_case names
should_skip_trtllm, should_skip_cutedsl, and should_skip_deepgemm in their
definitions (previously in moe_test_utils.py) and update every use: change the
imports in test_moe_module.py to import the new names, replace all call sites in
test_moe_module.py and moe_test_utils.py (and any other callers) to call the new
snake_case names, and update any comment references (e.g., in moe_test_utils.py
and test_moe_backend.py) to the new names so tests and docs remain consistent.
- Line 48: Rename the module-level variable logger to G_LOGGER and update all
its uses (there are three occurrences) to follow the upper snake_case with G_
prefix guideline; specifically change the assignment logging.getLogger(__name__)
to G_LOGGER = logging.getLogger(__name__) and replace every reference to logger
in this module (all three locations) with G_LOGGER so imports and tests continue
to work.
- Around line 456-459: The function supports_autotuner_capture has an unused
parameter quant_algo causing a lint warning; rename the parameter to _quant_algo
in the supports_autotuner_capture signature so callers can still pass it but the
linter sees it as intentionally unused (update the function definition for
supports_autotuner_capture accordingly and ensure any internal references remain
unchanged).
- Around line 40-46: The file currently imports classes directly (e.g.,
AutoTuner, CutlassFusedMoE, DeepGemmFusedMoE, CuteDslFusedMoE,
TRTLLMGenFusedMoE, MoE, QuantAlgo); change each to module-level imports (e.g.,
import tensorrt_llm._torch.autotuner as autotuner, import
tensorrt_llm._torch.modules.fused_moe.fused_moe_cutlass as fused_moe_cutlass,
fused_moe_deepgemm as fused_moe_deepgemm, fused_moe_cute_dsl as
fused_moe_cute_dsl, fused_moe_trtllm_gen as fused_moe_trtllm_gen, import
tensorrt_llm._torch.modules.fused_moe.interface as moe_interface, and import
tensorrt_llm.models.modeling_utils as modeling_utils) and then update all call
sites to qualify names (AutoTuner.get() → autotuner.AutoTuner.get(),
CutlassFusedMoE → fused_moe_cutlass.CutlassFusedMoE, DeepGemmFusedMoE →
fused_moe_deepgemm.DeepGemmFusedMoE, CuteDslFusedMoE →
fused_moe_cute_dsl.CuteDslFusedMoE, TRTLLMGenFusedMoE →
fused_moe_trtllm_gen.TRTLLMGenFusedMoE, MoE type hints → moe_interface.MoE, and
QuantAlgo values → modeling_utils.QuantAlgo) so the namespace is preserved
across the file.

In `@tests/unittest/_torch/modules/moe/test_moe_module.py`:
- Around line 280-283: The loop defines an unused loop variable "step" which
triggers lint warnings; change the loop to use a throwaway variable (e.g., "_"
or "__") so the behavior is unchanged: keep extra_steps, call run_forward_fn()
and ref_fused_moe.check_accuracy(output, ref_output) inside the loop, but
replace "for step in range(extra_steps):" with "for _ in range(extra_steps):"
(or another conventional unused-name) to silence the lint warning.
- Around line 860-871: The lambdas in the loop capture loop variables and
trigger ruff B023; replace the lambda list with direct sequential calls to the
check functions instead: call _get_comm_method_skip_reason(comm_method,
model_config), then should_skip_TRTLLM(backend_type, quant_algo, model_config,
comm_method=comm_method), then should_skip_CUTEDSL(backend_type, quant_algo,
model_config, comm_method), then should_skip_DEEPGEMM(backend_type,
comm_method), and finally should_skip_multi_gpu(parallel_mode, model_config,
world_size=4), setting skip_reason to the first non-falsy result (if not already
set) after each call; this removes the closures while preserving behavior.
🧹 Nitpick comments (6)
tensorrt_llm/_torch/modules/fused_moe/fused_moe_cutlass.py (1)

116-117: Add an inline attribute docstring for the new class constant.

This constant should be documented using an inline attribute docstring rather than a preceding comment.

🔧 Suggested update
-    # Quantization algorithms that support swiglu_gptoss_style
-    _GPTOSS_SUPPORTED_ALGOS = {QuantAlgo.W4A8_MXFP4_MXFP8}
+    _GPTOSS_SUPPORTED_ALGOS = {QuantAlgo.W4A8_MXFP4_MXFP8}
+    """set[QuantAlgo]: Quantization algorithms that support swiglu_gptoss_style."""

As per coding guidelines, document attributes and variables inline with """<type>: Description""" syntax.

tensorrt_llm/_torch/modules/fused_moe/communication/deep_ep_low_latency.py (1)

40-45: Use inline attribute docstrings for the new kernel-size constants.

Prefer inline attribute docstrings over block comments for class attributes.

🔧 Suggested update
-    # DeepEP low-latency kernel supported hidden sizes (from SWITCH_HIDDEN in launch.cuh)
-    SUPPORTED_HIDDEN_SIZES = {2048, 2560, 3584, 4096, 5120, 6144, 7168}
+    SUPPORTED_HIDDEN_SIZES = {2048, 2560, 3584, 4096, 5120, 6144, 7168}
+    """set[int]: Hidden sizes supported by the low-latency DeepEP kernel."""
@@
-    # Extension kernel supported hidden sizes (from SWITCH_HIDDEN_FOR_EXTENSION_KERNELS
-    # in extension_kernels.cu), used for nvfp4 post-quant dispatch and low-precision combine
-    SUPPORTED_HIDDEN_SIZES_EXTENSION = {4096, 6144, 7168}
+    SUPPORTED_HIDDEN_SIZES_EXTENSION = {4096, 6144, 7168}
+    """set[int]: Hidden sizes supported by extension kernels (nvfp4 post-quant/low-precision)."""

As per coding guidelines, document attributes and variables inline with """<type>: Description""" syntax.

tests/unittest/_torch/modules/moe/test_moe_backend.py (1)

31-44: Prefer module-level imports to preserve namespace.

The direct symbol imports from moe_test_utils break the repo’s namespace-preserving import rule. Consider importing the module and qualifying references.

🔧 Suggested update
-from _torch.modules.moe.moe_test_utils import (
-    MoeBackendType,
-    MoeModelConfig,
-    create_test_param,
-    get_backend_class,
-    iter_base_test_configs,
-    module_timer,  # noqa: F401 - imported for pytest fixture registration
-    replay_tactics_and_check,
-    supports_autotuner_capture,
-)
+from _torch.modules.moe import moe_test_utils

Then reference symbols as moe_test_utils.MoeBackendType, moe_test_utils.iter_base_test_configs, etc.

As per coding guidelines, always maintain the namespace when importing. Use from package.subpackage import foo instead of from package.subpackage.foo import SomeClass or import package.

tests/unittest/_torch/modules/moe/test_moe_module.py (3)

40-90: Prefer module-level imports to preserve namespace.

Several new imports pull symbols directly from modules. Please switch to module-level imports and qualify usage to align with repo conventions.

🔧 Suggested update
-from _torch.modules.moe.moe_test_utils import (
-    MoeBackendType,
-    MoeModelConfig,
-    create_test_param,
-    get_quick_skip_reason,
-    iter_base_test_configs,
-    module_timer,  # noqa: F401 - imported for pytest fixture registration
-    replay_tactics_and_check,
-    should_skip_CUTEDSL,
-    should_skip_DEEPGEMM,
-    should_skip_multi_gpu,
-    should_skip_TRTLLM,
-    supports_autotuner_capture,
-)
+from _torch.modules.moe import moe_test_utils

Then reference symbols as moe_test_utils.MoeBackendType, moe_test_utils.iter_base_test_configs, etc. Apply the same pattern to other direct symbol imports in this block.

As per coding guidelines, always maintain the namespace when importing. Use from package.subpackage import foo instead of from package.subpackage.foo import SomeClass or import package.


96-96: Prefix the module-level logger with G_.

Module-level globals should use the G_ prefix and upper snake_case.

🔧 Suggested update
-logger = logging.getLogger(__name__)
+G_LOGGER = logging.getLogger(__name__)

Remember to update usages (e.g., G_LOGGER.info(...)).

As per coding guidelines, use upper snake_case with prefix 'G' for global variables.


962-972: Rename test functions to snake_case.

These test function names are not snake_case; please rename them to follow the repo’s naming rules.

🔧 Suggested update
-def test_ConfigurableMoE_single_gpu(
+def test_configurable_moe_single_gpu(
@@
-def test_ConfigurableMoE_multi_gpu(
+def test_configurable_moe_multi_gpu(
@@
-def test_ConfigurableMoE_multi_gpu_eplb(
+def test_configurable_moe_multi_gpu_eplb(

As per coding guidelines, use snake_case for function and method names.

Also applies to: 1030-1042, 1278-1287

@xxi-nv xxi-nv force-pushed the testConfigurableMoE branch from 8c85f09 to 9aa00ae Compare February 11, 2026 05:39
@xxi-nv
Copy link
Collaborator Author

xxi-nv commented Feb 11, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35590 [ run ] triggered by Bot. Commit: 9aa00ae

@xxi-nv xxi-nv force-pushed the testConfigurableMoE branch 2 times, most recently from d5c2720 to ed1a0ea Compare February 11, 2026 06:28
@xxi-nv
Copy link
Collaborator Author

xxi-nv commented Feb 11, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35590 [ run ] completed with state SUCCESS. Commit: 9aa00ae
/LLM/main/L0_MergeRequest_PR pipeline #27491 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35598 [ run ] triggered by Bot. Commit: ed1a0ea

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35598 [ run ] completed with state SUCCESS. Commit: ed1a0ea
/LLM/main/L0_MergeRequest_PR pipeline #27495 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

- Extract shared utilities into moe_test_utils.py (skip logic, model configs, etc.)
- Add comprehensive single/multi-GPU tests with configurable backends, quant algos,
  routing methods, and SwiGLU parameters
- Add EPLB and autotune tactic capture/replay tests
- Support CI/local config split via TRTLLM_TEST_MOE_CI env var
- Skip TRTLLM+DeepEP combinations that crash with CUDA errors in multi-GPU mode
- Relax accuracy threshold for MXFP4+MXFP8 with large hidden_size

Signed-off-by: xxi <xxi@nvidia.com>
@xxi-nv xxi-nv force-pushed the testConfigurableMoE branch from ed1a0ea to a37b6c9 Compare February 11, 2026 13:00
@xxi-nv
Copy link
Collaborator Author

xxi-nv commented Feb 11, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35633 [ run ] triggered by Bot. Commit: a37b6c9

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35633 [ run ] completed with state DISABLED
CI server is currently disabled for unplanned maintenance. Estimated completion time: 8 AM PST on 2/11.

@xxi-nv
Copy link
Collaborator Author

xxi-nv commented Feb 11, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35681 [ run ] triggered by Bot. Commit: a37b6c9

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35681 [ run ] completed with state SUCCESS. Commit: a37b6c9
/LLM/main/L0_MergeRequest_PR pipeline #27559 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@xxi-nv
Copy link
Collaborator Author

xxi-nv commented Feb 12, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35763 [ run ] triggered by Bot. Commit: a37b6c9

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35763 [ run ] completed with state SUCCESS. Commit: a37b6c9
/LLM/main/L0_MergeRequest_PR pipeline #27621 completed with status: 'SUCCESS'

@xxi-nv xxi-nv merged commit 2565f0f into NVIDIA:main Feb 13, 2026
5 checks passed
ekou24 pushed a commit to ekou24/TensorRT-LLM that referenced this pull request Feb 16, 2026
…MoE test framework (NVIDIA#11437)

Signed-off-by: xxi <xxi@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants