[TRTLLM-9111][feat] provide the uniform test framework to test all MoE backends#11128

xxi-nv · 2026-01-30T02:16:21Z

Description

Summary
This PR introduces a unified test framework for MoE (Mixture of Experts) backends that enables systematic testing of all backend implementations through their backend-level interfaces (quantize_input + run_moe), rather than the high-level forward() interface which will be deprecated in the future.

Key Changes

Unified can_implement() Interface for All MoE Backends
Added a standardized can_implement() classmethod to all MoE backend classes:

CutlassFusedMoE - Supports unquantized, FP8, FP8_BLOCK_SCALES, NVFP4, W4A8_AWQ, W8A16, W4A16_MXFP4, W4A8_MXFP4_FP8, W4A8_MXFP4_MXFP8
TRTLLMGenFusedMoE - Supports NVFP4, FP8_BLOCK_SCALES, W4A8_NVFP4_FP8, W4A16_MXFP4, W4A8_MXFP4_MXFP8
CuteDslFusedMoE - Supports NVFP4 (SM100/103 only)
DeepGemmFusedMoE - Supports FP8_BLOCK_SCALES (SM100/103 only)
TritonFusedMoE - Supports unquantized, FP8, W4A8_MXFP4_FP8, W4A16_MXFP4 (SM90 only, gptoss_style required)

Each implementation checks:

SM version compatibility
Activation dtype support
Quantization algorithm support
gptoss_style support (bias/swiglu with custom alpha/beta/limit)

Comprehensive Test Framework (test_moe_backend.py)
Implemented a new test module that provides:

MoeBackendType enum for backend identification
MoeModelConfig dataclass for model configurations
Skip logic functions (should_skip_TRTLLM, should_skip_CUTEDSL, should_skip_gptoss) for backend-specific constraints
supports_autotuner_capture() to determine AutoTuner capture/replay support

Pre-computed test parameters at module load time for fast test collection
Test coverage for:

4 backend types (CUTLASS, TRTLLM, CUTEDSL, DEEPGEMM)
9 quantization algorithms (None, FP8, NVFP4, FP8_BLOCK_SCALES, W4A8_NVFP4_FP8, W4A16_MXFP4, W4A8_MXFP4_MXFP8, W8A16, W4A8_AWQ)
2 activation dtypes (float16, bfloat16)
12 model configurations (Mixtral, DeepSeek, Qwen, Grok, GPT-OSS, and boundary cases)
SwiGLU parameters (alpha, beta, limit) for gptoss_style testing

Enhanced quantize_utils.py
Extended quantization utilities to support the test framework with additional helper classes and methods.

Design Goals

Direct backend interface testing: Test routing_method.apply -> quantize_input -> run_moe pipeline
Comprehensive coverage: Cover all quantization + backend combinations
Intelligent skip logic: Use can_implement() interface to determine test skip at parametrize level
AutoTuner integration: Support autotune and tactic capture testing

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Summary by CodeRabbit

New Features
- Added capability checking for Mixture-of-Experts backends to validate quantization and hardware compatibility
- Enhanced support for gptoss_style configuration across quantization utilities
- Introduced comprehensive MoE backend testing framework with multi-backend coverage
Tests
- Simplified test parameters by removing deprecated feature flags from MoE tests
- Removed two test cases from specific hardware configurations

_{✏️ Tip: You can customize this high-level summary in your review settings.}

xxi-nv · 2026-01-30T02:18:41Z

Will update the CI test DB in another PR.

coderabbitai · 2026-01-30T02:33:00Z

📝 Walkthrough

Walkthrough

A capability checking framework is introduced across MoE backend implementations through an abstract can_implement classmethod in the base interface, with backend-specific implementations validating hardware constraints, quantization algorithm support, and gptoss_style compatibility. Supporting test utilities are expanded to handle gptoss_style configurations and multiple backends, with a new comprehensive backend test framework added and enable_configurable_moe test parameters removed.

Changes

Cohort / File(s)	Summary
Capability Framework `tensorrt_llm/_torch/modules/fused_moe/interface.py`, `tensorrt_llm/_torch/modules/fused_moe/configurable_moe.py`	Abstract `can_implement` classmethod and `_warn_and_return` utility added to base interface; wrapper class delegates to backend-specific implementations.
Backend-Specific Implementations `tensorrt_llm/_torch/modules/fused_moe/fused_moe_cute_dsl.py`, `tensorrt_llm/_torch/modules/fused_moe/fused_moe_cutlass.py`, `tensorrt_llm/_torch/modules/fused_moe/fused_moe_deepgemm.py`, `tensorrt_llm/_torch/modules/fused_moe/fused_moe_triton.py`, `tensorrt_llm/_torch/modules/fused_moe/fused_moe_trtllm_gen.py`	Each backend implements `can_implement` with hardware version constraints, quantization algorithm validation tables/sets, and gptoss_style support checks; returns (bool, Optional[str]) tuple.
Test Utilities Expansion `tests/unittest/_torch/modules/moe/quantize_utils.py`	Extended BaseQuantizeUtil and QuantizeUtil subclasses with gptoss_style support (swiglu parameters), backend-aware quantization selection, weight loading signature changes (weights → weights_list), and new reference modules for FP8 block scales and MXFP4_MXFP8 variants.
New Backend Test Framework `tests/unittest/_torch/modules/moe/test_moe_backend.py`	Comprehensive end-to-end MoE backend testing framework with backend enums, capability-aware skip logic, autotuner capture/replay support, and parameterized test generation across quantization variants and gptoss_style configurations.
Test Cleanup `tests/unittest/_torch/modules/test_fused_moe.py`	Removed enable_configurable_moe parameterization and associated environment mocking from multiple test functions; streamlined test signatures.
CI Test List Updates `tests/integration/test_lists/test-db/l0_dgx_b300.yml`, `tests/integration/test_lists/test-db/l0_gb300_multi_gpus.yml`	Removed DeepEP and DeepEPLowLatency FP4 test entries.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 56.04% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The pull request title clearly summarizes the main objective: introducing a unified test framework for MoE backends, which is the primary change in the changeset.
Description check	✅ Passed	The PR description is well-structured with clear sections for Summary, Key Changes, Test Coverage, and PR Checklist. It thoroughly explains the objectives, design goals, and comprehensive changes.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

🧪 Unit Test Generation v2 is now available!

We have significantly improved our unit test generation capabilities.

To enable: Add this to your .coderabbit.yaml configuration:

reviews:
  finishing_touches:
    unit_tests:
      enabled: true

Try it out by using the @coderabbitai generate unit tests command on your code files or under ✨ Finishing Touches on the walkthrough!

Have feedback? Share your thoughts on our Discord thread!

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

tensorrt_llm/_torch/modules/fused_moe/fused_moe_cute_dsl.py (1)

1-1: ⚠️ Potential issue | 🟠 Major

Add NVIDIA copyright header.

Please add the required NVIDIA copyright header (latest modification year) at the top of this source file.

As per coding guidelines, **/*.{cpp,cc,cxx,h,hpp,hxx,cu,cuh,py}: All TensorRT-LLM source files should contain an NVIDIA copyright header with the year of latest meaningful modification.

tensorrt_llm/_torch/modules/fused_moe/fused_moe_cutlass.py (1)

1-1: ⚠️ Potential issue | 🟠 Major

Add NVIDIA copyright header.

Please add the required NVIDIA copyright header (latest modification year) at the top of this source file.

As per coding guidelines, **/*.{cpp,cc,cxx,h,hpp,hxx,cu,cuh,py}: All TensorRT-LLM source files should contain an NVIDIA copyright header with the year of latest meaningful modification.
tests/unittest/_torch/modules/moe/quantize_utils.py (1)
204-244: ⚠️ Potential issue | 🟡 Minor

Ensure custom SwiGLU path activates when any swiglu param is set.

If only swiglu_beta/swiglu_limit are provided (without swiglu_alpha), the custom activation is currently skipped.
🐛 Suggested fix
-        self.experts = nn.ModuleList(
+        use_custom_swiglu = any(
+            v is not None for v in (swiglu_alpha, swiglu_beta, swiglu_limit)
+        )
+        self.experts = nn.ModuleList(
@@
-                    activation=custom_swiglu if swiglu_alpha is not None else F.silu,
+                    activation=custom_swiglu if use_custom_swiglu else F.silu,

🤖 Fix all issues with AI agents

In `@tensorrt_llm/_torch/modules/fused_moe/configurable_moe.py`:
- Around line 98-124: The can_implement method in class ConfigurableMoE declares
parameters quant_algo, dtype_activation and gptoss_style but doesn't use them,
triggering ARG003; fix it by explicitly marking them unused inside
ConfigurableMoE.can_implement (e.g., assign them to dummy vars or prefix with
underscores) so linters accept it — update the method body of
ConfigurableMoE.can_implement to reference quant_algo, dtype_activation, and
gptoss_style in a no-op way (e.g., `_ = quant_algo; _ = dtype_activation; _ =
gptoss_style`) or rename the params to
_quant_algo/_dtype_activation/_gptoss_style to silence the lint while keeping
the return behavior unchanged.

In `@tensorrt_llm/_torch/modules/fused_moe/fused_moe_deepgemm.py`:
- Line 1: Add the required NVIDIA copyright header to the top of the module file
fused_moe_deepgemm.py: insert the standard NVIDIA copyright block (matching
other TensorRT-LLM source files) including the latest modification year and
license text as the very first lines before any imports or code so the file
complies with the repository rule for **/*.{py} sources.

In `@tensorrt_llm/_torch/modules/fused_moe/fused_moe_triton.py`:
- Around line 1266-1331: Add the required NVIDIA copyright header at the top of
the file, move the local import "from tensorrt_llm.models.modeling_utils import
QuantAlgo" out of can_implement into module scope using namespace-preserving
import (e.g. "from tensorrt_llm.models import modeling_utils"), then update the
can_implement signature/type hints and all comparisons to reference
modeling_utils.QuantAlgo (change Optional["QuantAlgo"] to
Optional[modeling_utils.QuantAlgo] and replace QuantAlgo.* checks with
modeling_utils.QuantAlgo.*).

In `@tensorrt_llm/_torch/modules/fused_moe/fused_moe_trtllm_gen.py`:
- Line 1: Add the NVIDIA copyright header (with the latest modification year) to
the top of the module file fused_moe_trtllm_gen.py before any imports (e.g.,
before the existing "import inspect" line); ensure the header matches the
project's required header style used for TensorRT-LLM Python files and includes
the correct year, copyright owner (NVIDIA CORPORATION) and any required
license/boilerplate text.

🧹 Nitpick comments (11)

tensorrt_llm/_torch/modules/fused_moe/interface.py (1)
10-12: Prefer module-level imports for the new logger/QuantAlgo additions.
Keeps module namespaces intact and aligns with the repo import guideline.
♻️ Suggested adjustment
-from tensorrt_llm.logger import logger
-from tensorrt_llm.models.modeling_utils import QuantAlgo
+import tensorrt_llm.logger as trtllm_logger
+import tensorrt_llm.models.modeling_utils as modeling_utils
@@
-    logger.warning(reason)
+    trtllm_logger.logger.warning(reason)
@@
-        quant_algo: Optional[QuantAlgo],
+        quant_algo: Optional[modeling_utils.QuantAlgo],
As per coding guidelines: Always maintain the namespace when importing Python modules, even if only one class or function from a module is used.
tensorrt_llm/_torch/modules/fused_moe/fused_moe_deepgemm.py (1)
7-10: Prefer module-qualified imports for new utility references.

The new imports should keep their module namespaces (including _warn_and_return) to align with project style. Consider module-qualified access instead of from ... import ....
♻️ Suggested refactor (namespace imports)
-from tensorrt_llm._utils import get_sm_version, nvtx_range
-from tensorrt_llm.models.modeling_utils import QuantAlgo
+import tensorrt_llm._utils as trt_utils
+import tensorrt_llm.models.modeling_utils as modeling_utils
+from . import interface as moe_interface
@@
-@nvtx_range("[DG] preprocess_after_permute")
+@trt_utils.nvtx_range("[DG] preprocess_after_permute")
@@
-@nvtx_range("[DG]")
+@trt_utils.nvtx_range("[DG]")
@@
-@nvtx_range("[DG] forward")
+@trt_utils.nvtx_range("[DG] forward")
@@
-        quant_algo: Optional[QuantAlgo],
+        quant_algo: Optional[modeling_utils.QuantAlgo],
@@
-        from .interface import _warn_and_return
-
-        sm_version = get_sm_version()
+        sm_version = trt_utils.get_sm_version()
@@
-            return _warn_and_return(
+            return moe_interface._warn_and_return(
@@
-        if quant_algo == QuantAlgo.FP8_BLOCK_SCALES:
+        if quant_algo == modeling_utils.QuantAlgo.FP8_BLOCK_SCALES:
             return True, None
-        return _warn_and_return(
+        return moe_interface._warn_and_return(
As per coding guidelines, Always maintain the namespace when importing Python modules, even if only one class or function from a module is used.
tensorrt_llm/_torch/modules/fused_moe/fused_moe_trtllm_gen.py (1)
4-13: Prefer module-qualified imports for new utility references.

New imports should retain module namespaces per project style. Consider module-qualified access for _utils and modeling_utils.
♻️ Suggested refactor (namespace imports)
-from tensorrt_llm._utils import get_sm_version
-from tensorrt_llm.models.modeling_utils import QuantAlgo
+import tensorrt_llm._utils as trt_utils
+import tensorrt_llm.models.modeling_utils as modeling_utils
@@
-    _SUPPORTED_QUANT_ALGOS = {
-        QuantAlgo.NVFP4,
+    _SUPPORTED_QUANT_ALGOS = {
+        modeling_utils.QuantAlgo.NVFP4,
@@
-        sm_version = get_sm_version()
+        sm_version = trt_utils.get_sm_version()
@@
-        quant_algo: Optional[QuantAlgo],
+        quant_algo: Optional[modeling_utils.QuantAlgo],
@@
-        sm_version = get_sm_version()
+        sm_version = trt_utils.get_sm_version()
As per coding guidelines, Always maintain the namespace when importing Python modules, even if only one class or function from a module is used.
tensorrt_llm/_torch/modules/fused_moe/fused_moe_cute_dsl.py (1)
7-9: Prefer module-qualified imports for new utility references.

To align with namespace import guidance, consider module-qualified access for _utils and modeling_utils.
♻️ Suggested refactor (namespace imports)
-from tensorrt_llm._utils import get_sm_version, is_sm_100f
-from tensorrt_llm.models.modeling_utils import QuantAlgo
+import tensorrt_llm._utils as trt_utils
+import tensorrt_llm.models.modeling_utils as modeling_utils
@@
-    if is_sm_100f():
+    if trt_utils.is_sm_100f():
@@
-        quant_algo: Optional[QuantAlgo],
+        quant_algo: Optional[modeling_utils.QuantAlgo],
@@
-        sm_version = get_sm_version()
+        sm_version = trt_utils.get_sm_version()
@@
-        if quant_algo == QuantAlgo.NVFP4:
+        if quant_algo == modeling_utils.QuantAlgo.NVFP4:
As per coding guidelines, Always maintain the namespace when importing Python modules, even if only one class or function from a module is used.
tensorrt_llm/_torch/modules/fused_moe/fused_moe_cutlass.py (1)
10-12: Prefer module-qualified imports for new utility references.

To follow the namespace-import guideline, consider module-qualified access for _utils and modeling_utils and update the table references accordingly.
♻️ Suggested refactor (namespace imports)
-from tensorrt_llm._utils import get_sm_version
-from tensorrt_llm.models.modeling_utils import QuantAlgo
+import tensorrt_llm._utils as trt_utils
+import tensorrt_llm.models.modeling_utils as modeling_utils
@@
-        QuantAlgo.FP8: {
+        modeling_utils.QuantAlgo.FP8: {
@@
-        sm_version = get_sm_version()
+        sm_version = trt_utils.get_sm_version()
As per coding guidelines, Always maintain the namespace when importing Python modules, even if only one class or function from a module is used.
tests/unittest/_torch/modules/moe/quantize_utils.py (1)
21-27: Prefer module-qualified helper imports.

For consistency with project style, import helper modules and qualify usages (apply the same change across all call sites).
♻️ Suggested refactor (namespace imports)
-from _torch.helpers import (
-    calc_woq_tolerence,
-    per_block_cast_to_fp8,
-    per_block_cast_to_fp8_e8m0,
-    per_token_cast_to_fp8_e8m0,
-)
-from utils.util import check_accuracy
+import _torch.helpers as torch_helpers
+import utils.util as util
@@
-        check_accuracy(output, ref_output, rtol=2e-1, atol=2e-1, percent=0.96)
+        util.check_accuracy(output, ref_output, rtol=2e-1, atol=2e-1, percent=0.96)
@@
-        quant_fn = per_block_cast_to_fp8_e8m0 if use_e8m0_scale else per_block_cast_to_fp8
+        quant_fn = (torch_helpers.per_block_cast_to_fp8_e8m0
+                    if use_e8m0_scale else torch_helpers.per_block_cast_to_fp8)
@@
-        act_fp8, act_sf = per_token_cast_to_fp8_e8m0(permuted_data)
+        act_fp8, act_sf = torch_helpers.per_token_cast_to_fp8_e8m0(permuted_data)
@@
-        atol = calc_woq_tolerence(ref_output, weight_dtype)
+        atol = torch_helpers.calc_woq_tolerence(ref_output, weight_dtype)
As per coding guidelines, Always maintain the namespace when importing Python modules, even if only one class or function from a module is used.
tests/unittest/_torch/modules/moe/test_moe_backend.py (5)
242-246: Unused quant_algo parameter.

The quant_algo parameter is declared but never used in this function. If it's included for API consistency with other should_skip_* functions or reserved for future use, consider prefixing it with an underscore to signal intent:
 def should_skip_gptoss(
     backend_type: MoeBackendType,
-    quant_algo: Optional[QuantAlgo],
+    _quant_algo: Optional[QuantAlgo],
     gptoss_style: bool,
 ) -> Optional[str]:
275-278: Unused quant_algo parameter.

Similar to should_skip_gptoss, the quant_algo parameter is included but not used. Consider prefixing with underscore or removing if not needed for API consistency:
 def supports_autotuner_capture(
     backend_type: MoeBackendType,
-    quant_algo: Optional[QuantAlgo],
+    _quant_algo: Optional[QuantAlgo],
 ) -> bool:
902-902: Consider using tempfile for the cache path.

The hardcoded /tmp/moe_autotuner_cache.json path could cause issues in multi-user environments or parallel test runs. Consider using tempfile for safer temporary file handling:
🛡️ Proposed fix
+import tempfile
+import os
 ...
-        with torch.inference_mode(), autotune(cache_path="/tmp/moe_autotuner_cache.json"):
+        # Use a unique temp file to avoid conflicts in parallel test runs
+        with tempfile.NamedTemporaryFile(suffix=".json", delete=False) as f:
+            cache_path = f.name
+        try:
+            with torch.inference_mode(), autotune(cache_path=cache_path):
+                _ = run_moe()
+        finally:
+            if os.path.exists(cache_path):
+                os.unlink(cache_path)
Alternatively, if the cache file is intentionally shared across test runs for performance, document this in a comment.

894-898: AutoTuner state modifications may leak to other tests.

The test modifies AutoTuner singleton state (warmup, repeat, stream_delay_micro_secs) without restoring original values. If tests run in the same process, this could affect subsequent tests.

Consider saving and restoring the original values:
🛠️ Proposed fix
         # Configure AutoTuner for faster profiling (reduce warmup/repeat for unit tests)
         autotuner = AutoTuner.get()
+        original_warmup = autotuner.warmup
+        original_repeat = autotuner.repeat
+        original_stream_delay = autotuner.stream_delay_micro_secs
         autotuner.warmup = 0  # default: 2
         autotuner.repeat = 1  # default: 10
         autotuner.stream_delay_micro_secs = 10  # default: 1000
+        
+        try:
+            # ... rest of the test ...
+        finally:
+            # Restore original AutoTuner state
+            autotuner.warmup = original_warmup
+            autotuner.repeat = original_repeat
+            autotuner.stream_delay_micro_secs = original_stream_delay
Alternatively, if the test is always skipped (as indicated by the skip marker), this may be a non-issue, but it's good practice for when the skip is removed.

790-792: Direct assignment to mapping.rank after construction.

Assigning mapping.rank = mpi_rank() directly after creating a Mapping() object works but is unusual. Consider passing the rank during construction if the Mapping class supports it:
mapping = Mapping(rank=mpi_rank())
If the class doesn't support this, the current approach is fine.

tensorrt_llm/_torch/modules/fused_moe/configurable_moe.py

tensorrt_llm/_torch/modules/fused_moe/fused_moe_deepgemm.py

tensorrt_llm/_torch/modules/fused_moe/fused_moe_triton.py

tensorrt_llm/_torch/modules/fused_moe/fused_moe_trtllm_gen.py

xxi-nv · 2026-01-30T03:12:15Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-01-30T03:18:52Z

PR_Github #34161 [ run ] triggered by Bot. Commit: 90c54ba

tensorrt-cicd · 2026-01-30T10:04:40Z

PR_Github #34161 [ run ] completed with state SUCCESS. Commit: 90c54ba
/LLM/main/L0_MergeRequest_PR pipeline #26358 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

xxi-nv · 2026-02-01T23:38:56Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-02-01T23:45:19Z

PR_Github #34374 [ run ] triggered by Bot. Commit: 90c54ba

…E backends Signed-off-by: xxi <xxi@nvidia.com>

xxi-nv · 2026-02-02T00:29:54Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-02-02T00:36:35Z

PR_Github #34377 [ run ] triggered by Bot. Commit: 33f54a1

tests/unittest/_torch/modules/moe/test_moe_backend.py

tensorrt-cicd · 2026-02-02T06:26:46Z

PR_Github #34377 [ run ] completed with state SUCCESS. Commit: 33f54a1
/LLM/main/L0_MergeRequest_PR pipeline #26523 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

xxi-nv · 2026-02-02T12:00:00Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-02-02T12:05:56Z

PR_Github #34464 [ run ] triggered by Bot. Commit: 33f54a1

tensorrt-cicd · 2026-02-02T13:40:43Z

PR_Github #34464 [ run ] completed with state SUCCESS. Commit: 33f54a1
/LLM/main/L0_MergeRequest_PR pipeline #26589 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

xxi-nv · 2026-02-02T23:55:36Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-02-03T00:02:36Z

PR_Github #34505 [ run ] triggered by Bot. Commit: a935570

tensorrt-cicd · 2026-02-03T04:32:18Z

PR_Github #34505 [ run ] completed with state SUCCESS. Commit: a935570
/LLM/main/L0_MergeRequest_PR pipeline #26626 completed with status: 'ABORTED'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

xxi-nv · 2026-02-03T06:01:08Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-02-03T06:08:22Z

PR_Github #34575 [ run ] triggered by Bot. Commit: a935570

tensorrt-cicd · 2026-02-03T10:14:37Z

PR_Github #34575 [ run ] completed with state SUCCESS. Commit: a935570
/LLM/main/L0_MergeRequest_PR pipeline #26682 completed with status: 'SUCCESS'

rosenrodt

LGTM

tensorrt_llm/_torch/modules/fused_moe/fused_moe_trtllm_gen.py

tests/unittest/_torch/modules/moe/test_moe_backend.py

QiJune

LGTM

…E backends (NVIDIA#11128) Signed-off-by: xxi <xxi@nvidia.com>

xxi-nv requested a review from a team as a code owner January 30, 2026 02:16

xxi-nv requested a review from QiJune January 30, 2026 02:16

xxi-nv requested review from Barry-Delaney, leslie-fang25, rosenrodt and syuoni January 30, 2026 02:19

coderabbitai bot reviewed Jan 30, 2026

View reviewed changes

xxi-nv force-pushed the testBackend branch from 9c35c07 to 90c54ba Compare January 30, 2026 03:11

[TRTLLM-9111][feat] provide the uniform test framework to test all Mo…

33f54a1

…E backends Signed-off-by: xxi <xxi@nvidia.com>

xxi-nv force-pushed the testBackend branch from 90c54ba to 33f54a1 Compare February 2, 2026 00:29

syuoni approved these changes Feb 2, 2026

View reviewed changes

tests/unittest/_torch/modules/moe/test_moe_backend.py Show resolved Hide resolved

tests/unittest/_torch/modules/moe/test_moe_backend.py Show resolved Hide resolved

tests/unittest/_torch/modules/moe/test_moe_backend.py Show resolved Hide resolved

Merge branch 'main' into testBackend

a935570

rosenrodt approved these changes Feb 3, 2026

View reviewed changes

tensorrt_llm/_torch/modules/fused_moe/fused_moe_trtllm_gen.py Show resolved Hide resolved

tests/unittest/_torch/modules/moe/test_moe_backend.py Show resolved Hide resolved

QiJune approved these changes Feb 4, 2026

View reviewed changes

Barry-Delaney approved these changes Feb 4, 2026

View reviewed changes

xxi-nv merged commit 02b80bf into NVIDIA:main Feb 4, 2026
5 checks passed

SchumiDing pushed a commit to SchumiDing/TensorRT-LLM that referenced this pull request Feb 6, 2026

[TRTLLM-9111][feat] provide the uniform test framework to test all Mo…

33aa166

…E backends (NVIDIA#11128) Signed-off-by: xxi <xxi@nvidia.com>

This was referenced Feb 11, 2026

[TRTLLM-9108][feat] refactor MoE unit tests: add unified ConfigurableMoE test framework #11437

Merged

[None][revert] - Revert "[TRTLLM-9108][feat] refactor MoE unit tests: add unified ConfigurableMoE test framework" #11532

Merged

coderabbitai bot mentioned this pull request Feb 23, 2026

[TRTLLM-9108][feat] refactor MoE unit tests: add unified ConfigurableMoE test framework #11648

Merged

1 task

Comments

Conversation

xxi-nv commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Coverage

PR Checklist

GitHub Bot Help

kill

skip

reuse-pipeline

Summary by CodeRabbit

Uh oh!

xxi-nv commented Jan 30, 2026

Uh oh!

coderabbitai bot commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xxi-nv commented Jan 30, 2026

Uh oh!

tensorrt-cicd commented Jan 30, 2026

Uh oh!

tensorrt-cicd commented Jan 30, 2026

Uh oh!

xxi-nv commented Feb 1, 2026

Uh oh!

tensorrt-cicd commented Feb 1, 2026

Uh oh!

xxi-nv commented Feb 2, 2026

Uh oh!

tensorrt-cicd commented Feb 2, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tensorrt-cicd commented Feb 2, 2026

Uh oh!

xxi-nv commented Feb 2, 2026

Uh oh!

tensorrt-cicd commented Feb 2, 2026

Uh oh!

tensorrt-cicd commented Feb 2, 2026

Uh oh!

xxi-nv commented Feb 2, 2026

Uh oh!

tensorrt-cicd commented Feb 3, 2026

Uh oh!

tensorrt-cicd commented Feb 3, 2026

Uh oh!

xxi-nv commented Feb 3, 2026

Uh oh!

tensorrt-cicd commented Feb 3, 2026

Uh oh!

tensorrt-cicd commented Feb 3, 2026

Uh oh!

rosenrodt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

QiJune left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

xxi-nv commented Jan 30, 2026 •

edited

Loading

coderabbitai bot commented Jan 30, 2026 •

edited

Loading