[#10580][fix] re-enable NemotronH MOE MMLU test#10594

suyoggupta · 2026-01-12T05:52:22Z

fixes #10580

Summary by CodeRabbit

Tests
- Updated integration test configurations for improved model evaluation accuracy and reliability.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>

suyoggupta · 2026-01-12T05:52:56Z

/bot run

coderabbitai · 2026-01-12T05:54:42Z

📝 Walkthrough

Walkthrough

This PR modifies a single test file by adjusting a memory configuration parameter from 0.5 to 0.4 in the BF16 test, and uncommenting sampling parameters code in the FP8 test to include MMLU evaluation.

Changes

Cohort / File(s)	Summary
Test Configuration Updates `tests/integration/defs/accuracy/test_llm_api_autodeploy.py`	Modified `free_mem_ratio` from 0.5 to 0.4 in `test_bf16`; reintroduced commented sampling parameters in `test_fp8` to enable MMLU evaluation alongside FP8 model and GSM8K evaluations

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is incomplete, missing required sections like Description and Test Coverage that are specified in the repository template.	Add Description section explaining the issue from #10580 and why the test was disabled, and Test Coverage section listing the modified test cases that validate this fix.
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically identifies the fix as re-enabling the NemotronH MOE MMLU test, directly matching the main change in the test file.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5e0dbba and 14d2fa2.

📒 Files selected for processing (1)

tests/integration/defs/accuracy/test_llm_api_autodeploy.py

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: The code developed for TensorRT-LLM should conform to Python 3.8+
Indent Python code with 4 spaces. Do not use tabs
Always maintain the namespace when importing Python modules, even if only one class or function from a module is used
Python filenames should use snake_case (e.g., some_file.py)
Python classes should use PascalCase (e.g., class SomeClass)
Python functions and methods should use snake_case (e.g., def my_awesome_function():)
Python local variables should use snake_case, with prefix k for variable names that start with a number (e.g., k_99th_percentile)
Python global variables should use upper snake_case with prefix G (e.g., G_MY_GLOBAL)
Python constants should use upper snake_case (e.g., MY_CONSTANT)
Avoid shadowing variables declared in an outer scope in Python
Initialize all externally visible members of a Python class in the constructor
For Python interfaces that may be used outside a file, prefer docstrings over comments
Use comments in Python for code within a function, or interfaces that are local to a file
Use Google-style docstrings for Python classes and functions, which can be parsed by Sphinx
Python attributes and variables can be documented inline with the format """<type>: Description"""
Avoid using reflection in Python when functionality can be easily achieved without reflection
When using try-except blocks in Python, limit the except clause to the smallest set of errors possible
When using try-except blocks in Python to handle multiple possible variable types (duck-typing), keep the body of the try as small as possible and use the else block for the main logic

Files:

tests/integration/defs/accuracy/test_llm_api_autodeploy.py

**/*.{cpp,cc,cxx,h,hpp,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

All TensorRT-LLM source files (.cpp, .h, .cu, .py, and other source files) should contain an NVIDIA copyright header with the year of latest meaningful modification

Files:

tests/integration/defs/accuracy/test_llm_api_autodeploy.py

🧠 Learnings (2)

📓 Common learnings

Learnt from: djns99
Repo: NVIDIA/TensorRT-LLM PR: 6915
File: cpp/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_kernels.cu:4010-4012
Timestamp: 2025-08-14T23:23:27.449Z
Learning: For MOE (Mixture of Experts) code reviews in TensorRT-LLM, avoid repeatedly suggesting finalize fusion validation checks and safety assertions. The user djns99 has indicated these suggestions are repetitive and unwanted across multiple MOE-related changes.

📚 Learning: 2025-07-28T17:06:08.621Z

Learnt from: moraxu
Repo: NVIDIA/TensorRT-LLM PR: 6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

tests/integration/defs/accuracy/test_llm_api_autodeploy.py

🧬 Code graph analysis (1)

tests/integration/defs/accuracy/test_llm_api_autodeploy.py (1)

tests/unittest/_torch/modeling/test_modeling_out_of_tree.py (1)

sampling_params (58-59)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (2)

tests/integration/defs/accuracy/test_llm_api_autodeploy.py (2)

208-210: Reasonable memory tuning for test stability.

The reduction of free_mem_ratio from 0.5 to 0.4, combined with the existing max_batch_size reduction to 32, addresses the OOM issues mentioned in the TODO comment about multi-stream MOE memory usage. This is a pragmatic fix to re-enable the test.

229-231: MMLU evaluation correctly re-enabled with consistent pattern.

The re-enabled code follows the same pattern used in test_bf16 and other test methods: MMLU evaluation with sampling_params, followed by GSM8K without. This properly addresses the PR objective of re-enabling the NemotronH MOE MMLU test.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

tensorrt-cicd · 2026-01-12T05:59:08Z

PR_Github #31487 [ run ] triggered by Bot. Commit: 14d2fa2

tensorrt-cicd · 2026-01-12T09:30:17Z

PR_Github #31487 [ run ] completed with state SUCCESS. Commit: 14d2fa2
/LLM/main/L0_MergeRequest_PR pipeline #24342 completed with status: 'SUCCESS'

Copilot

Pull request overview

This PR re-enables the previously disabled MMLU accuracy test for the Nemotron MOE FP8 model and adjusts memory configuration for the BF16 test to prevent out-of-memory issues.

Changes:

Reduced free_mem_ratio from 0.5 to 0.4 in the BF16 test to accommodate multi-stream MOE memory requirements
Re-enabled MMLU evaluation in the FP8 test by uncommenting the test code

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com> Signed-off-by: Daniil Kulko <kulkodaniil@gmail.com>

re-enable MMLU test

14d2fa2

Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>

suyoggupta requested review from Copilot, lucaslie and nvchenghaoz January 12, 2026 17:19

Copilot started reviewing on behalf of suyoggupta January 12, 2026 17:20 View session

nvchenghaoz approved these changes Jan 12, 2026

View reviewed changes

suyoggupta merged commit a138524 into NVIDIA:main Jan 12, 2026
13 checks passed

Copilot AI reviewed Jan 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[#10580][fix] re-enable NemotronH MOE MMLU test#10594