Skip to content

Comments

[#10580][fix] re-enable NemotronH MOE MMLU test#10594

Merged
suyoggupta merged 1 commit intoNVIDIA:mainfrom
nv-auto-deploy:sg/fix-mmlu
Jan 12, 2026
Merged

[#10580][fix] re-enable NemotronH MOE MMLU test#10594
suyoggupta merged 1 commit intoNVIDIA:mainfrom
nv-auto-deploy:sg/fix-mmlu

Conversation

@suyoggupta
Copy link
Collaborator

@suyoggupta suyoggupta commented Jan 12, 2026

fixes #10580

Summary by CodeRabbit

  • Tests
    • Updated integration test configurations for improved model evaluation accuracy and reliability.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
@suyoggupta
Copy link
Collaborator Author

/bot run

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 12, 2026

📝 Walkthrough

Walkthrough

This PR modifies a single test file by adjusting a memory configuration parameter from 0.5 to 0.4 in the BF16 test, and uncommenting sampling parameters code in the FP8 test to include MMLU evaluation.

Changes

Cohort / File(s) Summary
Test Configuration Updates
tests/integration/defs/accuracy/test_llm_api_autodeploy.py
Modified free_mem_ratio from 0.5 to 0.4 in test_bf16; reintroduced commented sampling parameters in test_fp8 to enable MMLU evaluation alongside FP8 model and GSM8K evaluations

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is incomplete, missing required sections like Description and Test Coverage that are specified in the repository template. Add Description section explaining the issue from #10580 and why the test was disabled, and Test Coverage section listing the modified test cases that validate this fix.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically identifies the fix as re-enabling the NemotronH MOE MMLU test, directly matching the main change in the test file.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5e0dbba and 14d2fa2.

📒 Files selected for processing (1)
  • tests/integration/defs/accuracy/test_llm_api_autodeploy.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: The code developed for TensorRT-LLM should conform to Python 3.8+
Indent Python code with 4 spaces. Do not use tabs
Always maintain the namespace when importing Python modules, even if only one class or function from a module is used
Python filenames should use snake_case (e.g., some_file.py)
Python classes should use PascalCase (e.g., class SomeClass)
Python functions and methods should use snake_case (e.g., def my_awesome_function():)
Python local variables should use snake_case, with prefix k for variable names that start with a number (e.g., k_99th_percentile)
Python global variables should use upper snake_case with prefix G (e.g., G_MY_GLOBAL)
Python constants should use upper snake_case (e.g., MY_CONSTANT)
Avoid shadowing variables declared in an outer scope in Python
Initialize all externally visible members of a Python class in the constructor
For Python interfaces that may be used outside a file, prefer docstrings over comments
Use comments in Python for code within a function, or interfaces that are local to a file
Use Google-style docstrings for Python classes and functions, which can be parsed by Sphinx
Python attributes and variables can be documented inline with the format """<type>: Description"""
Avoid using reflection in Python when functionality can be easily achieved without reflection
When using try-except blocks in Python, limit the except clause to the smallest set of errors possible
When using try-except blocks in Python to handle multiple possible variable types (duck-typing), keep the body of the try as small as possible and use the else block for the main logic

Files:

  • tests/integration/defs/accuracy/test_llm_api_autodeploy.py
**/*.{cpp,cc,cxx,h,hpp,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

All TensorRT-LLM source files (.cpp, .h, .cu, .py, and other source files) should contain an NVIDIA copyright header with the year of latest meaningful modification

Files:

  • tests/integration/defs/accuracy/test_llm_api_autodeploy.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: djns99
Repo: NVIDIA/TensorRT-LLM PR: 6915
File: cpp/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_kernels.cu:4010-4012
Timestamp: 2025-08-14T23:23:27.449Z
Learning: For MOE (Mixture of Experts) code reviews in TensorRT-LLM, avoid repeatedly suggesting finalize fusion validation checks and safety assertions. The user djns99 has indicated these suggestions are repetitive and unwanted across multiple MOE-related changes.
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
Repo: NVIDIA/TensorRT-LLM PR: 6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • tests/integration/defs/accuracy/test_llm_api_autodeploy.py
🧬 Code graph analysis (1)
tests/integration/defs/accuracy/test_llm_api_autodeploy.py (1)
tests/unittest/_torch/modeling/test_modeling_out_of_tree.py (1)
  • sampling_params (58-59)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (2)
tests/integration/defs/accuracy/test_llm_api_autodeploy.py (2)

208-210: Reasonable memory tuning for test stability.

The reduction of free_mem_ratio from 0.5 to 0.4, combined with the existing max_batch_size reduction to 32, addresses the OOM issues mentioned in the TODO comment about multi-stream MOE memory usage. This is a pragmatic fix to re-enable the test.


229-231: MMLU evaluation correctly re-enabled with consistent pattern.

The re-enabled code follows the same pattern used in test_bf16 and other test methods: MMLU evaluation with sampling_params, followed by GSM8K without. This properly addresses the PR objective of re-enabling the NemotronH MOE MMLU test.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #31487 [ run ] triggered by Bot. Commit: 14d2fa2

@tensorrt-cicd
Copy link
Collaborator

PR_Github #31487 [ run ] completed with state SUCCESS. Commit: 14d2fa2
/LLM/main/L0_MergeRequest_PR pipeline #24342 completed with status: 'SUCCESS'

@suyoggupta suyoggupta merged commit a138524 into NVIDIA:main Jan 12, 2026
13 checks passed
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR re-enables the previously disabled MMLU accuracy test for the Nemotron MOE FP8 model and adjusts memory configuration for the BF16 test to prevent out-of-memory issues.

Changes:

  • Reduced free_mem_ratio from 0.5 to 0.4 in the BF16 test to accommodate multi-stream MOE memory requirements
  • Re-enabled MMLU evaluation in the FP8 test by uncommenting the test code

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

videodanchik pushed a commit to videodanchik/TensorRT-LLM that referenced this pull request Jan 14, 2026
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Signed-off-by: Daniil Kulko <kulkodaniil@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: AutoDeploy Fix the NemotronMOE MMLU OOM issue

3 participants