[None][fix] Nemotron Super fix#11425

IzzyPutterman · 2026-02-10T17:03:06Z

Summary by CodeRabbit

Release Notes

Improvements
- Enhanced KV cache management for speculative decoding in single-model configurations.
- Improved draft configuration handling to automatically use target model configuration when explicit configuration is not provided.
- Optimized layer selection and masking logic for hybrid cache scenarios, enhancing flexibility for various decoding strategies.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

IzzyPutterman · 2026-02-10T17:03:43Z

/bot run

coderabbitai · 2026-02-10T17:09:46Z

📝 Walkthrough

Walkthrough

Modified KV cache configuration logic in _util.py to determine effective draft config for one-model scenarios, updated KV cache size and creation logic accordingly, and enhanced layer mask handling in MambaHybridCacheManager for speculative decoding.

Changes

Cohort / File(s)	Summary
KV Cache Configuration `tensorrt_llm/_torch/pyexecutor/_util.py`	Added `_get_effective_draft_config()` helper method to resolve draft KV cache configuration. Updated one-model draft KV cache size computation and creation logic to use effective draft config for manager selection and dtype/model_config propagation. Enhanced layer_mask handling in MambaHybridCacheManager to compute hybrid and mamba layer masks from provided masks for speculative decoding scenarios, with fallback to hybrid_override_pattern behavior when no mask provided.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is entirely empty - all required sections (Description, Test Coverage, PR Checklist details) contain only template comments with no actual content provided by the author.	Provide a detailed description explaining the issue being fixed, the solution implemented, and list the relevant test cases that validate the changes. Complete the PR Checklist with appropriate selections.
Title check	❓ Inconclusive	The title '[None][fix] Nemotron Super fix' uses vague terminology ('Nemotron Super fix') that lacks specificity about what the actual code changes accomplish.	Replace vague description with specific technical details about the changes, e.g., '[None][fix] Fix KV cache configuration for one-model speculative decoding' based on the actual code modifications.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@tensorrt_llm/_torch/pyexecutor/_util.py`:
- Around line 121-127: The draft KV cache size calculation overcounts when MTP
falls back to the target model config: in
_should_create_separate_draft_kv_cache() you call _get_effective_draft_config()
and then kv_cache_manager_cls.get_cache_size_per_token(...) which uses the full
target layer count; change this to compute and pass the actual draft layer count
used by _create_one_model_draft_kv_cache_manager (call get_num_spec_layers or
derive num_draft_layers from layer_mask) so get_cache_size_per_token receives a
draft-specific config or an explicit num_draft_layers parameter (i.e., create a
short-lived draft_config with the correct layer count or add num_draft_layers
arg) to ensure the estimated kv_size_per_token matches the allocated draft KV
cache.

coderabbitai · 2026-02-10T17:09:49Z

tensorrt_llm/_torch/pyexecutor/_util.py

        elif self._should_create_separate_draft_kv_cache():
            # One-model draft with separate KV cache layout
+            effective_draft_config = self._get_effective_draft_config()
            kv_size_per_token += self._kv_cache_manager_cls.get_cache_size_per_token(
-                self._draft_config,
+                effective_draft_config,
                mapping,
                tokens_per_block=self._tokens_per_block)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # First, find the relevant Python files echo "=== Finding tensorrt_llm module files ===" fd -t f -e py "kv.*manager\|executor.*util" | head -20

Repository: NVIDIA/TensorRT-LLM

Length of output: 104

🏁 Script executed:

#!/bin/bash # Look at the specific file mentioned in the review echo "=== Checking _util.py file ===" cat -n tensorrt_llm/_torch/pyexecutor/_util.py | head -150

Repository: NVIDIA/TensorRT-LLM

Length of output: 8269

🏁 Script executed:

#!/bin/bash # Search for _get_effective_draft_config method echo "=== Search for _get_effective_draft_config ===" rg -n "_get_effective_draft_config" -C 5 --type=py

Repository: NVIDIA/TensorRT-LLM

Length of output: 3579

🏁 Script executed:

#!/bin/bash # Search for _create_one_model_draft_kv_cache_manager echo "=== Search for _create_one_model_draft_kv_cache_manager ===" rg -n "_create_one_model_draft_kv_cache_manager" -C 10 --type=py

Repository: NVIDIA/TensorRT-LLM

Length of output: 4131

🏁 Script executed:

#!/bin/bash # Search for get_cache_size_per_token implementations echo "=== Search for get_cache_size_per_token implementations ===" rg -n "def get_cache_size_per_token" -C 15 --type=py

Repository: NVIDIA/TensorRT-LLM

Length of output: 12828

🏁 Script executed:

#!/bin/bash # Read the full implementation of KVCacheManager.get_cache_size_per_token echo "=== KVCacheManager.get_cache_size_per_token full implementation ===" sed -n '802,900p' tensorrt_llm/_torch/pyexecutor/resource_manager.py

Repository: NVIDIA/TensorRT-LLM

Length of output: 4914

🏁 Script executed:

#!/bin/bash # Read the full implementation of MambaHybridCacheManager.get_cache_size_per_token echo "=== MambaHybridCacheManager.get_cache_size_per_token ===" rg -n "class MambaHybridCacheManager" --type=py -A 200 | head -250

Repository: NVIDIA/TensorRT-LLM

Length of output: 10094

🏁 Script executed:

#!/bin/bash # Read the full implementation of _create_one_model_draft_kv_cache_manager echo "=== _create_one_model_draft_kv_cache_manager full implementation ===" sed -n '557,620p' tensorrt_llm/_torch/pyexecutor/_util.py

Repository: NVIDIA/TensorRT-LLM

Length of output: 3353

🏁 Script executed:

#!/bin/bash # Check what get_num_attention_layers returns from the model config echo "=== Searching for get_num_attention_layers method ===" rg -n "def get_num_attention_layers" --type=py -A 3 # Also check ModelConfig to understand its structure echo "=== Searching for ModelConfig class definition ===" rg -n "class ModelConfig" --type=py -A 10 | head -50

Repository: NVIDIA/TensorRT-LLM

Length of output: 4173

🏁 Script executed:

#!/bin/bash # Search for _create_kv_cache_manager helper function to see if it uses num_layers echo "=== Searching for _create_kv_cache_manager function ===" rg -n "def _create_kv_cache_manager" --type=py -C 5

Repository: NVIDIA/TensorRT-LLM

Length of output: 3294

🏁 Script executed:

#!/bin/bash # Look at the _create_kv_cache_manager function fully to see how it handles num_layers echo "=== Full _create_kv_cache_manager function (module-level) ===" sed -n '646,800p' tensorrt_llm/_torch/pyexecutor/_util.py

Repository: NVIDIA/TensorRT-LLM

Length of output: 6966

🏁 Script executed:

#!/bin/bash # Check what the complete get_num_attention_layers method returns echo "=== Complete get_num_attention_layers ===" sed -n '770,780p' tensorrt_llm/_torch/model_config.py

Repository: NVIDIA/TensorRT-LLM

Length of output: 803

🏁 Script executed:

#!/bin/bash # Check if get_cache_size_per_token in MambaHybridCacheManager is overridden echo "=== Search for get_cache_size_per_token in MambaHybridCacheManager ===" rg -n "class MambaHybridCacheManager" --type=py -A 300 | grep -A 30 "def get_cache_size_per_token"

Repository: NVIDIA/TensorRT-LLM

Length of output: 133

🏁 Script executed:

#!/bin/bash # Look at how get_num_spec_layers works echo "=== Search for get_num_spec_layers ===" rg -n "def get_num_spec_layers" --type=py -A 10

Repository: NVIDIA/TensorRT-LLM

Length of output: 1004

🏁 Script executed:

#!/bin/bash # Check if get_cache_size_per_token for MambaHybridCacheManager uses layer_mask echo "=== Check KVCacheManager.get_cache_size_per_token signature ===" sed -n '802,850p' tensorrt_llm/_torch/pyexecutor/resource_manager.py | head -60

Repository: NVIDIA/TensorRT-LLM

Length of output: 2343

🏁 Script executed:

#!/bin/bash # Now let's trace through the exact values for nemotron_hybrid # First check: does get_cache_size_per_token accept layer_mask parameter? echo "=== Checking if get_cache_size_per_token accepts layer_mask ===" rg -n "def get_cache_size_per_token.*layer_mask" --type=py # And check KVCacheManagerV2.get_cache_size_per_token echo "=== Checking KVCacheManagerV2.get_cache_size_per_token ===" rg -n "class KVCacheManagerV2" --type=py -A 50 | grep -A 20 "def get_cache_size_per_token"

Repository: NVIDIA/TensorRT-LLM

Length of output: 185

🏁 Script executed:

#!/bin/bash # Let me understand the actual issue better by checking the calculation in detail # For nemotron_hybrid, when layer_mask is None (in size estimation), what does it count? echo "=== Look at the full get_cache_size_per_token code path for nemotron_hybrid ===" sed -n '802,855p' tensorrt_llm/_torch/pyexecutor/resource_manager.py

Repository: NVIDIA/TensorRT-LLM

Length of output: 2625

🏁 Script executed:

#!/bin/bash # Now let's verify the actual numbers by checking the code flow one more time # The key question: Does get_cache_size_per_token for nemotron_hybrid use get_num_attention_layers()? echo "=== Verify the calculation path ===" cat tensorrt_llm/_torch/model_config.py | sed -n '770,782p' # For nemotron_hybrid, get_num_attention_layers returns pattern.count("*") # This is used in get_cache_size_per_token at line 832: echo "" echo "=== Line 832 in resource_manager.py uses get_num_attention_layers() ===" sed -n '830,835p' tensorrt_llm/_torch/pyexecutor/resource_manager.py

Repository: NVIDIA/TensorRT-LLM

Length of output: 1202

🏁 Script executed:

#!/bin/bash # Now verify: In _create_one_model_draft_kv_cache_manager, what layer count is actually passed? echo "=== What num_draft_layers is passed to _create_kv_cache_manager? ===" sed -n '569,608p' tensorrt_llm/_torch/pyexecutor/_util.py | grep -A 40 "num_draft_layers"

Repository: NVIDIA/TensorRT-LLM

Length of output: 1979

🏁 Script executed:

#!/bin/bash # Trace the exact calculation: estimation vs actual for nemotron_hybrid echo "=== Verify MTP num_nextn_predict_layers vs target attention layers ===" rg -n "num_nextn_predict_layers" --type=py -C 3 | head -40

Repository: NVIDIA/TensorRT-LLM

Length of output: 3499

🏁 Script executed:

#!/bin/bash # Check if the layer_mask is actually used in the size estimation echo "=== Does get_cache_size_per_token accept layer_mask parameter? ===" rg -n "def get_cache_size_per_token" --type=py -B 2 -A 1 | head -30

Repository: NVIDIA/TensorRT-LLM

Length of output: 1820

🏁 Script executed:

#!/bin/bash # Verify the complete picture: for nemotron_hybrid, what happens in _get_kv_size_per_token echo "=== Full verification: nemotron_hybrid MTP case ===" echo "1. _get_effective_draft_config for MTP returns:" sed -n '540,555p' tensorrt_llm/_torch/pyexecutor/_util.py | tail -5 echo "" echo "2. For nemotron_hybrid, get_num_attention_layers() returns pattern.count('*')" echo " This is used in get_cache_size_per_token at line 832" echo "" echo "3. But actual allocation passes num_draft_layers (from get_num_spec_layers):" sed -n '569,572p' tensorrt_llm/_torch/pyexecutor/_util.py echo "" echo "4. The layer_mask is created with only num_draft_layers True values:" sed -n '575,577p' tensorrt_llm/_torch/pyexecutor/_util.py echo "" echo "Conclusion: Estimation uses full target attention layers, allocation uses only draft layers"

Repository: NVIDIA/TensorRT-LLM

Length of output: 1159

🏁 Script executed:

#!/bin/bash # Final check: is the size estimation actually used for memory calculation? echo "=== Is kv_size_per_token used in memory calculation? ===" sed -n '130,147p' tensorrt_llm/_torch/pyexecutor/_util.py

Repository: NVIDIA/TensorRT-LLM

Length of output: 1222

Draft KV cache size estimation overcounts for MTP on hybrid models.

When effective_draft_config falls back to the target model config for MTP, get_cache_size_per_token is called with the full target config—including all attention layers from hybrid_override_pattern.count("*") for nemotron_hybrid models. However, the actual draft KV cache (created in _create_one_model_draft_kv_cache_manager) only allocates for num_draft_layers (from get_num_spec_layers), specified via layer_mask. This mismatch means the memory budget reserved for the draft is significantly larger than what's actually needed, reducing available capacity for the target KV cache.

Pass num_draft_layers to the size calculation or use a draft-specific config with the correct layer count to avoid over-reserving memory.

🤖 Prompt for AI Agents

In `@tensorrt_llm/_torch/pyexecutor/_util.py` around lines 121 - 127, The draft KV cache size calculation overcounts when MTP falls back to the target model config: in _should_create_separate_draft_kv_cache() you call _get_effective_draft_config() and then kv_cache_manager_cls.get_cache_size_per_token(...) which uses the full target layer count; change this to compute and pass the actual draft layer count used by _create_one_model_draft_kv_cache_manager (call get_num_spec_layers or derive num_draft_layers from layer_mask) so get_cache_size_per_token receives a draft-specific config or an explicit num_draft_layers parameter (i.e., create a short-lived draft_config with the correct layer count or add num_draft_layers arg) to ensure the estimated kv_size_per_token matches the allocated draft KV cache.

tensorrt-cicd · 2026-02-10T17:10:03Z

PR_Github #35521 [ run ] triggered by Bot. Commit: b78dff9

tensorrt-cicd · 2026-02-10T18:49:27Z

PR_Github #35521 [ run ] completed with state SUCCESS. Commit: b78dff9
/LLM/main/L0_MergeRequest_PR pipeline #27427 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

IzzyPutterman · 2026-02-11T03:35:01Z

/bot run

tensorrt-cicd · 2026-02-11T03:45:32Z

PR_Github #35578 [ run ] triggered by Bot. Commit: b954710

IzzyPutterman · 2026-02-11T05:22:32Z

/bot run

tensorrt-cicd · 2026-02-11T05:28:44Z

PR_Github #35587 [ run ] triggered by Bot. Commit: 9713821

tensorrt-cicd · 2026-02-11T07:00:37Z

PR_Github #35587 [ run ] completed with state SUCCESS. Commit: 9713821
/LLM/main/L0_MergeRequest_PR pipeline #27488 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

IzzyPutterman · 2026-02-11T16:03:12Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-02-11T16:10:07Z

PR_Github #35644 [ run ] triggered by Bot. Commit: 1be697e

tensorrt-cicd · 2026-02-11T16:10:08Z

PR_Github #35644 [ run ] completed with state DISABLED
CI server is currently disabled for unplanned maintenance. Estimated completion time: 8 AM PST on 2/11.

Signed-off-by: Izzy Putterman <iputterman@nvidia.com>

mikeiovine · 2026-02-12T00:28:03Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-02-12T00:34:07Z

PR_Github #35686 [ run ] triggered by Bot. Commit: 4ae5969

tensorrt_llm/_torch/pyexecutor/_util.py

brb-nv

Minor refactor comment. LGTM.

tensorrt-cicd · 2026-02-12T05:06:58Z

PR_Github #35686 [ run ] completed with state SUCCESS. Commit: 4ae5969
/LLM/main/L0_MergeRequest_PR pipeline #27563 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

IzzyPutterman · 2026-02-12T17:23:35Z

/bot run

IzzyPutterman · 2026-02-13T22:34:25Z

/bot run

tensorrt-cicd · 2026-02-13T22:41:29Z

PR_Github #35945 [ run ] triggered by Bot. Commit: 0270e95

IzzyPutterman · 2026-02-14T00:45:16Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-02-14T00:51:38Z

PR_Github #35956 [ run ] triggered by Bot. Commit: 0270e95

tensorrt-cicd · 2026-02-14T05:59:04Z

PR_Github #35956 [ run ] completed with state SUCCESS. Commit: 0270e95
/LLM/main/L0_MergeRequest_PR pipeline #27771 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

IzzyPutterman · 2026-02-15T15:48:57Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-02-15T15:55:17Z

PR_Github #36014 [ run ] triggered by Bot. Commit: 0270e95

tensorrt-cicd · 2026-02-15T20:40:18Z

PR_Github #36014 [ run ] completed with state SUCCESS. Commit: 0270e95
/LLM/main/L0_MergeRequest_PR pipeline #27822 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

mikeiovine · 2026-02-17T14:52:12Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-02-17T15:01:46Z

PR_Github #36086 [ run ] triggered by Bot. Commit: 127d746

tensorrt-cicd · 2026-02-18T07:37:31Z

PR_Github #36086 [ run ] completed with state SUCCESS. Commit: 127d746
/LLM/main/L0_MergeRequest_PR pipeline #27883 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

IzzyPutterman · 2026-02-19T17:18:54Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-02-19T17:25:24Z

PR_Github #36261 [ run ] triggered by Bot. Commit: bc98799 Link to invocation

tensorrt-cicd · 2026-02-19T21:59:52Z

PR_Github #36261 [ run ] completed with state SUCCESS. Commit: bc98799
/LLM/main/L0_MergeRequest_PR pipeline #28038 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

IzzyPutterman · 2026-02-19T22:03:15Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-02-19T22:09:49Z

PR_Github #36294 [ run ] triggered by Bot. Commit: bc98799 Link to invocation

tensorrt-cicd · 2026-02-20T01:08:59Z

PR_Github #36294 [ run ] completed with state SUCCESS. Commit: bc98799
/LLM/main/L0_MergeRequest_PR pipeline #28067 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

IzzyPutterman · 2026-02-20T01:47:45Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-02-20T01:54:39Z

PR_Github #36303 [ run ] triggered by Bot. Commit: d3efb9a Link to invocation

tensorrt-cicd · 2026-02-20T06:07:36Z

PR_Github #36303 [ run ] completed with state SUCCESS. Commit: d3efb9a
/LLM/main/L0_MergeRequest_PR pipeline #28076 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

IzzyPutterman · 2026-02-20T06:09:33Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-02-20T06:15:37Z

PR_Github #36315 [ run ] triggered by Bot. Commit: d3efb9a Link to invocation

tensorrt-cicd · 2026-02-20T09:37:36Z

PR_Github #36315 [ run ] completed with state SUCCESS. Commit: d3efb9a
/LLM/main/L0_MergeRequest_PR pipeline #28087 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

Link to invocation

IzzyPutterman requested a review from a team as a code owner February 10, 2026 17:03

IzzyPutterman requested a review from shaharmor98 February 10, 2026 17:03

mikeiovine approved these changes Feb 10, 2026

View reviewed changes

coderabbitai bot reviewed Feb 10, 2026

View reviewed changes

ziyixiong-nv approved these changes Feb 11, 2026

View reviewed changes

IzzyPutterman force-pushed the iputterman/nemotron-0210 branch from b78dff9 to b954710 Compare February 11, 2026 03:43

IzzyPutterman force-pushed the iputterman/nemotron-0210 branch from b954710 to 9713821 Compare February 11, 2026 03:46

Nemotron Super fix

4ae5969

Signed-off-by: Izzy Putterman <iputterman@nvidia.com>

IzzyPutterman force-pushed the iputterman/nemotron-0210 branch from 1be697e to 4ae5969 Compare February 11, 2026 23:17

IzzyPutterman requested review from a team as code owners February 11, 2026 23:17

IzzyPutterman requested a review from brb-nv February 11, 2026 23:17

brb-nv reviewed Feb 12, 2026

View reviewed changes

tensorrt_llm/_torch/pyexecutor/_util.py Show resolved Hide resolved

brb-nv approved these changes Feb 12, 2026

View reviewed changes

Merge branch 'main' into iputterman/nemotron-0210

0270e95

IzzyPutterman force-pushed the iputterman/nemotron-0210 branch from 127d746 to bc98799 Compare February 19, 2026 17:18

Merge branch 'main' into iputterman/nemotron-0210

d3efb9a

IzzyPutterman force-pushed the iputterman/nemotron-0210 branch from bc98799 to d3efb9a Compare February 20, 2026 01:47

NVShreyas mentioned this pull request Feb 20, 2026

[None][fix] Nemotron H fp4 and MTP #11601

Merged

1 task

MartinMarciniszyn approved these changes Feb 20, 2026

View reviewed changes

MartinMarciniszyn merged commit d33c1e3 into NVIDIA:main Feb 20, 2026
5 checks passed

Comments

Conversation

IzzyPutterman commented Feb 10, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Description

Test Coverage

PR Checklist

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

IzzyPutterman commented Feb 10, 2026

Uh oh!

coderabbitai bot commented Feb 10, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented Feb 10, 2026

Uh oh!

tensorrt-cicd commented Feb 10, 2026

Uh oh!

IzzyPutterman commented Feb 11, 2026

Uh oh!

tensorrt-cicd commented Feb 11, 2026

Uh oh!

IzzyPutterman commented Feb 11, 2026

Uh oh!

tensorrt-cicd commented Feb 11, 2026

Uh oh!

tensorrt-cicd commented Feb 11, 2026

Uh oh!

IzzyPutterman commented Feb 11, 2026

Uh oh!

tensorrt-cicd commented Feb 11, 2026

Uh oh!

tensorrt-cicd commented Feb 11, 2026

Uh oh!

mikeiovine commented Feb 12, 2026

Uh oh!

tensorrt-cicd commented Feb 12, 2026

Uh oh!

Uh oh!

brb-nv left a comment

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented Feb 12, 2026

Uh oh!

IzzyPutterman commented Feb 12, 2026

Uh oh!

IzzyPutterman commented Feb 13, 2026

Uh oh!

tensorrt-cicd commented Feb 13, 2026

Uh oh!

IzzyPutterman commented Feb 14, 2026

Uh oh!

tensorrt-cicd commented Feb 14, 2026

Uh oh!

tensorrt-cicd commented Feb 14, 2026

Uh oh!

IzzyPutterman commented Feb 15, 2026

Uh oh!

tensorrt-cicd commented Feb 15, 2026

Uh oh!

tensorrt-cicd commented Feb 15, 2026

Uh oh!

mikeiovine commented Feb 17, 2026

Uh oh!

tensorrt-cicd commented Feb 17, 2026

Uh oh!

tensorrt-cicd commented Feb 18, 2026

IzzyPutterman commented Feb 10, 2026 •

edited by coderabbitai bot

Loading