Skip to content

Comments

[None][fix] Nemotron Super fix#11425

Merged
MartinMarciniszyn merged 3 commits intoNVIDIA:mainfrom
IzzyPutterman:iputterman/nemotron-0210
Feb 20, 2026
Merged

[None][fix] Nemotron Super fix#11425
MartinMarciniszyn merged 3 commits intoNVIDIA:mainfrom
IzzyPutterman:iputterman/nemotron-0210

Conversation

@IzzyPutterman
Copy link
Collaborator

@IzzyPutterman IzzyPutterman commented Feb 10, 2026

Summary by CodeRabbit

Release Notes

  • Improvements
    • Enhanced KV cache management for speculative decoding in single-model configurations.
    • Improved draft configuration handling to automatically use target model configuration when explicit configuration is not provided.
    • Optimized layer selection and masking logic for hybrid cache scenarios, enhancing flexibility for various decoding strategies.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@IzzyPutterman IzzyPutterman requested a review from a team as a code owner February 10, 2026 17:03
@IzzyPutterman
Copy link
Collaborator Author

/bot run

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 10, 2026

📝 Walkthrough

Walkthrough

Modified KV cache configuration logic in _util.py to determine effective draft config for one-model scenarios, updated KV cache size and creation logic accordingly, and enhanced layer mask handling in MambaHybridCacheManager for speculative decoding.

Changes

Cohort / File(s) Summary
KV Cache Configuration
tensorrt_llm/_torch/pyexecutor/_util.py
Added _get_effective_draft_config() helper method to resolve draft KV cache configuration. Updated one-model draft KV cache size computation and creation logic to use effective draft config for manager selection and dtype/model_config propagation. Enhanced layer_mask handling in MambaHybridCacheManager to compute hybrid and mamba layer masks from provided masks for speculative decoding scenarios, with fallback to hybrid_override_pattern behavior when no mask provided.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is entirely empty - all required sections (Description, Test Coverage, PR Checklist details) contain only template comments with no actual content provided by the author. Provide a detailed description explaining the issue being fixed, the solution implemented, and list the relevant test cases that validate the changes. Complete the PR Checklist with appropriate selections.
Title check ❓ Inconclusive The title '[None][fix] Nemotron Super fix' uses vague terminology ('Nemotron Super fix') that lacks specificity about what the actual code changes accomplish. Replace vague description with specific technical details about the changes, e.g., '[None][fix] Fix KV cache configuration for one-model speculative decoding' based on the actual code modifications.
✅ Passed checks (1 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@tensorrt_llm/_torch/pyexecutor/_util.py`:
- Around line 121-127: The draft KV cache size calculation overcounts when MTP
falls back to the target model config: in
_should_create_separate_draft_kv_cache() you call _get_effective_draft_config()
and then kv_cache_manager_cls.get_cache_size_per_token(...) which uses the full
target layer count; change this to compute and pass the actual draft layer count
used by _create_one_model_draft_kv_cache_manager (call get_num_spec_layers or
derive num_draft_layers from layer_mask) so get_cache_size_per_token receives a
draft-specific config or an explicit num_draft_layers parameter (i.e., create a
short-lived draft_config with the correct layer count or add num_draft_layers
arg) to ensure the estimated kv_size_per_token matches the allocated draft KV
cache.

Comment on lines 121 to 127
elif self._should_create_separate_draft_kv_cache():
# One-model draft with separate KV cache layout
effective_draft_config = self._get_effective_draft_config()
kv_size_per_token += self._kv_cache_manager_cls.get_cache_size_per_token(
self._draft_config,
effective_draft_config,
mapping,
tokens_per_block=self._tokens_per_block)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, find the relevant Python files
echo "=== Finding tensorrt_llm module files ==="
fd -t f -e py "kv.*manager\|executor.*util" | head -20

Repository: NVIDIA/TensorRT-LLM

Length of output: 104


🏁 Script executed:

#!/bin/bash
# Look at the specific file mentioned in the review
echo "=== Checking _util.py file ==="
cat -n tensorrt_llm/_torch/pyexecutor/_util.py | head -150

Repository: NVIDIA/TensorRT-LLM

Length of output: 8269


🏁 Script executed:

#!/bin/bash
# Search for _get_effective_draft_config method
echo "=== Search for _get_effective_draft_config ==="
rg -n "_get_effective_draft_config" -C 5 --type=py

Repository: NVIDIA/TensorRT-LLM

Length of output: 3579


🏁 Script executed:

#!/bin/bash
# Search for _create_one_model_draft_kv_cache_manager
echo "=== Search for _create_one_model_draft_kv_cache_manager ==="
rg -n "_create_one_model_draft_kv_cache_manager" -C 10 --type=py

Repository: NVIDIA/TensorRT-LLM

Length of output: 4131


🏁 Script executed:

#!/bin/bash
# Search for get_cache_size_per_token implementations
echo "=== Search for get_cache_size_per_token implementations ==="
rg -n "def get_cache_size_per_token" -C 15 --type=py

Repository: NVIDIA/TensorRT-LLM

Length of output: 12828


🏁 Script executed:

#!/bin/bash
# Read the full implementation of KVCacheManager.get_cache_size_per_token
echo "=== KVCacheManager.get_cache_size_per_token full implementation ==="
sed -n '802,900p' tensorrt_llm/_torch/pyexecutor/resource_manager.py

Repository: NVIDIA/TensorRT-LLM

Length of output: 4914


🏁 Script executed:

#!/bin/bash
# Read the full implementation of MambaHybridCacheManager.get_cache_size_per_token
echo "=== MambaHybridCacheManager.get_cache_size_per_token ==="
rg -n "class MambaHybridCacheManager" --type=py -A 200 | head -250

Repository: NVIDIA/TensorRT-LLM

Length of output: 10094


🏁 Script executed:

#!/bin/bash
# Read the full implementation of _create_one_model_draft_kv_cache_manager
echo "=== _create_one_model_draft_kv_cache_manager full implementation ==="
sed -n '557,620p' tensorrt_llm/_torch/pyexecutor/_util.py

Repository: NVIDIA/TensorRT-LLM

Length of output: 3353


🏁 Script executed:

#!/bin/bash
# Check what get_num_attention_layers returns from the model config
echo "=== Searching for get_num_attention_layers method ==="
rg -n "def get_num_attention_layers" --type=py -A 3

# Also check ModelConfig to understand its structure
echo "=== Searching for ModelConfig class definition ==="
rg -n "class ModelConfig" --type=py -A 10 | head -50

Repository: NVIDIA/TensorRT-LLM

Length of output: 4173


🏁 Script executed:

#!/bin/bash
# Search for _create_kv_cache_manager helper function to see if it uses num_layers
echo "=== Searching for _create_kv_cache_manager function ==="
rg -n "def _create_kv_cache_manager" --type=py -C 5

Repository: NVIDIA/TensorRT-LLM

Length of output: 3294


🏁 Script executed:

#!/bin/bash
# Look at the _create_kv_cache_manager function fully to see how it handles num_layers
echo "=== Full _create_kv_cache_manager function (module-level) ==="
sed -n '646,800p' tensorrt_llm/_torch/pyexecutor/_util.py

Repository: NVIDIA/TensorRT-LLM

Length of output: 6966


🏁 Script executed:

#!/bin/bash
# Check what the complete get_num_attention_layers method returns
echo "=== Complete get_num_attention_layers ==="
sed -n '770,780p' tensorrt_llm/_torch/model_config.py

Repository: NVIDIA/TensorRT-LLM

Length of output: 803


🏁 Script executed:

#!/bin/bash
# Check if get_cache_size_per_token in MambaHybridCacheManager is overridden
echo "=== Search for get_cache_size_per_token in MambaHybridCacheManager ==="
rg -n "class MambaHybridCacheManager" --type=py -A 300 | grep -A 30 "def get_cache_size_per_token"

Repository: NVIDIA/TensorRT-LLM

Length of output: 133


🏁 Script executed:

#!/bin/bash
# Look at how get_num_spec_layers works
echo "=== Search for get_num_spec_layers ==="
rg -n "def get_num_spec_layers" --type=py -A 10

Repository: NVIDIA/TensorRT-LLM

Length of output: 1004


🏁 Script executed:

#!/bin/bash
# Check if get_cache_size_per_token for MambaHybridCacheManager uses layer_mask
echo "=== Check KVCacheManager.get_cache_size_per_token signature ==="
sed -n '802,850p' tensorrt_llm/_torch/pyexecutor/resource_manager.py | head -60

Repository: NVIDIA/TensorRT-LLM

Length of output: 2343


🏁 Script executed:

#!/bin/bash
# Now let's trace through the exact values for nemotron_hybrid
# First check: does get_cache_size_per_token accept layer_mask parameter?
echo "=== Checking if get_cache_size_per_token accepts layer_mask ==="
rg -n "def get_cache_size_per_token.*layer_mask" --type=py

# And check KVCacheManagerV2.get_cache_size_per_token
echo "=== Checking KVCacheManagerV2.get_cache_size_per_token ==="
rg -n "class KVCacheManagerV2" --type=py -A 50 | grep -A 20 "def get_cache_size_per_token"

Repository: NVIDIA/TensorRT-LLM

Length of output: 185


🏁 Script executed:

#!/bin/bash
# Let me understand the actual issue better by checking the calculation in detail
# For nemotron_hybrid, when layer_mask is None (in size estimation), what does it count?
echo "=== Look at the full get_cache_size_per_token code path for nemotron_hybrid ==="
sed -n '802,855p' tensorrt_llm/_torch/pyexecutor/resource_manager.py

Repository: NVIDIA/TensorRT-LLM

Length of output: 2625


🏁 Script executed:

#!/bin/bash
# Now let's verify the actual numbers by checking the code flow one more time
# The key question: Does get_cache_size_per_token for nemotron_hybrid use get_num_attention_layers()?
echo "=== Verify the calculation path ==="
cat tensorrt_llm/_torch/model_config.py | sed -n '770,782p'

# For nemotron_hybrid, get_num_attention_layers returns pattern.count("*")
# This is used in get_cache_size_per_token at line 832:
echo ""
echo "=== Line 832 in resource_manager.py uses get_num_attention_layers() ==="
sed -n '830,835p' tensorrt_llm/_torch/pyexecutor/resource_manager.py

Repository: NVIDIA/TensorRT-LLM

Length of output: 1202


🏁 Script executed:

#!/bin/bash
# Now verify: In _create_one_model_draft_kv_cache_manager, what layer count is actually passed?
echo "=== What num_draft_layers is passed to _create_kv_cache_manager? ==="
sed -n '569,608p' tensorrt_llm/_torch/pyexecutor/_util.py | grep -A 40 "num_draft_layers"

Repository: NVIDIA/TensorRT-LLM

Length of output: 1979


🏁 Script executed:

#!/bin/bash
# Trace the exact calculation: estimation vs actual for nemotron_hybrid
echo "=== Verify MTP num_nextn_predict_layers vs target attention layers ==="
rg -n "num_nextn_predict_layers" --type=py -C 3 | head -40

Repository: NVIDIA/TensorRT-LLM

Length of output: 3499


🏁 Script executed:

#!/bin/bash
# Check if the layer_mask is actually used in the size estimation
echo "=== Does get_cache_size_per_token accept layer_mask parameter? ==="
rg -n "def get_cache_size_per_token" --type=py -B 2 -A 1 | head -30

Repository: NVIDIA/TensorRT-LLM

Length of output: 1820


🏁 Script executed:

#!/bin/bash
# Verify the complete picture: for nemotron_hybrid, what happens in _get_kv_size_per_token
echo "=== Full verification: nemotron_hybrid MTP case ==="
echo "1. _get_effective_draft_config for MTP returns:"
sed -n '540,555p' tensorrt_llm/_torch/pyexecutor/_util.py | tail -5

echo ""
echo "2. For nemotron_hybrid, get_num_attention_layers() returns pattern.count('*')"
echo "   This is used in get_cache_size_per_token at line 832"

echo ""
echo "3. But actual allocation passes num_draft_layers (from get_num_spec_layers):"
sed -n '569,572p' tensorrt_llm/_torch/pyexecutor/_util.py

echo ""
echo "4. The layer_mask is created with only num_draft_layers True values:"
sed -n '575,577p' tensorrt_llm/_torch/pyexecutor/_util.py

echo ""
echo "Conclusion: Estimation uses full target attention layers, allocation uses only draft layers"

Repository: NVIDIA/TensorRT-LLM

Length of output: 1159


🏁 Script executed:

#!/bin/bash
# Final check: is the size estimation actually used for memory calculation?
echo "=== Is kv_size_per_token used in memory calculation? ==="
sed -n '130,147p' tensorrt_llm/_torch/pyexecutor/_util.py

Repository: NVIDIA/TensorRT-LLM

Length of output: 1222


Draft KV cache size estimation overcounts for MTP on hybrid models.

When effective_draft_config falls back to the target model config for MTP, get_cache_size_per_token is called with the full target config—including all attention layers from hybrid_override_pattern.count("*") for nemotron_hybrid models. However, the actual draft KV cache (created in _create_one_model_draft_kv_cache_manager) only allocates for num_draft_layers (from get_num_spec_layers), specified via layer_mask. This mismatch means the memory budget reserved for the draft is significantly larger than what's actually needed, reducing available capacity for the target KV cache.

Pass num_draft_layers to the size calculation or use a draft-specific config with the correct layer count to avoid over-reserving memory.

🤖 Prompt for AI Agents
In `@tensorrt_llm/_torch/pyexecutor/_util.py` around lines 121 - 127, The draft KV
cache size calculation overcounts when MTP falls back to the target model
config: in _should_create_separate_draft_kv_cache() you call
_get_effective_draft_config() and then
kv_cache_manager_cls.get_cache_size_per_token(...) which uses the full target
layer count; change this to compute and pass the actual draft layer count used
by _create_one_model_draft_kv_cache_manager (call get_num_spec_layers or derive
num_draft_layers from layer_mask) so get_cache_size_per_token receives a
draft-specific config or an explicit num_draft_layers parameter (i.e., create a
short-lived draft_config with the correct layer count or add num_draft_layers
arg) to ensure the estimated kv_size_per_token matches the allocated draft KV
cache.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35521 [ run ] triggered by Bot. Commit: b78dff9

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35521 [ run ] completed with state SUCCESS. Commit: b78dff9
/LLM/main/L0_MergeRequest_PR pipeline #27427 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@IzzyPutterman
Copy link
Collaborator Author

/bot run

@IzzyPutterman IzzyPutterman force-pushed the iputterman/nemotron-0210 branch from b78dff9 to b954710 Compare February 11, 2026 03:43
@tensorrt-cicd
Copy link
Collaborator

PR_Github #35578 [ run ] triggered by Bot. Commit: b954710

@IzzyPutterman IzzyPutterman force-pushed the iputterman/nemotron-0210 branch from b954710 to 9713821 Compare February 11, 2026 03:46
@IzzyPutterman
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35587 [ run ] triggered by Bot. Commit: 9713821

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35587 [ run ] completed with state SUCCESS. Commit: 9713821
/LLM/main/L0_MergeRequest_PR pipeline #27488 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@IzzyPutterman
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35644 [ run ] triggered by Bot. Commit: 1be697e

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35644 [ run ] completed with state DISABLED
CI server is currently disabled for unplanned maintenance. Estimated completion time: 8 AM PST on 2/11.

Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
@IzzyPutterman IzzyPutterman force-pushed the iputterman/nemotron-0210 branch from 1be697e to 4ae5969 Compare February 11, 2026 23:17
@IzzyPutterman IzzyPutterman requested review from a team as code owners February 11, 2026 23:17
@IzzyPutterman IzzyPutterman requested a review from brb-nv February 11, 2026 23:17
@mikeiovine
Copy link
Collaborator

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35686 [ run ] triggered by Bot. Commit: 4ae5969

Copy link
Collaborator

@brb-nv brb-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor refactor comment. LGTM.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35686 [ run ] completed with state SUCCESS. Commit: 4ae5969
/LLM/main/L0_MergeRequest_PR pipeline #27563 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@IzzyPutterman
Copy link
Collaborator Author

/bot run

@IzzyPutterman
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35945 [ run ] triggered by Bot. Commit: 0270e95

@IzzyPutterman
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35956 [ run ] triggered by Bot. Commit: 0270e95

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35956 [ run ] completed with state SUCCESS. Commit: 0270e95
/LLM/main/L0_MergeRequest_PR pipeline #27771 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@IzzyPutterman
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #36014 [ run ] triggered by Bot. Commit: 0270e95

@tensorrt-cicd
Copy link
Collaborator

PR_Github #36014 [ run ] completed with state SUCCESS. Commit: 0270e95
/LLM/main/L0_MergeRequest_PR pipeline #27822 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@mikeiovine
Copy link
Collaborator

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #36086 [ run ] triggered by Bot. Commit: 127d746

@tensorrt-cicd
Copy link
Collaborator

PR_Github #36086 [ run ] completed with state SUCCESS. Commit: 127d746
/LLM/main/L0_MergeRequest_PR pipeline #27883 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@IzzyPutterman IzzyPutterman force-pushed the iputterman/nemotron-0210 branch from 127d746 to bc98799 Compare February 19, 2026 17:18
@IzzyPutterman
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #36261 [ run ] triggered by Bot. Commit: bc98799 Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #36261 [ run ] completed with state SUCCESS. Commit: bc98799
/LLM/main/L0_MergeRequest_PR pipeline #28038 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@IzzyPutterman
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #36294 [ run ] triggered by Bot. Commit: bc98799 Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #36294 [ run ] completed with state SUCCESS. Commit: bc98799
/LLM/main/L0_MergeRequest_PR pipeline #28067 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@IzzyPutterman IzzyPutterman force-pushed the iputterman/nemotron-0210 branch from bc98799 to d3efb9a Compare February 20, 2026 01:47
@IzzyPutterman
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #36303 [ run ] triggered by Bot. Commit: d3efb9a Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #36303 [ run ] completed with state SUCCESS. Commit: d3efb9a
/LLM/main/L0_MergeRequest_PR pipeline #28076 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@IzzyPutterman
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #36315 [ run ] triggered by Bot. Commit: d3efb9a Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #36315 [ run ] completed with state SUCCESS. Commit: d3efb9a
/LLM/main/L0_MergeRequest_PR pipeline #28087 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

Link to invocation

@MartinMarciniszyn MartinMarciniszyn merged commit d33c1e3 into NVIDIA:main Feb 20, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants