[TRTLLM-10857][chore] Move SaveHiddenStates spec dec mode to 1 model#11241

mikeiovine · 2026-02-03T21:03:58Z

Description

Remove another dependency on deprecated 2-model Drafter machinery. Also attempt document the existing behavior in the code.

Test Coverage

Existing tests pass.

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Summary by CodeRabbit

Release Notes

API Changes
- Updated speculative decoding hidden-state handling: new SaveHiddenStatesResourceManager and SaveHiddenStatesSpecMetadata classes replace previous SaveHiddenStatesDrafter API.
Improvements
- Enhanced hidden state capture and persistence mechanism for speculative decoding inference workflows.

tensorrt_llm/_torch/speculative/utils.py

coderabbitai · 2026-02-03T21:14:46Z

📝 Walkthrough

Walkthrough

The pull request refactors the speculative decoding hidden states saving mechanism from a Drafter-based to a ResourceManager-based architecture. The SaveHiddenStatesDrafter is replaced with SaveHiddenStatesResourceManager and SaveHiddenStatesSpecMetadata. The post-forward flow now calls the resource manager's process_and_save method instead of the drafter's post-hook, and the drafter's run_drafter_post method is removed.

Changes

Cohort / File(s)	Summary
Core Hidden States Implementation `tensorrt_llm/_torch/speculative/save_hidden_state.py`	Redesigned from Drafter-based to ResourceManager-based pattern. Introduced SaveHiddenStatesResourceManager (inheriting from BaseResourceManager) with resource lifecycle methods (prepare, update, free, shutdown) and new process_and_save method for post-forward hidden state capture. Added SaveHiddenStatesSpecMetadata dataclass to configure layer capture. Includes internal buffer management, disk I/O via _write_to_file, and per-request processing logic. 171 lines added.
Public API Exports `tensorrt_llm/_torch/speculative/__init__.py`	Replaced SaveHiddenStatesDrafter export with SaveHiddenStatesResourceManager and SaveHiddenStatesSpecMetadata in all. Updated imports to reflect new public entities.
Executor Integration `tensorrt_llm/_torch/pyexecutor/py_executor.py`	Replaced unconditional drafter.run_drafter_post() call with conditional SPEC_RESOURCE_MANAGER.process_and_save() invocation (gated by not-warmup condition). Passes scheduled_batch and optional spec_metadata from model_engine to new resource manager flow.
Drafter Cleanup `tensorrt_llm/_torch/speculative/drafter.py`	Removed run_drafter_post method and its docstring from Drafter class, eliminating the post-drafter hook API.
Utility & Integration Updates `tensorrt_llm/_torch/speculative/utils.py`, `tensorrt_llm/_torch/speculative/eagle3.py`, `tensorrt_llm/_torch/speculative/interface.py`	Updated get_spec_metadata and get_spec_resource_manager to return new SaveHiddenStatesSpecMetadata and SaveHiddenStatesResourceManager for save-hidden-states mode; renamed fields (num_layers → num_model_layers, eagle3_resource_manager → resource_manager). Removed SaveHiddenStatesDrafter return in get_spec_drafter. Extracted _get_eagle3_default_capture_layers helper. Dropped SAVE_HIDDEN_STATES from has_spec_drafter union condition.
Configuration Logic `tensorrt_llm/llmapi/llm_args.py`	Updated SaveHiddenStatesDecodingConfig.model_post_init to handle -1 layer capturing without automatic insertion. Modified num_capture_layers to account for aux_hidden_states presence via indicator int(-1 not in eagle3_layers_to_capture); expanded docstring to clarify tensor saving behavior.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 15.38% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: moving SaveHiddenStates spec decoding mode to a 1-model (single-model) implementation, which is the core objective reflected in the raw summary changes.
Description check	✅ Passed	The description provides a concise explanation of what is being done (removing dependency on deprecated 2-model Drafter machinery and documenting behavior) and mentions test coverage, though it lacks some detail about the specific technical changes and their rationale.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Important

Action Needed: IP Allowlist Update

If your organization protects your Git platform with IP whitelisting, please add the new CodeRabbit IP address to your allowlist:

✨ 136.113.208.247/32 (new)
34.170.211.100/32
35.222.179.152/32

Reviews will stop working after February 8, 2026 if the new IP is not added to your allowlist.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

tensorrt_llm/llmapi/llm_args.py (2)
1-5: ⚠️ Potential issue | 🟡 Minor

Add required NVIDIA copyright header.

This TensorRT‑LLM source file starts directly with imports; please add the standard NVIDIA header with the latest modification year at the top of the file.
As per coding guidelines: All TensorRT-LLM source files (.cpp, .h, .cu, .py, and other source files) should contain an NVIDIA copyright header with the year of latest meaningful modification.

1051-1067: ⚠️ Potential issue | 🟠 Major

Align SaveHiddenStates validation with the new default-capture behavior.

model_post_init now treats eagle3_layers_to_capture=None as valid, but validate() still rejects falsy values. If validate() is invoked, the default-capture path will fail. Either allow None explicitly or remove the default path/documentation.
🔧 Proposed fix
 def validate(self) -> None:
-    if self.output_directory is None or not self.eagle3_layers_to_capture:
-        raise ValueError(
-            "Save directory and layers to capture must be provided")
+    if self.output_directory is None:
+        raise ValueError("Save directory must be provided")
+    if self.eagle3_layers_to_capture is not None and len(self.eagle3_layers_to_capture) == 0:
+        raise ValueError("layers_to_capture must be non-empty when provided")
tensorrt_llm/_torch/speculative/__init__.py (1)

1-3: ⚠️ Potential issue | 🟡 Minor

Add required NVIDIA copyright header.

Please add the standard NVIDIA header with the latest modification year at the top of this source file.
As per coding guidelines: All TensorRT-LLM source files (.cpp, .h, .cu, .py, and other source files) should contain an NVIDIA copyright header with the year of latest meaningful modification.

tensorrt_llm/_torch/speculative/utils.py (1)

1-3: ⚠️ Potential issue | 🟡 Minor

Add required NVIDIA copyright header.

Please add the standard NVIDIA header with the latest modification year at the top of this source file.
As per coding guidelines: All TensorRT-LLM source files (.cpp, .h, .cu, .py, and other source files) should contain an NVIDIA copyright header with the year of latest meaningful modification.

🤖 Fix all issues with AI agents

In `@tensorrt_llm/_torch/pyexecutor/py_executor.py`:
- Around line 1598-1607: The SaveHiddenStates feature is not validated against
pipeline parallelism causing silent failure when pp_size > 1 because only
_executor_loop calls spec_resource_mgr.process_and_save; either enforce
pp_size==1 in the SaveHiddenStatesDecodingConfig initializer (add an
assertion/validation referencing SaveHiddenStatesDecodingConfig and
llm_args.pp_size) or add the same hook into _executor_loop_pp (check for
ResourceManagerType.SPEC_RESOURCE_MANAGER, getattr(self.model_engine,
'spec_metadata', None), and call spec_resource_mgr.process_and_save for the
final pipeline rank) so SaveHiddenStates always runs or fails loudly when PP is
enabled.

In `@tensorrt_llm/_torch/speculative/save_hidden_state.py`:
- Around line 67-68: The function get_needed_resource_to_completion currently
declares an unused parameter request which triggers lint ARG002; update the
function signature in get_needed_resource_to_completion to rename request to
_request (e.g., def get_needed_resource_to_completion(self, _request:
LlmRequest):) or explicitly remove the variable by adding del request at the
top, so the linter stops flagging the unused parameter while preserving the
existing return behavior.
- Around line 135-171: In SaveHiddenStatesSpecMetadata.__post_init__, the
default capture computation calls
_get_eagle3_default_capture_layers(self.num_layers) which uses the wrong field;
change it to call _get_eagle3_default_capture_layers(self.num_model_layers) so
defaults are based on the actual model layer count, keep the rest of the logic
(sorting layers_to_capture, handling -1 last-layer marker, and setting
num_capture_layers) unchanged and ensure the reference to
SaveHiddenStatesSpecMetadata and method __post_init__ are updated accordingly.
- Around line 1-4: Add the standard NVIDIA copyright header (with the latest
modification year) at the very top of
tensorrt_llm/_torch/speculative/save_hidden_state.py before any imports; ensure
the header matches the project's canonical NVIDIA header text and formatting
used across other TensorRT-LLM source files and includes the appropriate year of
last meaningful modification.

🧹 Nitpick comments (5)

tensorrt_llm/_torch/speculative/eagle3.py (1)
111-112: Add a Google-style docstring for the new helper.

Keeps the new utility compliant and clarifies the tuple semantics.
✍️ Suggested update
 def _get_eagle3_default_capture_layers(num_layers: int):
+    """Return default Eagle3 layer indices to capture.
+
+    Args:
+        num_layers: Total number of layers in the model.
+
+    Returns:
+        Tuple of layer indices to capture.
+    """
     return (1, num_layers // 2 - 1, num_layers - 4)
As per coding guidelines: Use Google-style docstrings for Python classes and functions, which can be parsed by Sphinx.
tensorrt_llm/_torch/speculative/interface.py (1)
135-136: Consider documenting the updated has_spec_drafter behavior.
✍️ Suggested update
     def has_spec_drafter(self):
+        """Return True if this mode uses a spec drafter."""
         return self.is_eagle3() or self.is_draft_target() or self.is_ngram(
         ) or self.is_user_provided() or self.is_mtp_eagle()
As per coding guidelines: Use Google-style docstrings for Python classes and functions, which can be parsed by Sphinx.
tensorrt_llm/_torch/speculative/__init__.py (1)
6-22: Keep save_hidden_state imports namespaced.

The guidelines require module‑namespace imports. Consider importing the module and re‑exporting the symbols.
♻️ Suggested refactor
-from .save_hidden_state import (SaveHiddenStatesResourceManager,
-                                SaveHiddenStatesSpecMetadata)
+from . import save_hidden_state
+
+SaveHiddenStatesResourceManager = save_hidden_state.SaveHiddenStatesResourceManager
+SaveHiddenStatesSpecMetadata = save_hidden_state.SaveHiddenStatesSpecMetadata
As per coding guidelines: Always maintain the namespace when importing Python modules, even if only one class or function from a module is used.
tensorrt_llm/_torch/speculative/utils.py (1)
17-18: Keep save_hidden_state imports namespaced.

To comply with the import‑namespace guideline, import the module and qualify usages.
♻️ Suggested refactor
-from .save_hidden_state import (SaveHiddenStatesResourceManager,
-                                SaveHiddenStatesSpecMetadata)
+from . import save_hidden_state
@@
-        return SaveHiddenStatesSpecMetadata(
+        return save_hidden_state.SaveHiddenStatesSpecMetadata(
@@
-        return SaveHiddenStatesResourceManager(
+        return save_hidden_state.SaveHiddenStatesResourceManager(
As per coding guidelines: Always maintain the namespace when importing Python modules, even if only one class or function from a module is used.

Also applies to: 84-95, 145-151
tensorrt_llm/_torch/speculative/save_hidden_state.py (1)
2-12: Keep internal imports namespaced.

To comply with the namespace‑import guideline, import modules and qualify the base classes.
♻️ Suggested refactor
-from ..pyexecutor.resource_manager import BaseResourceManager
-from .interface import SpecMetadata
+from ..pyexecutor import resource_manager
+from . import interface
@@
-class SaveHiddenStatesResourceManager(BaseResourceManager):
+class SaveHiddenStatesResourceManager(resource_manager.BaseResourceManager):
@@
-class SaveHiddenStatesSpecMetadata(SpecMetadata):
+class SaveHiddenStatesSpecMetadata(interface.SpecMetadata):
As per coding guidelines: Always maintain the namespace when importing Python modules, even if only one class or function from a module is used.

Also applies to: 18-18, 136-136

tensorrt_llm/_torch/pyexecutor/py_executor.py

tensorrt_llm/_torch/speculative/save_hidden_state.py

mikeiovine · 2026-02-03T21:19:15Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-02-03T21:25:45Z

PR_Github #34689 [ run ] triggered by Bot. Commit: b70c4ec

tensorrt-cicd · 2026-02-04T01:27:13Z

PR_Github #34689 [ run ] completed with state SUCCESS. Commit: b70c4ec
/LLM/main/L0_MergeRequest_PR pipeline #26766 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

tensorrt_llm/_torch/speculative/save_hidden_state.py

mikeiovine · 2026-02-04T16:26:34Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-02-04T16:32:48Z

PR_Github #34813 [ run ] triggered by Bot. Commit: daffa6e

tensorrt-cicd · 2026-02-04T22:24:16Z

PR_Github #34813 [ run ] completed with state SUCCESS. Commit: daffa6e
/LLM/main/L0_MergeRequest_PR pipeline #26853 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

mikeiovine · 2026-02-05T15:30:37Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-02-05T15:36:59Z

PR_Github #34978 [ run ] triggered by Bot. Commit: c52e55b

tensorrt-cicd · 2026-02-05T19:55:13Z

PR_Github #34978 [ run ] completed with state SUCCESS. Commit: c52e55b
/LLM/main/L0_MergeRequest_PR pipeline #26986 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

mikeiovine · 2026-02-06T15:55:21Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-02-06T16:02:21Z

PR_Github #35120 [ run ] triggered by Bot. Commit: a159e64

tensorrt-cicd · 2026-02-06T22:52:55Z

PR_Github #35120 [ run ] completed with state SUCCESS. Commit: a159e64
/LLM/main/L0_MergeRequest_PR pipeline #27113 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

mikeiovine · 2026-02-09T21:08:01Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-02-09T21:14:20Z

PR_Github #35373 [ run ] triggered by Bot. Commit: 33e80d4

tensorrt-cicd · 2026-02-10T00:59:35Z

PR_Github #35373 [ run ] completed with state SUCCESS. Commit: 33e80d4
/LLM/main/L0_MergeRequest_PR pipeline #27320 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

mikeiovine · 2026-02-10T16:11:58Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-02-10T16:18:29Z

PR_Github #35518 [ run ] triggered by Bot. Commit: 876761b

tensorrt-cicd · 2026-02-10T21:50:22Z

PR_Github #35518 [ run ] completed with state SUCCESS. Commit: 876761b
/LLM/main/L0_MergeRequest_PR pipeline #27432 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Signed-off-by: Mike Iovine <miovine@nvidia.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>

coderabbitai · 2026-02-11T16:06:38Z

Caution

Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted.

Error details

{"name":"HttpError","status":401,"request":{"method":"PATCH","url":"https://api.github.com/repos/NVIDIA/TensorRT-LLM/issues/comments/3843719469","headers":{"accept":"application/vnd.github.v3+json","user-agent":"octokit.js/0.0.0-development octokit-core.js/7.0.6 Node.js/24","authorization":"token [REDACTED]","content-type":"application/json; charset=utf-8"},"body":{"body":"<!-- This is an auto-generated comment: summarize by coderabbit.ai -->\n<!-- This is an auto-generated comment: failure by coderabbit.ai -->\n\n> [!CAUTION]\n> ## Review failed\n> \n> An error occurred during the review process. Please try again later.\n\n<!-- end of auto-generated comment: failure by coderabbit.ai -->\n\n<!-- walkthrough_start -->\n\n<details>\n<summary>📝 Walkthrough</summary>\n\n## Walkthrough\n\nThe pull request refactors the speculative decoding hidden states saving mechanism from a Drafter-based to a ResourceManager-based architecture. The SaveHiddenStatesDrafter is replaced with SaveHiddenStatesResourceManager and SaveHiddenStatesSpecMetadata. The post-forward flow now calls the resource manager's process_and_save method instead of the drafter's post-hook, and the drafter's run_drafter_post method is removed.\n\n## Changes\n\n|Cohort / File(s)|Summary|\n|---|---|\n|**Core Hidden States Implementation** <br> `tensorrt_llm/_torch/speculative/save_hidden_state.py`|Redesigned from Drafter-based to ResourceManager-based pattern. Introduced SaveHiddenStatesResourceManager (inheriting from BaseResourceManager) with resource lifecycle methods (prepare, update, free, shutdown) and new process_and_save method for post-forward hidden state capture. Added SaveHiddenStatesSpecMetadata dataclass to configure layer capture. Includes internal buffer management, disk I/O via _write_to_file, and per-request processing logic. 171 lines added.|\n|**Public API Exports** <br> `tensorrt_llm/_torch/speculative/__init__.py`|Replaced SaveHiddenStatesDrafter export with SaveHiddenStatesResourceManager and SaveHiddenStatesSpecMetadata in __all__. Updated imports to reflect new public entities.|\n|**Executor Integration** <br> `tensorrt_llm/_torch/pyexecutor/py_executor.py`|Replaced unconditional drafter.run_drafter_post() call with conditional SPEC_RESOURCE_MANAGER.process_and_save() invocation (gated by not-warmup condition). Passes scheduled_batch and optional spec_metadata from model_engine to new resource manager flow.|\n|**Drafter Cleanup** <br> `tensorrt_llm/_torch/speculative/drafter.py`|Removed run_drafter_post method and its docstring from Drafter class, eliminating the post-drafter hook API.|\n|**Utility & Integration Updates** <br> `tensorrt_llm/_torch/speculative/utils.py`, `tensorrt_llm/_torch/speculative/eagle3.py`, `tensorrt_llm/_torch/speculative/interface.py`|Updated get_spec_metadata and get_spec_resource_manager to return new SaveHiddenStatesSpecMetadata and SaveHiddenStatesResourceManager for save-hidden-states mode; renamed fields (num_layers → num_model_layers, eagle3_resource_manager → resource_manager). Removed SaveHiddenStatesDrafter return in get_spec_drafter. Extracted _get_eagle3_default_capture_layers helper. Dropped SAVE_HIDDEN_STATES from has_spec_drafter union condition.|\n|**Configuration Logic** <br> `tensorrt_llm/llmapi/llm_args.py`|Updated SaveHiddenStatesDecodingConfig.model_post_init to handle -1 layer capturing without automatic insertion. Modified num_capture_layers to account for aux_hidden_states presence via indicator int(-1 not in eagle3_layers_to_capture); expanded docstring to clarify tensor saving behavior.|\n\n## Estimated code review effort\n\n🎯 4 (Complex) | ⏱️ ~50 minutes\n\n</details>\n\n<!-- walkthrough_end -->\n\n<!-- pre_merge_checks_walkthrough_start -->\n\n<details>\n<summary>🚥 Pre-merge checks | ✅ 2 | ❌ 1</summary>\n\n<details>\n<summary>❌ Failed checks (1 warning)</summary>\n\n|     Check name     | Status     | Explanation                                                                           | Resolution                                                                         |\n| :----------------: | :--------- | :------------------------------------------------------------------------------------ | :--------------------------------------------------------------------------------- |\n| Docstring Coverage | ⚠️ Warning | Docstring coverage is 15.38% which is insufficient. The required threshold is 80.00%. | Write docstrings for the functions missing them to satisfy the coverage threshold. |\n\n</details>\n<details>\n<summary>✅ Passed checks (2 passed)</summary>\n\n|     Check name    | Status   | Explanation                                                                                                                                                                                                                                                                   |\n| :---------------: | :------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n|    Title check    | ✅ Passed | The title clearly and specifically describes the main change: moving SaveHiddenStates spec decoding mode to a 1-model (single-model) implementation, which is the core objective reflected in the raw summary changes.                                                        |\n| Description check | ✅ Passed | The description provides a concise explanation of what is being done (removing dependency on deprecated 2-model Drafter machinery and documenting behavior) and mentions test coverage, though it lacks some detail about the specific technical changes and their rationale. |\n\n</details>\n\n<sub>✏️ Tip: You can configure your own custom pre-merge checks in the settings.</sub>\n\n</details>\n\n<!-- pre_merge_checks_walkthrough_end -->\n\n<!-- finishing_touch_checkbox_start -->\n\n<details>\n<summary>✨ Finishing touches</summary>\n\n<details>\n<summary>🧪 Generate unit tests (beta)</summary>\n\n- [ ] <!-- {\"checkboxId\": \"f47ac10b-58cc-4372-a567-0e02b2c3d479\", \"radioGroupId\": \"utg-output-choice-group-unknown_comment_id\"} -->   Create PR with unit tests\n- [ ] <!-- {\"checkboxId\": \"07f1e7d6-8a8e-4e23-9900-8731c2c87f58\", \"radioGroupId\": \"utg-output-choice-group-unknown_comment_id\"} -->   Post copyable unit tests in a comment\n- [ ] <!-- {\"checkboxId\": \"6ba7b810-9dad-11d1-80b4-00c04fd430c8\", \"radioGroupId\": \"utg-output-choice-group-unknown_comment_id\"} -->   Commit unit tests in branch `saver-1-model`\n\n</details>\n\n</details>\n\n<!-- finishing_touch_checkbox_end -->\n\n<!-- announcements_start -->\n\n> [!TIP]\n> [Issue Planner](https://www.coderabbit.ai/issue-planner) is now in beta. Read the [docs](https://docs.coderabbit.ai/issues/planning) and try it out! Share your feedback on [Discord](https://discord.com/invite/coderabbit).\n\n<!-- announcements_end -->\n\n<!-- tips_start -->\n\n---\n\nThanks for using [CodeRabbit](https://coderabbit.ai?utm_source=oss&utm_medium=github&utm_campaign=NVIDIA/TensorRT-LLM&utm_content=11241)! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.\n\n<details>\n<summary>❤️ Share</summary>\n\n- [X](https://twitter.com/intent/tweet?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A&url=https%3A//coderabbit.ai)\n- [Mastodon](https://mastodon.social/share?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A%20https%3A%2F%2Fcoderabbit.ai)\n- [Reddit](https://www.reddit.com/submit?title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&text=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code.%20Check%20it%20out%3A%20https%3A//coderabbit.ai)\n- [LinkedIn](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fcoderabbit.ai&mini=true&title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&summary=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code)\n\n</details>\n\n<sub>Comment `@coderabbitai help` to get the list of available commands and usage tips.</sub>\n\n<!-- tips_end -->\n\n<!-- internal state start -->\n\n\n<!-- DwQgtGAEAqAWCWBnSTIEMB26CuAXA9mAOYCmGJATmriQCaQDG+Ats2bgFyQAOFk+AIwBWJBrngA3EsgEBPRvlqU0AgfFwA6NPEgQAfACgjoCEYDEZyAAUASpETZWaCrKPR1AGxJcA2tBvQADKBALJgAIwADAAcAKwA7AC6Pgyw+BQkiZAh+FKQAMpoUgAS8LRKGPm41NL23KKQSgyQzIokkASQ4S1tHpCQBgByjgKUXOHhAEwALN0DAKo2gVywuLjciBwA9FtE6rDYAhpMzFuDAGoAkgAilwCCW9BkiOkBYMEhW9zYHh5bEzM5gZ5ogxi14ABrEjwXLwcj9Az5fDYCgMdoCKgYVJcRBFSgRMCtJR9QBJhDBnKRcJAMZhsS1tFgBlVqNhNvx6oyDDYSBJ4CQAO5gy4AL2FsiseBoFGYmARAGEMjV6NQuJNIpMAGxgdXagDM0Em4Q4kV1HFi0QAWkZrtIGBR4NxxPgMFxua0pMg0I0SByKgx5M6OrB2kpeKIlZBJoTepBrlQAGZS+mpOGUeRyHq8jBEIPtQolMoVZk0ZCIeoMeDx+AMagwrBNRRwnNE9qdL2IJteaPElDMbheNgYapOjAaGDBxiwTCkdAeF7oNYkPu4ZCdWj4BiOdi5yAkAAeSHE2epJCnvPSKCwTCUY4AogfEEecyWVzw0IhEGO4O1bJeGB5sCUT17GqDBaGcehUlECEPEPSB4wva8m0gIhsDKEhYPIRAABoOmkFdcNgtEMA7bNcPXTdB2HOtcMweg5Uual8CpVk0BnOFHwobAxDrT8jH0YxwCgMh6HweMcAIYgyGUGhIJYKiuF4fhhFEcQPWpeRr2UVR1C0HQBJMKA4FQVBZTQPBCFIcgqFkhRWHYLgqH5exHBlFwNIUJQqB0zRtF0MBDEE0wDBoEj0goXAAH1flOSKCFRWAvlkfdRAsigksilLN3ijRuFkDgDAAImKgwLEgO5Lik6yIwcJx3LEydp2kIxLiwXAJysWR71S+L4PSfkILfdrcPa9puHwR8wAQigBooeg0nwCFGDQX4OnwHdaATJMpxkEgyEgDJ+zQNFaC/Ccw3PVlPPabAsWdWh1DrFaPHkOEJEWuhGi2ygNC4jBIs2tBE0oSLxsfAAKDQoYASnQYG+FGvqZsGuj7DQPtMJzR8fU/cd2nIZyPHwPZmgyd1ana1Aa1W1GOJoNBaFw/lgywDBmMvSBZuYbBuFoxdl1XdbcTyBByn2x8ahkV6h2UHjj35fYd3yKxbzlSKbFvfIAHlFjlW9IpCO5BjuABxW8bC4St0AKFW1Y17Xdf1w3jbNux90PT0wJ4ChYSA63eA3aREEiujIuF9o2HaxRcPUZbfmQAO0Q/EOwLDvFwcQKDaB+OhIoEahUlwstREiyOGeoNBYYV9r2RHFa6hLsvwOqeCfeYHpiUy7NUzO1Bw+QUWKhAyX0GAhDfnwfkwB59AeMDIGk0R6bZvobhqFgXCiAjDNEbZ3AwC5memDAx7nTO9oHAEUEAEdsG3eMiec0asB55vagyW/8OHkt0C9xO9toMgausBdwPifJAAA0ucMANYoK/3oBkF4KI0SQFfiPd8J5pokDHIMda3wBBEXKlYRiHYiAYBZIg9AGRGrZjoBoIwQwtaDFvMYLYmAVqyA7IgQKgRUzIFSE1WgXAADUUQthgGmEYW8j54AylslpA6PI+TORIPGaanBsh0HgI4IqJUDAQDAEYUKLwKARWih4WK8VUhbGLpuDwtYpBbEipFOE6hnG5XyrowqpVLAVSqjJT6tU3IBnEgI2hiB+KQG5EdJOO59zjQioE2QzABD4D6PmEgpQxaVGHNIOMC9KCtxYHhMKpiooxQ0HFdIqQNC2J+A47BzjXFRUipzRWuB+TrQJjwQ4hDEDJNSXOLgGSslFlyYgbkSDUQkBCOw0gfBUYjMLGQYs0h8jlhCCQaozc0Dn17AkqkbNnLfHjrmUEHROn2AGWk5A8Y26QFqXiSKg8yBh1ybRL2iN8GELuMQyAvIvTOOes4lAyA0G2U6PE9IVJEYdK6QKSA5C2C42Mvwqc4SdxEhzoATAJkBQsSfQBwFB4zHXRPIMmsJjxLOyasxA+T4bwMvLgH22cKzHlhZc7p7BHrSFwqo+MqlJAYXkCtKUyE0hPwnOHMALyMBgAljQeCt057kNgrgV6eK9xg0+p0VklBPaiVGnwLFXg+IMKMhOJpGA3GtJNSQXFkAgW/BBbBR8oLUHcDfvQToGQH6qR3FyrV0LPrcvELUVGPqly5FbBONJolyB8pIiiZCzoXoFDxKMlZ4zJnImmbM8h8zGXUrGZLdZohNnbIrlQsavTqxpqKNoexBD2grWdEQDsShQGHmQv0lJNze5oqaquCcdqHXIg6YNX5JCUQkpQcA8ds4xXHjhFKVVbT7THlGLIe6O5ZEHK2Piw56NpD0IYWVO4HgpS1mdILDaoh7E2V4vwcSh7PoXm+dWXcQ4eURP0dYWtzRQ3kqjVIIR6aCw0vGfSpMdzinGPCuUixlSrGwFqeWepalGkuOtS0v9EoCGftDXyT0YswMGH6FAYtWbJY5uQTMuZhTwZ7uDfQWD7dHlSGecs/6CqSDQ3I7ocDmTuO0rLQwCt5cW5Mb7Cxop7Hw5ceyW8mo/GoD4Z+X8ols6bqeqVFwJ1HgQVwhKSYsxFSqkJTQ6lexmHKnYZtYiyekBXUrgKhRoTmack0ekLmtE+a2KUAE5RjNInxliYkzsgTdMSAM2fR50LktoOBZ8dkTAlYv4ADF4BeHKqq2Qwpku8KwjQ0gYHhHTHEZMKRMi5GfQURkXkCL+XQq4KUIgsAvH8WCvBsp5jLHVMSnUmzQqtiA3hh4gqxVvHnsqlZAJhLXLOBCSV5qXIQM6vOhNfeY3tr4EWkQxibHYzfQoFwRGf0AYndBltloWy0jKi9uoZAFFOLIUFNQ4kWzPpHZ62ZixTiUM2PQ8NxxO2fp5RwetZiwY+CjDPDCPg76AP2HgGQih4bqFhNK6eyJFb7uKPJmBgABkocSF2wcUGuxDUEHh4xFyzjnWgkUP530fGyfIDOvC0G5J/NnuFEG+ZIKXBjp3IBa0dE9DwPhaN5pF1kAAvJAXBCbQWRUPtwLgqS0mQEVxllaoJYYBSV86bwROOb/nfMgInSWKBm+M0T37iH+sJSB9Zhpo2TseKJ0YIrtQsd0BEZEcRUxqviFq3JTtDXlG7jUS1zRD0dFTa64YkKzwEN9YBwN13dj3cxaIF4XUE3Oszf8TZQJi36qhPRaQX9UA7ikett0u1YAvBSD6MGDw9Q+DxmVSOR1lJMpsQLwDVR5lL2RRrI6FEQv7HJQoIgcGGBHDRTQHPtkK7YbtWoIo3AKISLW11GAXf/Z2jg3CLhJfzAV9r8gDsSMgnz+IuX7P/VgnpjQ3oVAeYunbK3iHyQXUcLLZSTXZZxMGKKZpEFXVc5SKAfPPYfUnMfKKSfXfDIa/fVDODCeMDQS/dA+fWGaab0ElH4KkF/efKpCfNAKfahaLWLBqL5KPZEZAOETCVsHmLwXCMMUECgLMHMd2GRTdU8IoBHJGfgI1akTEKCM1cwXxS9GSJ9ToRGJoB9a9ffBqV9USPgD9QDb9MNWvSADLXvOsdAUjLgInWArZQffPAAkfYg8fFA6fPAhfXAsg9fIcaGO3NqNPXrGKTPF3IbXPf/QvPKH3PhFbMrDUSrUPWRCMerJRJrWPCKLgTZBPZgTrfRbrbwv7Z3axAIzDLYFdSgbTIvJPEvObMvBbOqZbf3fQ7kWdeKAed8MOcsS7ApBGdaTafAbgJWO4c4fWYoG4a4W8QYSKfIaAO4aATWWuYwo7J7VBa1QMBqY+B6EcFFEdO7RQRzZyDIVA/fZlO+UQv/aw3UXCaDckCgSkXCQYE2GwO4EIXCEEQpKwH2XkJQRmRlEIaAKwSAI4nLVNWQAAbhQEOXWiJloT4GPg7Vf3yF6P6MGOGNGPGMmPyAHXsFgna1wDTXIVMUnmHQvi2Ti2WNPn31ui8k5gQFSGtlsTAAp3dUhPQgyFXkQXYFojkI3RzGPmZW1wficy3ypFDBEmQEDEpjRWghxxSwvSvVWLWjvQt0fRvTiw0P4C0ORyI1WygFwR6QI2aCnRRzR1QL92rzoVCOK39zK0mHEV1GiPD2ukUUaxUUSI0RSO0TSKTwyJT0dwzws1yOB3dwU1lWUxoBKJKjKOkgqJciqMJKNNqJ9HsROh3AuhhCuhtzAHzlBFY0fjaRri9G6Soy8xLBlz8xFy1MIQtw/EvBh0emPCOwACF3wSBCz6MC1ClOgZRmzrZZV5VxlqRsA1FClUYHpEAlpLgtgtYxxLgQTnJml4AVp4ACsgFFZj4qwiByJ1V6hcIAyOwCsPlvUPoMAtgWcv5YJmAnsdzZwiYawf4vRNzuyBBeyBUKBP9IBWouTWVPocyEU8zRMNlgCdlGgK4yzkAM5DhAK4sgDK1qhN91oyBJ8HAbN2gyCYEqCDSFAMBlyURVCY4sQAIHpjxXCKCHCMgNzuMw45ySBcIZQ9xIpcCCAoQSJVzZB1zGUWw+glBqhstcZWp/xAJagwCtt7MWlnNiZP1OgThvhFVECSDlpqCELV9X8AUfigjyT9o95vZfY6AzzOg2ZpRZyCtdBuh0VaBMZIBwZ7E3UyCoKv1RJxJEYXMgTmVjoIRkBcCWKnDGUvIhV6BcDCKZ85L58nz1NCNdCRVyhiTNggtIA5QzL2cQtINvMpkizmyKBwY6zQRGz/N5l+N3NrgALorhlYqS0SxwKQDwZiqdlVMolYzSVRIPB6BI57sFya5ultCDyfM6MwB0g+QZZ6Ap03NBMww140CBc6MF9M5gxs4uc84C5YBDc9BvYlEmD61HM5V+DQoqQLxCiKA116rNjF8nMczCAuieA4zTw41KAsrBNwUhdhrplRrOdc585cBUhZrjrSU0harAt3M7k9pmc2rbrwZDzHwXqYlTqPqKAIrEADhcB1x+QMBwYXr+R7Qf4yYGQe08RCVcl/kZzHVEb1AhcCBIoqwvAIqB9KLfqEqhcmBbpcB4bdA5qdi99kAybcDAaVwSbLDyA6Bc4bq0QCKWAT8RwAaSBedcAXqGatrkBIgIr/5k5Q5w4M57qmdWacIG4GBS5fyK5gafYk5QRkBla4YkwWKkYV4i5hDjxeNkAFLQZtag5fqRbGUu8YQHpqY01cbwFOgiaSAIrrbA5k5WahaRai4Wim5Na6a7JxLag4RxKXFAE+U9wHKxAB5uNv4L4iJeUezstAF+A8BxLGhqxcANy6JTUcAqKbyR40y30sB8KCafLGUV9HxFMKhsM04pBaJuBfRb0W7c5eNva3b8b8BCbssSBaajcu8uFXwu6mdeMZTBylpUZ/wYt59gS/1BgEVNq10eTnIByq1DLMY+qoBnoNwahmKGR2KsAadsCy6f47y+y+BIaqDPofBmbl9aLnhiKlMtz2gAAqJ/K/HypwxICKvuplSgCQeuTkn2X4T6DMJc1HDQPu7DKUMBjwCK8dXOz2oaEBSE/YsQb7e5SKNBvAAGeADIMQdIWQXCQerwa21ReAPcM8vG+Uq8ZEGWcGuvBvQoogJhngfAFdUQ2xW7CCr0ToOEB+O+LEdoZmasEBWVFOz0THZC6fegUyvy4OGuxRoix1MyqKAMuELuvlJ6jQD/P9F8llbiT6bpNckgGBYMBgCEZCZjCKUQr8qDUQRsbMOUZ0Zc+Ysk6AAATRVkijlGKFVggUuGuJlKKF4YQWptkXaEcapH9AXrNXPTZNUNvSUPvWcHScVKDQJWVJLKCvED0MiSistxMLeLMJcfisFwysoBSvrPSpFw8I5k9N8O9MG19PyP9JIt4w8T/RyuqFAoZkqcgCJ2qaKp/KEdKqmZAJaeMzaf+w6ezwwxGx6Y/tyX6agDx02NIXIQNKAoJi+AA2hn3rGYmekCaaSrs0gMinBlgaIC4EKgubpTcdwqIE8bQtR0KgYvqDOwGw0FoCsffqbs/stiHAorQCopZuFtZ1cyZUhehZfr3LcNFu93c3GYKuowLL+sSoC0fIGucGutxekEwNp3p3GsZymqetgGGUVp5zhcQA8IisxYg0KsuZJbqcfKuvJsF1GqwIpboCpcerpA50pa5wZfwiZfRcE1ZeEzipxYpq5Y0G+uJYpv5fJcURFq4ECAsUlaBplagDlc81pSufxdqShphrhovuZYxZebNfmQ0FJqhd5bownxYZpptbDpXUNfOaxfzI5aVZFydY5oAW5pJb5oxi2TrDJbpy1bhZ1b1dhfwltdlftc5eDZluDjlvTgvsFYmvDZFvZ3peTb51VvVqEaeZebKorkKheuVxIF9eNYS0VdqeDZ9p1rtrhdjf51LY0V1eYH1YLvLZDuqCrf9e/PLQ1uqDrbDobabfTaDeubVyRv7qoeHq9aN3nZZYAAEdlQLQLq3ZmdkZmp2hGWngF4I+Rarwr+gMXq6B6fKuBxc64pd8gtkfAV1EgFdjdyAZWMXNyyLwWqRFdIh/3ZXn6r9X6SJgOddIAwOBMMWgXGLvA1pLNkP6g4OUMNABAeTqBwgNRwOjWeahc2z8Xn2JdnQVofBF222kqf3t273ZWXLegnDYPQOF2J2wtj2K47NwCBLnFY3mWUt810s3Usscs7g8sCtwaDBfdB1aEytwh4hwhxF4gNR0iDEjEsinc/CfS3d8i8AOLgzptfFZswyaoK9qjozIlokTr6BWIZwGoXmbdGVfiACHXKAth3PACeOW4152pGqQFcyuPS0/OvRFlQvW26MuX+G8QZVuMuyR5N6+IoAB9bEK2QCuBCCMumhS42gNAkAu7G7XkLb4aL8nNxb98j2z2QDLxsY6DxIfOa3qgzyMgkVagqwMJAEuAXDVGw6WPO5XCgT4CbCSPhckqw7xuyPHW/10uWjpuRdsuLxcvG4Cuiv1nQXxlyutid9Gb4sFXA26P8X6v6ZrLFLjjPOKAgSoRHQxCYc3wqBI5X87yqRZyyEgvEUEVAKnz5uS4KdHJ1tvUJwaQsQQE+S9utrPoXOTs4upAEvskkuf4WwgS2ZF1KB9mhUJDaQQERnjSoBLgZMIowUf8dV1pfUvAxAvvjlVTdDiN8q2XsWjuYvizIvGeA3EAWuvRDoTqDv2W6VYfUYfOrvvOgiufUFcR2IsAMhW9MAqRQepDxTUmpSFD1pMm5Scn1C8nbIkdtSv1iniNccNjCVUd9np8PUvVsujDAw/u1bR20BIYYZduqvkAavxNp2vRCDEZpVOyLbMGTLEzFr5BmvwvYYFmpUumseGx3nUL0KmGnystuuvhsnnu+AOuVtwWdwffEu/eAuN4EyU+vs0/n9VH5dXKHmcDl8AzXDwiZTBuMInCy/WOK/cDq/VGE/r3aBk+nui/EVj0M/dwgjXXpkJuTuKQtwhxFEOvdzFEKbR/C1w/2gqvoeou1lwunzCeDk6Bjm9egNwjLYieqQjsOMhcr78YnMA5XjahSrV+JkM2krcI3eItQ7aDzub/2faUbdcJAAyAhD9q52VwjC97++LYxlAG5C7EOgKHFbBzFt7D9eaM3MEDlyeRn8VaRySHtV1v5XdTuMWc7kAKXbmsZCqWa1AKnE5D1csHCGTiaUNKCIREkRCILEGtKxE2gdpaPM1iSLx4XSmnTIqUmyJbAYoVBeAHwIsQhwLin4PKJNhDJmdS8lnSMksRs5/oXObzJsJ82XJcBWoHcBvvx2aQjQJwRJPvGgOZQxZXwo3XUE4QIoaN0Q0IY8A2wKaqVOSqNY8BEFHhUlOw7QGsOcgh6DkVAxdN2shCXhwh64sjPptYEYKsg00Ipf3nuiIieAQqbxfSspVZhhtToeMC3hGCJgkwCc0aPEhJBYC1hmgIzYknFicGowrq2Q46GJUwCyBkIJwKiGOAUG38bQSEDxl41RxqDWYy+f+q4R0FL8tke+T6MgwOLpDP0qAK6jPwp7+pd4CKcOCeHhzpAxwAAdRZiD9jiZg9RjJXdQNsY4VIF3pAGmDODRgWCBYUsJMGrDH2Fg91KCGHaxw0BOwrwHDROEPtKCMlWGP2Cuiyh9wDlS8M7WoAXgmMQ4cGE4NUrGYHhqjcwc8M3xTg3uDAKmkOFuQXhmYd2FsptkmjaV24QQzGn3HRr2AfQ2TGgC9D2QvZmUDjTVGvDAhk9vQmce0KMHJLSMTM6QeRhfCxHgwUBjKcyKXV6bjJYYEaLpMxBjTb42RJXHjN2RlDyBVKq1FADZQnBkFsRVIVAAAG8IgAAX1PTyd9+kAURPEHETRBGB8iZgVHgSLqJWsqODrG6QEgGBDIVlOLOZEkjlEmB9kIcI5DQDOQgkS2DyFpG8hqBfI+kAKOaKEh2QTyEBQBL9XtLd1qgTjM0RaIACc8QeMJEEjHTB4wGoNALEFiAah4wAgSYGgHjGRBIgtAaIGgAmBRBIxtADUBqGmCxBpgtAaYPECzHxB/IgUC0TULcRlBg4+owUEzhEj1jfREAeaurQuKU1bGTlZTOGJ9EGA5RAmQqEgFsA1kLyUIWgJ43tG4ArAW2OgIVGy765yKE4yGsiFqoziNwEIWwGuPggbjsIE4pAFrCkCmJuMR4klHOE3H9BCoD0bnLdGuAbgqg7JRAHKEHFHj9i94yAI+LKA2Bbo7gTEiQC/HQQfxXEP8QBOfEYAbQlIh0COHAl2NIJd8U8Q+MwhzjLgH4VnO+KPHFR0J/47RshIhCTISCiAI8T4EQ6QBxxTHB8VBDsaDBj0+EkCTlgYkQgfm1Eh8QqlZCoS/xTHQqPEnsSY9nQLEicMUzYleBnA9aL2LYkrDVhno8gICHaHgCjBshMoYzP7i4DuhkILzVWt6CaHNhmBbYLoN2AwgmVSI1hMyR4DD5RsqImFGkZSVQCIwmA1CQQCIB4h5BxhuDegIvwOhOiIywSFbJ+E4l0T/xLYfCbNGtTZhQpdEwqKQxaGoQMgN4k8VxP/GdU9gqqEiUxLYD4SJJJAbxEx0VGET+gtEuKexJykFSnm8ElSZRyvDfiSpd7QqDxIolnYoJjUh8UJPYQjgxJIYW0PaDqlqUr+wEY+BWHORdSRJWABqMzG3yoBRgyEdcPCCFo6TjwApMkViADD1gfQpDCMFGCNqucZQKYayCKi9gUQJ+4COHMIXSBcivYVEBQl/CYCXiAsOg5EO1mBLOZHKpYFgH1PYp9AVAC6b3uWHknNAaAqQa1NTGCmMpRoJDfya+2wSxSBJEUp5lFKbAIympCUr5klKqnHi7xHU9KfaEykrRspzEp5spIGk9TqJxU6iWVIEkVSSZ/418QwFezHhPGT00gGjO4nDheJ643GWlMElaphJqhfCYzOZkclo0VAdiMgHCCxANAuoaIAAFJHJICEyEmjUTVguqmgFISzhIYbZEE71XycgGiCRANAOY+WRoA5n/iBcAEHqU83mGrtGgG4UWXCIRgTge8WIaUieQ/B+Dgw7cToLiHECIB4w8gFyeLICxBg9Zcac2XjMKhIz/xKMmKdHIykBCPAxM3KaTMdlEjswFEymQJkSCETCo2jWwDVPJl1h8JaAOWepyzGRiBAqYyIKIHCDTASAkYtECQHiCTBIxuoaYNMBrkkp4g0QI2ZGI1DhBIxkY+MLqEND5wBUuodMePNiC6hwgAgeINMF1C0B4wsQcILFILnvglxAQTwNjMKixA+y0QTMX3IYCRB1OkQeMJGKiBoBkxtAeeTWBIAahdQzc8IAwAEDNyBAtAQsQWNoBzzqxVY+IKID/nqcNQowcCLEEjGbzCR7JVmcoFIAvkMeK0VZEeJpn/jFmORTpgZxGybViieUVBXzIIDVAPAhhd2bxCPEnE+ZbslVIgHmH7ARZmc9tEePCCUzGphUDBXpywU55umyAjkTUA8QEKwp7C5iCtFIU0LmFlCoRdQtWJ0L2oDCj8UeKqxFS2FHC5ZnkRGwmCBFXANBQ+KIWiLreJEChdHOkW8RZFsAeRU2FanwdWF1E9hTpy9KA51FjiIznOC0U0TCFIikhQYqsWSK4pJim9GYosVZyjxUtZRbYo4X8DuAggmKCIPbRuKdF6CzxWItWJGKqF3iwJRnIUXjAbFAk1RYDjyhZQ0oGUQpTlHwXaKPFxC5JeQvGDGL0l9CzJZYpCU5z+giogwK0u7FQAwwfY0gBPkHHBxOx+gIAA== -->\n\n<!-- internal state end -->"},"request":{"retryCount":1}},"response":{"url":"https://api.github.com/repos/NVIDIA/TensorRT-LLM/issues/comments/3843719469","status":401,"headers":{"access-control-allow-origin":"*","access-control-expose-headers":"ETag, Link, Location, Retry-After, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Used, X-RateLimit-Resource, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval, X-GitHub-Media-Type, X-GitHub-SSO, X-GitHub-Request-Id, Deprecation, Sunset","connection":"close","content-security-policy":"default-src 'none'","content-type":"application/json; charset=utf-8","date":"Wed, 11 Feb 2026 16:06:37 GMT","referrer-policy":"origin-when-cross-origin, strict-origin-when-cross-origin","server":"github.com","strict-transport-security":"max-age=31536000; includeSubdomains; preload","vary":"Accept-Encoding, Accept, X-Requested-With","x-content-type-options":"nosniff","x-frame-options":"deny","x-github-media-type":"github.v3; format=json","x-github-request-id":"5013:B611A:C6C961:35AE01B:698CA90D","x-xss-protection":"0"},"data":{"message":"Bad credentials","documentation_url":"https://docs.github.com/rest","status":"401"}}}

mikeiovine · 2026-02-11T16:06:42Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-02-11T16:13:03Z

PR_Github #35645 [ run ] triggered by Bot. Commit: c92f8a6

tensorrt-cicd · 2026-02-11T16:13:04Z

PR_Github #35645 [ run ] completed with state DISABLED
CI server is currently disabled for unplanned maintenance. Estimated completion time: 8 AM PST on 2/11.

mikeiovine · 2026-02-12T18:18:25Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-02-12T18:24:42Z

PR_Github #35808 [ run ] triggered by Bot. Commit: c92f8a6

tensorrt-cicd · 2026-02-13T00:55:26Z

PR_Github #35808 [ run ] completed with state SUCCESS. Commit: c92f8a6
/LLM/main/L0_MergeRequest_PR pipeline #27658 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

mikeiovine · 2026-02-13T16:20:52Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-02-13T16:26:35Z

PR_Github #35921 [ run ] triggered by Bot. Commit: d16ac26

mikeiovine · 2026-02-13T20:06:05Z

/bot run

tensorrt-cicd · 2026-02-13T20:12:34Z

PR_Github #35937 [ run ] triggered by Bot. Commit: 0c712f9

tensorrt-cicd · 2026-02-14T00:10:35Z

PR_Github #35937 [ run ] completed with state SUCCESS. Commit: 0c712f9
/LLM/main/L0_MergeRequest_PR pipeline #27753 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

mikeiovine · 2026-02-17T14:59:12Z

/bot run

tensorrt-cicd · 2026-02-17T15:04:56Z

PR_Github #36087 [ run ] triggered by Bot. Commit: 5664232

tensorrt-cicd · 2026-02-18T10:20:18Z

PR_Github #36087 [ run ] completed with state SUCCESS. Commit: 5664232
/LLM/main/L0_MergeRequest_PR pipeline #27884 completed with status: 'SUCCESS'

zheyuf

Looks good to me

mikeiovine requested a review from IzzyPutterman February 3, 2026 21:03

mikeiovine requested review from a team as code owners February 3, 2026 21:04

mikeiovine requested review from dongxuy04, hchings and nv-yilinf February 3, 2026 21:04

mikeiovine force-pushed the saver-1-model branch from 97f094f to feafa60 Compare February 3, 2026 21:13

mikeiovine commented Feb 3, 2026

View reviewed changes

tensorrt_llm/_torch/speculative/utils.py Show resolved Hide resolved

coderabbitai bot reviewed Feb 3, 2026

View reviewed changes

mikeiovine force-pushed the saver-1-model branch from feafa60 to b70c4ec Compare February 3, 2026 21:19

IzzyPutterman reviewed Feb 4, 2026

View reviewed changes

tensorrt_llm/_torch/speculative/save_hidden_state.py Outdated Show resolved Hide resolved

tensorrt_llm/_torch/speculative/save_hidden_state.py Show resolved Hide resolved

mikeiovine force-pushed the saver-1-model branch from b70c4ec to daffa6e Compare February 4, 2026 16:25

mikeiovine requested a review from IzzyPutterman February 4, 2026 16:25

Move SAVE_HIDDEN_STATES to 1 model

c92f8a6

Signed-off-by: Mike Iovine <miovine@nvidia.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>

mikeiovine force-pushed the saver-1-model branch from 876761b to c92f8a6 Compare February 11, 2026 16:06

Merge branch 'main' into saver-1-model

d16ac26

Merge branch 'main' into saver-1-model

0c712f9

Merge branch 'main' into saver-1-model

5664232

pcastonguay approved these changes Feb 19, 2026

View reviewed changes

zheyuf approved these changes Feb 20, 2026

View reviewed changes

mikeiovine merged commit fa2bfa5 into NVIDIA:main Feb 20, 2026
5 checks passed

mikeiovine deleted the saver-1-model branch February 20, 2026 15:32

Comments

Conversation

mikeiovine commented Feb 3, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Coverage

PR Checklist

GitHub Bot Help

kill

skip

reuse-pipeline

Summary by CodeRabbit

Release Notes

Uh oh!

Uh oh!

coderabbitai bot commented Feb 3, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mikeiovine commented Feb 3, 2026

Uh oh!

tensorrt-cicd commented Feb 3, 2026

Uh oh!

tensorrt-cicd commented Feb 4, 2026

Uh oh!

Uh oh!

Uh oh!

mikeiovine commented Feb 4, 2026

Uh oh!

tensorrt-cicd commented Feb 4, 2026

Uh oh!

tensorrt-cicd commented Feb 4, 2026

Uh oh!

mikeiovine commented Feb 5, 2026

Uh oh!

tensorrt-cicd commented Feb 5, 2026

Uh oh!

tensorrt-cicd commented Feb 5, 2026

Uh oh!

mikeiovine commented Feb 6, 2026

Uh oh!

tensorrt-cicd commented Feb 6, 2026

Uh oh!

tensorrt-cicd commented Feb 6, 2026

Uh oh!

mikeiovine commented Feb 9, 2026

Uh oh!

tensorrt-cicd commented Feb 9, 2026

Uh oh!

tensorrt-cicd commented Feb 10, 2026

Uh oh!

mikeiovine commented Feb 10, 2026

Uh oh!

tensorrt-cicd commented Feb 10, 2026

Uh oh!

tensorrt-cicd commented Feb 10, 2026

Uh oh!

coderabbitai bot commented Feb 11, 2026

Uh oh!

mikeiovine commented Feb 11, 2026

Uh oh!

tensorrt-cicd commented Feb 11, 2026

Uh oh!

tensorrt-cicd commented Feb 11, 2026

Uh oh!

mikeiovine commented Feb 12, 2026

Uh oh!

tensorrt-cicd commented Feb 12, 2026

Uh oh!

tensorrt-cicd commented Feb 13, 2026

Uh oh!

mikeiovine commented Feb 13, 2026

mikeiovine commented Feb 3, 2026 •

edited by coderabbitai bot

Loading