Skip to content

Comments

[#10826][feat] AutoDeploy: Eagle One-Model [2/n]: Prefill-Only Implementation#11073

Merged
govind-ramnarayan merged 7 commits intoNVIDIA:mainfrom
nv-auto-deploy:gramnarayan/eagle-wrapper
Feb 2, 2026
Merged

[#10826][feat] AutoDeploy: Eagle One-Model [2/n]: Prefill-Only Implementation#11073
govind-ramnarayan merged 7 commits intoNVIDIA:mainfrom
nv-auto-deploy:gramnarayan/eagle-wrapper

Conversation

@govind-ramnarayan
Copy link
Collaborator

@govind-ramnarayan govind-ramnarayan commented Jan 28, 2026

fixes: #10826

--Manual Summary--
Implements eagle one-model flow in prefill-only setting. This means no cached attention, so sequences are provided in full to the model. This is to be used for tracing and export in AutoDeploy.

Testing plan: Since this implements end-to-end Eagle support in the prefill-only setting, it can be tested by checking the acceptance rate of the model. This not only tests the correctness of the "glue code" (EagleWrapper) in this PR, but also verifies the correctness of the Eagle checkpoint architecture and model loading.

The test (test_eagle_wrapper_forward()) sets up a custom "target model with capture" that manually captures hidden states from preconfigured layers, creates a resource manager to store these target hidden states, then instantiates the EagleWrapper with the target model, resource manager, and draft model. Then it runs the EagleWrapper autoregressively and checks the acceptance rate for decode requests. This is done for batch sizes 1 and 2.

A note: The batch size 2 tests are quite hacky due to the fact that as we speculate, the sequences grow at different rates, and we are dealing with padded (not packed) representation here. To compensate, we discard all speculated tokens between each run, and calculate an acceptance rate by manually running these speculated tokens through the target model in a separate step. This makes the autoregressive loop identical to the normal decoding loop, but we check the accuracy of speculated tokens at each step.

Summary by CodeRabbit

  • New Features

    • Introduced EagleWrapper for speculative decoding, enabling efficient token verification through draft and target model collaboration.
    • Added support for reusing target model embeddings and language head in draft models, reducing memory footprint.
    • Enhanced configuration management for Eagle-based drafting with improved model-type handling.
  • Tests

    • Added comprehensive integration tests for speculative decoding workflows with multi-batch support and token verification.

✏️ Tip: You can customize this high-level summary in your review settings.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@govind-ramnarayan govind-ramnarayan force-pushed the gramnarayan/eagle-wrapper branch 2 times, most recently from 46745b6 to 57cf48a Compare January 28, 2026 23:44
@govind-ramnarayan govind-ramnarayan changed the title [#10826][Feature] AutoDeploy: Eagle One-Model [2/n]: Prefill-Only Implementation [#10826][feat] AutoDeploy: Eagle One-Model [2/n]: Prefill-Only Implementation Jan 28, 2026
@govind-ramnarayan
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #33930 [ run ] triggered by Bot. Commit: 57cf48a

@govind-ramnarayan govind-ramnarayan marked this pull request as ready for review January 29, 2026 00:08
@govind-ramnarayan govind-ramnarayan requested a review from a team as a code owner January 29, 2026 00:08
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 29, 2026

📝 Walkthrough

Walkthrough

This pull request implements a wrapper-based speculative decoding flow for Eagle models by introducing the EagleWrapper class with embedding/LM head management, implementing EagleConfig for configuration handling, enhancing Eagle3DrafterForCausalLM with conditional component loading, and adding comprehensive test infrastructure for prefill-only Eagle workflows.

Changes

Cohort / File(s) Summary
Core Eagle Model Implementation
tensorrt_llm/_torch/auto_deploy/models/custom/modeling_eagle.py
Added dtype management to Eagle3DecoderLayer and Eagle3Model for consistent embedding casting; introduced conditional embedding/LM head initialization via load_embedding_from_target/load_lm_head_from_target flags; replaced hard-coded 3-layer fc fusion with conditional self.fc based on num_capture_layers; modified forward signatures to accept inputs_embeds instead of input_ids. New classes: EagleWrapper (nn.Module), EagleWrapperOutput, EagleWrapperConfig with helper methods for embedding application, sampling (greedy and sample_and_verify), and prefill-only forward routing.
Eagle Configuration Management
tensorrt_llm/_torch/auto_deploy/models/eagle.py
Added EagleConfigInfo dataclass (holding drafter class and per-model defaults) and EagleConfig class (PretrainedConfig subclass for Eagle-specific configuration); replaced simple model_type-to-drafter mapping with _drafter_mapping linking model_type to EagleConfigInfo; updated EagleDrafterFactory._build_model to look up mapping and construct concrete EagleConfig by merging defaults.
Test Infrastructure & Helpers
tests/integration/defs/examples/test_ad_speculative_decoding.py
Added PrefillOnlyEagleResourceManager for hidden state buffer allocation; introduced LlamaModelWithCapture and LlamaForCausalLMWithCapture to capture and share hidden states across specified layers; added build_eagle_wrapper, generate_target_outputs, print_token_analysis, manual_sample_and_verify, and verify_eagle_wrapper_output helper functions; expanded test_eagle_wrapper_forward to exercise prefill-only EagleWrapper workflow with single- and multi-batch scenarios.
Unit Test Updates
tests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_eagle.py
Updated MockEagle3ModelForCausalLM.forward signature to accept input_embeds parameter; changed to compute logits via Eagle3DraftOutput; updated MockEagleDrafterFactory._drafter_mapping to use EagleConfigInfo instances; adjusted export path and logging to reflect new inputs_embeds-centric flow.
Test List Updates
tests/integration/test_lists/test-db/l0_h100.yml
Added two new test cases: test_eagle_wrapper_forward[1] and test_eagle_wrapper_forward[2] to autodeploy test suite.

Sequence Diagrams

sequenceDiagram
    actor User
    participant EagleWrapper
    participant TargetModel
    participant DraftModel
    participant Sampler

    User->>EagleWrapper: sample_and_verify(inputs_embeds, draft_input_ids, ...)
    EagleWrapper->>TargetModel: forward(inputs_embeds)
    TargetModel-->>EagleWrapper: logits, hidden_state
    EagleWrapper->>DraftModel: apply_eagle3_fc(hidden_state)
    DraftModel-->>EagleWrapper: compressed_hidden_state
    EagleWrapper->>DraftModel: apply_lm_head(compressed_hidden_state)
    DraftModel-->>EagleWrapper: draft_logits
    EagleWrapper->>Sampler: sample_greedy(draft_logits)
    Sampler-->>EagleWrapper: draft_token_ids
    EagleWrapper->>EagleWrapper: compare_with_target(draft_ids, target_logits)
    EagleWrapper-->>User: accepted_tokens, num_newly_accepted
Loading
sequenceDiagram
    actor Caller
    participant EagleWrapper as EagleWrapper._forward_prefill_only
    participant TargetModel
    participant DraftModel

    Caller->>EagleWrapper: forward(inputs_embeds, position_ids, num_previously_accepted=None)
    activate EagleWrapper
    EagleWrapper->>TargetModel: generate logits iteratively
    TargetModel-->>EagleWrapper: target_logits
    EagleWrapper->>DraftModel: draft_tokens via sample_and_verify
    DraftModel-->>EagleWrapper: accepted_tokens, hidden_states
    EagleWrapper->>EagleWrapper: consolidate hidden_state_history
    EagleWrapper-->>Caller: EagleWrapperOutput(new_tokens, new_tokens_lens)
    deactivate EagleWrapper
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 45.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ⚠️ Warning PR description includes manual summary and testing plan but lacks structured completion of required template sections (Description, Test Coverage, PR Checklist). Complete the Description and Test Coverage sections with clear explanations. Verify all PR Checklist items are properly addressed and checked.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main feature being added: a prefill-only implementation of Eagle One-Model for AutoDeploy, which aligns directly with the code changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
tensorrt_llm/_torch/auto_deploy/models/eagle.py (1)

1-2: Update the NVIDIA copyright year to 2026.

This file was meaningfully modified in this PR; the header should reflect the latest year.

✅ Suggested update
-# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
As per coding guidelines “All TensorRT-LLM source files (.cpp, .h, .cu, .py, and other source files) should contain an NVIDIA copyright header with the year of latest meaningful modification.”
tests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_eagle.py (1)

56-77: Fix inputs_embeds initialization when input_embeds is provided.

As written, inputs_embeds is undefined when input_embeds is non-None, which will raise UnboundLocalError.

✅ Suggested fix
-        if input_embeds is None:
-            inputs_embeds = self.model.embed_tokens(input_ids)
+        if input_embeds is None:
+            inputs_embeds = self.model.embed_tokens(input_ids)
+        else:
+            inputs_embeds = input_embeds
🤖 Fix all issues with AI agents
In `@tensorrt_llm/_torch/auto_deploy/models/custom/modeling_eagle.py`:
- Around line 452-455: The apply_eagle3_fc method can return None when
self.model.fc is absent, breaking callers expecting a tensor; modify
apply_eagle3_fc (in the class containing self.model) to return the original
hidden_states when self.model.fc is None (i.e., use an explicit
else/early-return returning hidden_states) so callers always receive a
torch.Tensor.
- Around line 588-694: The code is slicing tensors using 0-d CUDA tensors
(num_previously_accepted, num_newly_accepted_tokens), which will fail on CUDA;
convert those index tensors to Python ints before using them for
slicing/indexing (use .item() or int(... .cpu().item()) as appropriate). Update
usages in sample_and_verify where unchecked_input_ids and
unchecked_target_logits are built (the list comprehensions using
num_previously_accepted[i] and seq_len), the loop that builds draft_input_ids
(prev_accepted = input_ids[i, 1 : int(num_previously_accepted[i])],
newly_accepted slice using num_previously_accepted[i] and
num_newly_accepted_tokens[i], next_token indexing into
unchecked_output_ids[i][0][int(num_newly_accepted_tokens[i])]), and when
selecting bonus_logit from unchecked_target_logits (index with
int(num_newly_accepted_tokens[i])). After changes, run a CUDA test to verify
slicing no longer errors.

In `@tests/integration/defs/examples/test_ad_speculative_decoding.py`:
- Around line 603-613: In LlamaModelWithCapture.forward change the kwarg passed
to self.model from input_embeds to inputs_embeds (or rename the parameter to
inputs_embeds) so the call uses inputs_embeds=input_embeds; update the forward
signature or the call site accordingly (function: LlamaModelWithCapture.forward,
symbol: self.model(... input_embeds=... ) and logits = self.lm_head(...)) to
ensure exactly one of input_ids or inputs_embeds is provided.
- Around line 623-624: The method get_output_embeddings in LlamaModelWithCapture
currently returns self.model.lm_head which will raise because
LlamaModelWithCapture does not define model.lm_head; change
get_output_embeddings to return the module's actual output head (e.g.,
self.lm_head) or access the correct attribute on the wrapped model (e.g.,
getattr(self, "lm_head", getattr(self.model, "lm_head", None))) so it returns
the real output embedding layer; update get_output_embeddings to reference the
existing attribute (lm_head) on LlamaModelWithCapture rather than
self.model.lm_head.
🧹 Nitpick comments (5)
tensorrt_llm/_torch/auto_deploy/models/eagle.py (1)

24-33: Use module-qualified imports and annotate _drafter_mapping as ClassVar.

This aligns with the namespace import guideline and resolves the mutable class-attribute lint.

♻️ Suggested refactor
-from dataclasses import dataclass
-from typing import Dict, Type
+import dataclasses
+import typing

-from transformers import PretrainedConfig, PreTrainedModel
+import transformers
@@
-@dataclass
+@dataclasses.dataclass
 class EagleConfigInfo:
@@
-    config_class: Type[PreTrainedModel]
+    config_class: typing.Type[transformers.PreTrainedModel]
@@
-    _drafter_mapping: Dict[str, EagleConfigInfo] = {
+    _drafter_mapping: typing.ClassVar[typing.Dict[str, EagleConfigInfo]] = {
As per coding guidelines “Always maintain the namespace when importing Python modules, even if only one class or function from a module is used.”

Also applies to: 94-104

tensorrt_llm/_torch/auto_deploy/models/custom/modeling_eagle.py (1)

470-497: Align EagleWrapperOutput.new_tokens type with actual return value.

_forward_prefill_only returns a list of tensors, but new_tokens is typed as Optional[torch.Tensor]. Consider updating the annotation (or stacking to a tensor).

♻️ Suggested type update
-    new_tokens: Optional[torch.Tensor] = None
+    new_tokens: Optional[list[torch.Tensor]] = None
tests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_eagle.py (1)

25-30: Use module-qualified imports per namespace guideline.

Prefer importing the module and qualifying symbols to keep namespaces intact.

♻️ Suggested refactor
-from tensorrt_llm._torch.auto_deploy.models.custom.modeling_eagle import (
-    Eagle3DrafterForCausalLM,
-    Eagle3DraftOutput,
-)
-from tensorrt_llm._torch.auto_deploy.models.eagle import EagleConfigInfo, EagleDrafterFactory
+import tensorrt_llm._torch.auto_deploy.models.custom.modeling_eagle as modeling_eagle
+import tensorrt_llm._torch.auto_deploy.models.eagle as eagle_models
@@
-class MockEagle3ModelForCausalLM(Eagle3DrafterForCausalLM):
+class MockEagle3ModelForCausalLM(modeling_eagle.Eagle3DrafterForCausalLM):
@@
-        return Eagle3DraftOutput(logits=logits, last_hidden_state=draft_output.last_hidden_state)
+        return modeling_eagle.Eagle3DraftOutput(
+            logits=logits, last_hidden_state=draft_output.last_hidden_state
+        )
@@
-class MockEagleDrafterFactory(EagleDrafterFactory):
+class MockEagleDrafterFactory(eagle_models.EagleDrafterFactory):
@@
-        "llama": EagleConfigInfo(
+        "llama": eagle_models.EagleConfigInfo(
             config_class=MockEagle3ModelForCausalLM,
As per coding guidelines “Always maintain the namespace when importing Python modules, even if only one class or function from a module is used.”
tests/integration/defs/examples/test_ad_speculative_decoding.py (2)

18-31: Use module-qualified imports per namespace guideline.

Several new from ... import ... statements violate the namespace import rule.

♻️ Suggested refactor (pattern)
-from dataclasses import dataclass
+import dataclasses
@@
-from typing import Optional, Set
+import typing
@@
-from transformers import AutoModelForCausalLM, AutoTokenizer
-from transformers.masking_utils import create_causal_mask
-from transformers.modeling_outputs import BaseModelOutputWithPast
-from transformers.models.llama.modeling_llama import LlamaModel
-from transformers.utils.generic import ModelOutput
+import transformers

Then reference as dataclasses.dataclass, typing.Optional, transformers.AutoModelForCausalLM, transformers.models.llama.modeling_llama.LlamaModel, etc.

As per coding guidelines “Always maintain the namespace when importing Python modules, even if only one class or function from a module is used.”

663-672: Remove or use max_seq_len in build_eagle_wrapper.

It’s currently unused; either drop it or wire it into buffer sizing to avoid confusion.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #33930 [ run ] completed with state SUCCESS. Commit: 57cf48a
/LLM/main/L0_MergeRequest_PR pipeline #26170 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@govind-ramnarayan
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #34101 [ run ] triggered by Bot. Commit: b3e57fd

@tensorrt-cicd
Copy link
Collaborator

PR_Github #34101 [ run ] completed with state SUCCESS. Commit: b3e57fd
/LLM/main/L0_MergeRequest_PR pipeline #26312 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Copy link
Member

@lucaslie lucaslie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks good and it looks helpful for testing. My only concern is that this is very detached from what eventually will be the e2e workflow with AutoDeploy and looks more like a HF-style implementation to run it e2e. That being said, it's a good intermediate milestone and I just wanted to make you aware that things may have to change once it gets integrated more closely into AD

@govind-ramnarayan govind-ramnarayan force-pushed the gramnarayan/eagle-wrapper branch from b3e57fd to 8578338 Compare January 30, 2026 20:11
@govind-ramnarayan
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #34252 [ run ] triggered by Bot. Commit: 8578338

@govind-ramnarayan govind-ramnarayan force-pushed the gramnarayan/eagle-wrapper branch from f15b347 to 66162fb Compare January 30, 2026 21:12
@govind-ramnarayan
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #34256 [ run ] triggered by Bot. Commit: 66162fb

@tensorrt-cicd
Copy link
Collaborator

PR_Github #34256 [ run ] completed with state SUCCESS. Commit: 66162fb
/LLM/main/L0_MergeRequest_PR pipeline #26419 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@govind-ramnarayan govind-ramnarayan force-pushed the gramnarayan/eagle-wrapper branch from 66162fb to 6ea54f1 Compare January 31, 2026 01:52
@govind-ramnarayan
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #34271 [ run ] triggered by Bot. Commit: 6ea54f1

@tensorrt-cicd
Copy link
Collaborator

PR_Github #34271 [ run ] completed with state SUCCESS. Commit: 6ea54f1
/LLM/main/L0_MergeRequest_PR pipeline #26431 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@govind-ramnarayan
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #34324 [ run ] triggered by Bot. Commit: 6ea54f1

@tensorrt-cicd
Copy link
Collaborator

PR_Github #34324 [ run ] completed with state SUCCESS. Commit: 6ea54f1
/LLM/main/L0_MergeRequest_PR pipeline #26474 completed with status: 'SUCCESS'

@govind-ramnarayan govind-ramnarayan enabled auto-merge (squash) February 1, 2026 06:02
@govind-ramnarayan
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #34339 [ run ] triggered by Bot. Commit: 794d8c1

@tensorrt-cicd
Copy link
Collaborator

PR_Github #34339 [ run ] completed with state SUCCESS. Commit: 794d8c1
/LLM/main/L0_MergeRequest_PR pipeline #26488 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@govind-ramnarayan
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #34344 [ run ] triggered by Bot. Commit: 794d8c1

@govind-ramnarayan govind-ramnarayan force-pushed the gramnarayan/eagle-wrapper branch from 794d8c1 to 3a94eac Compare February 1, 2026 08:30
@govind-ramnarayan
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #34347 [ run ] triggered by Bot. Commit: 3a94eac

@tensorrt-cicd
Copy link
Collaborator

PR_Github #34347 [ run ] completed with state SUCCESS. Commit: 3a94eac
/LLM/main/L0_MergeRequest_PR pipeline #26495 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

… code with Llama + Eagle3. Acceptance rate seems good

Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
…ded in full

Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
@govind-ramnarayan govind-ramnarayan force-pushed the gramnarayan/eagle-wrapper branch from 3a94eac to 56ae5af Compare February 1, 2026 19:28
@govind-ramnarayan
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #34369 [ run ] triggered by Bot. Commit: 56ae5af

@tensorrt-cicd
Copy link
Collaborator

PR_Github #34369 [ run ] completed with state SUCCESS. Commit: 56ae5af
/LLM/main/L0_MergeRequest_PR pipeline #26515 completed with status: 'SUCCESS'

@govind-ramnarayan govind-ramnarayan merged commit 585fbb2 into NVIDIA:main Feb 2, 2026
5 checks passed
@github-project-automation github-project-automation bot moved this from Backlog to Done in AutoDeploy Board Feb 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

[AutoDeploy][Feature]: Eagle3 One Model [2/n]: One model impl (Vanilla PyTorch)

3 participants