Skip to content

Comments

[None][feat] Initial PR for trtllm-gen attention backend#10784

Merged
yihwang-nv merged 19 commits intoNVIDIA:mainfrom
yihwang-nv:yihwang/trtllm-gen-attn-01
Feb 11, 2026
Merged

[None][feat] Initial PR for trtllm-gen attention backend#10784
yihwang-nv merged 19 commits intoNVIDIA:mainfrom
yihwang-nv:yihwang/trtllm-gen-attn-01

Conversation

@yihwang-nv
Copy link
Collaborator

@yihwang-nv yihwang-nv commented Jan 18, 2026

Description

This is the initial PR to introduce trtllm-gen attention backend, It will only use trtllm-gen fmha kernels and will be an experimental path of the TrtllmAttention backend.

This backend will enable iff TRTLLM_ENABLE_TRTLLM_GEN_ATTENTION=1.

Summary by CodeRabbit

  • New Features
    • Added support for TRTLLM-Gen attention backend, controllable via environment flag for opt-in activation. The backend selection mechanism allows switching between attention implementations without altering the public API.

✏️ Tip: You can customize this high-level summary in your review settings.

@yihwang-nv yihwang-nv marked this pull request as ready for review January 18, 2026 16:46
@yihwang-nv yihwang-nv requested a review from a team as a code owner January 18, 2026 16:46
@yihwang-nv yihwang-nv requested a review from yuxianq January 18, 2026 16:46
@yihwang-nv yihwang-nv force-pushed the yihwang/trtllm-gen-attn-01 branch from 5d09597 to 0c0c796 Compare January 18, 2026 16:48
@yihwang-nv
Copy link
Collaborator Author

/bot run

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 18, 2026

Caution

Docstrings generation - FAILED

An unexpected error occurred while opening a pull request: Reference update failed - https://docs.github.com/rest/git/refs#create-a-reference

@NVIDIA NVIDIA deleted a comment from coderabbitai bot Jan 18, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 18, 2026

Oops, something went wrong! Please try again later. 🐰 💔

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 18, 2026

📝 Walkthrough

\n\n

\n📝 Walkthrough\n\n## Walkthrough\n\nIntroduces a feature-flag-controlled alternative attention backend (TRTLLM-Gen) that conditionally routes attention operations based on an environment variable. The new backend function is defined with extensive documentation but remains unimplemented (raises NotImplementedError).\n\n## Changes\n\n| Cohort / File(s) | Summary |\n|---|---|\n| Conditional routing in TrtllmAttentionWrapper
tensorrt_llm/_torch/attention_backend/trtllm.py | Adds module-level feature flag _TRTLLM_ENABLE_TRTLLM_GEN_ATTENTION (sourced from environment variable, default "0"). Conditionally imports trtllm_gen_attention from .trtllm_gen. In TrtllmAttentionWrapper.run(), routes attention calls to either trtllm_gen_attention() or existing thop.attention() based on flag state, maintaining argument parity. |\n| New attention backend stub
tensorrt_llm/_torch/attention_backend/trtllm_gen.py | Adds new public attention() function with comprehensive docstring covering prefill/decode phases, paged KV cache, MLA, speculative decoding, and quantization support. Function signature accepts extensive tensor and parameter list but raises NotImplementedError, marking this as a placeholder awaiting implementation. |\n\n## Estimated code review effort\n\n🎯 3 (Moderate) | ⏱️ ~20 minutes\n\n
\n\n

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is incomplete. It lacks Test Coverage section and the PR Checklist required by the template. Only the Description section is partially filled. Add Test Coverage section explaining relevant tests safeguarding the changes, and complete the PR Checklist by addressing items such as CODEOWNERS updates and documentation.
✅ Passed checks (2 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Title check ✅ Passed The title clearly describes the main change: introducing an initial PR for the trtllm-gen attention backend feature.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@tensorrt_llm/_torch/attention_backend/trtllm_gen.py`:
- Around line 1-4: This file is missing the required NVIDIA copyright header;
add the standard NVIDIA copyright block at the very top of trtllm_gen.py (before
any imports) with the appropriate year of latest meaningful modification and
owner information consistent with other TensorRT-LLM source files (include
SPDX/license tag if your project uses it) so the header matches the project
coding guidelines.
🧹 Nitpick comments (2)
tensorrt_llm/_torch/attention_backend/trtllm.py (2)

28-34: Global variable naming convention.

Per the coding guidelines, Python global variables should use upper snake_case with prefix G (e.g., G_TRTLLM_ENABLE_TRTLLM_GEN_ATTENTION). The leading underscore indicates internal use, which is fine, but the G_ prefix is the project convention for globals.

Suggested naming
-# Enable TRTLLM-Gen attention backend via environment variable.
-_TRTLLM_ENABLE_TRTLLM_GEN_ATTENTION = os.environ.get(
-    "TRTLLM_ENABLE_TRTLLM_GEN_ATTENTION", "0") == "1"
+# Enable TRTLLM-Gen attention backend via environment variable.
+G_TRTLLM_ENABLE_TRTLLM_GEN_ATTENTION = os.environ.get(
+    "TRTLLM_ENABLE_TRTLLM_GEN_ATTENTION", "0") == "1"

-if _TRTLLM_ENABLE_TRTLLM_GEN_ATTENTION:
+if G_TRTLLM_ENABLE_TRTLLM_GEN_ATTENTION:
     from .trtllm_gen import attention as trtllm_gen_attention

As per coding guidelines, Python global variables should use upper snake_case with prefix G.


507-588: LGTM - Feature-gated dispatch with functional parity.

The conditional dispatch correctly routes to trtllm_gen_attention when the feature flag is enabled, with identical arguments passed to both paths. This ensures functional parity once the TRTLLM-Gen backend is implemented.

Note that enabling TRTLLM_ENABLE_TRTLLM_GEN_ATTENTION=1 will currently raise NotImplementedError since the backend is a placeholder. Consider adding a log message when the flag is enabled to inform users that this is an experimental/unimplemented path.

Optional: Add informational log at import time
 if _TRTLLM_ENABLE_TRTLLM_GEN_ATTENTION:
     from .trtllm_gen import attention as trtllm_gen_attention
+    logger.info("TRTLLM-Gen attention backend enabled (experimental)")

@tensorrt-cicd
Copy link
Collaborator

PR_Github #32465 [ run ] triggered by Bot. Commit: 0c0c796

@tensorrt-cicd
Copy link
Collaborator

PR_Github #32465 [ run ] completed with state SUCCESS. Commit: 0c0c796
/LLM/main/L0_MergeRequest_PR pipeline #25150 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@yihwang-nv yihwang-nv force-pushed the yihwang/trtllm-gen-attn-01 branch from 0c0c796 to 88db82c Compare January 21, 2026 08:03
@yihwang-nv
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #32922 [ run ] triggered by Bot. Commit: 88db82c

@tensorrt-cicd
Copy link
Collaborator

PR_Github #32922 [ run ] completed with state SUCCESS. Commit: 88db82c
/LLM/main/L0_MergeRequest_PR pipeline #25464 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@yihwang-nv
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #33062 [ run ] triggered by Bot. Commit: 88db82c

@tensorrt-cicd
Copy link
Collaborator

PR_Github #33062 [ run ] completed with state SUCCESS. Commit: 88db82c
/LLM/main/L0_MergeRequest_PR pipeline #25558 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@juney-nvidia juney-nvidia changed the title [None][feat] Initial patch for trtllm-gen attention backend [None][feat] Initial PR for trtllm-gen attention backend Jan 26, 2026
@yihwang-nv
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #33746 [ run ] triggered by Bot. Commit: 88db82c

@tensorrt-cicd
Copy link
Collaborator

PR_Github #33746 [ run ] completed with state FAILURE. Commit: 88db82c
/LLM/main/L0_MergeRequest_PR pipeline #26028 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@yihwang-nv yihwang-nv force-pushed the yihwang/trtllm-gen-attn-01 branch from 88db82c to d21470e Compare January 28, 2026 02:35
@yihwang-nv
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #33801 [ run ] triggered by Bot. Commit: d21470e

@tensorrt-cicd
Copy link
Collaborator

PR_Github #33801 [ run ] completed with state SUCCESS. Commit: d21470e
/LLM/main/L0_MergeRequest_PR pipeline #26069 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@yihwang-nv yihwang-nv requested a review from a team as a code owner January 29, 2026 07:52
Signed-off-by: Yihan Wang <yihwang@nvidia.com>
Signed-off-by: Yihan Wang <yihwang@nvidia.com>
@yihwang-nv
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35182 [ run ] triggered by Bot. Commit: 0762274

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35182 [ run ] completed with state SUCCESS. Commit: 0762274
/LLM/main/L0_MergeRequest_PR pipeline #27171 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@yihwang-nv
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35195 [ run ] triggered by Bot. Commit: 0762274

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35195 [ run ] completed with state SUCCESS. Commit: 0762274
/LLM/main/L0_MergeRequest_PR pipeline #27183 completed with status: 'SUCCESS'

Signed-off-by: Yihan Wang <yihwang@nvidia.com>
@yihwang-nv
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35220 [ run ] triggered by Bot. Commit: d478b02

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35220 [ run ] completed with state SUCCESS. Commit: d478b02
/LLM/main/L0_MergeRequest_PR pipeline #27207 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@yihwang-nv
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35259 [ run ] triggered by Bot. Commit: d478b02

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35259 [ run ] completed with state SUCCESS. Commit: d478b02
/LLM/main/L0_MergeRequest_PR pipeline #27224 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@yihwang-nv
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35297 [ run ] triggered by Bot. Commit: d478b02

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35297 [ run ] completed with state SUCCESS. Commit: d478b02
/LLM/main/L0_MergeRequest_PR pipeline #27256 completed with status: 'SUCCESS'

@yihwang-nv
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35596 [ run ] triggered by Bot. Commit: d478b02

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35596 [ run ] completed with state ABORTED. Commit: d478b02
LLM/main/L0_MergeRequest_PR #27494 (Blue Ocean) completed with status: ABORTED

@yihwang-nv
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35607 [ run ] triggered by Bot. Commit: d478b02

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35607 [ run ] completed with state SUCCESS. Commit: d478b02
/LLM/main/L0_MergeRequest_PR pipeline #27504 completed with status: 'SUCCESS'

@yihwang-nv yihwang-nv merged commit e8b8609 into NVIDIA:main Feb 11, 2026
5 checks passed
ekou24 pushed a commit to ekou24/TensorRT-LLM that referenced this pull request Feb 16, 2026
Signed-off-by: Yihan Wang <yihwang@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants