[None][feat] Initial PR for trtllm-gen attention backend#10784

yihwang-nv · 2026-01-18T16:46:52Z

Description

This is the initial PR to introduce trtllm-gen attention backend, It will only use trtllm-gen fmha kernels and will be an experimental path of the TrtllmAttention backend.

This backend will enable iff TRTLLM_ENABLE_TRTLLM_GEN_ATTENTION=1.

Summary by CodeRabbit

New Features
- Added support for TRTLLM-Gen attention backend, controllable via environment flag for opt-in activation. The backend selection mechanism allows switching between attention implementations without altering the public API.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

yihwang-nv · 2026-01-18T16:48:54Z

/bot run

coderabbitai · 2026-01-18T16:49:06Z

Caution

Docstrings generation - FAILED

An unexpected error occurred while opening a pull request: Reference update failed - https://docs.github.com/rest/git/refs#create-a-reference

coderabbitai · 2026-01-18T16:50:25Z

Oops, something went wrong! Please try again later. 🐰 💔

coderabbitai · 2026-01-18T16:51:39Z

📝 Walkthrough

\n\n

\n

📝 Walkthrough

\n\n## Walkthrough\n\nIntroduces a feature-flag-controlled alternative attention backend (TRTLLM-Gen) that conditionally routes attention operations based on an environment variable. The new backend function is defined with extensive documentation but remains unimplemented (raises NotImplementedError).\n\n## Changes\n\n| Cohort / File(s) | Summary |\n|---|---|\n| Conditional routing in TrtllmAttentionWrapper
tensorrt_llm/_torch/attention_backend/trtllm.py | Adds module-level feature flag _TRTLLM_ENABLE_TRTLLM_GEN_ATTENTION (sourced from environment variable, default "0"). Conditionally imports trtllm_gen_attention from .trtllm_gen. In TrtllmAttentionWrapper.run(), routes attention calls to either trtllm_gen_attention() or existing thop.attention() based on flag state, maintaining argument parity. |\n| New attention backend stub
tensorrt_llm/_torch/attention_backend/trtllm_gen.py | Adds new public attention() function with comprehensive docstring covering prefill/decode phases, paged KV cache, MLA, speculative decoding, and quantization support. Function signature accepts extensive tensor and parameter list but raises NotImplementedError, marking this as a placeholder awaiting implementation. |\n\n## Estimated code review effort\n\n🎯 3 (Moderate) | ⏱️ ~20 minutes\n\n

\n\n

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is incomplete. It lacks Test Coverage section and the PR Checklist required by the template. Only the Description section is partially filled.	Add Test Coverage section explaining relevant tests safeguarding the changes, and complete the PR Checklist by addressing items such as CODEOWNERS updates and documentation.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Title check	✅ Passed	The title clearly describes the main change: introducing an initial PR for the trtllm-gen attention backend feature.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@tensorrt_llm/_torch/attention_backend/trtllm_gen.py`:
- Around line 1-4: This file is missing the required NVIDIA copyright header;
add the standard NVIDIA copyright block at the very top of trtllm_gen.py (before
any imports) with the appropriate year of latest meaningful modification and
owner information consistent with other TensorRT-LLM source files (include
SPDX/license tag if your project uses it) so the header matches the project
coding guidelines.

🧹 Nitpick comments (2)

tensorrt_llm/_torch/attention_backend/trtllm.py (2)
28-34: Global variable naming convention.

Per the coding guidelines, Python global variables should use upper snake_case with prefix G (e.g., G_TRTLLM_ENABLE_TRTLLM_GEN_ATTENTION). The leading underscore indicates internal use, which is fine, but the G_ prefix is the project convention for globals.
Suggested naming
-# Enable TRTLLM-Gen attention backend via environment variable.
-_TRTLLM_ENABLE_TRTLLM_GEN_ATTENTION = os.environ.get(
-    "TRTLLM_ENABLE_TRTLLM_GEN_ATTENTION", "0") == "1"
+# Enable TRTLLM-Gen attention backend via environment variable.
+G_TRTLLM_ENABLE_TRTLLM_GEN_ATTENTION = os.environ.get(
+    "TRTLLM_ENABLE_TRTLLM_GEN_ATTENTION", "0") == "1"

-if _TRTLLM_ENABLE_TRTLLM_GEN_ATTENTION:
+if G_TRTLLM_ENABLE_TRTLLM_GEN_ATTENTION:
     from .trtllm_gen import attention as trtllm_gen_attention
As per coding guidelines, Python global variables should use upper snake_case with prefix G.

507-588: LGTM - Feature-gated dispatch with functional parity.

The conditional dispatch correctly routes to trtllm_gen_attention when the feature flag is enabled, with identical arguments passed to both paths. This ensures functional parity once the TRTLLM-Gen backend is implemented.

Note that enabling TRTLLM_ENABLE_TRTLLM_GEN_ATTENTION=1 will currently raise NotImplementedError since the backend is a placeholder. Consider adding a log message when the flag is enabled to inform users that this is an experimental/unimplemented path.
Optional: Add informational log at import time
 if _TRTLLM_ENABLE_TRTLLM_GEN_ATTENTION:
     from .trtllm_gen import attention as trtllm_gen_attention
+    logger.info("TRTLLM-Gen attention backend enabled (experimental)")

tensorrt_llm/_torch/attention_backend/trtllm_gen.py

tensorrt-cicd · 2026-01-18T16:56:51Z

PR_Github #32465 [ run ] triggered by Bot. Commit: 0c0c796

tensorrt-cicd · 2026-01-18T19:08:35Z

PR_Github #32465 [ run ] completed with state SUCCESS. Commit: 0c0c796
/LLM/main/L0_MergeRequest_PR pipeline #25150 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

tensorrt_llm/_torch/attention_backend/trtllm.py

yihwang-nv · 2026-01-21T08:58:25Z

/bot run

tensorrt-cicd · 2026-01-21T09:04:04Z

PR_Github #32922 [ run ] triggered by Bot. Commit: 88db82c

tensorrt-cicd · 2026-01-21T12:05:40Z

PR_Github #32922 [ run ] completed with state SUCCESS. Commit: 88db82c
/LLM/main/L0_MergeRequest_PR pipeline #25464 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

yihwang-nv · 2026-01-22T03:07:45Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-01-22T03:13:14Z

PR_Github #33062 [ run ] triggered by Bot. Commit: 88db82c

tensorrt-cicd · 2026-01-22T09:33:02Z

PR_Github #33062 [ run ] completed with state SUCCESS. Commit: 88db82c
/LLM/main/L0_MergeRequest_PR pipeline #25558 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

yihwang-nv · 2026-01-27T15:54:07Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-01-27T16:00:22Z

PR_Github #33746 [ run ] triggered by Bot. Commit: 88db82c

tensorrt-cicd · 2026-01-27T16:49:54Z

PR_Github #33746 [ run ] completed with state FAILURE. Commit: 88db82c
/LLM/main/L0_MergeRequest_PR pipeline #26028 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

yihwang-nv · 2026-01-28T02:35:26Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-01-28T02:41:49Z

PR_Github #33801 [ run ] triggered by Bot. Commit: d21470e

tensorrt-cicd · 2026-01-28T05:32:17Z

PR_Github #33801 [ run ] completed with state SUCCESS. Commit: d21470e
/LLM/main/L0_MergeRequest_PR pipeline #26069 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

tensorrt_llm/_utils.py

Signed-off-by: Yihan Wang <yihwang@nvidia.com>

yihwang-nv · 2026-02-07T09:49:21Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-02-07T09:55:22Z

PR_Github #35182 [ run ] triggered by Bot. Commit: 0762274

tensorrt-cicd · 2026-02-07T13:37:08Z

PR_Github #35182 [ run ] completed with state SUCCESS. Commit: 0762274
/LLM/main/L0_MergeRequest_PR pipeline #27171 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

yihwang-nv · 2026-02-07T14:59:41Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-02-07T15:06:16Z

PR_Github #35195 [ run ] triggered by Bot. Commit: 0762274

tensorrt-cicd · 2026-02-07T15:43:17Z

PR_Github #35195 [ run ] completed with state SUCCESS. Commit: 0762274
/LLM/main/L0_MergeRequest_PR pipeline #27183 completed with status: 'SUCCESS'

Signed-off-by: Yihan Wang <yihwang@nvidia.com>

yihwang-nv · 2026-02-08T05:47:56Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-02-08T05:53:52Z

PR_Github #35220 [ run ] triggered by Bot. Commit: d478b02

tensorrt-cicd · 2026-02-08T09:51:40Z

PR_Github #35220 [ run ] completed with state SUCCESS. Commit: d478b02
/LLM/main/L0_MergeRequest_PR pipeline #27207 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

yihwang-nv · 2026-02-09T01:55:56Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-02-09T02:01:52Z

PR_Github #35259 [ run ] triggered by Bot. Commit: d478b02

tensorrt-cicd · 2026-02-09T04:42:28Z

PR_Github #35259 [ run ] completed with state SUCCESS. Commit: d478b02
/LLM/main/L0_MergeRequest_PR pipeline #27224 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

yihwang-nv · 2026-02-09T07:20:43Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-02-09T07:27:20Z

PR_Github #35297 [ run ] triggered by Bot. Commit: d478b02

tensorrt-cicd · 2026-02-09T08:48:10Z

PR_Github #35297 [ run ] completed with state SUCCESS. Commit: d478b02
/LLM/main/L0_MergeRequest_PR pipeline #27256 completed with status: 'SUCCESS'

yihwang-nv · 2026-02-11T06:39:59Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-02-11T06:47:37Z

PR_Github #35596 [ run ] triggered by Bot. Commit: d478b02

tensorrt-cicd · 2026-02-11T07:08:09Z

PR_Github #35596 [ run ] completed with state ABORTED. Commit: d478b02
LLM/main/L0_MergeRequest_PR #27494 (Blue Ocean) completed with status: ABORTED

yihwang-nv · 2026-02-11T07:58:33Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-02-11T08:04:47Z

PR_Github #35607 [ run ] triggered by Bot. Commit: d478b02

tensorrt-cicd · 2026-02-11T08:57:40Z

PR_Github #35607 [ run ] completed with state SUCCESS. Commit: d478b02
/LLM/main/L0_MergeRequest_PR pipeline #27504 completed with status: 'SUCCESS'

Signed-off-by: Yihan Wang <yihwang@nvidia.com>

yihwang-nv marked this pull request as ready for review January 18, 2026 16:46

yihwang-nv requested a review from a team as a code owner January 18, 2026 16:46

yihwang-nv requested a review from yuxianq January 18, 2026 16:46

yihwang-nv force-pushed the yihwang/trtllm-gen-attn-01 branch from 5d09597 to 0c0c796 Compare January 18, 2026 16:48

NVIDIA deleted a comment from coderabbitai bot Jan 18, 2026

coderabbitai bot reviewed Jan 18, 2026

View reviewed changes

tensorrt_llm/_torch/attention_backend/trtllm_gen.py Outdated Show resolved Hide resolved

yuxianq reviewed Jan 19, 2026

View reviewed changes

tensorrt_llm/_torch/attention_backend/trtllm.py Outdated Show resolved Hide resolved

yuxianq reviewed Jan 19, 2026

View reviewed changes

tensorrt_llm/_torch/attention_backend/trtllm.py Outdated Show resolved Hide resolved

yihwang-nv force-pushed the yihwang/trtllm-gen-attn-01 branch from 0c0c796 to 88db82c Compare January 21, 2026 08:03

juney-nvidia changed the title ~~[None][feat] Initial patch for trtllm-gen attention backend~~ [None][feat] Initial PR for trtllm-gen attention backend Jan 26, 2026

yihwang-nv force-pushed the yihwang/trtllm-gen-attn-01 branch from 88db82c to d21470e Compare January 28, 2026 02:35

yihwang-nv requested a review from a team as a code owner January 29, 2026 07:52

yuxianq reviewed Feb 5, 2026

View reviewed changes

tensorrt_llm/_utils.py Outdated Show resolved Hide resolved

yihwang-nv added 2 commits February 5, 2026 03:43

Remove unnecessary changes

de8737c

Signed-off-by: Yihan Wang <yihwang@nvidia.com>

Temporary hardcode _TRTLLM_ENABLE_TRTLLM_GEN_ATTENTION=True

0762274

Signed-off-by: Yihan Wang <yihwang@nvidia.com>

Removed hardcoded _TRTLLM_ENABLE_TRTLLM_GEN_ATTENTION

d478b02

Signed-off-by: Yihan Wang <yihwang@nvidia.com>

yuxianq approved these changes Feb 11, 2026

View reviewed changes

yihwang-nv removed request for juney-nvidia and wenmingw February 11, 2026 09:13

yihwang-nv merged commit e8b8609 into NVIDIA:main Feb 11, 2026
5 checks passed

ekou24 pushed a commit to ekou24/TensorRT-LLM that referenced this pull request Feb 16, 2026

[None][feat] Initial PR for trtllm-gen attention backend (NVIDIA#10784)

bd33a0e

Signed-off-by: Yihan Wang <yihwang@nvidia.com>

Comments

Conversation

yihwang-nv commented Jan 18, 2026 • edited by juney-nvidia Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Summary by CodeRabbit

Uh oh!

yihwang-nv commented Jan 18, 2026

Uh oh!

coderabbitai bot commented Jan 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Jan 18, 2026

Uh oh!

coderabbitai bot commented Jan 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tensorrt-cicd commented Jan 18, 2026

Uh oh!

tensorrt-cicd commented Jan 18, 2026

Uh oh!

Uh oh!

Uh oh!

yihwang-nv commented Jan 21, 2026

Uh oh!

tensorrt-cicd commented Jan 21, 2026

Uh oh!

tensorrt-cicd commented Jan 21, 2026

Uh oh!

yihwang-nv commented Jan 22, 2026

Uh oh!

tensorrt-cicd commented Jan 22, 2026

Uh oh!

tensorrt-cicd commented Jan 22, 2026

Uh oh!

yihwang-nv commented Jan 27, 2026

Uh oh!

tensorrt-cicd commented Jan 27, 2026

Uh oh!

tensorrt-cicd commented Jan 27, 2026

Uh oh!

yihwang-nv commented Jan 28, 2026

Uh oh!

tensorrt-cicd commented Jan 28, 2026

Uh oh!

tensorrt-cicd commented Jan 28, 2026

Uh oh!

Uh oh!

yihwang-nv commented Feb 7, 2026

Uh oh!

tensorrt-cicd commented Feb 7, 2026

Uh oh!

tensorrt-cicd commented Feb 7, 2026

Uh oh!

yihwang-nv commented Feb 7, 2026

Uh oh!

tensorrt-cicd commented Feb 7, 2026

Uh oh!

tensorrt-cicd commented Feb 7, 2026

Uh oh!

yihwang-nv commented Feb 8, 2026

Uh oh!

tensorrt-cicd commented Feb 8, 2026

Uh oh!

tensorrt-cicd commented Feb 8, 2026

Uh oh!

yihwang-nv commented Feb 9, 2026

Uh oh!

tensorrt-cicd commented Feb 9, 2026

Uh oh!

tensorrt-cicd commented Feb 9, 2026

Uh oh!

yihwang-nv commented Feb 9, 2026

Uh oh!

tensorrt-cicd commented Feb 9, 2026

Uh oh!

yihwang-nv commented Jan 18, 2026 •

edited by juney-nvidia

Loading

coderabbitai bot commented Jan 18, 2026 •

edited

Loading

coderabbitai bot commented Jan 18, 2026 •

edited

Loading