[#9525][feat] add L2 norm pattern matcher and fusion transform#10767
[#9525][feat] add L2 norm pattern matcher and fusion transform#10767lucaslie merged 1 commit intoNVIDIA:mainfrom
Conversation
📝 WalkthroughWalkthroughThis PR introduces L2Norm pattern matching and fusion transforms for TensorRT-LLM's PyTorch auto-deploy system. It adds configuration entries, implements pattern detection and backend-specific fusion logic, and includes comprehensive test coverage for the new transforms. Changes
Sequence Diagram(s)sequenceDiagram
actor Input as GraphModule
participant PMatcher as MatchL2NormPattern
participant Registry as PatternRegistry
participant Fusion as FuseL2Norm
participant Backend as Backend Ops<br/>(fla/torch)
actor Output as GraphModule
Input->>PMatcher: Apply pattern matching
PMatcher->>Registry: Register L2Norm patterns<br/>(with/without dtype cast)
Registry-->>PMatcher: Pattern matches found
PMatcher->>PMatcher: Replace matches with<br/>torch_l2norm op
PMatcher-->>Output: Intermediate graph
Output->>Fusion: Apply fusion transform
Fusion->>Fusion: Validate backend<br/>("fla" or "torch")
Fusion->>Fusion: Traverse graph nodes
Fusion->>Backend: Swap torch_l2norm ops
Backend-->>Fusion: Backend-specific ops
Fusion->>Fusion: Recompile graph
Fusion-->>Output: Fused GraphModule
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@tensorrt_llm/_torch/auto_deploy/transform/library/l2_norm.py`:
- Line 1: The file-level docstring in l2_norm.py is missing the required NVIDIA
copyright header; add the standard multi-line NVIDIA copyright/header block
(matching the header used in adjacent TensorRT-LLM Python sources) at the top of
the file above the existing module docstring, update the modification year to
2026, and ensure the header formatting and SPDX/license lines exactly match the
project's other source files (use l2_norm.py and the module-level docstring to
locate the insertion point).
🧹 Nitpick comments (2)
tensorrt_llm/_torch/auto_deploy/transform/library/l2_norm.py (1)
3-21: Use module‑namespace imports for internal modulesThe file uses multiple
from ... import ...statements (internal modules,pydantic,torch.fx,typing). Please switch to module imports and qualify usages (e.g.,node_utils.is_op,pattern_matcher.ADPatternMatcherPass) to preserve namespaces. As per coding guidelines.tests/unittest/_torch/auto_deploy/unit/singlegpu/transformations/library/test_fuse_l2norm.py (1)
6-6: Avoid wildcard import; keep namespace for custom opsSwitch to a module import to preserve the namespace while still registering the ops.
♻️ Proposed change
-from tensorrt_llm._torch.auto_deploy.custom_ops.l2norm import * # noqa +import tensorrt_llm._torch.auto_deploy.custom_ops.l2norm as l2norm_ops # noqa: F401Note: tests under
tests/don’t require NVIDIA headers. As per coding guidelines; based on learnings.
13bfe98 to
8e55921
Compare
lucaslie
left a comment
There was a problem hiding this comment.
looks great. just a small comment. Please update it and then feel free to get it merged :)
8e55921 to
40d2f50
Compare
|
/bot run |
|
PR_Github #32779 [ run ] triggered by Bot. Commit: |
|
PR_Github #32779 [ run ] completed with state
|
b2f95de to
52d4bb6
Compare
|
/bot run --add-multi-gpu-test |
|
PR_Github #32992 [ run ] triggered by Bot. Commit: |
|
PR_Github #32992 [ run ] completed with state
|
52d4bb6 to
c69136c
Compare
|
/bot run --add-multi-gpu-test |
|
PR_Github #33006 [ run ] triggered by Bot. Commit: |
|
PR_Github #33006 [ run ] completed with state
|
c69136c to
47c2fea
Compare
|
/bot run --add-multi-gpu-test |
|
PR_Github #33190 [ run ] triggered by Bot. Commit: |
|
PR_Github #33190 [ run ] completed with state
|
47c2fea to
1c1c55b
Compare
|
/bot run --add-multi-gpu-test |
|
PR_Github #33756 [ run ] triggered by Bot. Commit: |
|
PR_Github #33756 [ run ] completed with state
|
|
/bot run --add-multi-gpu-test |
|
PR_Github #33767 [ run ] triggered by Bot. Commit: |
|
PR_Github #33767 [ run ] completed with state
|
1c1c55b to
8a43c7d
Compare
Signed-off-by: Karthik Vetrivel <kvetrivel@nvidia.com>
8a43c7d to
c8f76e0
Compare
|
/bot run |
|
PR_Github #33776 [ run ] triggered by Bot. Commit: |
|
PR_Github #33776 [ run ] completed with state |
Description
This PR adds L2 normalization pattern matching and fusion transforms to the TensorRT-LLM AutoDeploy system, following the established two-stage pattern matching approach used by RMS norm (see #9969).
Configuration
The transforms are configured in
default.yaml:PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
Summary by CodeRabbit
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.