[#9525][feat] add L2 norm pattern matcher and fusion transform#10767

karthikvetrivel · 2026-01-16T21:35:15Z

Description

This PR adds L2 normalization pattern matching and fusion transforms to the TensorRT-LLM AutoDeploy system, following the established two-stage pattern matching approach used by RMS norm (see #9969).

Configuration

The transforms are configured in default.yaml:

transforms:
  match_l2norm_pattern:
    stage: pattern_matcher
  fuse_l2norm:
    stage: post_load_fusion
    l2norm_backend: fla  # Options: 'fla' or 'torch'

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

Summary by CodeRabbit

New Features
- L2Norm pattern matching and fusion optimization now available for model acceleration
- Configurable L2Norm backend support with FLA and Torch options
Tests
- Comprehensive test suite added for L2Norm fusion with multiple backend variants

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-16T21:49:27Z

📝 Walkthrough

Walkthrough

This PR introduces L2Norm pattern matching and fusion transforms for TensorRT-LLM's PyTorch auto-deploy system. It adds configuration entries, implements pattern detection and backend-specific fusion logic, and includes comprehensive test coverage for the new transforms.

Changes

Cohort / File(s)	Summary
Configuration `tensorrt_llm/_torch/auto_deploy/config/default.yaml`	Added two new transform definitions: `match_l2norm_pattern` (pattern_matcher stage) and `fuse_l2norm` (post_load_fusion stage with l2norm_backend parameter set to "fla").
L2Norm Transform Implementation `tensorrt_llm/_torch/auto_deploy/transform/library/l2_norm.py`	Introduced `MatchL2NormPattern` and `FuseL2Norm` transform classes for identifying and fusing L2Norm patterns. Includes `FuseL2NormConfig` dataclass, `_BACKEND_OPS` mapping for backend selection ("fla" or "torch"), and three helper functions implementing L2Norm computation patterns with optional dtype casting.
L2Norm Fusion Tests `tests/unittest/_torch/auto_deploy/unit/singlegpu/transformations/library/test_fuse_l2norm.py`	Added test module with `L2Norm` and `L2NormNoCast` model variants, `TestModel` combining linear layers with L2Norm blocks, and parametrized `test_l2norm_fusion()` test covering both fusion backends ("fla" and "torch") with multiple epsilon values.

Sequence Diagram(s)

sequenceDiagram
    actor Input as GraphModule
    participant PMatcher as MatchL2NormPattern
    participant Registry as PatternRegistry
    participant Fusion as FuseL2Norm
    participant Backend as Backend Ops<br/>(fla/torch)
    actor Output as GraphModule

    Input->>PMatcher: Apply pattern matching
    PMatcher->>Registry: Register L2Norm patterns<br/>(with/without dtype cast)
    Registry-->>PMatcher: Pattern matches found
    PMatcher->>PMatcher: Replace matches with<br/>torch_l2norm op
    PMatcher-->>Output: Intermediate graph
    
    Output->>Fusion: Apply fusion transform
    Fusion->>Fusion: Validate backend<br/>("fla" or "torch")
    Fusion->>Fusion: Traverse graph nodes
    Fusion->>Backend: Swap torch_l2norm ops
    Backend-->>Fusion: Backend-specific ops
    Fusion->>Fusion: Recompile graph
    Fusion-->>Output: Fused GraphModule

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 17.65% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: adding L2 norm pattern matcher and fusion transform, which aligns with the changeset that introduces these two transforms and their configuration.
Description check	✅ Passed	The description covers the main feature, configuration details, and includes a completed PR checklist. However, the 'Test Coverage' section from the template is not explicitly filled out, though tests are referenced and the checklist mentions test cases.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@tensorrt_llm/_torch/auto_deploy/transform/library/l2_norm.py`:
- Line 1: The file-level docstring in l2_norm.py is missing the required NVIDIA
copyright header; add the standard multi-line NVIDIA copyright/header block
(matching the header used in adjacent TensorRT-LLM Python sources) at the top of
the file above the existing module docstring, update the modification year to
2026, and ensure the header formatting and SPDX/license lines exactly match the
project's other source files (use l2_norm.py and the module-level docstring to
locate the insertion point).

🧹 Nitpick comments (2)

tensorrt_llm/_torch/auto_deploy/transform/library/l2_norm.py (1)

3-21: Use module‑namespace imports for internal modules

The file uses multiple from ... import ... statements (internal modules, pydantic, torch.fx, typing). Please switch to module imports and qualify usages (e.g., node_utils.is_op, pattern_matcher.ADPatternMatcherPass) to preserve namespaces. As per coding guidelines.
tests/unittest/_torch/auto_deploy/unit/singlegpu/transformations/library/test_fuse_l2norm.py (1)
6-6: Avoid wildcard import; keep namespace for custom ops

Switch to a module import to preserve the namespace while still registering the ops.
♻️ Proposed change
-from tensorrt_llm._torch.auto_deploy.custom_ops.l2norm import *  # noqa
+import tensorrt_llm._torch.auto_deploy.custom_ops.l2norm as l2norm_ops  # noqa: F401
Note: tests under tests/ don’t require NVIDIA headers. As per coding guidelines; based on learnings.

tensorrt_llm/_torch/auto_deploy/transform/library/l2_norm.py

tensorrt_llm/_torch/auto_deploy/config/default.yaml

lucaslie

looks great. just a small comment. Please update it and then feel free to get it merged :)

tensorrt_llm/_torch/auto_deploy/config/default.yaml

karthikvetrivel · 2026-01-20T16:42:02Z

/bot run

tensorrt-cicd · 2026-01-20T16:48:05Z

PR_Github #32779 [ run ] triggered by Bot. Commit: b2f95de

tensorrt-cicd · 2026-01-20T18:29:10Z

PR_Github #32779 [ run ] completed with state SUCCESS. Commit: b2f95de
/LLM/main/L0_MergeRequest_PR pipeline #25373 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

karthikvetrivel · 2026-01-21T18:23:41Z

/bot run --add-multi-gpu-test

tensorrt-cicd · 2026-01-21T18:30:31Z

PR_Github #32992 [ run ] triggered by Bot. Commit: 52d4bb6

tensorrt-cicd · 2026-01-21T20:32:37Z

PR_Github #32992 [ run ] completed with state SUCCESS. Commit: 52d4bb6
/LLM/main/L0_MergeRequest_PR pipeline #25508 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

karthikvetrivel · 2026-01-21T20:45:15Z

/bot run --add-multi-gpu-test

tensorrt-cicd · 2026-01-21T20:51:43Z

PR_Github #33006 [ run ] triggered by Bot. Commit: c69136c

tensorrt-cicd · 2026-01-22T00:11:49Z

PR_Github #33006 [ run ] completed with state SUCCESS. Commit: c69136c
/LLM/main/L0_MergeRequest_PR pipeline #25516 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

karthikvetrivel · 2026-01-22T14:31:15Z

/bot run --add-multi-gpu-test

tensorrt-cicd · 2026-01-22T14:37:11Z

PR_Github #33190 [ run ] triggered by Bot. Commit: 47c2fea

tensorrt-cicd · 2026-01-22T16:16:31Z

PR_Github #33190 [ run ] completed with state SUCCESS. Commit: 47c2fea
/LLM/main/L0_MergeRequest_PR pipeline #25645 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

karthikvetrivel · 2026-01-27T17:07:25Z

/bot run --add-multi-gpu-test

tensorrt-cicd · 2026-01-27T17:13:12Z

PR_Github #33756 [ run ] triggered by Bot. Commit: 1c1c55b

tensorrt-cicd · 2026-01-27T18:10:33Z

PR_Github #33756 [ run ] completed with state SUCCESS. Commit: 1c1c55b
/LLM/main/L0_MergeRequest_PR pipeline #26035 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

karthikvetrivel · 2026-01-27T19:29:35Z

/bot run --add-multi-gpu-test

tensorrt-cicd · 2026-01-27T19:36:08Z

PR_Github #33767 [ run ] triggered by Bot. Commit: 1c1c55b

tensorrt-cicd · 2026-01-27T20:22:24Z

PR_Github #33767 [ run ] completed with state SUCCESS. Commit: 1c1c55b
/LLM/main/L0_MergeRequest_PR pipeline #26042 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Signed-off-by: Karthik Vetrivel <kvetrivel@nvidia.com>

karthikvetrivel · 2026-01-27T21:02:01Z

/bot run

tensorrt-cicd · 2026-01-27T21:08:47Z

PR_Github #33776 [ run ] triggered by Bot. Commit: c8f76e0

tensorrt-cicd · 2026-01-27T23:40:55Z

PR_Github #33776 [ run ] completed with state SUCCESS. Commit: c8f76e0
/LLM/main/L0_MergeRequest_PR pipeline #26049 completed with status: 'SUCCESS'

karthikvetrivel requested a review from a team as a code owner January 16, 2026 21:35

karthikvetrivel requested a review from govind-ramnarayan January 16, 2026 21:35

coderabbitai bot reviewed Jan 16, 2026

View reviewed changes

tensorrt_llm/_torch/auto_deploy/transform/library/l2_norm.py Show resolved Hide resolved

lucaslie reviewed Jan 16, 2026

View reviewed changes

karthikvetrivel force-pushed the feature/l2norm-pattern-matcher branch 4 times, most recently from 13bfe98 to 8e55921 Compare January 20, 2026 14:29

lucaslie approved these changes Jan 20, 2026

View reviewed changes

tensorrt_llm/_torch/auto_deploy/config/default.yaml Show resolved Hide resolved

karthikvetrivel force-pushed the feature/l2norm-pattern-matcher branch from 8e55921 to 40d2f50 Compare January 20, 2026 14:47

karthikvetrivel force-pushed the feature/l2norm-pattern-matcher branch from b2f95de to 52d4bb6 Compare January 21, 2026 18:07

karthikvetrivel force-pushed the feature/l2norm-pattern-matcher branch from 52d4bb6 to c69136c Compare January 21, 2026 20:37

karthikvetrivel force-pushed the feature/l2norm-pattern-matcher branch from c69136c to 47c2fea Compare January 22, 2026 14:31

karthikvetrivel force-pushed the feature/l2norm-pattern-matcher branch from 47c2fea to 1c1c55b Compare January 27, 2026 16:26

karthikvetrivel force-pushed the feature/l2norm-pattern-matcher branch from 1c1c55b to 8a43c7d Compare January 27, 2026 20:23

feat(auto_deploy): add L2 norm pattern matcher and fusion transform

c8f76e0

Signed-off-by: Karthik Vetrivel <kvetrivel@nvidia.com>

karthikvetrivel force-pushed the feature/l2norm-pattern-matcher branch from 8a43c7d to c8f76e0 Compare January 27, 2026 20:59

lucaslie merged commit 5a97374 into NVIDIA:main Jan 30, 2026
5 checks passed

Comments

Conversation

karthikvetrivel commented Jan 16, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Configuration

PR Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 16, 2026

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lucaslie left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

karthikvetrivel commented Jan 20, 2026

Uh oh!

tensorrt-cicd commented Jan 20, 2026

Uh oh!

tensorrt-cicd commented Jan 20, 2026

Uh oh!

karthikvetrivel commented Jan 21, 2026

Uh oh!

tensorrt-cicd commented Jan 21, 2026

Uh oh!

tensorrt-cicd commented Jan 21, 2026

Uh oh!

karthikvetrivel commented Jan 21, 2026

Uh oh!

tensorrt-cicd commented Jan 21, 2026

Uh oh!

tensorrt-cicd commented Jan 22, 2026

Uh oh!

karthikvetrivel commented Jan 22, 2026

Uh oh!

tensorrt-cicd commented Jan 22, 2026

Uh oh!

tensorrt-cicd commented Jan 22, 2026

Uh oh!

karthikvetrivel commented Jan 27, 2026

Uh oh!

tensorrt-cicd commented Jan 27, 2026

Uh oh!

tensorrt-cicd commented Jan 27, 2026

Uh oh!

karthikvetrivel commented Jan 27, 2026

Uh oh!

tensorrt-cicd commented Jan 27, 2026

Uh oh!

tensorrt-cicd commented Jan 27, 2026

Uh oh!

karthikvetrivel commented Jan 27, 2026

Uh oh!

tensorrt-cicd commented Jan 27, 2026

Uh oh!

tensorrt-cicd commented Jan 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

karthikvetrivel commented Jan 16, 2026 •

edited by coderabbitai bot

Loading