Skip to content

Comments

[TRTLLM-10612][feat] Initial support of AIGV models in TRTLLM#11462

Merged
pcastonguay merged 13 commits intoNVIDIA:mainfrom
chang-l:feat/dev_aigv
Feb 13, 2026
Merged

[TRTLLM-10612][feat] Initial support of AIGV models in TRTLLM#11462
pcastonguay merged 13 commits intoNVIDIA:mainfrom
chang-l:feat/dev_aigv

Conversation

@chang-l
Copy link
Collaborator

@chang-l chang-l commented Feb 12, 2026

This PR introduces initial support for vision generation models in TRT-LLM, including the support for Wan 2.1 / Wan 2.2 models, including both Text-to-Video (T2V) and Image-to-Video (I2V) workflows.

To run:
Wan2.1 Text-to-Video Example:

python examples/visual_gen/visual_gen_wan_t2v.py \
    --height 480 \
    --width 832 \
    --num_frames 33 \
    --model_path ${PATH_TO_MODEL}\
    --prompt "A cute cat playing piano" \
    --output_path wan_cat_piano.png

WAN 2.2 Image-to-Video Example:

python examples/visual_gen/visual_gen_wan_i2v.py \
    --height 480 \
    --width 832 \
    --num_frames 81 \
    --model_path ${PATH_TO_MODEL} \
    --image_path examples/visual_gen/cat_piano.png \
    --prompt "It snows as the cat plays piano, lots of snow \
    appearing all over the screen, snowflakes, blizzard,
    gradually more snow" \
    --negative_prompt "blurry, low quality" \
    --output_path wan22_i2v_cat_piano_optimized.gif \
    --linear_type trtllm-fp8-blockwise \
    --attention_backend TRTLLM \
    --enable_teacache \
    --teacache_thres 0.2 \
    --guidance_scale 6.0 \
    --guidance_scale_2 5.0 \
    --boundary_ratio 0.85

Co-authored with @o-stoner @QiJune @JunyiXu-nv @yibinl-nvidia @chang-l

Summary by CodeRabbit

Release Notes

New Features

  • Visual Generation API: Added text-to-image (FLUX2), text-to-video (WAN), and image-to-video generation capabilities with OpenAI-compatible endpoints.
  • Synchronous and Asynchronous Workflows: Support for both blocking and non-blocking generation requests.
  • Video Management: List, retrieve metadata, download, and delete generated videos.
  • Performance Optimizations: TeaCache acceleration and FP8 quantization for reduced memory usage.
  • Distributed Inference: Multi-GPU support with sequence parallelism for faster generation.

Documentation

  • Added comprehensive README files with setup instructions, quick start guides, and usage examples for all visual generation models.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@chang-l chang-l requested review from a team as code owners February 12, 2026 02:27
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 12, 2026

📝 Walkthrough

Walkthrough

This pull request adds comprehensive visual generation capabilities to TensorRT LLM, including distributed pipelines for text-to-video (WAN), text-to-image (FLUX2), and video+audio generation (LTX2). It introduces baseline examples using HuggingFace Diffusers, a complete TensorRT LLM implementation with distributed execution, OpenAI-compatible API endpoints, quantization support, and optimization features like TeaCache.

Changes

Cohort / File(s) Summary
Visual Generation Example Scripts
examples/visual_gen/hf_*.py, examples/visual_gen/visual_gen_wan_*.py, examples/visual_gen/output_handler.py
Adds baseline example scripts for FLUX2 image generation, WAN and LTX2 video generation, and utility modules for post-processing and saving outputs from diffusion models.
Visual Generation Bash Scripts
examples/visual_gen/*.sh
Introduces execution scripts for running baseline HuggingFace Diffusers tests and WAN T2V/I2V examples with various optimization configurations and multi-GPU support.
Visual Generation API Examples
examples/visual_gen/serve/*.py, examples/visual_gen/serve/configs/*.yml
Adds OpenAI SDK-based example scripts for synchronous and asynchronous image/video generation, video deletion, and YAML configuration files for model optimization settings.
Visual Generation Documentation
examples/visual_gen/README.md, examples/visual_gen/serve/README.md
Comprehensive documentation covering prerequisites, quick start, environment variables, usage guides, API endpoints, and troubleshooting for visual generation examples.
Core Visual Gen Configuration
tensorrt_llm/_torch/visual_gen/config.py
Introduces comprehensive configuration dataclasses (DiffusionArgs, DiffusionModelConfig, AttentionConfig, ParallelConfig, TeaCacheConfig) for managing diffusion pipeline parameters, quantization, and parallelism settings.
Attention Backend Implementations
tensorrt_llm/_torch/visual_gen/attention_backend/*.py
Adds multiple attention backend implementations: VANILLA (PyTorch SDPA), TRTLLM (optimized), Ulysses (sequence parallelism), with factory utilities and configuration interfaces.
WAN Pipeline and Transformer
tensorrt_llm/_torch/visual_gen/models/wan/*.py
Implements WAN text-to-video and image-to-video pipelines with two-stage denoising, TeaCache integration, RoPE embeddings, and full transformer architecture with quantization support.
Visual Gen Core Infrastructure
tensorrt_llm/_torch/visual_gen/{executor,pipeline,pipeline_loader,pipeline_registry,output,utils,teacache,parallelism}.py
Foundational components including distributed executor with IPC, base pipeline orchestration, model loading, registry-based factory, output container, video post-processing, TeaCache optimization, and sequence parallelism setup.
Quantization Support
tensorrt_llm/_torch/visual_gen/quantization/*.py
Implements FP8 per-tensor and blockwise quantization operations, dynamic weight loader for diffusion models supporting fused QKV configurations, with integration into the loading pipeline.
Weight Checkpointing
tensorrt_llm/_torch/visual_gen/checkpoints/*.py
Adds checkpoint loading utilities supporting both standalone and pipeline-formatted models, with safetensors and PyTorch format support, including progress tracking.
Module-level Initializers
tensorrt_llm/_torch/visual_gen/__init__.py, tensorrt_llm/_torch/visual_gen/modules/__init__.py, tensorrt_llm/_torch/visual_gen/models/__init__.py, tensorrt_llm/_torch/visual_gen/models/wan/__init__.py, tensorrt_llm/_torch/visual_gen/attention_backend/__init__.py, tensorrt_llm/_torch/visual_gen/checkpoints/__init__.py, tensorrt_llm/_torch/visual_gen/quantization/__init__.py
Establishes public API surfaces and module organization for visual generation components.
Distributed Utilities
tensorrt_llm/_torch/distributed/__init__.py, tensorrt_llm/_torch/distributed/ops.py
Adds all-to-all 4D redistribution operation for tensor parallelism in diffusion models, exported via module's public API.
Linear Module Enhancements
tensorrt_llm/_torch/modules/linear.py
Extends Linear module handling to support multi-dimensional inputs by flattening/reshaping for GEMM operations while preserving output shapes.
Modeling Utilities
tensorrt_llm/_torch/models/modeling_utils.py
Minor formatting change to init_ops set literal in MetaInitMode for improved readability.
Command Line Utilities
tensorrt_llm/commands/utils.py, tensorrt_llm/commands/serve.py
Adds model path resolution, downloading, diffusion model detection, and dispatcher logic for visual generation serving; rewires server instantiation to use generalized generator parameter.
Visual Generation Front-end API
tensorrt_llm/llmapi/visual_gen.py
Implements high-level VisualGen API with distributed multiprocessing executor, async/sync generation methods, IPC-based communication, and future-like result handling via DiffusionRemoteClient.
API Module Exports
tensorrt_llm/llmapi/__init__.py
Exports VisualGen and VisualGenParams for public API access.
Input Data Types
tensorrt_llm/inputs/data.py
Adds TypedDict types and validation for visual generation prompts (text and token-based) with optional negative prompts.
API Utilities
tensorrt_llm/llmapi/utils.py, tensorrt_llm/llmapi/disagg_utils.py
Introduces generic HuggingFace partial downloader and new ServerRole.VISUAL_GEN enum value for server role identification.
Server Integration
tensorrt_llm/serve/openai_server.py, tensorrt_llm/serve/openai_protocol.py, tensorrt_llm/serve/visual_gen_utils.py, tensorrt_llm/serve/media_storage.py
Integrates visual generation into OpenAI-compatible server with image/video generation and editing endpoints, request parsing, media storage, and comprehensive protocol support including async job management.
Ray Stub Update
tensorrt_llm/ray_stub.py
Defers Ray import-time error to usage time when MPI is disabled, improving startup experience when Ray is unavailable.
Dependencies
requirements.txt
Adds python-multipart dependency for multipart form data handling.
Comprehensive Test Suite
tests/unittest/_torch/visual_gen/test_*.py, tests/unittest/_torch/visual_gen/multi_gpu/*.py
Extensive unit and integration tests covering FP8 quantization operations, fused QKV loading, attention backends, performance benchmarks, model loading pipelines, WAN and I2V pipeline variants, distributed CFG parallelism, TeaCache optimization, and server endpoints.
Integration Tests
tests/integration/defs/examples/test_visual_gen.py
Adds VBench-based dimension scoring comparison tests for WAN TRT-LLM outputs against diffusers reference.
Test Configuration Updates
tests/integration/test_lists/test-db/l0_b200.yml, tests/integration/test_lists/test-db/l0_dgx_b200.yml
Registers new visual generation unit tests, multi-GPU tests, and integration tests in the test suite configuration.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.45% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ⚠️ Warning PR description is incomplete and lacks critical information required by the template. Add a comprehensive 'Description' section explaining the issue, solution, and design. Add a 'Test Coverage' section listing relevant tests. Complete all checklist items and verify the PR follows CODING_GUIDELINES.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title '[TRTLLM-10612][feat] Initial support of AIGV models in TRTLLM' is clear and directly related to the main purpose of the PR, which is adding vision generation model support.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
tensorrt_llm/_torch/modules/linear.py (1)

1239-1270: ⚠️ Potential issue | 🔴 Critical

Reshape logic crashes for Fp4QuantizedTensor and tuple inputs.

The apply() method accepts input: Union[torch.Tensor, Fp4QuantizedTensor] (passed from Linear.forward()), but the reshape block (lines 1241–1245) runs before _input_prepare() and calls .shape and .dim() unconditionally:

  • For tuple inputs: .shape (line 1243) → AttributeError
  • For Fp4QuantizedTensor inputs: .dim() (line 1244) → AttributeError (has .shape property but no .dim() method)

Guard the reshape to only apply to plain torch.Tensor inputs:

Proposed fix
     def apply(self, module: Linear, input: torch.Tensor,
               bias: Optional[torch.Tensor]):
-        # Handle multi-dimensional inputs (e.g., 3D: batch, seq, hidden)
-        # GEMM ops require 2D matrices
-        original_shape = input.shape
-        if input.dim() > 2:
-            input = input.reshape(-1, input.shape[-1])
-
+        # Handle multi-dimensional inputs (e.g., 3D: batch, seq, hidden)
+        # GEMM ops require 2D matrices
+        original_shape = None
+        if isinstance(input, torch.Tensor) and input.dim() > 2:
+            original_shape = input.shape
+            input = input.reshape(-1, input.shape[-1])
+
         act_fp4, act_sf = self._input_prepare(module, input)
         # Use unified interface - supports CUTLASS, cuBLASLt, CuteDSL
         ...
-        # Reshape output back to original shape (with out_features as last dim)
-        if len(original_shape) > 2:
-            output = output.reshape(*original_shape[:-1], output.shape[-1])
-
+        # Reshape output back to original shape (with out_features as last dim)
+        if original_shape is not None and len(original_shape) > 2:
+            output = output.reshape(*original_shape[:-1], output.shape[-1])
+
         if bias is not None:
tensorrt_llm/commands/serve.py (1)

1-1: ⚠️ Potential issue | 🟠 Major

Add/update the NVIDIA Apache-2.0 header for this modified source file.

🛠️ Proposed fix
+ # Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ #
+ # Licensed under the Apache License, Version 2.0 (the "License");
+ # you may not use this file except in compliance with the License.
+ # You may obtain a copy of the License at
+ #
+ #     http://www.apache.org/licenses/LICENSE-2.0
+ #
+ # Unless required by applicable law or agreed to in writing, software
+ # distributed under the License is distributed on an "AS IS" BASIS,
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ # See the License for the specific language governing permissions and
+ # limitations under the License.
+
  import asyncio

As per coding guidelines: "All source files must contain an NVIDIA copyright header with the year of latest meaningful modification. Use the Apache License 2.0 format."

🤖 Fix all issues with AI agents
In `@tensorrt_llm/_torch/visual_gen/attention_backend/trtllm.py`:
- Around line 218-226: The code uses seq_len (and seq_len_kv) which may be
torch.Tensor scalars when computing shapes for q.view/k.view/v.view and when
calling self.metadata.prepare; coerce these to Python ints before any arithmetic
or .view() calls (e.g., convert seq_len and seq_len_kv to int(seq_len.item())
when they are tensors) so batch_size * seq_len yields an int and .view()
receives integer shape args; update the section around q = q.view(...), k =
k.view(...), v = v.view(...) and the call to self.metadata.prepare(batch_size,
seq_len) to use the converted int values.

In `@tensorrt_llm/_torch/visual_gen/executor.py`:
- Around line 164-178: The broadcast loop in serve_forever currently lets rank 0
continue when the queue is empty, causing other ranks to still call
dist.broadcast_object_list and deadlock; change the rank 0 path (using self.rank
and self.requests_ipc) to block on receiving a request (use a blocking get()
instead of poll+continue) so rank 0 always produces a value for obj_list before
calling dist.broadcast_object_list, ensuring all ranks participate in each
broadcast and stay synchronized.

In `@tensorrt_llm/commands/utils.py`:
- Around line 18-25: In _get_lock, os.makedirs is creating the parent of
lock_dir instead of the lock_dir itself; change the call in the _get_lock
function to ensure lock_dir (not os.path.dirname(lock_dir)) is created (use the
lock_dir variable, preserving exist_ok=True and any intended mode/permissions)
before constructing the FileLock with lock_file_name and temp_dir fallback
logic.
- Around line 72-95: The function is_diffusers_model_path currently annotates
its return type as -> True but returns boolean values; change the return
annotation to -> bool and update the docstring to accurately reflect it returns
a boolean (true if model_index.json exists and contains "_diffusers_version",
false otherwise) so the signature and documentation match the implementation in
is_diffusers_model_path.

In `@tensorrt_llm/serve/visual_gen_utils.py`:
- Around line 53-56: The code assumes media_storage_path is a string and
request.input_reference is an UploadFile; update the block handling
request.input_reference in visual_gen_utils.py (the params.input_reference
assignment and file copy) to first ensure media_storage_path is not None (either
raise a clear ValueError or create/use a temporary directory) and then handle
both allowed types for request.input_reference (if it's a str treat it as an
existing path and set params.input_reference to that path; if it’s an UploadFile
use os.path.join(media_storage_path, f"{id}_reference.png") and
shutil.copyfileobj(request.input_reference.file, open_path)). Ensure you only
call .file when request.input_reference has that attribute and only call
os.path.join with a non-None media_storage_path.

In `@tests/unittest/_torch/visual_gen/test_attention_integration.py`:
- Around line 399-419: The test currently prints results but doesn't fail on
mismatches; update the test_trtllm_cached_prepare logic to assert the checks
instead of only printing: for the per-iteration equality check (compare
outputs_integrated[i] vs outputs_naive[i] and the boolean all_passed), add an
assertion that each max-difference is below tolerance (e.g., assert diff < 1e-6)
or assert all_passed at the end; for the cross-iteration diversity check (the
outputs_differ boolean computed from outputs_integrated), add assert
outputs_differ to ensure outputs across iterations are not identical; reference
outputs_integrated, outputs_naive, all_passed and outputs_differ to locate and
implement the assertions.
- Around line 245-261: Replace the silent return-and-print pattern in
test_self_attention_equivalence (and the other tests
test_cross_attention_equivalence, test_trtllm_cached_prepare,
test_trtllm_varying_seq_len) with explicit pytest assertions: use
torch.testing.assert_allclose(out_naive, out_integrated, rtol=1e-2, atol=1e-3)
or assert is_close with a helpful failure message that includes max_diff and
mean_diff, and remove the final return; alternatively call pytest.fail(...) when
differences exceed thresholds so test failures are reported by pytest; locate
variables out_naive, out_integrated, is_close, max_diff, mean_diff in each test
to implement the assertion.
🟠 Major comments (37)
tensorrt_llm/_torch/visual_gen/models/wan/transformer_wan.py-366-373 (1)

366-373: ⚠️ Potential issue | 🟠 Major

Avoid hard‑coding the text context length to 512.

The image/text split assumes a fixed 512‑token text length. If max_sequence_length differs, image_context_length can become negative or mis‑split the context. Please derive the text length from config or pass it explicitly.

tensorrt_llm/_torch/visual_gen/modules/attention.py-1-3 (1)

1-3: ⚠️ Potential issue | 🟠 Major

Add the NVIDIA Apache 2.0 header to this new source file.

This file is new and currently lacks the required NVIDIA copyright/license header.

🔧 Suggested header (place above imports)
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
 from enum import Enum

As per coding guidelines, "Include NVIDIA copyright header on ALL new files and update year on modified files."

tensorrt_llm/serve/openai_protocol.py-1-2 (1)

1-2: ⚠️ Potential issue | 🟠 Major

Add/update the NVIDIA Apache 2.0 header in this modified source file.

The file is a .py source file and should include the standard NVIDIA header with the latest modification year.

🔧 Suggested header placement (top of file)
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
 # Adapted from
 # https://github.com/vllm-project/vllm/blob/4db5176d9758b720b05460c50ace3c01026eb158/vllm/entrypoints/openai/protocol.py

As per coding guidelines, "Include NVIDIA copyright header on ALL new files and update year on modified files."

tensorrt_llm/_torch/visual_gen/models/wan/pipeline_wan_i2v.py-1-5 (1)

1-5: ⚠️ Potential issue | 🟠 Major

Add the NVIDIA Apache 2.0 header to this new source file.

This file is new and currently lacks the required NVIDIA copyright/license header.

🔧 Suggested header (place above imports)
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
 import json

As per coding guidelines, "Include NVIDIA copyright header on ALL new files and update year on modified files."

tensorrt_llm/_torch/visual_gen/models/wan/transformer_wan.py-1-2 (1)

1-2: ⚠️ Potential issue | 🟠 Major

Add the NVIDIA Apache 2.0 header to this new source file.

This file is new and currently lacks the required NVIDIA copyright/license header.

🔧 Suggested header (place above imports)
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
 import math

As per coding guidelines, "Include NVIDIA copyright header on ALL new files and update year on modified files."

tensorrt_llm/_torch/visual_gen/modules/attention.py-12-55 (1)

12-55: ⚠️ Potential issue | 🟠 Major

Runtime failure when config=None due to TYPE_CHECKING import.

DiffusionModelConfig is only imported under TYPE_CHECKING, but the constructor instantiates it at runtime. This will raise NameError if config isn’t passed.

🔧 Suggested fix
-from typing import TYPE_CHECKING, Optional, Tuple
+from typing import Optional, Tuple
 ...
-if TYPE_CHECKING:
-    from ..config import DiffusionModelConfig
+from ..config import DiffusionModelConfig
tensorrt_llm/_torch/visual_gen/modules/attention.py-16-185 (1)

16-185: ⚠️ Potential issue | 🟠 Major

QKVMode.FUSE_KV is silently treated as separate QKV.

FUSE_KV exists in the enum but isn’t implemented; falling through to the separate‑QKV path can mis-handle fused KV weights. Consider raising until supported (or implement fused‑KV projections).

🔧 Guard unsupported mode
     def _init_qkv_proj(self) -> None:
+        if self.qkv_mode == QKVMode.FUSE_KV:
+            raise NotImplementedError("QKVMode.FUSE_KV is not implemented yet.")
         if self.qkv_mode == QKVMode.FUSE_QKV:
             qkv_out_dim = self.q_dim + 2 * self.kv_dim
tensorrt_llm/_torch/visual_gen/models/wan/pipeline_wan_i2v.py-64-134 (1)

64-134: ⚠️ Potential issue | 🟠 Major

Recompute is_wan22 and instantiate transformer_2 after reading model_index.json.

boundary_ratio can be loaded from model_index.json, but is_wan22 and transformer_2 are set earlier. If the config doesn’t include boundary_ratio, Wan2.2 will be treated as Wan2.1 and the second transformer never created.

🔧 Suggested update after parsing model_index.json
                 if "boundary_ratio" in model_index:
                     self.boundary_ratio = model_index["boundary_ratio"]
+                    self.is_wan22 = True
                     logger.info(f"Found boundary_ratio in model_index.json: {self.boundary_ratio}")
 ...
                 has_transformer_2 = (
                     transformer_2_spec is not None and transformer_2_spec[0] is not None
                 )
                 logger.info(f"transformer_2 in model_index.json: {has_transformer_2}")
+        if has_transformer_2 and self.transformer_2 is None:
+            self.transformer_2 = WanTransformer3DModel(model_config=self.model_config)
tests/unittest/_torch/visual_gen/test_trtllm_serve_e2e.py-142-156 (1)

142-156: ⚠️ Potential issue | 🟠 Major

Add request timeout and narrow the exception type in the readiness loop.

requests.get without a timeout can hang indefinitely, and the broad except Exception violates the coding guideline requiring specific exception handling. Use a timeout and catch requests.RequestException. Also apply timeouts to the other requests in this file.

🔧 Safer readiness loop
         while True:
             try:
-                if requests.get(url).status_code == 200:
+                if requests.get(url, timeout=5).status_code == 200:
                     return
-            except Exception as err:
+            except requests.RequestException as err:
                 result = self.proc.poll()
                 if result is not None and result != 0:
                     raise RuntimeError("Visual-gen server exited unexpectedly.") from err
examples/visual_gen/serve/sync_video_gen.py-91-103 (1)

91-103: ⚠️ Potential issue | 🟠 Major

Missing timeout on requests.post calls.

Both HTTP calls (lines 91 and 94) lack a timeout parameter. Video generation can take a long time, but the call should still have a generous timeout to avoid hanging indefinitely.

Proposed fix
-            response_video = requests.post(endpoint, data=form_data, files=files)
+            response_video = requests.post(endpoint, data=form_data, files=files, timeout=600)
-            response_video = requests.post(
-                endpoint,
-                json={...},
-            )
+            response_video = requests.post(
+                endpoint,
+                json={...},
+                timeout=600,
+            )
examples/visual_gen/serve/sync_video_gen.py-82-91 (1)

82-91: ⚠️ Potential issue | 🟠 Major

Resource leak: file handle never closed.

open(input_reference, "rb") on line 85 creates a file handle that is never explicitly closed. Use a context manager or close the handle after the request.

Proposed fix
-            # Add the file
-            ## Note: The content-type must be multipart/form-data.
-            files = {
-                "input_reference": (
-                    Path(input_reference).name,
-                    open(input_reference, "rb"),
-                    "multipart/form-data",
-                )
-            }
-
-            print("\n   Uploading reference image and generating video...")
-            response_video = requests.post(endpoint, data=form_data, files=files)
+            # Add the file
+            ## Note: The content-type must be multipart/form-data.
+            with open(input_reference, "rb") as img_file:
+                files = {
+                    "input_reference": (
+                        Path(input_reference).name,
+                        img_file,
+                        "multipart/form-data",
+                    )
+                }
+
+                print("\n   Uploading reference image and generating video...")
+                response_video = requests.post(endpoint, data=form_data, files=files, timeout=300)
examples/visual_gen/hf_ltx2.py-145-149 (1)

145-149: ⚠️ Potential issue | 🟠 Major

Bug: hardcoded frame_rate=24.0 ignores the function parameter.

The frame_rate parameter is accepted by test_ltx2_baseline (Line 26) but the save call at Line 148 hardcodes 24.0 instead of using the frame_rate variable.

🐛 Proposed fix
     OutputHandler.save(
-        output=MediaOutput(video=video, audio=audio), output_path=output_path, frame_rate=24.0
+        output=MediaOutput(video=video, audio=audio), output_path=output_path, frame_rate=frame_rate
     )
tensorrt_llm/_torch/visual_gen/parallelism.py-68-100 (1)

68-100: ⚠️ Potential issue | 🟠 Major

Ranks outside cfg_size × ulysses_size get use_parallelism=True with pg=None.

When world_size > cfg_size * ulysses_size (e.g., world_size=8, cfg_size=2, ulysses_size=2 → only ranks 0–3 are assigned), ranks 4–7 iterate the loop without matching any group. They'll get ulysses_pg=None and ulysses_rank=0, but the function still returns use_parallelism=True. Downstream code that checks use_parallelism and then uses parallelism_pg could crash on None.

Consider either:

  1. Raising an error when total_parallel != world_size (if all ranks must participate), or
  2. Returning use_parallelism=False for orphan ranks, or
  3. Adding a clear check/warning for this scenario.
♻️ Option 1: Enforce exact match
     total_parallel = cfg_size * ulysses_size
-    if total_parallel > world_size:
+    if total_parallel != world_size:
         raise ValueError(
-            f"cfg_size ({cfg_size}) * ulysses_size ({ulysses_size}) = "
-            f"{total_parallel} exceeds world_size ({world_size})"
+            f"cfg_size ({cfg_size}) * ulysses_size ({ulysses_size}) = "
+            f"{total_parallel} must equal world_size ({world_size})"
         )
examples/visual_gen/serve/async_video_gen.py-78-83 (1)

78-83: ⚠️ Potential issue | 🟠 Major

Resource leak: file handle opened but never closed.

The file opened at Line 83 is passed directly to create_params and never explicitly closed. If an exception occurs before or during the API call, the handle leaks.

🛡️ Proposed fix
         # Add input reference if provided (TI2V mode)
         if input_reference:
             if not Path(input_reference).exists():
                 print(f"\n❌ Error: Input reference image not found: {input_reference}")
                 return False
-            create_params["input_reference"] = open(input_reference, "rb")
+            ref_file = open(input_reference, "rb")
+            try:
+                create_params["input_reference"] = ref_file
+            except Exception:
+                ref_file.close()
+                raise

Alternatively, consider reading the file content into bytes beforehand if the API accepts it, or use a with block wrapping the entire request section.

tensorrt_llm/_torch/visual_gen/attention_backend/vanilla.py-96-114 (1)

96-114: ⚠️ Potential issue | 🟠 Major

Shape assertion will break if seq_len is a torch.Tensor.

The type hint allows Union[int, torch.Tensor], but q.shape[2] == seq_len produces a torch.Tensor (not bool) when seq_len is a tensor. This tensor in a Python and chain inside assert will raise RuntimeError: Boolean value of Tensor with more than one element is ambiguous (or silently pass for 0-dim tensors without proper truthiness). Either narrow the type to int or convert explicitly.

Option: convert to int before assertions
+        seq_len = int(seq_len) if isinstance(seq_len, torch.Tensor) else seq_len
         # Validate tensor shapes - flexible for Ulysses head sharding
tensorrt_llm/serve/visual_gen_utils.py-15-19 (1)

15-19: ⚠️ Potential issue | 🟠 Major

Python 3.10+ union syntax and id shadows built-in.

Line 16 uses X | Y | Z union syntax, which requires Python 3.10+ (or from __future__ import annotations). The coding guidelines require Python 3.8+ compatibility. Additionally, the id parameter shadows the built-in id() function, which the guidelines explicitly warn against.

Proposed fix
+from typing import Any, Dict, List, Optional, Union
+
 def parse_visual_gen_params(
-    request: ImageGenerationRequest | VideoGenerationRequest | ImageEditRequest,
-    id: str,
+    request: Union[ImageGenerationRequest, VideoGenerationRequest, ImageEditRequest],
+    request_id: str,
     media_storage_path: Optional[str] = None,
 ) -> VisualGenParams:

Then update all references to id within the function to request_id (Lines 54).

tensorrt_llm/commands/utils.py-28-30 (1)

28-30: ⚠️ Potential issue | 🟠 Major

local_dir parameter is declared but never used, and str | None requires Python 3.10+.

The local_dir parameter is never referenced in the function body (confirmed by static analysis). Additionally, str | None union syntax requires Python 3.10+, while the coding guidelines require Python 3.8+ compatibility — use Optional[str] instead.

Proposed fix
 def _maybe_download_model(
-    model_name_or_path: str, local_dir: str | None = None, download: bool = True
+    model_name_or_path: str, download: bool = True
 ) -> str:

If local_dir is intended for future use, at minimum fix the type annotation:

-    model_name_or_path: str, local_dir: str | None = None, download: bool = True
+    model_name_or_path: str, local_dir: Optional[str] = None, download: bool = True
tests/unittest/_torch/visual_gen/test_model_loader.py-12-30 (1)

12-30: ⚠️ Potential issue | 🟠 Major

Module-level assert in _llm_models_root() prevents collection of all tests when checkpoint paths are unavailable.

_llm_models_root() is called at module import time (Line 29) and raises AssertionError if none of the hardcoded paths exist and LLM_MODELS_ROOT is not set. This blocks pytest from collecting the entire module, including pure unit tests like test_diffusion_args_to_quant_config and test_diffusion_args_from_dict that don't require any checkpoint.

Proposed fix — defer checkpoint resolution and guard with pytest.skip
-def _llm_models_root() -> str:
-    """Return LLM_MODELS_ROOT path if it is set in env, assert when it's set but not a valid path."""
-    root = Path("/home/scratch.trt_llm_data_ci/llm-models/")
-    if "LLM_MODELS_ROOT" in os.environ:
-        root = Path(os.environ["LLM_MODELS_ROOT"])
-    if not root.exists():
-        root = Path("/scratch.trt_llm_data/llm-models/")
-    assert root.exists(), (
-        "You shall set LLM_MODELS_ROOT env or be able to access scratch.trt_llm_data to run this test"
-    )
-    return str(root)
+def _llm_models_root() -> Optional[str]:
+    """Return LLM_MODELS_ROOT path if available, or None."""
+    if "LLM_MODELS_ROOT" in os.environ:
+        root = Path(os.environ["LLM_MODELS_ROOT"])
+        if root.exists():
+            return str(root)
+    for candidate in [
+        "/home/scratch.trt_llm_data_ci/llm-models/",
+        "/scratch.trt_llm_data/llm-models/",
+    ]:
+        if Path(candidate).exists():
+            return candidate
+    return None
 
 
-# Skip if checkpoint not available
-# Set DIFFUSION_MODEL_PATH env var to run integration tests
-CHECKPOINT_PATH = os.environ.get(
-    "DIFFUSION_MODEL_PATH",
-    os.path.join(_llm_models_root(), "Wan2.1-T2V-1.3B-Diffusers"),
-)
+def _get_checkpoint_path() -> Optional[str]:
+    if "DIFFUSION_MODEL_PATH" in os.environ:
+        return os.environ["DIFFUSION_MODEL_PATH"]
+    root = _llm_models_root()
+    if root:
+        return os.path.join(root, "Wan2.1-T2V-1.3B-Diffusers")
+    return None
+
+CHECKPOINT_PATH = _get_checkpoint_path()
tests/unittest/_torch/visual_gen/test_wan_i2v.py-1076-1082 (1)

1076-1082: ⚠️ Potential issue | 🟠 Major

test_mismatched_image_size catches all exceptions, making it a no-op test.

The bare except Exception on Line 1080 catches any error (including unrelated bugs like AttributeError or TypeError) and prints a success message. This means the test passes regardless of what happens, providing zero value.

If the intent is to verify the model handles non-standard image sizes gracefully, restrict the catch to the expected error type or verify the specific behavior:

Proposed fix
             try:
                 image_embeds = pipeline._encode_image(small_image)
                 assert image_embeds is not None
                 print("\n✓ Handled non-standard image size gracefully")
-            except Exception as e:
-                # Some error is expected
-                print(f"\n✓ Raised appropriate error for mismatched size: {type(e).__name__}")
+            except (ValueError, RuntimeError) as e:
+                # These specific errors are expected for mismatched sizes
+                print(f"\n✓ Raised appropriate error for mismatched size: {type(e).__name__}")
tensorrt_llm/_torch/visual_gen/quantization/loader.py-167-181 (1)

167-181: ⚠️ Potential issue | 🟠 Major

Hardcoded .cuda() may place tensors on wrong device in multi-GPU setups; potential None group_size.

  1. Line 171: weight.cuda() moves the tensor to the default CUDA device (cuda:0). In multi-GPU or distributed setups, the model may reside on a different device. Consider passing the target device explicitly or inferring it from the module's existing parameters.

  2. Line 176: self.quant_config.group_size may be None if not explicitly configured, which would pass None as block_size to quantize_fp8_blockwise. Add a fallback:

Proposed fix
-        # Move to GPU only if needed
-        if weight.device.type != "cuda":
-            weight = weight.cuda()
+        # Move to GPU only if needed (use current CUDA device)
+        if weight.device.type != "cuda":
+            weight = weight.to(torch.cuda.current_device())
 
         if quant_algo == QuantAlgo.FP8:
             qweight, scale = quantize_fp8_per_tensor(weight)
         elif quant_algo == QuantAlgo.FP8_BLOCK_SCALES:
-            block_size = self.quant_config.group_size if self.quant_config else 128
+            block_size = (
+                getattr(self.quant_config, "group_size", None) if self.quant_config else None
+            ) or 128
             qweight, scale = quantize_fp8_blockwise(weight, block_size=block_size)
tensorrt_llm/serve/media_storage.py-356-371 (1)

356-371: ⚠️ Potential issue | 🟠 Major

Fragile path manipulation and overly broad exception handler.

Two concerns here:

  1. output_path.replace(".mp4", ".png") (Lines 362, 370) performs a naive string replace that will mangle paths containing .mp4 in directory names (e.g., ./mp4_outputs/video.mp4./png_outputs/video.png). Use os.path.splitext for safe extension replacement.

  2. The bare except Exception (Line 364) silently swallows all errors—including programming bugs (e.g., TypeError, AttributeError)—and falls back to saving a single PNG frame without raising. This hides real issues. Consider catching only av-specific encoding errors or at minimum re-logging with the full traceback at a higher severity.

Proposed fix for path manipulation
-            png_path = output_path.replace(".mp4", ".png")
+            base, _ = os.path.splitext(output_path)
+            png_path = base + ".png"

Apply this pattern on both Lines 362 and 370.

tensorrt_llm/_torch/visual_gen/models/wan/pipeline_wan.py-182-184 (1)

182-184: ⚠️ Potential issue | 🟠 Major

Storing entire weights dict keeps large tensors in memory.

self._weights_dict = weights (Line 184) holds a reference to the full checkpoint weights dictionary. Since the weights are loaded into the model modules, this reference likely prevents garbage collection of potentially multi-GB tensors. If this is only needed for deferred transformer_2 loading, consider clearing it after loading completes (e.g., at the end of load_weights or in post_load_weights).

Proposed fix
     def load_weights(self, weights: dict) -> None:
         # Store weights for later use (in case transformer_2 is created after this call)
         self._weights_dict = weights
         ...
         # At the end of load_weights:
+        # Clear reference to avoid keeping large checkpoint in memory
+        self._weights_dict = None
tests/unittest/_torch/visual_gen/test_wan_i2v.py-54-71 (1)

54-71: ⚠️ Potential issue | 🟠 Major

Module-level _llm_models_root() call will crash imports in environments without scratch paths.

CHECKPOINT_PATH is evaluated at module import time (Line 68–71), calling _llm_models_root() which has an assert root.exists(). This means merely importing this test module (e.g., during pytest collection) will raise AssertionError on machines that lack both /home/scratch.trt_llm_data_ci/ and /scratch.trt_llm_data/ and don't set LLM_MODELS_ROOT.

Consider deferring the path resolution to a fixture or using a pytest.importorskip-style guard, or replacing the assert with a graceful fallback (e.g., returning None and letting fixtures skip):

Proposed fix
 def _llm_models_root() -> str:
-    """Return LLM_MODELS_ROOT path if it is set in env, assert when it's set but not a valid path."""
+    """Return LLM_MODELS_ROOT path, or None if unavailable."""
     root = Path("/home/scratch.trt_llm_data_ci/llm-models/")
     if "LLM_MODELS_ROOT" in os.environ:
         root = Path(os.environ["LLM_MODELS_ROOT"])
     if not root.exists():
         root = Path("/scratch.trt_llm_data/llm-models/")
-    assert root.exists(), (
-        "You shall set LLM_MODELS_ROOT env or be able to access scratch.trt_llm_data to run this test"
-    )
+    if not root.exists():
+        return None
     return str(root)
 
 
 # Checkpoint paths
+_models_root = _llm_models_root()
 CHECKPOINT_PATH = os.environ.get(
     "DIFFUSION_MODEL_PATH",
-    os.path.join(_llm_models_root(), "Wan2.2-I2V-A14B-Diffusers"),
+    os.path.join(_models_root, "Wan2.2-I2V-A14B-Diffusers") if _models_root else "",
 )
tensorrt_llm/serve/openai_server.py-1248-1256 (1)

1248-1256: ⚠️ Potential issue | 🟠 Major

Handle multi-image outputs when saving to disk.

When num_images_per_prompt > 1, output.image can be a list, and save_image() expects a single tensor. Save each image separately and use the list for the response.

🛠️ Proposed fix
-            output_images = output.image
-            MediaStorage.save_image(
-                output_images,
-                self.media_storage_path / f"{image_id}.png",
-            )
-
-            if not isinstance(output_images, list):
-                output_images = [output_images]
+            output_images = output.image
+            images = output_images if isinstance(output_images, list) else [output_images]
+            for idx, image in enumerate(images):
+                MediaStorage.save_image(
+                    image,
+                    self.media_storage_path / f"{image_id}_{idx}.png",
+                )
...
-                data = [
+                data = [
                     ImageObject(
                         b64_json=base64.b64encode(MediaStorage.convert_image_to_bytes(image)).decode('utf-8'),
                         revised_prompt=request.prompt
-                    ) for image in output_images
+                    ) for image in images
                 ]

Also applies to: 1310-1318

tensorrt_llm/_torch/visual_gen/executor.py-132-139 (1)

132-139: ⚠️ Potential issue | 🟠 Major

Guard against diffusion_config=None before .copy().

diffusion_config is optional; the default None path will raise AttributeError and prevent any worker from starting.

🛠️ Proposed fix
-            config_dict = self.diffusion_config.copy()
+            config_dict = (self.diffusion_config or {}).copy()
tensorrt_llm/_torch/visual_gen/config.py-1-1 (1)

1-1: ⚠️ Potential issue | 🟠 Major

Add NVIDIA Apache-2.0 header at the top of this new file.

🛠️ Proposed fix
+ # Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ #
+ # Licensed under the Apache License, Version 2.0 (the "License");
+ # you may not use this file except in compliance with the License.
+ # You may obtain a copy of the License at
+ #
+ #     http://www.apache.org/licenses/LICENSE-2.0
+ #
+ # Unless required by applicable law or agreed to in writing, software
+ # distributed under the License is distributed on an "AS IS" BASIS,
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ # See the License for the specific language governing permissions and
+ # limitations under the License.
+
  import json

As per coding guidelines: "All source files must contain an NVIDIA copyright header with the year of latest meaningful modification. Use the Apache License 2.0 format."

tensorrt_llm/serve/openai_server.py-475-479 (1)

475-479: ⚠️ Potential issue | 🟠 Major

Guard /metrics for generators that lack get_stats_async().

register_visual_gen_routes() exposes /metrics, but VisualGen does not implement get_stats_async(), causing 500s. Return an empty list or add a stub.

🛠️ Proposed fix
     async def get_iteration_stats(self) -> JSONResponse:
-        stats = []
-        async for stat in self.generator.get_stats_async(2):
-            stats.append(stat)
-        return JSONResponse(content=stats)
+        if not hasattr(self.generator, "get_stats_async"):
+            return JSONResponse(content=[])
+        stats = []
+        async for stat in self.generator.get_stats_async(2):
+            stats.append(stat)
+        return JSONResponse(content=stats)
tensorrt_llm/serve/openai_server.py-1-1 (1)

1-1: ⚠️ Potential issue | 🟠 Major

Add/update the NVIDIA Apache-2.0 header for this modified source file.

🛠️ Proposed fix
+ # Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ #
+ # Licensed under the Apache License, Version 2.0 (the "License");
+ # you may not use this file except in compliance with the License.
+ # You may obtain a copy of the License at
+ #
+ #     http://www.apache.org/licenses/LICENSE-2.0
+ #
+ # Unless required by applicable law or agreed to in writing, software
+ # distributed under the License is distributed on an "AS IS" BASIS,
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ # See the License for the specific language governing permissions and
+ # limitations under the License.
+
  #!/usr/bin/env python

As per coding guidelines: "All source files must contain an NVIDIA copyright header with the year of latest meaningful modification. Use the Apache License 2.0 format."

tensorrt_llm/_torch/visual_gen/executor.py-1-1 (1)

1-1: ⚠️ Potential issue | 🟠 Major

Add NVIDIA Apache-2.0 header at the top of this new file.

🛠️ Proposed fix
+ # Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ #
+ # Licensed under the Apache License, Version 2.0 (the "License");
+ # you may not use this file except in compliance with the License.
+ # You may obtain a copy of the License at
+ #
+ #     http://www.apache.org/licenses/LICENSE-2.0
+ #
+ # Unless required by applicable law or agreed to in writing, software
+ # distributed under the License is distributed on an "AS IS" BASIS,
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ # See the License for the specific language governing permissions and
+ # limitations under the License.
+
  import os

As per coding guidelines: "All source files must contain an NVIDIA copyright header with the year of latest meaningful modification. Use the Apache License 2.0 format."

tensorrt_llm/llmapi/visual_gen.py-1-1 (1)

1-1: ⚠️ Potential issue | 🟠 Major

Add NVIDIA Apache-2.0 header at the top of this new file.

🛠️ Proposed fix
+ # Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ #
+ # Licensed under the Apache License, Version 2.0 (the "License");
+ # you may not use this file except in compliance with the License.
+ # You may obtain a copy of the License at
+ #
+ #     http://www.apache.org/licenses/LICENSE-2.0
+ #
+ # Unless required by applicable law or agreed to in writing, software
+ # distributed under the License is distributed on an "AS IS" BASIS,
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ # See the License for the specific language governing permissions and
+ # limitations under the License.
+
  import asyncio

As per coding guidelines: "All source files must contain an NVIDIA copyright header with the year of latest meaningful modification. Use the Apache License 2.0 format."

tensorrt_llm/serve/openai_server.py-189-194 (1)

189-194: ⚠️ Potential issue | 🟠 Major

Avoid a fixed /tmp storage path without ownership/permission hardening.

/tmp is world-writable and can be abused via symlinks. Prefer a private temp dir or enforce restrictive permissions.

🛠️ Proposed fix
+import tempfile
...
     def _init_visual_gen(self):
         self.processor = None
         self.model_config = None
-        self.media_storage_path = Path(os.getenv("TRTLLM_MEDIA_STORAGE_PATH", "/tmp/trtllm_generated"))
-        self.media_storage_path.mkdir(exist_ok=True, parents= True)
+        storage_env = os.getenv("TRTLLM_MEDIA_STORAGE_PATH")
+        if storage_env:
+            self.media_storage_path = Path(storage_env)
+        else:
+            self.media_storage_path = Path(tempfile.mkdtemp(prefix="trtllm_generated_"))
+        self.media_storage_path.mkdir(exist_ok=True, parents=True, mode=0o700)
         self.video_gen_tasks = {}
tensorrt_llm/_torch/visual_gen/teacache.py-1-1 (1)

1-1: ⚠️ Potential issue | 🟠 Major

Add NVIDIA Apache-2.0 header at the top of this new file.

🛠️ Proposed fix
+ # Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ #
+ # Licensed under the Apache License, Version 2.0 (the "License");
+ # you may not use this file except in compliance with the License.
+ # You may obtain a copy of the License at
+ #
+ #     http://www.apache.org/licenses/LICENSE-2.0
+ #
+ # Unless required by applicable law or agreed to in writing, software
+ # distributed under the License is distributed on an "AS IS" BASIS,
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ # See the License for the specific language governing permissions and
+ # limitations under the License.
+
  import inspect

As per coding guidelines: "All source files must contain an NVIDIA copyright header with the year of latest meaningful modification. Use the Apache License 2.0 format."

tensorrt_llm/_torch/visual_gen/pipeline.py-1-1 (1)

1-1: ⚠️ Potential issue | 🟠 Major

Add NVIDIA Apache-2.0 header at the top of this new file.

🛠️ Proposed fix
+ # Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ #
+ # Licensed under the Apache License, Version 2.0 (the "License");
+ # you may not use this file except in compliance with the License.
+ # You may obtain a copy of the License at
+ #
+ #     http://www.apache.org/licenses/LICENSE-2.0
+ #
+ # Unless required by applicable law or agreed to in writing, software
+ # distributed under the License is distributed on an "AS IS" BASIS,
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ # See the License for the specific language governing permissions and
+ # limitations under the License.
+
  import time

As per coding guidelines: "All source files must contain an NVIDIA copyright header with the year of latest meaningful modification. Use the Apache License 2.0 format."

tensorrt_llm/serve/openai_server.py-1520-1526 (1)

1520-1526: ⚠️ Potential issue | 🟠 Major

Mark async jobs as failed when output.video is missing.

Returning an error response from the background task doesn’t update the job status, leaving it stuck in queued.

🛠️ Proposed fix
             if output.video is None:
-                return self.create_error_response(
-                    message="Video generation failed",
-                    err_type="InternalServerError",
-                    status_code=HTTPStatus.INTERNAL_SERVER_ERROR,
-                )
+                job = await VIDEO_STORE.get(video_id)
+                if job:
+                    job.status = "failed"
+                    job.completed_at = int(time.time())
+                    job.error = "Video generation failed"
+                    await VIDEO_STORE.upsert(video_id, job)
+                return
tensorrt_llm/commands/serve.py-580-585 (1)

580-585: ⚠️ Potential issue | 🟠 Major

Guard against yaml.safe_load() returning None.

Empty YAML files yield None, which will cause:

  • Line 585: TypeError/AttributeError in update_llm_args_with_extra_dict() when attempting membership test on None
  • Line 635: dict.update() raises TypeError: 'NoneType' object is not iterable
Proposed fix
        if extra_llm_api_options is not None:
            with open(extra_llm_api_options, 'r') as f:
-                llm_args_extra_dict = yaml.safe_load(f)
+                llm_args_extra_dict = yaml.safe_load(f) or {}

...
        if extra_visual_gen_options is not None:
            with open(extra_visual_gen_options, 'r') as f:
-                visual_gen_extra_args = yaml.safe_load(f)
+                visual_gen_extra_args = yaml.safe_load(f) or {}
tensorrt_llm/serve/openai_server.py-1222-1241 (1)

1222-1241: ⚠️ Potential issue | 🟠 Major

Offload blocking generate() calls from the event loop.

These async endpoints call the synchronous VisualGen.generate() directly, which blocks the event loop and will make the server unresponsive under load. Use asyncio.to_thread() to keep the server responsive, consistent with how blocking operations are handled elsewhere in the codebase (e.g., input_processor).

🛠️ Proposed fix
-            output = self.generator.generate(inputs=inputs, params=params)
+            output = await asyncio.to_thread(self.generator.generate, inputs=inputs, params=params)

Apply the same change at lines 1240, 1302, and 1362.

tensorrt_llm/_torch/visual_gen/pipeline.py-52-55 (1)

52-55: ⚠️ Potential issue | 🟠 Major

Fix the device property—nn.Module does not have a .device attribute.

Accessing self.transformer.device will raise AttributeError. Derive the device from the module's parameters instead, with proper None-handling.

Proposed fix
     `@property`
     def device(self):
-        return self.transformer.device
+        if self.transformer is None:
+            return torch.device("cpu")
+        try:
+            return next(self.transformer.parameters()).device
+        except StopIteration:
+            return torch.device("cpu")

QiJune and others added 2 commits February 11, 2026 19:18
Signed-off-by: Freddy Qi <junq@nvidia.com>

[TRTLLM-10629][feat] Basic trtllm-serve functionality support for AIGV

Signed-off-by: Junyi Xu <junyix@nvidia.com>

unify outputs dict to MediaOutput class

Signed-off-by: Freddy Qi <junq@nvidia.com>

[TRTLLM-10630][feat] Support text+image to video generation

Signed-off-by: Junyi Xu <junyix@nvidia.com>

[None][fix] Add dependency to requirements.txt for media storage

Signed-off-by: Junyi Xu <junyix@nvidia.com>

Wan modeling + dynamic quant + custom attention support

Signed-off-by: Olivia Stoner <ostoner@nvidia.com>

[None][chore] format code

Signed-off-by: Freddy Qi <junq@nvidia.com>

Support HuggingFace Hub model IDs for checkpoint loading

Signed-off-by: Chang Liu (Enterprise Products) <liuc@nvidia.com>

Remove PyAV pkg

Signed-off-by: Chang Liu (Enterprise Products) <liuc@nvidia.com>

Wan 2.2 + I2V updates

Signed-off-by: Olivia Stoner <ostoner@nvidia.com>

[TRTLLM-10898][chore] Unify VisualGen generate inputs to LLM

Signed-off-by: Junyi Xu <junyix@nvidia.com>

[None][feat] support ulysses parallelism in WAN

Signed-off-by: Freddy Qi <junq@nvidia.com>

Implement VBench CI Test

Signed-off-by: Yibin Li <yibinl@nvidia.com>

[None][feat] Add e2e/endpoints tests for trtllm-serve

Signed-off-by: Junyi Xu <junyix@nvidia.com>

Use llm_model_root for ckpt path for CI

Signed-off-by: Chang Liu (Enterprise Products) <liuc@nvidia.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
@JunyiXu-nv
Copy link
Collaborator

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35726 [ run ] triggered by Bot. Commit: 515354c

Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Add nosec B108 suppression since the path is configurable via
TRTLLM_MEDIA_STORAGE_PATH environment variable.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
@tensorrt-cicd
Copy link
Collaborator

PR_Github #35726 [ run ] completed with state FAILURE. Commit: 515354c
/LLM/main/L0_MergeRequest_PR pipeline #27592 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
@chang-l
Copy link
Collaborator Author

chang-l commented Feb 12, 2026

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35848 [ run ] triggered by Bot. Commit: 6cdd728

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35848 [ run ] completed with state SUCCESS. Commit: 6cdd728
/LLM/main/L0_MergeRequest_PR pipeline #27687 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@chang-l
Copy link
Collaborator Author

chang-l commented Feb 13, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35868 [ run ] triggered by Bot. Commit: 6cdd728

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35868 [ run ] completed with state SUCCESS. Commit: 6cdd728
/LLM/main/L0_MergeRequest_PR pipeline #27700 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@chang-l
Copy link
Collaborator Author

chang-l commented Feb 13, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35875 [ run ] triggered by Bot. Commit: 6cdd728

This reverts commit 3bc17e1.

Signed-off-by: Zhenhua Wang <zhenhuaw@nvidia.com>
@tensorrt-cicd
Copy link
Collaborator

PR_Github #35875 [ run ] completed with state FAILURE. Commit: 6cdd728
/LLM/main/L0_MergeRequest_PR pipeline #27706 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@chang-l
Copy link
Collaborator Author

chang-l commented Feb 13, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35882 [ run ] triggered by Bot. Commit: f55056f

Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
@chang-l
Copy link
Collaborator Author

chang-l commented Feb 13, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35884 [ run ] triggered by Bot. Commit: b828cde

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35884 [ run ] completed with state SUCCESS. Commit: b828cde
/LLM/main/L0_MergeRequest_PR pipeline #27713 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@JunyiXu-nv
Copy link
Collaborator

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35910 [ run ] triggered by Bot. Commit: b828cde

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35910 [ run ] completed with state SUCCESS. Commit: b828cde
/LLM/main/L0_MergeRequest_PR pipeline #27732 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@chang-l
Copy link
Collaborator Author

chang-l commented Feb 13, 2026

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35930 [ run ] triggered by Bot. Commit: b828cde

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35930 [ run ] completed with state SUCCESS. Commit: b828cde
/LLM/main/L0_MergeRequest_PR pipeline #27747 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@chang-l
Copy link
Collaborator Author

chang-l commented Feb 13, 2026

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35936 [ run ] triggered by Bot. Commit: b828cde

@pcastonguay pcastonguay enabled auto-merge (squash) February 13, 2026 21:39
@tensorrt-cicd
Copy link
Collaborator

PR_Github #35936 [ run ] completed with state SUCCESS. Commit: b828cde
/LLM/main/L0_MergeRequest_PR pipeline #27752 completed with status: 'SUCCESS'

@pcastonguay pcastonguay merged commit 26901e4 into NVIDIA:main Feb 13, 2026
5 checks passed
peihu-nv pushed a commit to peihu-nv/TensorRT-LLM that referenced this pull request Feb 19, 2026
…#11462)

Signed-off-by: Chang Liu (Enterprise Products) <liuc@nvidia.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Zhenhua Wang <zhenhuaw@nvidia.com>
Co-authored-by: Freddy Qi <junq@nvidia.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Zhenhua Wang <zhenhuaw@nvidia.com>
Signed-off-by: peihu-nv <259410613+peihu-nv@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants