[TRTLLM-10612][feat] Initial support of AIGV models in TRTLLM#11462
[TRTLLM-10612][feat] Initial support of AIGV models in TRTLLM#11462pcastonguay merged 13 commits intoNVIDIA:mainfrom
Conversation
📝 WalkthroughWalkthroughThis pull request adds comprehensive visual generation capabilities to TensorRT LLM, including distributed pipelines for text-to-video (WAN), text-to-image (FLUX2), and video+audio generation (LTX2). It introduces baseline examples using HuggingFace Diffusers, a complete TensorRT LLM implementation with distributed execution, OpenAI-compatible API endpoints, quantization support, and optimization features like TeaCache. Changes
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
tensorrt_llm/_torch/modules/linear.py (1)
1239-1270:⚠️ Potential issue | 🔴 CriticalReshape logic crashes for
Fp4QuantizedTensorandtupleinputs.The
apply()method acceptsinput: Union[torch.Tensor, Fp4QuantizedTensor](passed fromLinear.forward()), but the reshape block (lines 1241–1245) runs before_input_prepare()and calls.shapeand.dim()unconditionally:
- For
tupleinputs:.shape(line 1243) →AttributeError- For
Fp4QuantizedTensorinputs:.dim()(line 1244) →AttributeError(has.shapeproperty but no.dim()method)Guard the reshape to only apply to plain
torch.Tensorinputs:Proposed fix
def apply(self, module: Linear, input: torch.Tensor, bias: Optional[torch.Tensor]): - # Handle multi-dimensional inputs (e.g., 3D: batch, seq, hidden) - # GEMM ops require 2D matrices - original_shape = input.shape - if input.dim() > 2: - input = input.reshape(-1, input.shape[-1]) - + # Handle multi-dimensional inputs (e.g., 3D: batch, seq, hidden) + # GEMM ops require 2D matrices + original_shape = None + if isinstance(input, torch.Tensor) and input.dim() > 2: + original_shape = input.shape + input = input.reshape(-1, input.shape[-1]) + act_fp4, act_sf = self._input_prepare(module, input) # Use unified interface - supports CUTLASS, cuBLASLt, CuteDSL ... - # Reshape output back to original shape (with out_features as last dim) - if len(original_shape) > 2: - output = output.reshape(*original_shape[:-1], output.shape[-1]) - + # Reshape output back to original shape (with out_features as last dim) + if original_shape is not None and len(original_shape) > 2: + output = output.reshape(*original_shape[:-1], output.shape[-1]) + if bias is not None:tensorrt_llm/commands/serve.py (1)
1-1:⚠️ Potential issue | 🟠 MajorAdd/update the NVIDIA Apache-2.0 header for this modified source file.
🛠️ Proposed fix
+ # Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved. + # + # Licensed under the Apache License, Version 2.0 (the "License"); + # you may not use this file except in compliance with the License. + # You may obtain a copy of the License at + # + # http://www.apache.org/licenses/LICENSE-2.0 + # + # Unless required by applicable law or agreed to in writing, software + # distributed under the License is distributed on an "AS IS" BASIS, + # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + # See the License for the specific language governing permissions and + # limitations under the License. + import asyncioAs per coding guidelines: "All source files must contain an NVIDIA copyright header with the year of latest meaningful modification. Use the Apache License 2.0 format."
🤖 Fix all issues with AI agents
In `@tensorrt_llm/_torch/visual_gen/attention_backend/trtllm.py`:
- Around line 218-226: The code uses seq_len (and seq_len_kv) which may be
torch.Tensor scalars when computing shapes for q.view/k.view/v.view and when
calling self.metadata.prepare; coerce these to Python ints before any arithmetic
or .view() calls (e.g., convert seq_len and seq_len_kv to int(seq_len.item())
when they are tensors) so batch_size * seq_len yields an int and .view()
receives integer shape args; update the section around q = q.view(...), k =
k.view(...), v = v.view(...) and the call to self.metadata.prepare(batch_size,
seq_len) to use the converted int values.
In `@tensorrt_llm/_torch/visual_gen/executor.py`:
- Around line 164-178: The broadcast loop in serve_forever currently lets rank 0
continue when the queue is empty, causing other ranks to still call
dist.broadcast_object_list and deadlock; change the rank 0 path (using self.rank
and self.requests_ipc) to block on receiving a request (use a blocking get()
instead of poll+continue) so rank 0 always produces a value for obj_list before
calling dist.broadcast_object_list, ensuring all ranks participate in each
broadcast and stay synchronized.
In `@tensorrt_llm/commands/utils.py`:
- Around line 18-25: In _get_lock, os.makedirs is creating the parent of
lock_dir instead of the lock_dir itself; change the call in the _get_lock
function to ensure lock_dir (not os.path.dirname(lock_dir)) is created (use the
lock_dir variable, preserving exist_ok=True and any intended mode/permissions)
before constructing the FileLock with lock_file_name and temp_dir fallback
logic.
- Around line 72-95: The function is_diffusers_model_path currently annotates
its return type as -> True but returns boolean values; change the return
annotation to -> bool and update the docstring to accurately reflect it returns
a boolean (true if model_index.json exists and contains "_diffusers_version",
false otherwise) so the signature and documentation match the implementation in
is_diffusers_model_path.
In `@tensorrt_llm/serve/visual_gen_utils.py`:
- Around line 53-56: The code assumes media_storage_path is a string and
request.input_reference is an UploadFile; update the block handling
request.input_reference in visual_gen_utils.py (the params.input_reference
assignment and file copy) to first ensure media_storage_path is not None (either
raise a clear ValueError or create/use a temporary directory) and then handle
both allowed types for request.input_reference (if it's a str treat it as an
existing path and set params.input_reference to that path; if it’s an UploadFile
use os.path.join(media_storage_path, f"{id}_reference.png") and
shutil.copyfileobj(request.input_reference.file, open_path)). Ensure you only
call .file when request.input_reference has that attribute and only call
os.path.join with a non-None media_storage_path.
In `@tests/unittest/_torch/visual_gen/test_attention_integration.py`:
- Around line 399-419: The test currently prints results but doesn't fail on
mismatches; update the test_trtllm_cached_prepare logic to assert the checks
instead of only printing: for the per-iteration equality check (compare
outputs_integrated[i] vs outputs_naive[i] and the boolean all_passed), add an
assertion that each max-difference is below tolerance (e.g., assert diff < 1e-6)
or assert all_passed at the end; for the cross-iteration diversity check (the
outputs_differ boolean computed from outputs_integrated), add assert
outputs_differ to ensure outputs across iterations are not identical; reference
outputs_integrated, outputs_naive, all_passed and outputs_differ to locate and
implement the assertions.
- Around line 245-261: Replace the silent return-and-print pattern in
test_self_attention_equivalence (and the other tests
test_cross_attention_equivalence, test_trtllm_cached_prepare,
test_trtllm_varying_seq_len) with explicit pytest assertions: use
torch.testing.assert_allclose(out_naive, out_integrated, rtol=1e-2, atol=1e-3)
or assert is_close with a helpful failure message that includes max_diff and
mean_diff, and remove the final return; alternatively call pytest.fail(...) when
differences exceed thresholds so test failures are reported by pytest; locate
variables out_naive, out_integrated, is_close, max_diff, mean_diff in each test
to implement the assertion.
🟠 Major comments (37)
tensorrt_llm/_torch/visual_gen/models/wan/transformer_wan.py-366-373 (1)
366-373:⚠️ Potential issue | 🟠 MajorAvoid hard‑coding the text context length to 512.
The image/text split assumes a fixed 512‑token text length. If
max_sequence_lengthdiffers,image_context_lengthcan become negative or mis‑split the context. Please derive the text length from config or pass it explicitly.tensorrt_llm/_torch/visual_gen/modules/attention.py-1-3 (1)
1-3:⚠️ Potential issue | 🟠 MajorAdd the NVIDIA Apache 2.0 header to this new source file.
This file is new and currently lacks the required NVIDIA copyright/license header.
🔧 Suggested header (place above imports)
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# from enum import EnumAs per coding guidelines, "Include NVIDIA copyright header on ALL new files and update year on modified files."
tensorrt_llm/serve/openai_protocol.py-1-2 (1)
1-2:⚠️ Potential issue | 🟠 MajorAdd/update the NVIDIA Apache 2.0 header in this modified source file.
The file is a .py source file and should include the standard NVIDIA header with the latest modification year.
🔧 Suggested header placement (top of file)
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# # Adapted from # https://github.com/vllm-project/vllm/blob/4db5176d9758b720b05460c50ace3c01026eb158/vllm/entrypoints/openai/protocol.pyAs per coding guidelines, "Include NVIDIA copyright header on ALL new files and update year on modified files."
tensorrt_llm/_torch/visual_gen/models/wan/pipeline_wan_i2v.py-1-5 (1)
1-5:⚠️ Potential issue | 🟠 MajorAdd the NVIDIA Apache 2.0 header to this new source file.
This file is new and currently lacks the required NVIDIA copyright/license header.
🔧 Suggested header (place above imports)
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# import jsonAs per coding guidelines, "Include NVIDIA copyright header on ALL new files and update year on modified files."
tensorrt_llm/_torch/visual_gen/models/wan/transformer_wan.py-1-2 (1)
1-2:⚠️ Potential issue | 🟠 MajorAdd the NVIDIA Apache 2.0 header to this new source file.
This file is new and currently lacks the required NVIDIA copyright/license header.
🔧 Suggested header (place above imports)
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# import mathAs per coding guidelines, "Include NVIDIA copyright header on ALL new files and update year on modified files."
tensorrt_llm/_torch/visual_gen/modules/attention.py-12-55 (1)
12-55:⚠️ Potential issue | 🟠 MajorRuntime failure when
config=Nonedue to TYPE_CHECKING import.
DiffusionModelConfigis only imported underTYPE_CHECKING, but the constructor instantiates it at runtime. This will raiseNameErrorifconfigisn’t passed.🔧 Suggested fix
-from typing import TYPE_CHECKING, Optional, Tuple +from typing import Optional, Tuple ... -if TYPE_CHECKING: - from ..config import DiffusionModelConfig +from ..config import DiffusionModelConfigtensorrt_llm/_torch/visual_gen/modules/attention.py-16-185 (1)
16-185:⚠️ Potential issue | 🟠 Major
QKVMode.FUSE_KVis silently treated as separate QKV.
FUSE_KVexists in the enum but isn’t implemented; falling through to the separate‑QKV path can mis-handle fused KV weights. Consider raising until supported (or implement fused‑KV projections).🔧 Guard unsupported mode
def _init_qkv_proj(self) -> None: + if self.qkv_mode == QKVMode.FUSE_KV: + raise NotImplementedError("QKVMode.FUSE_KV is not implemented yet.") if self.qkv_mode == QKVMode.FUSE_QKV: qkv_out_dim = self.q_dim + 2 * self.kv_dimtensorrt_llm/_torch/visual_gen/models/wan/pipeline_wan_i2v.py-64-134 (1)
64-134:⚠️ Potential issue | 🟠 MajorRecompute
is_wan22and instantiatetransformer_2after readingmodel_index.json.
boundary_ratiocan be loaded frommodel_index.json, butis_wan22andtransformer_2are set earlier. If the config doesn’t includeboundary_ratio, Wan2.2 will be treated as Wan2.1 and the second transformer never created.🔧 Suggested update after parsing model_index.json
if "boundary_ratio" in model_index: self.boundary_ratio = model_index["boundary_ratio"] + self.is_wan22 = True logger.info(f"Found boundary_ratio in model_index.json: {self.boundary_ratio}") ... has_transformer_2 = ( transformer_2_spec is not None and transformer_2_spec[0] is not None ) logger.info(f"transformer_2 in model_index.json: {has_transformer_2}") + if has_transformer_2 and self.transformer_2 is None: + self.transformer_2 = WanTransformer3DModel(model_config=self.model_config)tests/unittest/_torch/visual_gen/test_trtllm_serve_e2e.py-142-156 (1)
142-156:⚠️ Potential issue | 🟠 MajorAdd request timeout and narrow the exception type in the readiness loop.
requests.getwithout a timeout can hang indefinitely, and the broadexcept Exceptionviolates the coding guideline requiring specific exception handling. Use a timeout and catchrequests.RequestException. Also apply timeouts to the other requests in this file.🔧 Safer readiness loop
while True: try: - if requests.get(url).status_code == 200: + if requests.get(url, timeout=5).status_code == 200: return - except Exception as err: + except requests.RequestException as err: result = self.proc.poll() if result is not None and result != 0: raise RuntimeError("Visual-gen server exited unexpectedly.") from errexamples/visual_gen/serve/sync_video_gen.py-91-103 (1)
91-103:⚠️ Potential issue | 🟠 MajorMissing timeout on
requests.postcalls.Both HTTP calls (lines 91 and 94) lack a
timeoutparameter. Video generation can take a long time, but the call should still have a generous timeout to avoid hanging indefinitely.Proposed fix
- response_video = requests.post(endpoint, data=form_data, files=files) + response_video = requests.post(endpoint, data=form_data, files=files, timeout=600)- response_video = requests.post( - endpoint, - json={...}, - ) + response_video = requests.post( + endpoint, + json={...}, + timeout=600, + )examples/visual_gen/serve/sync_video_gen.py-82-91 (1)
82-91:⚠️ Potential issue | 🟠 MajorResource leak: file handle never closed.
open(input_reference, "rb")on line 85 creates a file handle that is never explicitly closed. Use a context manager or close the handle after the request.Proposed fix
- # Add the file - ## Note: The content-type must be multipart/form-data. - files = { - "input_reference": ( - Path(input_reference).name, - open(input_reference, "rb"), - "multipart/form-data", - ) - } - - print("\n Uploading reference image and generating video...") - response_video = requests.post(endpoint, data=form_data, files=files) + # Add the file + ## Note: The content-type must be multipart/form-data. + with open(input_reference, "rb") as img_file: + files = { + "input_reference": ( + Path(input_reference).name, + img_file, + "multipart/form-data", + ) + } + + print("\n Uploading reference image and generating video...") + response_video = requests.post(endpoint, data=form_data, files=files, timeout=300)examples/visual_gen/hf_ltx2.py-145-149 (1)
145-149:⚠️ Potential issue | 🟠 MajorBug: hardcoded
frame_rate=24.0ignores the function parameter.The
frame_rateparameter is accepted bytest_ltx2_baseline(Line 26) but the save call at Line 148 hardcodes24.0instead of using theframe_ratevariable.🐛 Proposed fix
OutputHandler.save( - output=MediaOutput(video=video, audio=audio), output_path=output_path, frame_rate=24.0 + output=MediaOutput(video=video, audio=audio), output_path=output_path, frame_rate=frame_rate )tensorrt_llm/_torch/visual_gen/parallelism.py-68-100 (1)
68-100:⚠️ Potential issue | 🟠 MajorRanks outside
cfg_size × ulysses_sizegetuse_parallelism=Truewithpg=None.When
world_size > cfg_size * ulysses_size(e.g.,world_size=8,cfg_size=2,ulysses_size=2→ only ranks 0–3 are assigned), ranks 4–7 iterate the loop without matching any group. They'll getulysses_pg=Noneandulysses_rank=0, but the function still returnsuse_parallelism=True. Downstream code that checksuse_parallelismand then usesparallelism_pgcould crash onNone.Consider either:
- Raising an error when
total_parallel != world_size(if all ranks must participate), or- Returning
use_parallelism=Falsefor orphan ranks, or- Adding a clear check/warning for this scenario.
♻️ Option 1: Enforce exact match
total_parallel = cfg_size * ulysses_size - if total_parallel > world_size: + if total_parallel != world_size: raise ValueError( - f"cfg_size ({cfg_size}) * ulysses_size ({ulysses_size}) = " - f"{total_parallel} exceeds world_size ({world_size})" + f"cfg_size ({cfg_size}) * ulysses_size ({ulysses_size}) = " + f"{total_parallel} must equal world_size ({world_size})" )examples/visual_gen/serve/async_video_gen.py-78-83 (1)
78-83:⚠️ Potential issue | 🟠 MajorResource leak: file handle opened but never closed.
The file opened at Line 83 is passed directly to
create_paramsand never explicitly closed. If an exception occurs before or during the API call, the handle leaks.🛡️ Proposed fix
# Add input reference if provided (TI2V mode) if input_reference: if not Path(input_reference).exists(): print(f"\n❌ Error: Input reference image not found: {input_reference}") return False - create_params["input_reference"] = open(input_reference, "rb") + ref_file = open(input_reference, "rb") + try: + create_params["input_reference"] = ref_file + except Exception: + ref_file.close() + raiseAlternatively, consider reading the file content into bytes beforehand if the API accepts it, or use a
withblock wrapping the entire request section.tensorrt_llm/_torch/visual_gen/attention_backend/vanilla.py-96-114 (1)
96-114:⚠️ Potential issue | 🟠 MajorShape assertion will break if
seq_lenis atorch.Tensor.The type hint allows
Union[int, torch.Tensor], butq.shape[2] == seq_lenproduces atorch.Tensor(notbool) whenseq_lenis a tensor. This tensor in a Pythonandchain insideassertwill raiseRuntimeError: Boolean value of Tensor with more than one element is ambiguous(or silently pass for 0-dim tensors without proper truthiness). Either narrow the type tointor convert explicitly.Option: convert to int before assertions
+ seq_len = int(seq_len) if isinstance(seq_len, torch.Tensor) else seq_len # Validate tensor shapes - flexible for Ulysses head shardingtensorrt_llm/serve/visual_gen_utils.py-15-19 (1)
15-19:⚠️ Potential issue | 🟠 MajorPython 3.10+ union syntax and
idshadows built-in.Line 16 uses
X | Y | Zunion syntax, which requires Python 3.10+ (orfrom __future__ import annotations). The coding guidelines require Python 3.8+ compatibility. Additionally, theidparameter shadows the built-inid()function, which the guidelines explicitly warn against.Proposed fix
+from typing import Any, Dict, List, Optional, Union + def parse_visual_gen_params( - request: ImageGenerationRequest | VideoGenerationRequest | ImageEditRequest, - id: str, + request: Union[ImageGenerationRequest, VideoGenerationRequest, ImageEditRequest], + request_id: str, media_storage_path: Optional[str] = None, ) -> VisualGenParams:Then update all references to
idwithin the function torequest_id(Lines 54).tensorrt_llm/commands/utils.py-28-30 (1)
28-30:⚠️ Potential issue | 🟠 Major
local_dirparameter is declared but never used, andstr | Nonerequires Python 3.10+.The
local_dirparameter is never referenced in the function body (confirmed by static analysis). Additionally,str | Noneunion syntax requires Python 3.10+, while the coding guidelines require Python 3.8+ compatibility — useOptional[str]instead.Proposed fix
def _maybe_download_model( - model_name_or_path: str, local_dir: str | None = None, download: bool = True + model_name_or_path: str, download: bool = True ) -> str:If
local_diris intended for future use, at minimum fix the type annotation:- model_name_or_path: str, local_dir: str | None = None, download: bool = True + model_name_or_path: str, local_dir: Optional[str] = None, download: bool = Truetests/unittest/_torch/visual_gen/test_model_loader.py-12-30 (1)
12-30:⚠️ Potential issue | 🟠 MajorModule-level
assertin_llm_models_root()prevents collection of all tests when checkpoint paths are unavailable.
_llm_models_root()is called at module import time (Line 29) and raisesAssertionErrorif none of the hardcoded paths exist andLLM_MODELS_ROOTis not set. This blocks pytest from collecting the entire module, including pure unit tests liketest_diffusion_args_to_quant_configandtest_diffusion_args_from_dictthat don't require any checkpoint.Proposed fix — defer checkpoint resolution and guard with pytest.skip
-def _llm_models_root() -> str: - """Return LLM_MODELS_ROOT path if it is set in env, assert when it's set but not a valid path.""" - root = Path("/home/scratch.trt_llm_data_ci/llm-models/") - if "LLM_MODELS_ROOT" in os.environ: - root = Path(os.environ["LLM_MODELS_ROOT"]) - if not root.exists(): - root = Path("/scratch.trt_llm_data/llm-models/") - assert root.exists(), ( - "You shall set LLM_MODELS_ROOT env or be able to access scratch.trt_llm_data to run this test" - ) - return str(root) +def _llm_models_root() -> Optional[str]: + """Return LLM_MODELS_ROOT path if available, or None.""" + if "LLM_MODELS_ROOT" in os.environ: + root = Path(os.environ["LLM_MODELS_ROOT"]) + if root.exists(): + return str(root) + for candidate in [ + "/home/scratch.trt_llm_data_ci/llm-models/", + "/scratch.trt_llm_data/llm-models/", + ]: + if Path(candidate).exists(): + return candidate + return None -# Skip if checkpoint not available -# Set DIFFUSION_MODEL_PATH env var to run integration tests -CHECKPOINT_PATH = os.environ.get( - "DIFFUSION_MODEL_PATH", - os.path.join(_llm_models_root(), "Wan2.1-T2V-1.3B-Diffusers"), -) +def _get_checkpoint_path() -> Optional[str]: + if "DIFFUSION_MODEL_PATH" in os.environ: + return os.environ["DIFFUSION_MODEL_PATH"] + root = _llm_models_root() + if root: + return os.path.join(root, "Wan2.1-T2V-1.3B-Diffusers") + return None + +CHECKPOINT_PATH = _get_checkpoint_path()tests/unittest/_torch/visual_gen/test_wan_i2v.py-1076-1082 (1)
1076-1082:⚠️ Potential issue | 🟠 Major
test_mismatched_image_sizecatches all exceptions, making it a no-op test.The bare
except Exceptionon Line 1080 catches any error (including unrelated bugs likeAttributeErrororTypeError) and prints a success message. This means the test passes regardless of what happens, providing zero value.If the intent is to verify the model handles non-standard image sizes gracefully, restrict the catch to the expected error type or verify the specific behavior:
Proposed fix
try: image_embeds = pipeline._encode_image(small_image) assert image_embeds is not None print("\n✓ Handled non-standard image size gracefully") - except Exception as e: - # Some error is expected - print(f"\n✓ Raised appropriate error for mismatched size: {type(e).__name__}") + except (ValueError, RuntimeError) as e: + # These specific errors are expected for mismatched sizes + print(f"\n✓ Raised appropriate error for mismatched size: {type(e).__name__}")tensorrt_llm/_torch/visual_gen/quantization/loader.py-167-181 (1)
167-181:⚠️ Potential issue | 🟠 MajorHardcoded
.cuda()may place tensors on wrong device in multi-GPU setups; potentialNonegroup_size.
Line 171:
weight.cuda()moves the tensor to the default CUDA device (cuda:0). In multi-GPU or distributed setups, the model may reside on a different device. Consider passing the target device explicitly or inferring it from the module's existing parameters.Line 176:
self.quant_config.group_sizemay beNoneif not explicitly configured, which would passNoneasblock_sizetoquantize_fp8_blockwise. Add a fallback:Proposed fix
- # Move to GPU only if needed - if weight.device.type != "cuda": - weight = weight.cuda() + # Move to GPU only if needed (use current CUDA device) + if weight.device.type != "cuda": + weight = weight.to(torch.cuda.current_device()) if quant_algo == QuantAlgo.FP8: qweight, scale = quantize_fp8_per_tensor(weight) elif quant_algo == QuantAlgo.FP8_BLOCK_SCALES: - block_size = self.quant_config.group_size if self.quant_config else 128 + block_size = ( + getattr(self.quant_config, "group_size", None) if self.quant_config else None + ) or 128 qweight, scale = quantize_fp8_blockwise(weight, block_size=block_size)tensorrt_llm/serve/media_storage.py-356-371 (1)
356-371:⚠️ Potential issue | 🟠 MajorFragile path manipulation and overly broad exception handler.
Two concerns here:
output_path.replace(".mp4", ".png")(Lines 362, 370) performs a naive string replace that will mangle paths containing.mp4in directory names (e.g.,./mp4_outputs/video.mp4→./png_outputs/video.png). Useos.path.splitextfor safe extension replacement.The bare
except Exception(Line 364) silently swallows all errors—including programming bugs (e.g.,TypeError,AttributeError)—and falls back to saving a single PNG frame without raising. This hides real issues. Consider catching onlyav-specific encoding errors or at minimum re-logging with the full traceback at a higher severity.Proposed fix for path manipulation
- png_path = output_path.replace(".mp4", ".png") + base, _ = os.path.splitext(output_path) + png_path = base + ".png"Apply this pattern on both Lines 362 and 370.
tensorrt_llm/_torch/visual_gen/models/wan/pipeline_wan.py-182-184 (1)
182-184:⚠️ Potential issue | 🟠 MajorStoring entire weights dict keeps large tensors in memory.
self._weights_dict = weights(Line 184) holds a reference to the full checkpoint weights dictionary. Since the weights are loaded into the model modules, this reference likely prevents garbage collection of potentially multi-GB tensors. If this is only needed for deferredtransformer_2loading, consider clearing it after loading completes (e.g., at the end ofload_weightsor inpost_load_weights).Proposed fix
def load_weights(self, weights: dict) -> None: # Store weights for later use (in case transformer_2 is created after this call) self._weights_dict = weights ... # At the end of load_weights: + # Clear reference to avoid keeping large checkpoint in memory + self._weights_dict = Nonetests/unittest/_torch/visual_gen/test_wan_i2v.py-54-71 (1)
54-71:⚠️ Potential issue | 🟠 MajorModule-level
_llm_models_root()call will crash imports in environments without scratch paths.
CHECKPOINT_PATHis evaluated at module import time (Line 68–71), calling_llm_models_root()which has anassert root.exists(). This means merely importing this test module (e.g., during pytest collection) will raiseAssertionErroron machines that lack both/home/scratch.trt_llm_data_ci/and/scratch.trt_llm_data/and don't setLLM_MODELS_ROOT.Consider deferring the path resolution to a fixture or using a
pytest.importorskip-style guard, or replacing theassertwith a graceful fallback (e.g., returningNoneand letting fixtures skip):Proposed fix
def _llm_models_root() -> str: - """Return LLM_MODELS_ROOT path if it is set in env, assert when it's set but not a valid path.""" + """Return LLM_MODELS_ROOT path, or None if unavailable.""" root = Path("/home/scratch.trt_llm_data_ci/llm-models/") if "LLM_MODELS_ROOT" in os.environ: root = Path(os.environ["LLM_MODELS_ROOT"]) if not root.exists(): root = Path("/scratch.trt_llm_data/llm-models/") - assert root.exists(), ( - "You shall set LLM_MODELS_ROOT env or be able to access scratch.trt_llm_data to run this test" - ) + if not root.exists(): + return None return str(root) # Checkpoint paths +_models_root = _llm_models_root() CHECKPOINT_PATH = os.environ.get( "DIFFUSION_MODEL_PATH", - os.path.join(_llm_models_root(), "Wan2.2-I2V-A14B-Diffusers"), + os.path.join(_models_root, "Wan2.2-I2V-A14B-Diffusers") if _models_root else "", )tensorrt_llm/serve/openai_server.py-1248-1256 (1)
1248-1256:⚠️ Potential issue | 🟠 MajorHandle multi-image outputs when saving to disk.
When
num_images_per_prompt > 1,output.imagecan be a list, andsave_image()expects a single tensor. Save each image separately and use the list for the response.🛠️ Proposed fix
- output_images = output.image - MediaStorage.save_image( - output_images, - self.media_storage_path / f"{image_id}.png", - ) - - if not isinstance(output_images, list): - output_images = [output_images] + output_images = output.image + images = output_images if isinstance(output_images, list) else [output_images] + for idx, image in enumerate(images): + MediaStorage.save_image( + image, + self.media_storage_path / f"{image_id}_{idx}.png", + ) ... - data = [ + data = [ ImageObject( b64_json=base64.b64encode(MediaStorage.convert_image_to_bytes(image)).decode('utf-8'), revised_prompt=request.prompt - ) for image in output_images + ) for image in images ]Also applies to: 1310-1318
tensorrt_llm/_torch/visual_gen/executor.py-132-139 (1)
132-139:⚠️ Potential issue | 🟠 MajorGuard against diffusion_config=None before
.copy().
diffusion_configis optional; the defaultNonepath will raiseAttributeErrorand prevent any worker from starting.🛠️ Proposed fix
- config_dict = self.diffusion_config.copy() + config_dict = (self.diffusion_config or {}).copy()tensorrt_llm/_torch/visual_gen/config.py-1-1 (1)
1-1:⚠️ Potential issue | 🟠 MajorAdd NVIDIA Apache-2.0 header at the top of this new file.
🛠️ Proposed fix
+ # Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved. + # + # Licensed under the Apache License, Version 2.0 (the "License"); + # you may not use this file except in compliance with the License. + # You may obtain a copy of the License at + # + # http://www.apache.org/licenses/LICENSE-2.0 + # + # Unless required by applicable law or agreed to in writing, software + # distributed under the License is distributed on an "AS IS" BASIS, + # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + # See the License for the specific language governing permissions and + # limitations under the License. + import jsonAs per coding guidelines: "All source files must contain an NVIDIA copyright header with the year of latest meaningful modification. Use the Apache License 2.0 format."
tensorrt_llm/serve/openai_server.py-475-479 (1)
475-479:⚠️ Potential issue | 🟠 MajorGuard
/metricsfor generators that lackget_stats_async().
register_visual_gen_routes()exposes/metrics, butVisualGendoes not implementget_stats_async(), causing 500s. Return an empty list or add a stub.🛠️ Proposed fix
async def get_iteration_stats(self) -> JSONResponse: - stats = [] - async for stat in self.generator.get_stats_async(2): - stats.append(stat) - return JSONResponse(content=stats) + if not hasattr(self.generator, "get_stats_async"): + return JSONResponse(content=[]) + stats = [] + async for stat in self.generator.get_stats_async(2): + stats.append(stat) + return JSONResponse(content=stats)tensorrt_llm/serve/openai_server.py-1-1 (1)
1-1:⚠️ Potential issue | 🟠 MajorAdd/update the NVIDIA Apache-2.0 header for this modified source file.
🛠️ Proposed fix
+ # Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved. + # + # Licensed under the Apache License, Version 2.0 (the "License"); + # you may not use this file except in compliance with the License. + # You may obtain a copy of the License at + # + # http://www.apache.org/licenses/LICENSE-2.0 + # + # Unless required by applicable law or agreed to in writing, software + # distributed under the License is distributed on an "AS IS" BASIS, + # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + # See the License for the specific language governing permissions and + # limitations under the License. + #!/usr/bin/env pythonAs per coding guidelines: "All source files must contain an NVIDIA copyright header with the year of latest meaningful modification. Use the Apache License 2.0 format."
tensorrt_llm/_torch/visual_gen/executor.py-1-1 (1)
1-1:⚠️ Potential issue | 🟠 MajorAdd NVIDIA Apache-2.0 header at the top of this new file.
🛠️ Proposed fix
+ # Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved. + # + # Licensed under the Apache License, Version 2.0 (the "License"); + # you may not use this file except in compliance with the License. + # You may obtain a copy of the License at + # + # http://www.apache.org/licenses/LICENSE-2.0 + # + # Unless required by applicable law or agreed to in writing, software + # distributed under the License is distributed on an "AS IS" BASIS, + # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + # See the License for the specific language governing permissions and + # limitations under the License. + import osAs per coding guidelines: "All source files must contain an NVIDIA copyright header with the year of latest meaningful modification. Use the Apache License 2.0 format."
tensorrt_llm/llmapi/visual_gen.py-1-1 (1)
1-1:⚠️ Potential issue | 🟠 MajorAdd NVIDIA Apache-2.0 header at the top of this new file.
🛠️ Proposed fix
+ # Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved. + # + # Licensed under the Apache License, Version 2.0 (the "License"); + # you may not use this file except in compliance with the License. + # You may obtain a copy of the License at + # + # http://www.apache.org/licenses/LICENSE-2.0 + # + # Unless required by applicable law or agreed to in writing, software + # distributed under the License is distributed on an "AS IS" BASIS, + # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + # See the License for the specific language governing permissions and + # limitations under the License. + import asyncioAs per coding guidelines: "All source files must contain an NVIDIA copyright header with the year of latest meaningful modification. Use the Apache License 2.0 format."
tensorrt_llm/serve/openai_server.py-189-194 (1)
189-194:⚠️ Potential issue | 🟠 MajorAvoid a fixed
/tmpstorage path without ownership/permission hardening.
/tmpis world-writable and can be abused via symlinks. Prefer a private temp dir or enforce restrictive permissions.🛠️ Proposed fix
+import tempfile ... def _init_visual_gen(self): self.processor = None self.model_config = None - self.media_storage_path = Path(os.getenv("TRTLLM_MEDIA_STORAGE_PATH", "/tmp/trtllm_generated")) - self.media_storage_path.mkdir(exist_ok=True, parents= True) + storage_env = os.getenv("TRTLLM_MEDIA_STORAGE_PATH") + if storage_env: + self.media_storage_path = Path(storage_env) + else: + self.media_storage_path = Path(tempfile.mkdtemp(prefix="trtllm_generated_")) + self.media_storage_path.mkdir(exist_ok=True, parents=True, mode=0o700) self.video_gen_tasks = {}tensorrt_llm/_torch/visual_gen/teacache.py-1-1 (1)
1-1:⚠️ Potential issue | 🟠 MajorAdd NVIDIA Apache-2.0 header at the top of this new file.
🛠️ Proposed fix
+ # Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved. + # + # Licensed under the Apache License, Version 2.0 (the "License"); + # you may not use this file except in compliance with the License. + # You may obtain a copy of the License at + # + # http://www.apache.org/licenses/LICENSE-2.0 + # + # Unless required by applicable law or agreed to in writing, software + # distributed under the License is distributed on an "AS IS" BASIS, + # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + # See the License for the specific language governing permissions and + # limitations under the License. + import inspectAs per coding guidelines: "All source files must contain an NVIDIA copyright header with the year of latest meaningful modification. Use the Apache License 2.0 format."
tensorrt_llm/_torch/visual_gen/pipeline.py-1-1 (1)
1-1:⚠️ Potential issue | 🟠 MajorAdd NVIDIA Apache-2.0 header at the top of this new file.
🛠️ Proposed fix
+ # Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved. + # + # Licensed under the Apache License, Version 2.0 (the "License"); + # you may not use this file except in compliance with the License. + # You may obtain a copy of the License at + # + # http://www.apache.org/licenses/LICENSE-2.0 + # + # Unless required by applicable law or agreed to in writing, software + # distributed under the License is distributed on an "AS IS" BASIS, + # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + # See the License for the specific language governing permissions and + # limitations under the License. + import timeAs per coding guidelines: "All source files must contain an NVIDIA copyright header with the year of latest meaningful modification. Use the Apache License 2.0 format."
tensorrt_llm/serve/openai_server.py-1520-1526 (1)
1520-1526:⚠️ Potential issue | 🟠 MajorMark async jobs as failed when
output.videois missing.Returning an error response from the background task doesn’t update the job status, leaving it stuck in
queued.🛠️ Proposed fix
if output.video is None: - return self.create_error_response( - message="Video generation failed", - err_type="InternalServerError", - status_code=HTTPStatus.INTERNAL_SERVER_ERROR, - ) + job = await VIDEO_STORE.get(video_id) + if job: + job.status = "failed" + job.completed_at = int(time.time()) + job.error = "Video generation failed" + await VIDEO_STORE.upsert(video_id, job) + returntensorrt_llm/commands/serve.py-580-585 (1)
580-585:⚠️ Potential issue | 🟠 MajorGuard against
yaml.safe_load()returningNone.Empty YAML files yield
None, which will cause:
- Line 585: TypeError/AttributeError in
update_llm_args_with_extra_dict()when attempting membership test onNone- Line 635:
dict.update()raisesTypeError: 'NoneType' object is not iterableProposed fix
if extra_llm_api_options is not None: with open(extra_llm_api_options, 'r') as f: - llm_args_extra_dict = yaml.safe_load(f) + llm_args_extra_dict = yaml.safe_load(f) or {} ... if extra_visual_gen_options is not None: with open(extra_visual_gen_options, 'r') as f: - visual_gen_extra_args = yaml.safe_load(f) + visual_gen_extra_args = yaml.safe_load(f) or {}tensorrt_llm/serve/openai_server.py-1222-1241 (1)
1222-1241:⚠️ Potential issue | 🟠 MajorOffload blocking
generate()calls from the event loop.These async endpoints call the synchronous
VisualGen.generate()directly, which blocks the event loop and will make the server unresponsive under load. Useasyncio.to_thread()to keep the server responsive, consistent with how blocking operations are handled elsewhere in the codebase (e.g.,input_processor).🛠️ Proposed fix
- output = self.generator.generate(inputs=inputs, params=params) + output = await asyncio.to_thread(self.generator.generate, inputs=inputs, params=params)Apply the same change at lines 1240, 1302, and 1362.
tensorrt_llm/_torch/visual_gen/pipeline.py-52-55 (1)
52-55:⚠️ Potential issue | 🟠 MajorFix the device property—
nn.Moduledoes not have a.deviceattribute.Accessing
self.transformer.devicewill raiseAttributeError. Derive the device from the module's parameters instead, with proper None-handling.Proposed fix
`@property` def device(self): - return self.transformer.device + if self.transformer is None: + return torch.device("cpu") + try: + return next(self.transformer.parameters()).device + except StopIteration: + return torch.device("cpu")
Signed-off-by: Freddy Qi <junq@nvidia.com> [TRTLLM-10629][feat] Basic trtllm-serve functionality support for AIGV Signed-off-by: Junyi Xu <junyix@nvidia.com> unify outputs dict to MediaOutput class Signed-off-by: Freddy Qi <junq@nvidia.com> [TRTLLM-10630][feat] Support text+image to video generation Signed-off-by: Junyi Xu <junyix@nvidia.com> [None][fix] Add dependency to requirements.txt for media storage Signed-off-by: Junyi Xu <junyix@nvidia.com> Wan modeling + dynamic quant + custom attention support Signed-off-by: Olivia Stoner <ostoner@nvidia.com> [None][chore] format code Signed-off-by: Freddy Qi <junq@nvidia.com> Support HuggingFace Hub model IDs for checkpoint loading Signed-off-by: Chang Liu (Enterprise Products) <liuc@nvidia.com> Remove PyAV pkg Signed-off-by: Chang Liu (Enterprise Products) <liuc@nvidia.com> Wan 2.2 + I2V updates Signed-off-by: Olivia Stoner <ostoner@nvidia.com> [TRTLLM-10898][chore] Unify VisualGen generate inputs to LLM Signed-off-by: Junyi Xu <junyix@nvidia.com> [None][feat] support ulysses parallelism in WAN Signed-off-by: Freddy Qi <junq@nvidia.com> Implement VBench CI Test Signed-off-by: Yibin Li <yibinl@nvidia.com> [None][feat] Add e2e/endpoints tests for trtllm-serve Signed-off-by: Junyi Xu <junyix@nvidia.com> Use llm_model_root for ckpt path for CI Signed-off-by: Chang Liu (Enterprise Products) <liuc@nvidia.com>
5f999b6 to
515354c
Compare
|
/bot run |
|
PR_Github #35726 [ run ] triggered by Bot. Commit: |
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Add nosec B108 suppression since the path is configurable via TRTLLM_MEDIA_STORAGE_PATH environment variable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
|
PR_Github #35726 [ run ] completed with state
|
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
|
/bot run |
|
PR_Github #35848 [ run ] triggered by Bot. Commit: |
|
PR_Github #35848 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #35868 [ run ] triggered by Bot. Commit: |
|
PR_Github #35868 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #35875 [ run ] triggered by Bot. Commit: |
This reverts commit 3bc17e1. Signed-off-by: Zhenhua Wang <zhenhuaw@nvidia.com>
|
PR_Github #35875 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #35882 [ run ] triggered by Bot. Commit: |
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
|
/bot run --disable-fail-fast |
|
PR_Github #35884 [ run ] triggered by Bot. Commit: |
|
PR_Github #35884 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #35910 [ run ] triggered by Bot. Commit: |
|
PR_Github #35910 [ run ] completed with state
|
|
/bot run |
|
PR_Github #35930 [ run ] triggered by Bot. Commit: |
|
PR_Github #35930 [ run ] completed with state
|
|
/bot run |
|
PR_Github #35936 [ run ] triggered by Bot. Commit: |
|
PR_Github #35936 [ run ] completed with state |
…#11462) Signed-off-by: Chang Liu (Enterprise Products) <liuc@nvidia.com> Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com> Signed-off-by: Zhenhua Wang <zhenhuaw@nvidia.com> Co-authored-by: Freddy Qi <junq@nvidia.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Zhenhua Wang <zhenhuaw@nvidia.com> Signed-off-by: peihu-nv <259410613+peihu-nv@users.noreply.github.com>
This PR introduces initial support for vision generation models in TRT-LLM, including the support for Wan 2.1 / Wan 2.2 models, including both Text-to-Video (T2V) and Image-to-Video (I2V) workflows.
To run:
Wan2.1 Text-to-Video Example:
WAN 2.2 Image-to-Video Example:
Co-authored with @o-stoner @QiJune @JunyiXu-nv @yibinl-nvidia @chang-l
Summary by CodeRabbit
Release Notes
New Features
Documentation
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...Provide a user friendly way for developers to interact with a Jenkins server.
Run
/bot [-h|--help]to print this help message.See details below for each supported subcommand.
Details
run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]Launch build/test pipelines. All previously running jobs will be killed.
--reuse-test (optional)pipeline-id(OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.--disable-reuse-test(OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.--disable-fail-fast(OPTIONAL) : Disable fail fast on build/tests/infra failures.--skip-test(OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.--stage-list "A10-PyTorch-1, xxx"(OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.--gpu-type "A30, H100_PCIe"(OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.--test-backend "pytorch, cpp"(OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.--only-multi-gpu-test(OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.--disable-multi-gpu-test(OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.--add-multi-gpu-test(OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.--post-merge(OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx"(OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".--detailed-log(OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.--debug(OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in thestage-listparameter to access the appropriate container environment. Note: Does NOT update GitHub check status.For guidance on mapping tests to stage names, see
docs/source/reference/ci-overview.mdand the
scripts/test_to_stage_mapping.pyhelper.kill
killKill all running builds associated with pull request.
skip
skip --comment COMMENTSkip testing for latest commit on pull request.
--comment "Reason for skipping build/test"is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.reuse-pipeline
reuse-pipelineReuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.