[None][feat] Add performance alignment to layer-wise benchmarks#11018
[None][feat] Add performance alignment to layer-wise benchmarks#11018yuantailing merged 21 commits intoNVIDIA:mainfrom
Conversation
📝 WalkthroughWalkthroughThis patch introduces a comprehensive layer-wise calibration and benchmarking system. It adds a Calibrator class with COLLECT/MARK/REPLAY modes, integrates it into MOE routing and PyExecutor, provides benchmarking utilities for trace parsing and kernel correlation, and delivers workflow scripts for end-to-end performance alignment analysis. Changes
Sequence Diagram(s)sequenceDiagram
actor App as Application
participant Cal as Calibrator
participant MOE as MOE Module
participant GPU as GPU Buffer
participant File as File Storage
App->>Cal: init(mode=COLLECT, ...)
activate Cal
Cal->>GPU: allocate fixed buffers
deactivate Cal
App->>Cal: start()
activate Cal
Cal->>Cal: reset state
deactivate Cal
loop Per Iteration
App->>Cal: pre_step(it)
App->>MOE: forward()
activate MOE
MOE->>Cal: maybe_collect_or_replay_slots()
activate Cal
Cal->>GPU: record slot data
deactivate Cal
MOE-->>App: token_selected_slots
deactivate MOE
App->>Cal: post_step(it)
activate Cal
Cal->>GPU: copy iteration metadata
deactivate Cal
end
App->>Cal: stop()
activate Cal
Cal->>File: save collected data (all ranks)
deactivate Cal
sequenceDiagram
actor App as Application
participant Cal as Calibrator
participant File as File Storage
participant MOE as MOE Module
participant GPU as GPU Buffer
App->>File: read calibration file
App->>Cal: init(mode=REPLAY, file_path=..., layer_indices=...)
activate Cal
Cal->>Cal: load and validate replay data across ranks
Cal->>GPU: allocate graph-compatible buffers
deactivate Cal
App->>Cal: start()
activate Cal
Cal->>Cal: initialize replay state
deactivate Cal
loop Per Iteration
App->>Cal: pre_step(it)
activate Cal
Cal->>Cal: prepare replay data for iteration
deactivate Cal
App->>MOE: forward()
activate MOE
MOE->>Cal: maybe_collect_or_replay_slots()
activate Cal
Cal->>GPU: load and apply replay slot data
Cal-->>MOE: replayed token_selected_slots
deactivate Cal
MOE-->>App: result
deactivate MOE
App->>Cal: post_step(it)
activate Cal
Cal->>Cal: record actual metadata for verification
deactivate Cal
end
App->>Cal: stop()
activate Cal
Cal->>Cal: verify actual vs. recorded metadata
deactivate Cal
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 20
🤖 Fix all issues with AI agents
In `@examples/layer_wise_benchmarks/correlation_template.html`:
- Line 47: The current lookup for referenceData uses
s.series.search('reference') == 0 which is unclear and brittle; update the
predicate to use s.series.startsWith('reference') and guard against undefined
series (e.g., s.series && s.series.startsWith('reference')) so rawData.find(...)
reliably finds entries whose series begins with "reference"; update the
referenceData declaration accordingly.
In `@examples/layer_wise_benchmarks/correlation.py`:
- Around line 1-6: Add the missing NVIDIA copyright header to the top of
correlation.py: insert the project's standard multi-line NVIDIA copyright notice
with the year of latest meaningful modification as used across the repo
(matching other source files), placed above all imports and module code (before
the existing imports like kernel_short_name and shortest_common_supersequence)
so the file now includes the required header.
In `@examples/layer_wise_benchmarks/middleware/mpi_env_from_ompi`:
- Around line 3-8: The script uses set -u so missing OMPI_* vars produce an
unhelpful "unbound variable" error; add explicit validation before exporting
WORLD_SIZE, RANK, LOCAL_RANK, and NODE_RANK by checking OMPI_COMM_WORLD_SIZE,
OMPI_COMM_WORLD_RANK, OMPI_COMM_WORLD_LOCAL_RANK, and OMPI_COMM_WORLD_NODE_RANK
and printing clear, actionable error messages and exiting non‑zero if any are
unset or empty (or alternatively provide explicit defaults if intended); update
the export block (the four export lines) to run only after these checks so
failures are deterministic and readable when not run under Open MPI.
In `@examples/layer_wise_benchmarks/parse_e2e.py`:
- Around line 25-27: The CLI argument "--target-gen-reqs" is parsed as a string
which causes mismatches when compared to integer values (leading to empty
eager_iters); update the call to parser.add_argument("--target-gen-reqs") to
parse integers by adding type=int so comparisons with parsed ints succeed—modify
the parser.add_argument call for "--target-gen-reqs" (and any related usage that
assumes an int) to use type=int.
- Around line 1-7: Add the standard NVIDIA TensorRT-LLM copyright header
(updated year 2026) at the very top of the file parse_e2e.py, placing it before
any imports; ensure the header matches the project's canonical NVIDIA header
format and includes the copyright notice, license text and modification year
2026.
- Around line 191-195: The range check uses the leaked loop variable
eager_layers instead of the intended list for iteration 0; update the
comparisons in the block that builds eager_per_layer_kernels to reference
per_layer_eager_layers[0] explicitly (e.g., replace
eager_layers[eager_layers_idx][1] with
per_layer_eager_layers[0][eager_layers_idx][1]) so eager_layers_idx and
eager_kernel are validated against the correct layer list; keep existing bisect
on per_layer_eager_layers[0] and ensure all accesses use that same explicit
list.
- Around line 206-212: The code uses bisect.bisect(..., key=...) which is
unsupported before Python 3.10; replace that call by precomputing a list of
start indices from super_per_layer_kernels (e.g., starts = [t[0] for t in
super_per_layer_kernels]) and then use bisect.bisect(starts, j) - 1 to compute
layer_idx; update the logic that references
super_per_layer_kernels[layer_idx][1] and appends to
graph_per_layer_kernels[layer_idx] accordingly so behavior remains identical.
In `@examples/layer_wise_benchmarks/parser_utils.py`:
- Around line 1-6: Add the standard NVIDIA copyright/header to the top of this
new module (parser_utils.py) with the current modification year 2026; place the
header comment block before the first import and ensure it matches the project's
standard TensorRT-LLM header format (including copyright notice, license
reference, and any required SPDX or contribution lines) so that the file begins
with the exact required NVIDIA header followed by the existing imports (re,
subprocess, sys, numpy).
In `@examples/layer_wise_benchmarks/README.md`:
- Around line 214-253: Update the MARK-mode profile filename and fix the
numbered list ordering in the README: when instructing to run Step 1 again in
MARK mode, change the recommended nsys output argument from "-o
profiles/report_e2e_collect_rank%q{RANK}.nsys-rep" to a MARK-specific name such
as "-o profiles/report_mark_rank%q{RANK}.nsys-rep" to avoid confusion, and
renumber the “Here are explanations of every argument” list sequentially (1
through 8) so entries for NP, --load-format, --layer-indices, --batch-size,
--seq-len-q, --seq-len-kv-cache, --replay-file-path, and
--replay-start/--replay-stop appear in logical order and match the CLI example;
references to config fields (cuda_graph_config and
layer_wise_benchmarks_config.calibration_mode) remain unchanged.
In `@tensorrt_llm/_torch/modules/fused_moe/configurable_moe.py`:
- Line 42: The calibrator is being invoked unconditionally in configurable_moe
(call to get_calibrator) even when routing is fused and token_selected_slots is
None; change the top-level import to keep the module namespace (import
tensorrt_llm.tools.layer_wise_benchmarks as layer_wise_benchmarks) and then
guard the calibrator call so it only runs when token_selected_slots is not None
(e.g., if token_selected_slots is not None: calibrator =
layer_wise_benchmarks.get_calibrator(...)); apply the same guard to the other
nearby calls around the existing get_calibrator usage (the block spanning the
current lines ~627–632).
In `@tensorrt_llm/_torch/pyexecutor/py_executor_creator.py`:
- Line 27: Update LayerwiseBenchmarksConfig to include a new field
replay_verify_metadata: Optional[bool] so the config can carry the flag; change
the import in py_executor_creator.py to preserve module namespace by using
import tensorrt_llm.tools.layer_wise_benchmarks as layer_wise_benchmarks
(instead of importing get_calibrator directly); then locate the call to
calibrator.init(...) in py_executor_creator.py and add the
replay_verify_metadata argument (pass the value from LayerwiseBenchmarksConfig)
so REPLAY mode no longer raises ValueError("missing replay_verify_metadata").
In `@tensorrt_llm/_torch/pyexecutor/py_executor.py`:
- Around line 38-40: Replace the direct submodule import so the module namespace
is preserved: change the import statement in py_executor.py from "from
tensorrt_llm.tools.layer_wise_benchmarks import get_calibrator" to "from
tensorrt_llm.tools import layer_wise_benchmarks", then update all calls to
get_calibrator (e.g., the invocation around line 740) to use
layer_wise_benchmarks.get_calibrator(); apply the same import-and-call pattern
to other files that currently import get_calibrator directly to comply with the
coding guideline.
In `@tensorrt_llm/llmapi/llm_args.py`:
- Around line 847-871: The Literal type for calibration_mode is missing
"REPLAY", causing validate_calibration_file_path (in LayerwiseBenchmarksConfig)
to reject REPLAY; update the calibration_mode field definition to include
"REPLAY" as an allowed Literal value (so calibration_mode can be set to "NONE",
"MARK", "COLLECT", or "REPLAY") and ensure the model_validator method
validate_calibration_file_path continues to check self.calibration_mode for
["COLLECT", "REPLAY"] as before.
In `@tensorrt_llm/tools/layer_wise_benchmarks/calibrator.py`:
- Around line 1-11: This new source file (calibrator.py) is missing the required
NVIDIA TensorRT-LLM copyright header; add the standard NVIDIA header block at
the top of the file with the latest modification year 2026, ensuring it matches
the project's header style and includes the copyright notice, license statement,
and any required contributor/ownership lines before the existing imports
(base64, functools, json, zlib, etc.) so the file complies with repository
coding guidelines.
- Around line 101-113: Before setting self.mode and calling _init_collect_mode
or _init_replay_mode, validate required inputs and raise clear ValueError
messages: ensure dist is provided when mode is COLLECT, ensure mapping and
layer_indices (and replay_verify_metadata) are provided when mode is REPLAY.
Update the block around Mode[mode] assignment to check mapping, dist, and
layer_indices early (referencing Mode, self.mode, mapping, dist, layer_indices,
replay_verify_metadata) and raise explicit errors before calling
_init_collect_mode or _init_replay_mode so missing inputs fail fast with clear
messages.
- Around line 672-698: The method get_replay_iteration_range currently computes
start_iter and stop_iter as the first and last element of sorted self._replay_db
keys but returns stop_iter inclusive while the docstring promises an exclusive
upper bound; change the return to return start_iter, stop_iter + 1 so callers
receive [start_iter, stop_iter) as documented, and keep the contiguous-range
verification using local_iterations != list(range(start_iter, stop_iter + 1))
unchanged (it still validates contiguity against the inclusive last iter).
Ensure the docstring remains the same and update any callers only if they relied
on the inclusive behavior.
- Around line 540-569: post_step() can overrun the fixed-size buffers
_collected_metadata_idx, _collected_slots_cpu, and
_collected_actual_metadata_idx because record_idx is computed from dynamic list
lengths without bounds checks; add explicit index bounds checks before accessing
these arrays in both Mode.COLLECT and Mode.REPLAY branches (i.e., before using
record_idx with _collected_metadata_idx.copy_, _collected_slots_cpu[record_idx],
and _collected_actual_metadata_idx.copy_), and raise a clear RuntimeError (or
ValueError) that includes the offending record_idx, the buffer name, and the
buffer capacity when record_idx >= len(buffer); keep behavior otherwise
unchanged.
- Around line 79-88: The init method uses Python 3.9+ style annotation
`list[int]` which breaks on 3.8 and also doesn't allow None as noted in the
docstring; update the signature of initializer `init` to use typing.Optional and
typing.List (e.g., layer_indices: Optional[List[int]]), ensure Optional and List
are imported from typing, and if relevant set the default for layer_indices to
None or handle None in the method body (since mode "COLLECT" may pass None).
In `@tensorrt_llm/tools/layer_wise_benchmarks/runner.py`:
- Around line 446-448: The code accesses model.model.layers[layer_indices[0]] to
set residual_fusion without ensuring layer_indices is non-empty, which can raise
IndexError; modify the logic around the residual_fusion assignment (before the
loop that iterates over layer_indices) to first check if layer_indices is
truthy/has length > 0 and only then access layer_indices[0], otherwise set
residual_fusion to a safe default (e.g., False) or handle the empty case
appropriately so the subsequent loop over layer_indices is safe; update any
downstream assumptions in the loop that use residual_fusion or expect at least
one layer.
- Around line 444-460: The constructor sets up a local layer_indices used by the
nested forward() but never assigns it to the instance, causing
replace_routing_method_ctx to fail when accessing self.layer_indices; fix by
assigning the incoming layer_indices parameter to self.layer_indices in the same
initializer (e.g., in __init__ assign self.layer_indices = layer_indices) so
both the closure-based forward and the method replace_routing_method_ctx
reference the same instance attribute.
🧹 Nitpick comments (9)
tensorrt_llm/_torch/modules/fused_moe/fused_moe_wide_ep.py (1)
11-11: Prefer module import to preserve namespace.This aligns with the Python import guideline and keeps the call site explicit.
As per coding guidelines, keep the module namespace on import.♻️ Proposed refactor
-from tensorrt_llm.tools.layer_wise_benchmarks import get_calibrator +import tensorrt_llm.tools.layer_wise_benchmarks as layer_wise_benchmarks @@ - token_selected_slots = get_calibrator().maybe_collect_or_replay_slots( + token_selected_slots = layer_wise_benchmarks.get_calibrator().maybe_collect_or_replay_slots( self.num_slots, token_selected_slots)Also applies to: 443-444
tensorrt_llm/_torch/modules/fused_moe/fused_moe_cutlass.py (1)
11-11: Prefer module import to preserve namespace.This follows the Python import guideline and keeps the call site explicit.
As per coding guidelines, keep the module namespace on import.♻️ Proposed refactor
-from tensorrt_llm.tools.layer_wise_benchmarks import get_calibrator +import tensorrt_llm.tools.layer_wise_benchmarks as layer_wise_benchmarks @@ - token_selected_slots = get_calibrator().maybe_collect_or_replay_slots( + token_selected_slots = layer_wise_benchmarks.get_calibrator().maybe_collect_or_replay_slots( self.num_slots, token_selected_slots)Also applies to: 538-539
tensorrt_llm/llmapi/llm_args.py (1)
865-871: Ruff TRY003: avoid long inline exception message.Ruff flags long messages inline in
raise. Consider moving it to a constant/class var for readability.♻️ Suggested tweak
class LayerwiseBenchmarksConfig(StrictBaseModel): """ Configuration for layer-wise benchmarks calibration. """ + CALIBRATION_FILE_REQUIRED_MSG: ClassVar[str] = ( + "calibration_file_path must be set when calibration_mode is COLLECT or REPLAY." + ) @@ def validate_calibration_file_path(self) -> 'LayerwiseBenchmarksConfig': if self.calibration_mode in ["COLLECT", "REPLAY" ] and not self.calibration_file_path: - raise ValueError( - f"Expect calibration_file_path not to be empty when work on {self.calibration_mode} mode" - ) + raise ValueError(self.CALIBRATION_FILE_REQUIRED_MSG) return selfexamples/layer_wise_benchmarks/parse.py (2)
353-354: Enable Jinja2 autoescape for XSS mitigation.While this generates local HTML files, enabling autoescape is a security best practice. The static analysis tool flagged this (S701).
Proposed fix
loader = jinja2.FileSystemLoader(Path(__file__).parent) -template = jinja2.Environment(loader=loader).get_template("breakdown_template.html") +template = jinja2.Environment(loader=loader, autoescape=True).get_template("breakdown_template.html")
378-399: Consider addingstrict=Truetozip()for safety.The static analysis tool flagged
zip()withoutstrict=(B905). Whileproblem_setandkernelsshould have the same length by construction, addingstrict=Trueprovides a runtime check that catches mismatches early.Proposed fix
correlation = [] -for problem, runs in zip(problem_set, kernels): +for problem, runs in zip(problem_set, kernels, strict=True): timeline = []examples/layer_wise_benchmarks/sample_performance_alignment.sh (1)
121-130: Consider using parallel execution for independent parsing tasks.The
xargs -I%runs sequentially. For improved performance with multiple ranks, consider adding-Pfor parallel execution:Proposed optimization
-seq 0 $((NP - 1)) | xargs -I% python3 parse_e2e.py \ +seq 0 $((NP - 1)) | xargs -P$NP -I% python3 parse_e2e.py \ --eager-trace "$PROFILE_DIR/report_e2e_mark_rank%.nsys-rep" \ --graph-trace "$PROFILE_DIR/report_e2e_collect_rank%.nsys-rep" \ --layer-indices 5,6,7 \ --warmup-times 5 \ -o "$PROFILE_DIR/report_e2e_collect_rank%.json" -seq 0 $((NP - 1)) | xargs -I% python3 parse.py \ +seq 0 $((NP - 1)) | xargs -P$NP -I% python3 parse.py \ --profile-dir "$PROFILE_DIR" \ --world-size $NP \ --rank %examples/layer_wise_benchmarks/correlation.py (2)
93-96: Enable Jinja2 autoescape for XSS mitigation.The static analysis tool flagged this (S701). While generating local HTML files, enabling autoescape is a security best practice.
Proposed fix
loader = jinja2.FileSystemLoader(Path(__file__).parent) -template = jinja2.Environment(loader=loader).get_template("correlation_template.html") +template = jinja2.Environment(loader=loader, autoescape=True).get_template("correlation_template.html") with open(args.output, "w") as f: f.write(template.render(rawData=data))
86-88: Consider addingstrict=Truetozip()for safety.The static analysis tool flagged
zip()withoutstrict=(B905). Addingstrict=Trueprovides a runtime check thatx_tgtandtgt_data["timeline"]have matching lengths.Proposed fix
"duration": o["duration"] / 1000, "end": o["end"] / 1000, } - for x, o in zip(x_tgt, tgt_data["timeline"]) + for x, o in zip(x_tgt, tgt_data["timeline"], strict=True) ],examples/layer_wise_benchmarks/parse_e2e.py (1)
10-15: Keepparser_utilsnamespace in imports.The guidelines require preserving the module namespace instead of importing symbols directly. Please switch to a module import and update call sites accordingly. As per coding guidelines, keep module namespaces in imports.
♻️ Suggested change
-from parser_utils import ( - kernel_short_name, - lazy_convert_sqlite, - shortest_common_supersequence, - warned_names, -) +import parser_utilsThen update usages to
parser_utils.kernel_short_name,parser_utils.lazy_convert_sqlite,parser_utils.shortest_common_supersequence, andparser_utils.warned_names.
3834e50 to
bfceb88
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #33687 [ run ] triggered by Bot. Commit: |
|
/bot run --disable-fail-fast |
|
PR_Github #33689 [ run ] triggered by Bot. Commit: |
|
/bot run --disable-fail-fast |
|
PR_Github #33727 [ run ] triggered by Bot. Commit: |
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
de6bf6f to
bd13002
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #33731 [ run ] triggered by Bot. Commit: |
|
PR_Github #33731 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
Superjomn
left a comment
There was a problem hiding this comment.
LGTM on the llmapi changes.
|
PR_Github #33800 [ run ] triggered by Bot. Commit: |
|
PR_Github #33800 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #33831 [ run ] triggered by Bot. Commit: |
|
/bot run --disable-fail-fast |
|
PR_Github #33845 [ run ] triggered by Bot. Commit: |
|
PR_Github #33845 [ run ] completed with state |
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
|
/bot skip --comment "only docs changes" |
|
PR_Github #33980 [ skip ] triggered by Bot. Commit: |
|
PR_Github #33980 [ skip ] completed with state |
Summary by CodeRabbit
Release Notes
New Features
Documentation
Tests
✏️ Tip: You can customize this high-level summary in your review settings.
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...Provide a user friendly way for developers to interact with a Jenkins server.
Run
/bot [-h|--help]to print this help message.See details below for each supported subcommand.
Details
run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]Launch build/test pipelines. All previously running jobs will be killed.
--reuse-test (optional)pipeline-id(OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.--disable-reuse-test(OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.--disable-fail-fast(OPTIONAL) : Disable fail fast on build/tests/infra failures.--skip-test(OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.--stage-list "A10-PyTorch-1, xxx"(OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.--gpu-type "A30, H100_PCIe"(OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.--test-backend "pytorch, cpp"(OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.--only-multi-gpu-test(OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.--disable-multi-gpu-test(OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.--add-multi-gpu-test(OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.--post-merge(OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx"(OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".--detailed-log(OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.--debug(OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in thestage-listparameter to access the appropriate container environment. Note: Does NOT update GitHub check status.For guidance on mapping tests to stage names, see
docs/source/reference/ci-overview.mdand the
scripts/test_to_stage_mapping.pyhelper.kill
killKill all running builds associated with pull request.
skip
skip --comment COMMENTSkip testing for latest commit on pull request.
--comment "Reason for skipping build/test"is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.reuse-pipeline
reuse-pipelineReuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.