[None][fix] Respect CUDA_LAUNCH_BLOCKING by fixing doCheckError#11261

hnover-nv · 2026-02-04T06:06:54Z

Description

The cpp macro sync_check_cuda_error(stream) is intended to conditionally synchronize the thread to the current stream and then check for any cuda errors, based on user control and build type. Namely, the user can set the environement variable CUDA_LAUNCH_BLOCKING to 0 or 1, and it should be respected. If the variable is unset, then we should sync and check for debug builds but not for other builds. (Additionally, we never sync and check if we're in the midst of capturing a cuda graph).

To avoid constantly looking up the environment variable, the existing code used a thread local variable firstCall to run the lookup once. But a conditional was inverted, so the lookup never ran, and we only followed the build-type behavior. Even if it did run, although the code returns an optional, once it ran it would have always set true or false, losing the "environement variable not set" behavior.

Here we fix all that, simplifying to using a lambda-initialized const static variable instead of the thread local and combining two functions.

Test Coverage

There do not seem to be any existing tests that directly exercise it. There are two test decorators that reference it:

triton_kernels/testing.py compute_sanitizer decorator should exercise it, but either we don't use that decorator or nothing terrible happens when the env var is ignored, either because synchronization isn't needed or those are debug builds where we did synchronize.
Tests using the assert_no_cuda_sync decorator in tests/unittest/utils/util.py will skip if the environement variable is set to true.

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
[ x] Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

coderabbitai · 2026-02-04T06:13:49Z

📝 Walkthrough

Walkthrough

Refactored error checking logic in cudaUtils.h by removing the public isCudaLaunchBlocking() function and replacing it with a static, lazily-initialized doCheckIfNotCapturing flag that caches the CUDA_LAUNCH_BLOCKING environment variable value to eliminate per-call overhead.

Changes

Cohort / File(s)	Summary
Error Checking Refactor `cpp/include/tensorrt_llm/common/cudaUtils.h`	Removed `isCudaLaunchBlocking()` public function. Refactored `doCheckError(cudaStream_t)` to use a static, lazily-initialized `doCheckIfNotCapturing` flag computed once from `CUDA_LAUNCH_BLOCKING` env var instead of checking on each call. Final error checking now combines this cached flag with `isCapturing(stream)` result while preserving debug/non-debug build-type behavior differences.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: fixing doCheckError to respect the CUDA_LAUNCH_BLOCKING environment variable.
Description check	✅ Passed	The PR description is well-structured and follows the template, explaining the bug, solution, test coverage, and checklist.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

cpp/include/tensorrt_llm/common/cudaUtils.h (1)
2-2: ⚠️ Potential issue | 🟡 Minor

Update copyright year to include 2026.

The copyright header shows 2022-2024, but this file is being meaningfully modified in 2026. As per coding guidelines, the copyright header should reflect the year of latest meaningful modification.
- * SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-FileCopyrightText: Copyright (c) 2022-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

🧹 Nitpick comments (1)

cpp/include/tensorrt_llm/common/cudaUtils.h (1)
166-178: Static variable naming should use 's' prefix.

Per coding guidelines, locally visible static variables should use camelCase prefixed with 's'. The variable doCheckIfNotCapturing should be renamed to sDoCheckIfNotCapturing.
📝 Proposed fix
-    static bool const doCheckIfNotCapturing = []()
+    static bool const sDoCheckIfNotCapturing = []()
     {
         char const* env = std::getenv("CUDA_LAUNCH_BLOCKING");
         if (env != nullptr)
         {
             return std::string(env) == "1";
         }
 `#ifndef` NDEBUG
         return true;
 `#else`
         return false;
 `#endif`
     }();
-    return doCheckIfNotCapturing && !isCapturing(stream);
+    return sDoCheckIfNotCapturing && !isCapturing(stream);

hnover-nv · 2026-02-04T16:43:50Z

@coderabbitai apply the suggested copyright and variable naming fixes

coderabbitai · 2026-02-04T16:48:15Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Ensure PR `#11261` is checked out =="
gh pr checkout 11261 -R NVIDIA/TensorRT-LLM >/dev/null

echo "== Basic repo context =="
git rev-parse --show-toplevel
git branch --show-current
git status -s

echo "== List changed files in PR =="
gh pr diff 11261 -R NVIDIA/TensorRT-LLM --name-only

echo "== Save full diff to pr_diff.patch (for reference) =="
gh pr diff 11261 -R NVIDIA/TensorRT-LLM > pr_diff.patch
wc -l pr_diff.patch || true

echo "== Fetch review threads and comments JSON =="
gh pr view 11261 -R NVIDIA/TensorRT-LLM --json reviewThreads,comments > pr_comments.json
jq '{
  reviewThreads: (.reviewThreads // []) | length,
  comments: (.comments // []) | length
}' pr_comments.json

echo "== Extract any suggestions or notes mentioning copyright or variable naming =="
jq -r '
  [
    .reviewThreads[]?.comments[]?.body? // empty,
    .comments[]?.body? // empty
  ]
  | map(select(test("(?i)copyright|variable|naming|style|identifier|Hungarian")))
  | .[]' pr_comments.json || true

echo "== Show the changed file header and key functions (cudaUtils.h) if present =="
if [ -f cpp/include/tensorrt_llm/common/cudaUtils.h ]; then
  echo "--- Header (first 40 lines) ---"
  sed -n '1,40p' cpp/include/tensorrt_llm/common/cudaUtils.h || true

  echo "--- doCheckError occurrences ---"
  rg -n -C3 --type=cpp '\bdoCheckError\s*\(' cpp/include/tensorrt_llm/common/cudaUtils.h || true

  echo "--- CUDA_LAUNCH_BLOCKING occurrences ---"
  rg -n -C3 --type=cpp 'CUDA_LAUNCH_BLOCKING' cpp/include/tensorrt_llm/common/cudaUtils.h || true

  echo "--- static const flag names around doCheckError ---"
  rg -n -C3 --type=cpp 'static\s+const|const\s+static' cpp/include/tensorrt_llm/common/cudaUtils.h || true
else
  echo "File cpp/include/tensorrt_llm/common/cudaUtils.h not found in working tree."
fi

Length of output: 1110

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Fetch reviews and comments with available fields =="
gh pr view 11261 -R NVIDIA/TensorRT-LLM --json reviews,comments,latestReviews > pr_reviews.json

echo "== Extract review comments mentioning copyright, variable, naming, style =="
jq -r '
  [
    (.reviews[]?.body? // empty),
    (.latestReviews[]?.body? // empty),
    (.comments[]?.body? // empty)
  ]
  | map(select(. != null and . != ""))
  | map(select(test("(?i)copyright|variable|naming|style|identifier|Hungarian|doCheck|static|const")))
  | .[]' pr_reviews.json || echo "No matching comments found in JSON"

echo ""
echo "== Read the PR diff to identify changes =="
cat pr_diff.patch

echo ""
echo "== Full content of cudaUtils.h on PR branch =="
cat cpp/include/tensorrt_llm/common/cudaUtils.h

Length of output: 62744

@hnover-nv thanks — applying both suggestions below.

Update SPDX header year to 2026.
Rename the static local flag in doCheckError() to follow the s-prefixed static naming convention.

✏️ Learnings added

Learnt from: hnover-nv
URL: https://github.com/NVIDIA/TensorRT-LLM/pull/11261

Timestamp: 2026-02-04T16:48:11.405Z
Learning: TensorRT-LLM C++ style: Locally scoped static variables (e.g., inside functions) use camelCase with an 's' prefix, e.g., sDoCheckIfNotCapturing in cpp/include/tensorrt_llm/common/cudaUtils.h.

Learnt from: hnover-nv
URL: https://github.com/NVIDIA/TensorRT-LLM/pull/11261

Timestamp: 2026-02-04T16:48:11.405Z
Learning: SPDX header policy: The SPDX-FileCopyrightText year range in C++ headers should include the latest meaningful modification year (e.g., 2022-2026 for edits made in 2026).

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

🧠 Learnings used

Learnt from: CR
Repo: NVIDIA/TensorRT-LLM PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2026-01-06T03:07:15.754Z
Learning: Applies to **/*.{cpp,cc,cxx,h,hpp,hxx,cu,cuh} : Do not use Hungarian notation in C++ except for 'apps hungarian' (e.g., 'nb' to indicate count: `mNbLayers`)

💡 Use @coderabbitai plan to create a plan for code changes

hnover-nv · 2026-02-04T19:01:36Z

/bot run

tensorrt-cicd · 2026-02-04T19:07:36Z

PR_Github #34837 [ run ] triggered by Bot. Commit: 06cefcc

tensorrt-cicd · 2026-02-05T01:28:39Z

PR_Github #34837 [ run ] completed with state SUCCESS. Commit: 06cefcc
/LLM/main/L0_MergeRequest_PR pipeline #26875 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

hnover-nv · 2026-02-05T05:29:06Z

/bot run

tensorrt-cicd · 2026-02-05T05:34:42Z

PR_Github #34910 [ run ] triggered by Bot. Commit: 06cefcc

tensorrt-cicd · 2026-02-05T05:53:02Z

PR_Github #34910 [ run ] completed with state FAILURE. Commit: 06cefcc
/LLM/main/L0_MergeRequest_PR pipeline #26929 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

hnover-nv · 2026-02-05T06:00:37Z

/bot run

tensorrt-cicd · 2026-02-05T06:06:58Z

PR_Github #34919 [ run ] triggered by Bot. Commit: 07debb0

tensorrt-cicd · 2026-02-05T14:58:53Z

PR_Github #34919 [ run ] completed with state SUCCESS. Commit: 07debb0
/LLM/main/L0_MergeRequest_PR pipeline #26936 completed with status: 'SUCCESS'

hnover-nv · 2026-02-05T20:48:39Z

/bot run

tensorrt-cicd · 2026-02-05T20:54:27Z

PR_Github #35009 [ run ] triggered by Bot. Commit: 92c796a

mikeiovine

Thanks for cleaning it up

cpp/include/tensorrt_llm/common/cudaUtils.h

tensorrt-cicd · 2026-02-05T21:48:04Z

PR_Github #35009 [ run ] completed with state FAILURE. Commit: 92c796a
/LLM/main/L0_MergeRequest_PR pipeline #27012 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

hnover-nv · 2026-02-05T21:55:25Z

/bot run

hnover-nv · 2026-02-06T00:24:01Z

/bot run

tensorrt-cicd · 2026-02-06T00:36:01Z

PR_Github #35031 [ run ] triggered by Bot. Commit: 8ffa120

tensorrt-cicd · 2026-02-06T02:51:41Z

PR_Github #35031 [ run ] completed with state ABORTED. Commit: 8ffa120
LLM/main/L0_MergeRequest_PR #27034 (Blue Ocean) completed with status: ABORTED

hnover-nv · 2026-02-06T18:40:05Z

/bot run

tensorrt-cicd · 2026-02-06T18:46:39Z

PR_Github #35142 [ run ] triggered by Bot. Commit: 4b18a76

tensorrt-cicd · 2026-02-06T23:04:32Z

PR_Github #35142 [ run ] completed with state SUCCESS. Commit: 4b18a76
/LLM/main/L0_MergeRequest_PR pipeline #27133 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Signed-off-by: Harris Nover <249353502+hnover-nv@users.noreply.github.com>

hnover-nv · 2026-02-07T16:58:27Z

/bot run

tensorrt-cicd · 2026-02-07T17:04:47Z

PR_Github #35200 [ run ] triggered by Bot. Commit: 6daad3f

tensorrt-cicd · 2026-02-08T05:55:44Z

PR_Github #35200 [ run ] completed with state SUCCESS. Commit: 6daad3f
/LLM/main/L0_MergeRequest_PR pipeline #27188 completed with status: 'SUCCESS'

hnover-nv · 2026-02-08T17:41:17Z

/bot skip --comment "passed last night, accidentally updated branch instead of merging, but don't need to retest"

tensorrt-cicd · 2026-02-08T17:47:48Z

PR_Github #35243 Bot args parsing error: Failed to parse bot args

hnover-nv · 2026-02-08T18:04:06Z

/bot skip --comment "updated to main by accident, already passed"

tensorrt-cicd · 2026-02-08T18:09:53Z

PR_Github #35244 [ skip ] triggered by Bot. Commit: dac963b

tensorrt-cicd · 2026-02-08T18:51:55Z

PR_Github #35244 [ skip ] completed with state SUCCESS. Commit: dac963b
Skipping testing for commit dac963b

…IA#11261) Signed-off-by: Harris Nover <249353502+hnover-nv@users.noreply.github.com> Signed-off-by: Ahmet Inci <ainci@nvidia.com>

coderabbitai bot changed the title ~~@coderabbitai title~~ [refactor] Cache CUDA_LAUNCH_BLOCKING with lazy init Feb 4, 2026

coderabbitai bot reviewed Feb 4, 2026

View reviewed changes

hnover-nv changed the title ~~[refactor] Cache CUDA_LAUNCH_BLOCKING with lazy init~~ [None][fix] Respect CUDA_LAUNCH_BLOCKING by fixing doCheckError Feb 4, 2026

hnover-nv force-pushed the fix_doCheckError branch from c98b4a0 to 06cefcc Compare February 4, 2026 18:57

hnover-nv force-pushed the fix_doCheckError branch from 06cefcc to 07debb0 Compare February 5, 2026 06:00

hnover-nv force-pushed the fix_doCheckError branch from 07debb0 to 92c796a Compare February 5, 2026 20:48

mikeiovine approved these changes Feb 5, 2026

View reviewed changes

cpp/include/tensorrt_llm/common/cudaUtils.h Outdated Show resolved Hide resolved

hnover-nv force-pushed the fix_doCheckError branch from 92c796a to 8ffa120 Compare February 5, 2026 21:53

hnover-nv force-pushed the fix_doCheckError branch from 8ffa120 to 4b18a76 Compare February 6, 2026 18:39

Fix doCheckError.

6daad3f

Signed-off-by: Harris Nover <249353502+hnover-nv@users.noreply.github.com>

hnover-nv force-pushed the fix_doCheckError branch from 4b18a76 to 6daad3f Compare February 7, 2026 16:58

Merge branch 'main' into fix_doCheckError

dac963b

mikeiovine merged commit 100bfdc into NVIDIA:main Feb 9, 2026
5 checks passed

hnover-nv deleted the fix_doCheckError branch February 9, 2026 19:34

Comments

Conversation

hnover-nv commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Coverage

PR Checklist

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

coderabbitai bot commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

hnover-nv commented Feb 4, 2026

Uh oh!

coderabbitai bot commented Feb 4, 2026

Uh oh!

hnover-nv commented Feb 4, 2026

Uh oh!

tensorrt-cicd commented Feb 4, 2026

Uh oh!

tensorrt-cicd commented Feb 5, 2026

Uh oh!

hnover-nv commented Feb 5, 2026

Uh oh!

tensorrt-cicd commented Feb 5, 2026

Uh oh!

tensorrt-cicd commented Feb 5, 2026

Uh oh!

hnover-nv commented Feb 5, 2026

Uh oh!

tensorrt-cicd commented Feb 5, 2026

Uh oh!

tensorrt-cicd commented Feb 5, 2026

Uh oh!

hnover-nv commented Feb 5, 2026

Uh oh!

tensorrt-cicd commented Feb 5, 2026

Uh oh!

mikeiovine left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tensorrt-cicd commented Feb 5, 2026

Uh oh!

hnover-nv commented Feb 5, 2026

Uh oh!

hnover-nv commented Feb 6, 2026

Uh oh!

tensorrt-cicd commented Feb 6, 2026

Uh oh!

tensorrt-cicd commented Feb 6, 2026

Uh oh!

hnover-nv commented Feb 6, 2026

Uh oh!

tensorrt-cicd commented Feb 6, 2026

Uh oh!

tensorrt-cicd commented Feb 6, 2026

Uh oh!

hnover-nv commented Feb 7, 2026

Uh oh!

tensorrt-cicd commented Feb 7, 2026

Uh oh!

tensorrt-cicd commented Feb 8, 2026

Uh oh!

hnover-nv commented Feb 8, 2026

Uh oh!

tensorrt-cicd commented Feb 8, 2026

Uh oh!

hnover-nv commented Feb 8, 2026

Uh oh!

tensorrt-cicd commented Feb 8, 2026

hnover-nv commented Feb 4, 2026 •

edited

Loading

coderabbitai bot commented Feb 4, 2026 •

edited

Loading