Skip to content

[None][fix] glm engine build dtype#11246

Merged
litaotju merged 1 commit intoNVIDIA:mainfrom
mandroid6:patch-1
Feb 12, 2026
Merged

[None][fix] glm engine build dtype#11246
litaotju merged 1 commit intoNVIDIA:mainfrom
mandroid6:patch-1

Conversation

@mandroid6
Copy link
Contributor

@mandroid6 mandroid6 commented Feb 4, 2026

Currently GLM example uses auto for checkpoint_conversion step which results in a bfloat16 output dtype while the trtllm-build step assumed gemm is float16.

This change updates the --gemm_plugin to use bfloat16

Summary by CodeRabbit

  • Documentation
    • Updated example documentation to reflect the latest GEMM plugin data type configuration.

Currently GLM example uses `auto` for checkpoint_conversion step which results in a bfloat16 output dtype while the `trtllm-build` step assumed gemm is float16.

This change updates the `--gemm_plugin` to use bfloat16

Signed-off-by: Mandar Deshpande <razzormandar@gmail.com>
@mandroid6 mandroid6 requested review from a team as code owners February 4, 2026 00:11
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 4, 2026

📝 Walkthrough

Walkthrough

Documentation example updated in a GLM-4-9B README file. The GEMM plugin data type parameter in a trtllm-build command example was changed from float16 to bfloat16.

Changes

Cohort / File(s) Summary
Documentation Update
examples/models/core/glm-4-9b/README.md
Updated GEMM plugin data type parameter from float16 to bfloat16 in a command example.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description check ✅ Passed The description explains the issue (dtype mismatch between checkpoint_conversion and trtllm-build) and the solution (updating --gemm_plugin to bfloat16), but is missing the required Description section structure and Test Coverage section from the template.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Title check ✅ Passed The title '[fix] glm engine build dtype' directly addresses the main change: fixing a dtype mismatch in the GLM engine build by updating GEMM plugin from float16 to bfloat16.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
examples/models/core/glm-4-9b/README.md (2)

217-232: ⚠️ Potential issue | 🟠 Major

Potential dtype mismatch in smooth quantization example.

Similar to the weight-only quantization case, the checkpoint conversion at lines 223-227 does not specify --dtype, which likely produces bfloat16 output. However, line 231 still uses --gemm_plugin float16, creating the same mismatch that this PR addresses for the base example.

Consider updating line 231 to use --gemm_plugin bfloat16, or explicitly specify --dtype float16 in the conversion command at line 223.


199-208: ⚠️ Potential issue | 🟠 Major

Fix dtype mismatch in int8 weight-only quantization example.

The checkpoint conversion at lines 200-203 defaults to --dtype auto, which produces bfloat16 output for GLM-4-9B (as fixed in commit 21ddab7). However, line 207 still specifies --gemm_plugin float16, creating a dtype mismatch identical to the one already corrected in the fp16 example.

Update line 207 to use --gemm_plugin bfloat16:

Diff
 # glm_4_9b: single-gpu engine with int8 weight only quantization, GPT Attention plugin, Gemm plugin
 trtllm-build --checkpoint_dir trt_ckpt/glm_4_9b/int8_wo/1-gpu \
-        --gemm_plugin float16 \
+        --gemm_plugin bfloat16 \
         --output_dir trt_engines/glm_4_9b/int8_wo/1-gpu

@svc-trtllm-gh-bot svc-trtllm-gh-bot added the Community want to contribute PRs initiated from Community label Feb 4, 2026
@karljang karljang changed the title [fix] glm engine build dtype [None][fix] glm engine build dtype Feb 10, 2026
@karljang
Copy link
Collaborator

/bot skip --comment "doc only change"

@karljang
Copy link
Collaborator

Thank you for your contribution! I’ve updated the title to reflect a failed check.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35458 [ skip ] triggered by Bot. Commit: 21ddab7

@tensorrt-cicd
Copy link
Collaborator

PR_Github #35458 [ skip ] completed with state SUCCESS. Commit: 21ddab7
Skipping testing for commit 21ddab7

Copy link
Collaborator

@litaotju litaotju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! ✅

Simple documentation fix - correcting dtype in example command from float16 to bfloat16.

Trivial change - auto-approved:

  • Documentation only
  • 1 line changed
  • CI passing

Thanks for the contribution!

@litaotju litaotju merged commit 936220e into NVIDIA:main Feb 12, 2026
7 of 9 checks passed
ekou24 pushed a commit to ekou24/TensorRT-LLM that referenced this pull request Feb 16, 2026
Signed-off-by: Mandar Deshpande <razzormandar@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Community want to contribute PRs initiated from Community

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants