[None][fix] glm engine build dtype#11246

mandroid6 · 2026-02-04T00:11:29Z

Currently GLM example uses auto for checkpoint_conversion step which results in a bfloat16 output dtype while the trtllm-build step assumed gemm is float16.

This change updates the --gemm_plugin to use bfloat16

Summary by CodeRabbit

Documentation
- Updated example documentation to reflect the latest GEMM plugin data type configuration.

Currently GLM example uses `auto` for checkpoint_conversion step which results in a bfloat16 output dtype while the `trtllm-build` step assumed gemm is float16. This change updates the `--gemm_plugin` to use bfloat16 Signed-off-by: Mandar Deshpande <razzormandar@gmail.com>

coderabbitai · 2026-02-04T00:14:56Z

📝 Walkthrough

Walkthrough

Documentation example updated in a GLM-4-9B README file. The GEMM plugin data type parameter in a trtllm-build command example was changed from float16 to bfloat16.

Changes

Cohort / File(s)	Summary
Documentation Update `examples/models/core/glm-4-9b/README.md`	Updated GEMM plugin data type parameter from float16 to bfloat16 in a command example.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description check	✅ Passed	The description explains the issue (dtype mismatch between checkpoint_conversion and trtllm-build) and the solution (updating --gemm_plugin to bfloat16), but is missing the required Description section structure and Test Coverage section from the template.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Title check	✅ Passed	The title '[fix] glm engine build dtype' directly addresses the main change: fixing a dtype mismatch in the GLM engine build by updating GEMM plugin from float16 to bfloat16.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

examples/models/core/glm-4-9b/README.md (2)
217-232: ⚠️ Potential issue | 🟠 Major

Potential dtype mismatch in smooth quantization example.

Similar to the weight-only quantization case, the checkpoint conversion at lines 223-227 does not specify --dtype, which likely produces bfloat16 output. However, line 231 still uses --gemm_plugin float16, creating the same mismatch that this PR addresses for the base example.

Consider updating line 231 to use --gemm_plugin bfloat16, or explicitly specify --dtype float16 in the conversion command at line 223.

199-208: ⚠️ Potential issue | 🟠 Major

Fix dtype mismatch in int8 weight-only quantization example.

The checkpoint conversion at lines 200-203 defaults to --dtype auto, which produces bfloat16 output for GLM-4-9B (as fixed in commit 21ddab7). However, line 207 still specifies --gemm_plugin float16, creating a dtype mismatch identical to the one already corrected in the fp16 example.

Update line 207 to use --gemm_plugin bfloat16:
Diff
 # glm_4_9b: single-gpu engine with int8 weight only quantization, GPT Attention plugin, Gemm plugin
 trtllm-build --checkpoint_dir trt_ckpt/glm_4_9b/int8_wo/1-gpu \
-        --gemm_plugin float16 \
+        --gemm_plugin bfloat16 \
         --output_dir trt_engines/glm_4_9b/int8_wo/1-gpu

karljang · 2026-02-10T07:31:27Z

/bot skip --comment "doc only change"

karljang · 2026-02-10T07:32:20Z

Thank you for your contribution! I’ve updated the title to reflect a failed check.

tensorrt-cicd · 2026-02-10T07:37:20Z

PR_Github #35458 [ skip ] triggered by Bot. Commit: 21ddab7

tensorrt-cicd · 2026-02-10T08:00:53Z

PR_Github #35458 [ skip ] completed with state SUCCESS. Commit: 21ddab7
Skipping testing for commit 21ddab7

litaotju

LGTM! ✅

Simple documentation fix - correcting dtype in example command from float16 to bfloat16.

Trivial change - auto-approved:

Documentation only
1 line changed
CI passing

Thanks for the contribution!

Signed-off-by: Mandar Deshpande <razzormandar@gmail.com>

mandroid6 requested review from a team as code owners February 4, 2026 00:11

mandroid6 requested review from QiJune, brb-nv and laikhtewari February 4, 2026 00:11

coderabbitai bot reviewed Feb 4, 2026

View reviewed changes

svc-trtllm-gh-bot added the Community want to contribute PRs initiated from Community label Feb 4, 2026

karljang changed the title ~~[fix] glm engine build dtype~~ [None][fix] glm engine build dtype Feb 10, 2026

litaotju approved these changes Feb 12, 2026

View reviewed changes

litaotju merged commit 936220e into NVIDIA:main Feb 12, 2026
7 of 9 checks passed

ekou24 pushed a commit to ekou24/TensorRT-LLM that referenced this pull request Feb 16, 2026

[None][fix] glm engine build dtype (NVIDIA#11246)

429841a

Signed-off-by: Mandar Deshpande <razzormandar@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[None][fix] glm engine build dtype#11246