[None][fix] glm engine build dtype#11246
Conversation
Currently GLM example uses `auto` for checkpoint_conversion step which results in a bfloat16 output dtype while the `trtllm-build` step assumed gemm is float16. This change updates the `--gemm_plugin` to use bfloat16 Signed-off-by: Mandar Deshpande <razzormandar@gmail.com>
📝 WalkthroughWalkthroughDocumentation example updated in a GLM-4-9B README file. The GEMM plugin data type parameter in a trtllm-build command example was changed from float16 to bfloat16. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
examples/models/core/glm-4-9b/README.md (2)
217-232:⚠️ Potential issue | 🟠 MajorPotential dtype mismatch in smooth quantization example.
Similar to the weight-only quantization case, the checkpoint conversion at lines 223-227 does not specify
--dtype, which likely produces bfloat16 output. However, line 231 still uses--gemm_plugin float16, creating the same mismatch that this PR addresses for the base example.Consider updating line 231 to use
--gemm_plugin bfloat16, or explicitly specify--dtype float16in the conversion command at line 223.
199-208:⚠️ Potential issue | 🟠 MajorFix dtype mismatch in int8 weight-only quantization example.
The checkpoint conversion at lines 200-203 defaults to
--dtype auto, which produces bfloat16 output for GLM-4-9B (as fixed in commit 21ddab7). However, line 207 still specifies--gemm_plugin float16, creating a dtype mismatch identical to the one already corrected in the fp16 example.Update line 207 to use
--gemm_plugin bfloat16:Diff
# glm_4_9b: single-gpu engine with int8 weight only quantization, GPT Attention plugin, Gemm plugin trtllm-build --checkpoint_dir trt_ckpt/glm_4_9b/int8_wo/1-gpu \ - --gemm_plugin float16 \ + --gemm_plugin bfloat16 \ --output_dir trt_engines/glm_4_9b/int8_wo/1-gpu
|
/bot skip --comment "doc only change" |
|
Thank you for your contribution! I’ve updated the title to reflect a failed check. |
|
PR_Github #35458 [ skip ] triggered by Bot. Commit: |
|
PR_Github #35458 [ skip ] completed with state |
litaotju
left a comment
There was a problem hiding this comment.
LGTM! ✅
Simple documentation fix - correcting dtype in example command from float16 to bfloat16.
Trivial change - auto-approved:
- Documentation only
- 1 line changed
- CI passing
Thanks for the contribution!
Signed-off-by: Mandar Deshpande <razzormandar@gmail.com>
Currently GLM example uses
autofor checkpoint_conversion step which results in a bfloat16 output dtype while thetrtllm-buildstep assumed gemm is float16.This change updates the
--gemm_pluginto use bfloat16Summary by CodeRabbit