-
Notifications
You must be signed in to change notification settings - Fork 14k
convert: support Mistral 3 Large MoE #17730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
So far so good with this, in a couple hours will be able to test generation |
|
seems to work and produce coherent results! |
|
This PR still needs to be clean up before it is ready for review 😅 |
| # remap hparams from Mistral MoE format to DeepseekV2 format | ||
| # we do this way to be able to reuse DeepseekV2Model set_gguf_parameters logic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhat ugly but an acceptable trade-off.
|
@ngxson Thank you so much for this. I've tried your Q4_K_M - seems working just fine. Is there any other setting or change needed for the conversion? |
|
It disappeared?? 👀 I can re-upload if necessary I guess .. Only difference is using |
|
Yeah I've used the mistral format. Than I guess I have a corrupted bf16 version (I cannot think of anything else) https://github.com/csabakecskemeti/ministral-3_dequantizer_fp8-bf16 |
|
I can see it here: https://huggingface.co/mistralai/Mistral-Large-3-675B-Instruct-2512-BF16 |
|
You're right, they just removed it from the collection (it it was ever there :p) there's where I looked for. My bad |
It looks like @ngxson forgot |
CISC
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@csabakecskemeti This should work.
|
I can confirm it works with the changes suggested by @CISC |
Co-authored-by: Sigbjørn Skjæret <[email protected]>
|
@ngxson Ouch, that second suggestion should not have been directly applied, GitHub messes up changes outside of preview area. :( |
Co-authored-by: Sigbjørn Skjæret <[email protected]>
* convert: support Mistral 3 Large MoE * filter out vision tensors, add missing keys * handle vocab * add temperature_length * fix mscale_all_dim * clean up * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <[email protected]> * fix * Update gguf-py/gguf/tensor_mapping.py Co-authored-by: Sigbjørn Skjæret <[email protected]> --------- Co-authored-by: Sigbjørn Skjæret <[email protected]>
WIP, the code is quite ugly for now, but just want to get it to work.
Remember to convert with the
--mistral-formatargument, as the weight is not yet transformers-compatibleOutput
F16 weight is 1.35 TerabytesQ8_0 weight is 716GB and I don't have enough hw to test itEdit: thanks @bartowski1182 for testing it!
NOTE: this PR only covers the conversion to GGUF. the C++ code still missing llama 4 scaling to work, but it will be another PR