convert: support Mistral 3 Large MoE #17730

ngxson · 2025-12-03T11:39:39Z

WIP, the code is quite ugly for now, but just want to get it to work.

Remember to convert with the --mistral-format argument, as the weight is not yet transformers-compatible

Output ~~F16 weight is 1.35 Terabytes~~ Q8_0 weight is 716GB and I don't have enough hw to test it

Edit: thanks @bartowski1182 for testing it!

Disclaimer: unlike Ministral release, this PR is not affiliated with Mistral Team

NOTE: this PR only covers the conversion to GGUF. the C++ code still missing llama 4 scaling to work, but it will be another PR

bartowski1182 · 2025-12-03T13:49:47Z

So far so good with this, in a couple hours will be able to test generation

bartowski1182 · 2025-12-03T16:48:41Z

seems to work and produce coherent results!

ngxson · 2025-12-03T16:55:50Z

This PR still needs to be clean up before it is ready for review 😅

convert_hf_to_gguf.py

CISC · 2025-12-03T20:44:11Z

convert_hf_to_gguf.py

+        # remap hparams from Mistral MoE format to DeepseekV2 format
+        # we do this way to be able to reuse DeepseekV2Model set_gguf_parameters logic


Somewhat ugly but an acceptable trade-off.

csabakecskemeti · 2025-12-05T02:36:13Z

@ngxson Thank you so much for this.
I've also tested the conversion from your branch the convert script succeeded (with --mistral-format) but at inference time (Q8_0) I've received:
llama_model_load: error loading model: missing tensor 'blk.0.attn_k_b.weight'
Tried F16 too it also failed on the same (it should have been failed on not enough memory)

I've tried your Q4_K_M - seems working just fine.
(now downloading @bartowski1182's Q8_0 version to test on that too)

Is there any other setting or change needed for the conversion?
Note I've used Mistral's own BF16 version as the source, which has now disappeared.

bartowski1182 · 2025-12-05T02:42:26Z

It disappeared?? 👀 I can re-upload if necessary I guess ..

Only difference is using --mistral-format

csabakecskemeti · 2025-12-05T02:50:46Z

Yeah I've used the mistral format. Than I guess I have a corrupted bf16 version (I cannot think of anything else)
Yeah I can't see the BF16 version on HF.
If you ca upload that would be nice.
I made a dequantizer I've used with the Ministral 3 instruct models.
If anyone need it

https://github.com/csabakecskemeti/ministral-3_dequantizer_fp8-bf16

bartowski1182 · 2025-12-05T03:02:10Z

I can see it here:

https://huggingface.co/mistralai/Mistral-Large-3-675B-Instruct-2512-BF16

csabakecskemeti · 2025-12-05T03:13:25Z

You're right, they just removed it from the collection (it it was ever there :p) there's where I looked for. My bad

CISC · 2025-12-05T09:01:33Z

@ngxson Thank you so much for this. I've also tested the conversion from your branch the convert script succeeded (with --mistral-format) but at inference time (Q8_0) I've received: llama_model_load: error loading model: missing tensor 'blk.0.attn_k_b.weight' Tried F16 too it also failed on the same (it should have been failed on not enough memory)

It looks like @ngxson forgot wkv_b remapping in the cleanup.

CISC

@csabakecskemeti This should work.

convert_hf_to_gguf.py

gguf-py/gguf/tensor_mapping.py

csabakecskemeti · 2025-12-05T19:31:40Z

I can confirm it works with the changes suggested by @CISC
(Converted from FP8 dequantized model)

Co-authored-by: Sigbjørn Skjæret <[email protected]>

CISC · 2025-12-05T20:58:24Z

@ngxson Ouch, that second suggestion should not have been directly applied, GitHub messes up changes outside of preview area. :(

gguf-py/gguf/tensor_mapping.py

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* convert: support Mistral 3 Large MoE * filter out vision tensors, add missing keys * handle vocab * add temperature_length * fix mscale_all_dim * clean up * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <[email protected]> * fix * Update gguf-py/gguf/tensor_mapping.py Co-authored-by: Sigbjørn Skjæret <[email protected]> --------- Co-authored-by: Sigbjørn Skjæret <[email protected]>

ngxson added 2 commits December 3, 2025 12:36

convert: support Mistral 3 Large MoE

3e623b9

filter out vision tensors, add missing keys

1a308da

ngxson changed the title ~~convert: support Mistral 3 Large MoE~~ convert: support Mistral 3 Large MoE (need help for testing) Dec 3, 2025

ngxson added 3 commits December 3, 2025 14:08

handle vocab

08e0a4e

add temperature_length

249eda4

fix mscale_all_dim

aebab5f

github-actions bot added the python python script changes label Dec 3, 2025

ngxson marked this pull request as ready for review December 3, 2025 16:54

ngxson requested a review from CISC as a code owner December 3, 2025 16:54

ngxson marked this pull request as draft December 3, 2025 16:55

bartowski1182 mentioned this pull request Dec 3, 2025

Can't convert mistral 3 large #17705

Closed

clean up

646e47d

ngxson marked this pull request as ready for review December 3, 2025 18:10

ngxson changed the title ~~convert: support Mistral 3 Large MoE (need help for testing)~~ convert: support Mistral 3 Large MoE Dec 3, 2025

ngxson mentioned this pull request Dec 3, 2025

model: add llama 4 scaling for mistral-large (deepseek arch) #17744

Merged

loci-dev mentioned this pull request Dec 3, 2025

UPSTREAM PR #17744: model: add llama 4 scaling for mistral-large (deepseek arch) auroralabs-loci/llama.cpp#423

Open

CISC approved these changes Dec 3, 2025

View reviewed changes

CISC mentioned this pull request Dec 4, 2025

fix: convert_hf_to_gguf - use existing local chat_template if mistral-format model has one. #17749

Merged

Merge branch 'master' into xsn/mistral_large_moe

ab2474d

taronaeo linked an issue Dec 5, 2025 that may be closed by this pull request

Can't convert mistral 3 large #17705

Closed

CISC reviewed Dec 5, 2025

View reviewed changes

convert_hf_to_gguf.py Show resolved Hide resolved

CISC reviewed Dec 5, 2025

View reviewed changes

gguf-py/gguf/tensor_mapping.py Show resolved Hide resolved

Apply suggestions from code review

49c4e2d

Co-authored-by: Sigbjørn Skjæret <[email protected]>

ngxson added 2 commits December 5, 2025 22:01

fix

2f8c2ef

Merge branch 'master' into xsn/mistral_large_moe

4955da0

CISC reviewed Dec 5, 2025

View reviewed changes

gguf-py/gguf/tensor_mapping.py Outdated Show resolved Hide resolved

Update gguf-py/gguf/tensor_mapping.py

15f78b6

Co-authored-by: Sigbjørn Skjæret <[email protected]>

ngxson merged commit dbc15a7 into ggml-org:master Dec 6, 2025
6 checks passed

github-actions bot mentioned this pull request Dec 7, 2025

Reddit News Daily 2025-12-07 gitlawr/reddit-daily-news#86

Open

		# remap hparams from Mistral MoE format to DeepseekV2 format
		# we do this way to be able to reuse DeepseekV2Model set_gguf_parameters logic

convert: support Mistral 3 Large MoE #17730

convert: support Mistral 3 Large MoE #17730

Conversation

ngxson commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bartowski1182 commented Dec 3, 2025

Uh oh!

bartowski1182 commented Dec 3, 2025

Uh oh!

ngxson commented Dec 3, 2025

Uh oh!

Uh oh!

Uh oh!

CISC Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

csabakecskemeti commented Dec 5, 2025

Uh oh!

bartowski1182 commented Dec 5, 2025

Uh oh!

csabakecskemeti commented Dec 5, 2025

Uh oh!

bartowski1182 commented Dec 5, 2025

Uh oh!

csabakecskemeti commented Dec 5, 2025

Uh oh!

CISC commented Dec 5, 2025

Uh oh!

CISC left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

csabakecskemeti commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC commented Dec 5, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ngxson commented Dec 3, 2025 •

edited

Loading

csabakecskemeti commented Dec 5, 2025 •

edited

Loading