model: add llama 4 scaling for mistral-large (deepseek arch) #17744

ngxson · 2025-12-03T18:47:39Z

This should allow Mistral Large to go past 16K context length (hopefully, someone with enough VRAM can verify if this works or not)

DocShotgun · 2025-12-07T19:30:48Z

I cherry-picked this commit onto the current latest master 08f9d3c and loaded a 4.94bpw quant of Mistral Large 3 675B Instruct with 32k sequence length, and it does produce coherent text both on a short prompt of 1.5k tokens and a longer prompt of around 19k tokens.

ngxson · 2025-12-07T21:29:02Z

@DocShotgun thanks for testing, I guess it's good to merge then

model: add llama 4 scaling for mistral-large (deepseek arch)

49d2305

ngxson requested a review from CISC as a code owner December 3, 2025 18:47

loci-dev mentioned this pull request Dec 3, 2025

UPSTREAM PR #17744: model: add llama 4 scaling for mistral-large (deepseek arch) auroralabs-loci/llama.cpp#423

Open

CISC approved these changes Dec 3, 2025

View reviewed changes

github-actions bot added the model Model specific label Dec 3, 2025

ngxson merged commit 4d37262 into ggml-org:master Dec 7, 2025
76 of 80 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

model: add llama 4 scaling for mistral-large (deepseek arch) #17744

model: add llama 4 scaling for mistral-large (deepseek arch) #17744

ngxson commented Dec 3, 2025 •

edited

Loading

Uh oh!

DocShotgun commented Dec 7, 2025

Uh oh!

ngxson commented Dec 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

model: add llama 4 scaling for mistral-large (deepseek arch) #17744

model: add llama 4 scaling for mistral-large (deepseek arch) #17744

Conversation

ngxson commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DocShotgun commented Dec 7, 2025

Uh oh!

ngxson commented Dec 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ngxson commented Dec 3, 2025 •

edited

Loading