Releases · ggml-org/llama.cpp

07 Dec 22:07

4d37262

b7315 Latest

Latest

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

model: add llama 4 scaling for mistral-large (deepseek arch) (#17744)

macOS/iOS:

Linux:

Windows:

Assets 24

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6

373 MB 2025-12-07T22:07:24Z
cudart-llama-bin-win-cuda-13.1-x64.zip

sha256:f96935e7e385e3b2d0189239077c10fe8fd7e95690fea4afec455b1b6c7e3f18

384 MB 2025-12-07T22:07:37Z
llama-b7315-bin-macos-arm64.tar.gz

sha256:62cacdebf0e7e2879b9c6555dce36c74611a651bf690fefeb5e9039a3c277e53

13.2 MB 2025-12-07T22:07:52Z
llama-b7315-bin-macos-arm64.zip

sha256:d2df47ae64d2085ec5455115e4e218ca47c5036500bc4dc2196a1b7399d1ae51

13.2 MB 2025-12-07T22:07:53Z
llama-b7315-bin-macos-x64.tar.gz

sha256:6de412b512294943b55d5d4ec019eeacc29ee70fc058e206a6d6bd208aa9d674

36.4 MB 2025-12-07T22:07:54Z
llama-b7315-bin-macos-x64.zip

sha256:3168eacca8a8034c081cd7287b2afcc385f7feafea37ca89aedc3f134885480f

36.3 MB 2025-12-07T22:07:56Z
llama-b7315-bin-ubuntu-s390x.tar.gz

sha256:e45e6d8a8eb0c34a184d6de26a7fee957e879025c66edd1fde2b6290d0554f49

17.2 MB 2025-12-07T22:07:58Z
llama-b7315-bin-ubuntu-s390x.zip

sha256:a2af6c4ca04ab27a86c9dc7c2ce3cba047198c5174ca1e2ceac4b31ae1afdff4

15.1 MB 2025-12-07T22:07:59Z
llama-b7315-bin-ubuntu-vulkan-x64.tar.gz

sha256:a7b1f98a236b0943b479be4ba620090f53a637696591f571111f34129233b007

30.1 MB 2025-12-07T22:08:01Z
llama-b7315-bin-ubuntu-vulkan-x64.zip

sha256:f67d923b17aafaacfe4eecdf324972a46fb8431ef89dd424cf5c894c37b11753

30.1 MB 2025-12-07T22:08:03Z
Source code (zip)

2025-12-07T21:29:54Z
Source code (tar.gz)

2025-12-07T21:29:54Z

07 Dec 18:01

github-actions

b7314

08f9d3c

b7314

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

Vulkan: improve mul_mat_vec_iq1_m (#16907)

Optimize Vulkan shader for matrix-vector multiplication
Revert changes on compute_outputs and main

Refactor compute_outputs to handle remaining rows correctly.

Fix trailing whitespace

macOS/iOS:

Linux:

Windows:

Assets 24

07 Dec 14:44

github-actions

b7313

0a540f9

b7313

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

ci : add windows-cuda 13.1 release (#17839)

macOS/iOS:

Linux:

Windows:

Assets 24

07 Dec 03:18

github-actions

b7312

2257758

b7312

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

common : change --color to accept on/off/auto, default to auto (#17827)

macOS/iOS:

Linux:

Windows:

Assets 22

07 Dec 02:36

github-actions

b7311

d9e03db

b7311

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

sycl: add missing BF16 conversion support for Intel oneAPI (#17780)

sycl: add missing BF16 conversion support for Intel oneAPI
Fix Line 645: Trailing whitespace

macOS/iOS:

Linux:

Windows:

Assets 22

06 Dec 21:38

github-actions

b7310

db97837

b7310

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

vulkan: perf_logger improvements (#17672)

vulkan: perf_logger improvements

Move perf_logger from device to ctx.
Add an env var to control the frequency we dump the stats. If you set a very
large value, it just dumps when the ctx is destroyed.
Add a fusion info string to the tracking, only log one item per fused op.
Fix MUL_MAT_ID flops calculation.

fix vector sizes

macOS/iOS:

Linux:

Windows:

Assets 22

06 Dec 19:30

github-actions

b7307

09c7c50

b7307

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

ggml : add circular tiling support to pad, for Vulkan, CUDA, and CPU (used for making seamless textures) (#16985)

Feat: Added vulkan circular tiling support
Feat: Added cpu circular
Feat: Added cuda kernels
Added tests
Added tests
Removed non-pad operations
Removed unneded changes
removed backend non pad tests
Update test-backend-ops.cpp
Fixed comment on pad test
removed trailing whitespace
Removed unneded test in test-backend-ops
Removed removed test from calls
Update ggml/src/ggml-vulkan/vulkan-shaders/pad.comp

Co-authored-by: Ruben Ortlam [email protected]

Fixed alignment
Formatting

Co-authored-by: Aman Gupta [email protected]

Format pad
Format
Clang format
format
format
don't change so much stuff
clang format and update to bool
fix duplicates
don't need to fix the padding
make circular bool
duplicate again
rename vulkan to wrap around
Don't need indent
moved to const expr
removed unneded extra line break
More readable method calls
Minor wording changes
Added final newline
Update ggml/include/ggml.h

Co-authored-by: Georgi Gerganov [email protected]

Update ggml/include/ggml.h

Co-authored-by: Georgi Gerganov [email protected]

Added circular pad ext tests
Gate non circular pad devices
Cleaned gating of non-circular pad devices

Co-authored-by: Phylliida [email protected]
Co-authored-by: Ruben Ortlam [email protected]
Co-authored-by: Aman Gupta [email protected]
Co-authored-by: Georgi Gerganov [email protected]

macOS/iOS:

Linux:

Windows:

Assets 22

06 Dec 15:08

github-actions

b7306

f334b79

b7306

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

HIP: fix RDNA3 FP16/BF16 matrix multiplication (#17817)

macOS/iOS:

Linux:

Windows:

Assets 22

06 Dec 13:50

github-actions

b7302

7b43f55

b7302

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

ggml : improve error handling for search path existence checks (#17653)

Improve error handling for search path existence checks

Refactor existence checks for search paths using std::error_code to handle potential errors.

Improve cache file existence check with error code

Update fs::exists to use std::error_code for error handling.

Simplify existence check for search paths

Simplify existence check for search paths

Fix logging path in error message for posix_stat
Update ggml/src/ggml-backend-reg.cpp

Co-authored-by: Aman Gupta [email protected]

Adapt to the coding standard

Co-authored-by: Aman Gupta [email protected]

macOS/iOS:

Linux:

Windows:

Assets 22

06 Dec 13:16

github-actions

b7301

444f00b

b7301

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

llama : remove quantization sanity check (#17788)

llama : remove quantization sanity check

This commit removes the quantization sanity check for attention layers.

The motivation for this is that there are model that are hybrid models
that have recurrent layers, experts layers, and attention layers. For
these models the current check fails as the experts layers are not
taking into account. After consideration, it was decided that this check
is not strictly necessary, and can be removed to allow for more flexible
model architectures.

llama : remove unused pruned_attention_w and is_clip_model vars

macOS/iOS:

Linux:

Windows:

Assets 22

Releases: ggml-org/llama.cpp

b7315

Uh oh!

b7314

Uh oh!

b7313

Uh oh!

b7312

Uh oh!

b7311

Uh oh!

b7310

Uh oh!

b7307

Uh oh!

b7306

Uh oh!

b7302

Uh oh!

b7301

Uh oh!