Releases: ggml-org/llama.cpp
b7315
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
model: add llama 4 scaling for mistral-large (deepseek arch) (#17744)
macOS/iOS:
Linux:
Windows:
b7314
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
Vulkan: improve mul_mat_vec_iq1_m (#16907)
-
Optimize Vulkan shader for matrix-vector multiplication
-
Revert changes on compute_outputs and main
Refactor compute_outputs to handle remaining rows correctly.
- Fix trailing whitespace
macOS/iOS:
Linux:
Windows:
b7313
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
ci : add windows-cuda 13.1 release (#17839)
macOS/iOS:
Linux:
Windows:
b7312
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
common : change --color to accept on/off/auto, default to auto (#17827)
macOS/iOS:
Linux:
Windows:
b7311
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
sycl: add missing BF16 conversion support for Intel oneAPI (#17780)
-
sycl: add missing BF16 conversion support for Intel oneAPI
-
Fix Line 645: Trailing whitespace
macOS/iOS:
Linux:
Windows:
b7310
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
vulkan: perf_logger improvements (#17672)
- vulkan: perf_logger improvements
- Move perf_logger from device to ctx.
- Add an env var to control the frequency we dump the stats. If you set a very
large value, it just dumps when the ctx is destroyed. - Add a fusion info string to the tracking, only log one item per fused op.
- Fix MUL_MAT_ID flops calculation.
- fix vector sizes
macOS/iOS:
Linux:
Windows:
b7307
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
ggml : add circular tiling support to pad, for Vulkan, CUDA, and CPU (used for making seamless textures) (#16985)
-
Feat: Added vulkan circular tiling support
-
Feat: Added cpu circular
-
Feat: Added cuda kernels
-
Added tests
-
Added tests
-
Removed non-pad operations
-
Removed unneded changes
-
removed backend non pad tests
-
Update test-backend-ops.cpp
-
Fixed comment on pad test
-
removed trailing whitespace
-
Removed unneded test in test-backend-ops
-
Removed removed test from calls
-
Update ggml/src/ggml-vulkan/vulkan-shaders/pad.comp
Co-authored-by: Ruben Ortlam [email protected]
-
Fixed alignment
-
Formatting
Co-authored-by: Aman Gupta [email protected]
-
Format pad
-
Format
-
Clang format
-
format
-
format
-
don't change so much stuff
-
clang format and update to bool
-
fix duplicates
-
don't need to fix the padding
-
make circular bool
-
duplicate again
-
rename vulkan to wrap around
-
Don't need indent
-
moved to const expr
-
removed unneded extra line break
-
More readable method calls
-
Minor wording changes
-
Added final newline
-
Update ggml/include/ggml.h
Co-authored-by: Georgi Gerganov [email protected]
- Update ggml/include/ggml.h
Co-authored-by: Georgi Gerganov [email protected]
-
Added circular pad ext tests
-
Gate non circular pad devices
-
Cleaned gating of non-circular pad devices
Co-authored-by: Phylliida [email protected]
Co-authored-by: Ruben Ortlam [email protected]
Co-authored-by: Aman Gupta [email protected]
Co-authored-by: Georgi Gerganov [email protected]
macOS/iOS:
Linux:
Windows:
b7306
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
HIP: fix RDNA3 FP16/BF16 matrix multiplication (#17817)
macOS/iOS:
Linux:
Windows:
b7302
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
ggml : improve error handling for search path existence checks (#17653)
- Improve error handling for search path existence checks
Refactor existence checks for search paths using std::error_code to handle potential errors.
- Improve cache file existence check with error code
Update fs::exists to use std::error_code for error handling.
- Simplify existence check for search paths
Simplify existence check for search paths
-
Fix logging path in error message for posix_stat
-
Update ggml/src/ggml-backend-reg.cpp
Co-authored-by: Aman Gupta [email protected]
- Adapt to the coding standard
Co-authored-by: Aman Gupta [email protected]
macOS/iOS:
Linux:
Windows:
b7301
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
llama : remove quantization sanity check (#17788)
- llama : remove quantization sanity check
This commit removes the quantization sanity check for attention layers.
The motivation for this is that there are model that are hybrid models
that have recurrent layers, experts layers, and attention layers. For
these models the current check fails as the experts layers are not
taking into account. After consideration, it was decided that this check
is not strictly necessary, and can be removed to allow for more flexible
model architectures.
- llama : remove unused pruned_attention_w and is_clip_model vars
macOS/iOS:
Linux:
Windows: