Skip to content

Conversation

@ggerganov
Copy link
Member

Moving all CPU quantization code into separate source files.
This is a refactoring change - there should be no functional difference.

  • Rename k_quants.h/.c -> ggml-quants.h/.c
  • Move qunatization code from ggml.c into ggml-quants.c
  • Remove GGML_USE_K_QUANTS ifdefs (i.e. always build with K-quants support)

In the future, ggml-quants.h should probably be moved into a ggml-impl.h private API header as discussed in ggml-org/ggml#549

@ggerganov ggerganov changed the title Ggml quants ggml : quantization refactoring Oct 28, 2023
@cebtenzzre
Copy link
Collaborator

Building without GGML_USE_K_QUANTS is currently the only way to quantize a pure Q4_0 model, which has been helpful while developing the Nomic Vulkan backend, as we run the whole model on the GPU but did not have a Q6_K matmul shader at first.

Maybe we should add a flag to quantize to disable the k-quants logic?

@ggerganov
Copy link
Member Author

Maybe we should add a flag to quantize to disable the k-quants logic?

Yes. Do you want to add it to this PR?

@cebtenzzre
Copy link
Collaborator

Yes. Do you want to add it to this PR?

Done.

@ggerganov ggerganov merged commit d69d777 into master Oct 29, 2023
Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request Oct 30, 2023
* ggml : factor all quantization code in ggml-quants

ggml-ci

* ggml-quants : fix Zig and Swift builds + quantize tool

ggml-ci

* quantize : --pure option for disabling k-quant mixtures

---------

Co-authored-by: cebtenzzre <[email protected]>
olexiyb pushed a commit to Sanctum-AI/llama.cpp that referenced this pull request Nov 23, 2023
* ggml : factor all quantization code in ggml-quants

ggml-ci

* ggml-quants : fix Zig and Swift builds + quantize tool

ggml-ci

* quantize : --pure option for disabling k-quant mixtures

---------

Co-authored-by: cebtenzzre <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants