gguf : deduplicate #2629

ggerganov · 2023-08-16T11:59:51Z

Following things work:

ggerganov · 2023-08-16T12:01:57Z

Overall, it is looking good - need to test and fix CUDA.

The only problem I notice is that there are no more new-lines in the generated text.
Maybe something wrong after the tokenizer fixes.
cc @goerch - any ideas?

monatis · 2023-08-16T12:17:10Z

@ggerganov Btw, I removed the inner part of llama_load_session_file_internal() and its save counterpart, thinking of migrating them to GGUF format later on. So these need to be copied from old llama.cpp again if we don't want to do so.

My idea was to use GGUF for all serialization / deserialization ops, including session and LoRA files, but we may choose to keep things simple.

klosax · 2023-08-16T12:22:53Z

Overall, it is looking good - need to test and fix CUDA

Compiling with cublas seems to work fine. Tested main perplexity quantize on openllama-3b-v2

klosax · 2023-08-16T12:25:17Z

Still got this compiler warning:

In file included from /usr/include/string.h:495,
                 from /usr/include/c++/9/cstring:42,
                 from llama.cpp:72:
In function ‘char* strncpy(char*, const char*, size_t)’,
    inlined from ‘int llama_token_to_str_with_model(const llama_model*, llama_token, char*, int)’ at llama.cpp:4793:20:
/usr/include/x86_64-linux-gnu/bits/string_fortified.h:106:34: warning: ‘char* __builtin_strncpy(char*, const char*, long unsigned int)’ output truncated before terminating nul copying 3 bytes from a string of the same length [-Wstringop-truncation]
  106 |   return __builtin___strncpy_chk (__dest, __src, __len, __bos (__dest));
      |          ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

ggerganov · 2023-08-16T12:25:28Z

@monatis Yup, I will fix these soon.

@klosax Cool

Regarding the "new-line" issue - I can see the new-line token (13) being sampled, but it is detokenized to empty string '' instead of new-line by the llama_token_to_str() function. We should dig into this and fix it

klosax · 2023-08-16T12:26:14Z

Maybe the warning above is the problem?

klosax · 2023-08-16T12:44:41Z

Line feed working now, I guess it was a typo + instead of -

ggerganov · 2023-08-16T12:58:23Z

Perfect - the tokenizer test don't work because they need new vocabs:

 $ ▶ ./bin/test-tokenizer-0 ../models/ggml-vocab-llama.bin 
main : reading vocab from: '../models/ggml-vocab-llama.bin'
gguf_init_from_file: invalid magic number 67676a74
Segmentation fault (core dumped)

We need to generate new vocab files.

klosax · 2023-08-16T13:04:39Z

Maybe better to use a model file instead of using a separate vocab? And have information that the tests only works with the original LLaMA models.

klosax · 2023-08-16T13:05:23Z

Later we will need tokenizer tests for other models like mpt falcon gptneox..

ggerganov · 2023-08-16T13:07:18Z

We need small vocab files inside the repository in order to run the CI

klosax · 2023-08-16T13:08:59Z

Ok. The vocab files are model files without the tensors, right?

ggerganov · 2023-08-16T13:11:57Z

Yes, just the meta data: key-value pairs + tensor info. No tensor data

klosax · 2023-08-16T13:22:37Z

Should not have tensor_count = 0 and include tensor info? Maybe skip tensor info?

ggerganov · 2023-08-16T13:30:43Z

Should not have tensor_count = 0 and include tensor info? Maybe skip tensor info?

Both ways works - whichever is simpler

monatis · 2023-08-16T13:37:12Z

Setting tensor count to 0 perfectly works without breaking file format --I previously used it for testing.

We can define a flag in conversion script metadata_only = False and setting it to True can output metadata-only file without tensors for that model.

ggerganov · 2023-08-16T13:40:10Z

There is already --vocab-only arg in convert.py:

https://github.com/ggerganov/llama.cpp/blob/5b94b14d5dd59e14f2ab443a79413c445726c98f/convert.py#L1305

I consider the other convert scripts that we currently use just temporary until we update convert.py to work with gguf.py, so not sure it's worth adding more features to them

Edit: this is regarding the LLaMA models. For other models, we should likely have separate convert scripts

klosax · 2023-08-16T14:23:24Z

Edit: this is regarding the LLaMA models. For other models, we should likely have separate convert scripts

I that case i think the lazy tensor reader from convert.py should be separated so they can be used by the other scripts.

Will migrate this to GGUF in the future

klosax · 2023-08-16T14:49:52Z

LLaMA v1 13b main and perplexity works

klosax · 2023-08-16T15:14:45Z

Tested q4_0 70b model in main but the output is not good and perplexity returns -inf ..

klosax · 2023-08-16T15:18:09Z

LLaMA v1 33b works in main and perplexity

klosax · 2023-08-16T15:22:12Z

OpenLLaMA v2 3b and 7b works main perplexity

klosax · 2023-08-16T15:26:42Z

LLaMA v1 7b also works.

ggerganov · 2023-08-16T16:25:20Z

Weird that 70B does not work - I can't test as I don't have HF 70B. Just the original one

Will merge like this and fix the convert script. When ready, will test 70B and try to resolve

Green-Sky · 2023-08-16T21:16:40Z

Weird that 70B does not work - I can't test as I don't have HF 70B. Just the original one

Will merge like this and fix the convert script. When ready, will test 70B and try to resolve

iirc there was a fix on master after gguf branched. (cant find it rn so might be bad memory)

klosax · 2023-08-16T21:43:46Z

iirc there was a fix on master after gguf branched.

If you was thinking of this #2427 it is already included.
I cant fully test the 70b model on my system.

Green-Sky · 2023-08-16T22:57:48Z

If you was thinking of this #2427 it is already included.
I cant fully test the 70b model on my system.

hm yea, i think i meant that. never mind then.

goerch · 2023-08-17T05:14:18Z

Just tried to the branch. Converting with

python.exe convert-llama-h5-to-gguf.py models\llama-7B-hf 1

looks fine:

output.weight, shape 2, torch.float16 --> float16
gguf: model successfully exported to 'models\llama-7B-hf/ggml-model-f16.gguf'

The resulting file gguf file is around 13 GB big. When starting inference I get

main: build = 1168 (d864596)
main: seed  = 1692248927
gguf_init_from_file: invalid magic number 00000000

This happens with clang-cl on Windows.

klosax · 2023-08-17T07:19:08Z

gguf_init_from_file: invalid magic number 00000000

I got this when accidentally tried to open a directory instead of a model file.

monatis · 2023-08-17T07:34:49Z

I got this when accidentally tried to open a directory instead of a model

Me too, and reproduceable.

$ ./main -m ~                                                       

main: build = 1157 (5ec1893)                                                                                            
main: seed  = 1692257530                                                                                                
gguf_init_from_file: invalid magic number 00000000                                                                      
Segmentation fault

klosax · 2023-08-17T07:44:44Z

gguf: model successfully exported to 'models\llama-7B-hf/ggml-model-f16.gguf'

There is both backslashes and slashes in this path, dont know if you are using this to load?

ggerganov · 2023-08-17T07:50:44Z

I don't have Windows to test

I'm a bit confused - does the wrong magic of 0000000 occur for directories, or does it occur with a converted F16 model file?

klosax · 2023-08-17T07:53:07Z

I get the error when trying to open a directory instead of a model file. On ubuntu.

goerch · 2023-08-17T07:54:16Z

My mistake was trying to open an non existing file. Otherwise things work fine now.

monatis · 2023-08-17T07:54:27Z

does the wrong magic of 0000000 occur for directories, or does it occur with a converted F16 model?

To my end, I couldn't reproduce with a GGUF file, but passing a directory path leads to wrong magic on WSL, regular Windows and regular Ubuntu.

monatis · 2023-08-17T07:55:33Z

Maybe we can make that error message a little bit verbose with "Are you sure that this path holds a GGUF file?"

klosax · 2023-08-17T07:56:45Z

Better " Can not open an directory. Should be a .gguf model file."

ggerganov · 2023-08-17T08:56:53Z

@goerch

I'm running the test-tokenizer-0 test using the new F16 GGUF model, and it is currently failing.
We made the following change recently:

from:

static uint8_t llama_byte_to_char(const llama_vocab & vocab, uint8_t byte) {
    if (llama_vocab_type(vocab) == "spm") {
        return byte + 3;
    }

to:

        return byte - 3;

This was needed because the new-line wasn't printing correctly.
However, with this change, the test-tokenizer-0 is now failing.

If I revert it, it no longer fails, but the de-tokenized strings do not match the original strings.
For example:

main : ' ' tokenized to ' ▁'
main : ' Hello World' tokenized to '  Hello World'
main : ' Hello World!' tokenized to '  Hello World!'
main : ' Hello world' tokenized to '  Hello world'
main : ' this is 🦙.cpp' tokenized to '  this is .cpp'
main : 'Hello World' tokenized to ' Hello World'
main : 'Hello world' tokenized to ' Hello world'
main : 'w048 7tuijk dsdfhu' tokenized to ' w048 7tuijk dsdfhu'
main : 'нещо на Български' tokenized to ' нещо на Български'
main : 'កាន់តែពិសេសអាចខលចេញ' tokenized to ' កាន祑礕祈礜礽ស祇ស礨ា礋礇礡礋祇礏'
main : '🚀 (normal) 😶‍🌫️ (multiple emojis concatenated) ✅ (only emoji that has its own token)' tokenized to '  (normal) ‍️ (multiple emojis concatenated) ✅ (only emoji that has its own token)'

I suspect 2 things:

either I introduced a bug when merging the tokenizer fixes PR
or we are not exporting the vocab properly using the new convert scripts

Will be looking into this now - any help would be appreciated

ggerganov · 2023-08-17T09:29:13Z

I think I fixed it:

7b6ae89

We were incorrectly using llama_byte_to_char() when tokenizing a string.
To fix it, I added llama_char_to_byte() and use it instead

cebtenzzre · 2023-08-17T17:14:15Z

llama.cpp

-    }
-    void read_vocab() {
-        vocab.id_to_token.resize(hparams.n_vocab);
+        ctx_gguf = gguf_init_from_file(fname.c_str(), params);


There's a missing check for ctx_gguf == NULL here, which is why "invalid magic number" leads to a segfault.

cebtenzzre · 2023-08-17T17:41:24Z

And checking ferror(file) if gguf_fread_el returns false here should make it more obvious if the user passes a directory instead of a file:
https://github.com/ggerganov/llama.cpp/blob/795ec7070cba4356112836bd0de3a718cb04e8bf/ggml.c#L18717-L18721

klosax · 2023-08-17T17:57:33Z

I think it is better if you review here instead #2398

@klosax

* gguf : first API pass * gguf : read header + meta data * gguf : read tensor info * gguf : initial model loading - not tested * gguf : add gguf_get_tensor_name() * gguf : do not support passing existing ggml_context to gguf_init * gguf : simplify gguf_get_val * gguf : gguf.c is now part of ggml.c * gguf : read / write sample models * gguf : add comments * refactor : reduce code duplication and better API (#2415) * gguf : expose the gguf_type enum through the API for now * gguf : add array support * gguf.py : some code style changes * convert.py : start a new simplified implementation by removing old stuff * convert.py : remove GGML vocab + other obsolete stuff * GGUF : write tensor (#2426) * WIP: Write tensor * GGUF : Support writing tensors in Python * refactor : rm unused import and upd todos * fix : fix errors upd writing example * rm example.gguf * gitignore *.gguf * undo formatting * gguf : add gguf_find_key (#2438) * gguf.cpp : find key example * ggml.h : add gguf_find_key * ggml.c : add gguf_find_key * gguf : fix writing tensors * gguf : do not hardcode tensor names to read * gguf : write sample tensors to read * gguf : add tokenization constants * quick and dirty conversion example * gguf : fix writing gguf arrays * gguf : write tensors one by one and code reuse * gguf : fix writing gguf arrays * gguf : write tensors one by one * gguf : write tensors one by one * gguf : write tokenizer data * gguf : upd gguf conversion script * Update convert-llama-h5-to-gguf.py * gguf : handle already encoded string * ggml.h : get array str and f32 * ggml.c : get arr str and f32 * gguf.py : support any type * Update convert-llama-h5-to-gguf.py * gguf : fix set is not subscriptable * gguf : update convert-llama-h5-to-gguf.py * constants.py : add layer norm eps * gguf.py : add layer norm eps and merges * ggml.h : increase GGML_MAX_NAME to 64 * ggml.c : add gguf_get_arr_n * Update convert-llama-h5-to-gguf.py * add gptneox gguf example * Makefile : add gptneox gguf example * Update convert-llama-h5-to-gguf.py * add gptneox gguf example * Update convert-llama-h5-to-gguf.py * Update convert-gptneox-h5-to-gguf.py * Update convert-gptneox-h5-to-gguf.py * Update convert-llama-h5-to-gguf.py * gguf : support custom alignment value * gguf : fix typo in function call * gguf : mmap tensor data example * fix : update convert-llama-h5-to-gguf.py * Update convert-llama-h5-to-gguf.py * convert-gptneox-h5-to-gguf.py : Special tokens * gptneox-main.cpp : special tokens * Update gptneox-main.cpp * constants.py : special tokens * gguf.py : accumulate kv and tensor info data + special tokens * convert-gptneox-h5-to-gguf.py : accumulate kv and ti + special tokens * gguf : gguf counterpart of llama-util.h * gguf-util.h : update note * convert-llama-h5-to-gguf.py : accumulate kv / ti + special tokens * convert-llama-h5-to-gguf.py : special tokens * Delete gptneox-common.cpp * Delete gptneox-common.h * convert-gptneox-h5-to-gguf.py : gpt2bpe tokenizer * gptneox-main.cpp : gpt2 bpe tokenizer * gpt2 bpe tokenizer (handles merges and unicode) * Makefile : remove gptneox-common * gguf.py : bytesarray for gpt2bpe tokenizer * cmpnct_gpt2bpe.hpp : comments * gguf.py : use custom alignment if present * gguf : minor stuff * Update gptneox-main.cpp * map tensor names * convert-gptneox-h5-to-gguf.py : map tensor names * convert-llama-h5-to-gguf.py : map tensor names * gptneox-main.cpp : map tensor names * gguf : start implementing libllama in GGUF (WIP) * gguf : start implementing libllama in GGUF (WIP) * rm binary commited by mistake * upd .gitignore * gguf : calculate n_mult * gguf : inference with 7B model working (WIP) * gguf : rm deprecated function * gguf : start implementing gguf_file_saver (WIP) * gguf : start implementing gguf_file_saver (WIP) * gguf : start implementing gguf_file_saver (WIP) * gguf : add gguf_get_kv_type * gguf : add gguf_get_kv_type * gguf : write metadata in gguf_file_saver (WIP) * gguf : write metadata in gguf_file_saver (WIP) * gguf : write metadata in gguf_file_saver * gguf : rm references to old file formats * gguf : shorter name for member variable * gguf : rm redundant method * gguf : get rid of n_mult, read n_ff from file * Update gguf_tensor_map.py * Update gptneox-main.cpp * gguf : rm references to old file magics * gguf : start implementing quantization (WIP) * gguf : start implementing quantization (WIP) * gguf : start implementing quantization (WIP) * gguf : start implementing quantization (WIP) * gguf : start implementing quantization (WIP) * gguf : start implementing quantization (WIP) * gguf : quantization is working * gguf : roper closing of file * gguf.py : no need to convert tensors twice * convert-gptneox-h5-to-gguf.py : no need to convert tensors twice * convert-llama-h5-to-gguf.py : no need to convert tensors twice * convert-gptneox-h5-to-gguf.py : simplify nbytes * convert-llama-h5-to-gguf.py : simplify nbytes * gptneox-main.cpp : n_layer --> n_block * constants.py : n_layer --> n_block * gguf.py : n_layer --> n_block * convert-gptneox-h5-to-gguf.py : n_layer --> n_block * convert-llama-h5-to-gguf.py : n_layer --> n_block * gptneox-main.cpp : n_layer --> n_block * Update gguf_tensor_map.py * convert-gptneox-h5-to-gguf.py : load model in parts to save memory * convert-llama-h5-to-gguf.py : load model in parts to save memory * convert : write more metadata for LLaMA * convert : rm quantization version * convert-gptneox-h5-to-gguf.py : add file_type key * gptneox-main.cpp : add file_type key * fix conflicts * gguf : add todos and comments * convert-gptneox-h5-to-gguf.py : tensor name map changes * Create gguf_namemap.py : tensor name map changes * Delete gguf_tensor_map.py * gptneox-main.cpp : tensor name map changes * convert-llama-h5-to-gguf.py : fixes * gguf.py : dont add empty strings * simple : minor style changes * gguf : use UNIX line ending * Create convert-llama-7b-pth-to-gguf.py * llama : sync gguf-llama.cpp with latest llama.cpp (#2608) * llama : sync gguf-llama.cpp with latest llama.cpp * minor : indentation + assert * llama : refactor gguf_buffer and gguf_ctx_buffer * llama : minor * gitignore : add gptneox-main * llama : tokenizer fixes (#2549) * Merge tokenizer fixes into the gguf branch. * Add test vocabularies * convert : update convert-new.py with tokenizer fixes (#2614) * Merge tokenizer fixes into the gguf branch. * Add test vocabularies * Adapt convert-new.py (and fix a clang-cl compiler error on windows) * llama : sync gguf-llama with llama (#2613) * llama : sync gguf-llama with llama * tests : fix build + warnings (test-tokenizer-1 still fails) * tests : fix wstring_convert * convert : fix layer names * llama : sync gguf-llama.cpp * convert : update HF converter to new tokenizer voodoo magics * llama : update tokenizer style * convert-llama-h5-to-gguf.py : add token types * constants.py : add token types * gguf.py : add token types * convert-llama-7b-pth-to-gguf.py : add token types * gguf-llama.cpp : fix n_head_kv * convert-llama-h5-to-gguf.py : add 70b gqa support * gguf.py : add tensor data layout * convert-llama-h5-to-gguf.py : add tensor data layout * convert-llama-7b-pth-to-gguf.py : add tensor data layout * gptneox-main.cpp : add tensor data layout * convert-llama-h5-to-gguf.py : clarify the reverse permute * llama : refactor model loading code (#2620) * llama : style formatting + remove helper methods * llama : fix quantization using gguf tool * llama : simplify gguf_file_saver * llama : fix method names * llama : simplify write_header() * llama : no need to pass full file loader to the file saver just gguf_ctx * llama : gguf_file_saver write I32 * llama : refactor tensor names (#2622) * gguf: update tensor names searched in quantization * gguf : define tensor names as constants * gguf : initial write API (not tested yet) * gguf : write to file API (not tested) * gguf : initial write API ready + example * gguf : fix header write * gguf : fixes + simplify example + add ggml_nbytes_pad() * gguf : minor * llama : replace gguf_file_saver with new gguf write API * gguf : streaming support when writing files * gguf : remove oboslete write methods * gguf : remove obosolete gguf_get_arr_xxx API * llama : simplify gguf_file_loader * llama : move hparams and vocab from gguf_file_loader to llama_model_loader * llama : merge gguf-util.h in llama.cpp * llama : reorder definitions in .cpp to match .h * llama : minor simplifications * llama : refactor llama_model_loader (WIP) wip : remove ggml_ctx from llama_model_loader wip : merge gguf_file_loader in llama_model_loader * llama : fix shape prints * llama : fix Windows build + fix norm_rms_eps key * llama : throw error on missing KV paris in model meta data * llama : improve printing + log meta data * llama : switch print order of meta data --------- Co-authored-by: M. Yusuf Sarıgöz <[email protected]> * gguf : deduplicate (#2629) * gguf : better type names * dedup : CPU + Metal is working * ggml : fix warnings about unused results * llama.cpp : fix line feed and compiler warning * llama : fix strncpy warning + note token_to_str does not write null * llama : restore the original load/save session implementation Will migrate this to GGUF in the future * convert-llama-h5-to-gguf.py : support alt ctx param name * ggml : assert when using ggml_mul with non-F32 src1 * examples : dedup simple --------- Co-authored-by: klosax <[email protected]> * gguf.py : merge all files in gguf.py * convert-new.py : pick #2427 for HF 70B support * examples/gguf : no need to keep q option for quantization any more * llama.cpp : print actual model size * llama.cpp : use ggml_elements() * convert-new.py : output gguf (#2635) * convert-new.py : output gguf (WIP) * convert-new.py : add gguf key-value pairs * llama : add hparams.ctx_train + no longer print ftype * convert-new.py : minor fixes * convert-new.py : vocab-only option should work now * llama : fix tokenizer to use llama_char_to_byte * tests : add new ggml-vocab-llama.gguf * convert-new.py : tensor name mapping * convert-new.py : add map for skipping tensor serialization * convert-new.py : convert script now works * gguf.py : pick some of the refactoring from #2644 * convert-new.py : minor fixes * convert.py : update to support GGUF output * Revert "ci : disable CI temporary to not waste energy" This reverts commit 7e82d25. * convert.py : n_head_kv optional and .gguf file extension * convert.py : better always have n_head_kv and default it to n_head * llama : sync with recent PRs on master * editorconfig : ignore models folder ggml-ci * ci : update ".bin" to ".gguf" extension ggml-ci * llama : fix llama_model_loader memory leak * gptneox : move as a WIP example * llama : fix lambda capture ggml-ci * ggml : fix bug in gguf_set_kv ggml-ci * common.h : .bin --> .gguf * quantize-stats.cpp : .bin --> .gguf * convert.py : fix HF tensor permuting / unpacking ggml-ci * llama.cpp : typo * llama : throw error if gguf fails to init from file ggml-ci * llama : fix tensor name grepping during quantization ggml-ci * gguf.py : write tensors in a single pass (#2644) * gguf : single pass for writing tensors + refactoring writer * gguf : single pass for writing tensors + refactoring writer * gguf : single pass for writing tensors + refactoring writer * gguf : style fixes in simple conversion script * gguf : refactor gptneox conversion script * gguf : rename h5 to hf (for HuggingFace) * gguf : refactor pth to gguf conversion script * gguf : rm file_type key and method * gguf.py : fix vertical alignment * gguf.py : indentation --------- Co-authored-by: Georgi Gerganov <[email protected]> * convert-gptneox-hf-to-gguf.py : fixes * gguf.py : gptneox mapping * convert-llama-hf-to-gguf.py : fixes * convert-llama-7b-pth-to-gguf.py : fixes * ggml.h : reverse GGUF_MAGIC * gguf.py : reverse GGUF_MAGIC * test-tokenizer-0.cpp : fix warning * llama.cpp : print kv general.name * llama.cpp : get special token kv and linefeed token id * llama : print number of tensors per type + print arch + style * tests : update vocab file with new magic * editorconfig : fix whitespaces * llama : re-order functions * llama : remove C++ API + reorganize common source in /common dir * llama : minor API updates * llama : avoid hardcoded special tokens * llama : fix MPI build ggml-ci * llama : introduce enum llama_vocab_type + remove hardcoded string constants * convert-falcon-hf-to-gguf.py : falcon HF --> gguf conversion, not tested * falcon-main.cpp : falcon inference example * convert-falcon-hf-to-gguf.py : remove extra kv * convert-gptneox-hf-to-gguf.py : remove extra kv * convert-llama-7b-pth-to-gguf.py : remove extra kv * convert-llama-hf-to-gguf.py : remove extra kv * gguf.py : fix for falcon 40b * falcon-main.cpp : fix for falcon 40b * convert-falcon-hf-to-gguf.py : update ref * convert-falcon-hf-to-gguf.py : add tensor data layout * cmpnct_gpt2bpe.hpp : fixes * falcon-main.cpp : fixes * gptneox-main.cpp : fixes * cmpnct_gpt2bpe.hpp : remove non-general stuff * Update examples/server/README.md Co-authored-by: slaren <[email protected]> * cmpnct_gpt2bpe.hpp : cleanup * convert-llama-hf-to-gguf.py : special tokens * convert-llama-7b-pth-to-gguf.py : special tokens * convert-permute-debug.py : permute debug print * convert-permute-debug-master.py : permute debug for master * convert-permute-debug.py : change permute type of attn_q * convert.py : 70b model working (change attn_q permute) * Delete convert-permute-debug-master.py * Delete convert-permute-debug.py * convert-llama-hf-to-gguf.py : fix attn_q permute * gguf.py : fix rope scale kv * convert-llama-hf-to-gguf.py : rope scale and added tokens * convert-llama-7b-pth-to-gguf.py : rope scale and added tokens * llama.cpp : use rope scale kv * convert-llama-7b-pth-to-gguf.py : rope scale fix * convert-llama-hf-to-gguf.py : rope scale fix * py : fix whitespace * gguf : add Python script to convert GGMLv3 LLaMA models to GGUF (#2682) * First pass at converting GGMLv3 LLaMA models to GGUF * Cleanups, better output during conversion * Fix vocab space conversion logic * More vocab conversion fixes * Add description to converted GGUF files * Improve help text, expand warning * Allow specifying name and description for output GGUF * Allow overriding vocab and hyperparams from original model metadata * Use correct params override var name * Fix wrong type size for Q8_K Better handling of original style metadata * Set default value for gguf add_tensor raw_shape KW arg * llama : improve token type support (#2668) * Merge tokenizer fixes into the gguf branch. * Add test vocabularies * Adapt convert-new.py (and fix a clang-cl compiler error on windows) * Improved tokenizer test But does it work on MacOS? * Improve token type support - Added @klosax code to convert.py - Improved token type support in vocabulary * Exclude platform dependent tests * More sentencepiece compatibility by eliminating magic numbers * Restored accidentally removed comment * llama : add API for token type ggml-ci * tests : use new tokenizer type API (#2692) * Merge tokenizer fixes into the gguf branch. * Add test vocabularies * Adapt convert-new.py (and fix a clang-cl compiler error on windows) * Improved tokenizer test But does it work on MacOS? * Improve token type support - Added @klosax code to convert.py - Improved token type support in vocabulary * Exclude platform dependent tests * More sentencepiece compatibility by eliminating magic numbers * Restored accidentally removed comment * Improve commentary * Use token type API in test-tokenizer-1.cpp * py : cosmetics * readme : add notice about new file format ggml-ci --------- Co-authored-by: M. Yusuf Sarıgöz <[email protected]> Co-authored-by: klosax <[email protected]> Co-authored-by: goerch <[email protected]> Co-authored-by: slaren <[email protected]> Co-authored-by: Kerfuffle <[email protected]>

ggerganov added 2 commits August 16, 2023 14:37

gguf : better type names

d72a23e

dedup : CPU + Metal is working

1891c92

ggerganov changed the title ~~Gguf deduplicate~~ gguf : deduplicate Aug 16, 2023

ggml : fix warnings about unused results

dd6eaa3

ggerganov mentioned this pull request Aug 16, 2023

GGUF #2398

Merged

34 tasks

llama.cpp : fix line feed and compiler warning

a499313

llama : fix strncpy warning + note token_to_str does not write null

5b94b14

llama : restore the original load/save session implementation

6412e97

Will migrate this to GGUF in the future

convert-llama-h5-to-gguf.py : support alt ctx param name

f3e90f2

ggerganov added 2 commits August 16, 2023 19:19

ggml : assert when using ggml_mul with non-F32 src1

c290f3e

examples : dedup simple

795ec70

ggerganov merged commit 88b5769 into gguf Aug 16, 2023

ggerganov deleted the gguf-deduplicate branch August 16, 2023 16:25

cebtenzzre reviewed Aug 17, 2023

View reviewed changes

gguf : deduplicate #2629

gguf : deduplicate #2629

Uh oh!

Conversation

ggerganov commented Aug 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov commented Aug 16, 2023

Uh oh!

monatis commented Aug 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

klosax commented Aug 16, 2023

Uh oh!

klosax commented Aug 16, 2023

Uh oh!

ggerganov commented Aug 16, 2023

Uh oh!

klosax commented Aug 16, 2023

Uh oh!

klosax commented Aug 16, 2023

Uh oh!

ggerganov commented Aug 16, 2023

Uh oh!

klosax commented Aug 16, 2023

Uh oh!

klosax commented Aug 16, 2023

Uh oh!

ggerganov commented Aug 16, 2023

Uh oh!

klosax commented Aug 16, 2023

Uh oh!

ggerganov commented Aug 16, 2023

Uh oh!

klosax commented Aug 16, 2023

Uh oh!

ggerganov commented Aug 16, 2023

Uh oh!

monatis commented Aug 16, 2023

Uh oh!

ggerganov commented Aug 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

klosax commented Aug 16, 2023

Uh oh!

klosax commented Aug 16, 2023

Uh oh!

klosax commented Aug 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

klosax commented Aug 16, 2023

Uh oh!

klosax commented Aug 16, 2023

Uh oh!

klosax commented Aug 16, 2023

Uh oh!

ggerganov commented Aug 16, 2023

Uh oh!

Green-Sky commented Aug 16, 2023

Uh oh!

klosax commented Aug 16, 2023

Uh oh!

Green-Sky commented Aug 16, 2023

Uh oh!

goerch commented Aug 17, 2023

Uh oh!

klosax commented Aug 17, 2023

Uh oh!

monatis commented Aug 17, 2023

Uh oh!

klosax commented Aug 17, 2023

Uh oh!

ggerganov commented Aug 17, 2023

Uh oh!

klosax commented Aug 17, 2023

Uh oh!

goerch commented Aug 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

monatis commented Aug 17, 2023

ggerganov commented Aug 16, 2023 •

edited

Loading

monatis commented Aug 16, 2023 •

edited

Loading

ggerganov commented Aug 16, 2023 •

edited

Loading

klosax commented Aug 16, 2023 •

edited

Loading

goerch commented Aug 17, 2023 •

edited

Loading

ggerganov commented Aug 17, 2023 •

edited

Loading