examples: fix utf8 decoding error #5935

zhangfuwen · 2024-03-08T09:54:04Z

some models have a tokenizer that decodes an id into an incomplete utf8 sequence, need to validate and wait for next token
one example would be: https://huggingface.co/Qwen/Qwen1.5-1.8B-Chat-GGUF/resolve/main/qwen1_5-1_8b-chat-q4_0.gguf and and an example of the token is 18137

some models have a tokenizer that decodes an id into an incomplete utf8 sequence, need to validate and wait for next token one example would be: https://huggingface.co/Qwen/Qwen1.5-1.8B-Chat-GGUF/resolve/main/qwen1_5-1_8b-chat-q4_0.gguf and and an example of the token is 18137

* examples: fix utf8 decoding error some models have a tokenizer that decodes an id into an incomplete utf8 sequence, need to validate and wait for next token one example would be: https://huggingface.co/Qwen/Qwen1.5-1.8B-Chat-GGUF/resolve/main/qwen1_5-1_8b-chat-q4_0.gguf and and an example of the token is 18137 * android : minor --------- Co-authored-by: zhangfuwen <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>

zhangfuwen marked this pull request as ready for review March 8, 2024 09:55

android : minor

13d21fa

ggerganov approved these changes Mar 10, 2024

View reviewed changes

ggerganov merged commit 7ab7b73 into ggml-org:master Mar 10, 2024

Provide feedback