You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When offloading to iGPU UHD 770 in a docker from https://github.com/mudler/LocalAI (b2128) llama.cpp crashes with the following error:
The number of work-items in each dimension of a work-group cannot exceed {512, 512, 512} for this device -54 (PI_ERROR_INVALID_WORK_GROUP_SIZE)Exception caught at file:/build/backend/cpp/llama/llama.cpp/ggml-sycl.cpp, line:12708
From trial and error it happens if I have number of tokens predicted >256. I mean that if I limit the tokens with 256 it does not happen.
Tested with multiple 7b mistral models with both Q6 and Q8 quantization