Skip to content

SYCL backend error PI_ERROR_INVALID_WORK_GROUP_SIZE on iGPU UHD 770 #5467

@fakezeta

Description

@fakezeta

When offloading to iGPU UHD 770 in a docker from https://github.com/mudler/LocalAI (b2128) llama.cpp crashes with the following error:

The number of work-items in each dimension of a work-group cannot exceed {512, 512, 512} for this device -54 (PI_ERROR_INVALID_WORK_GROUP_SIZE)Exception caught at file:/build/backend/cpp/llama/llama.cpp/ggml-sycl.cpp, line:12708

From trial and error it happens if I have number of tokens predicted >256. I mean that if I limit the tokens with 256 it does not happen.

Tested with multiple 7b mistral models with both Q6 and Q8 quantization

Intel oneAPI version 2024.0

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions