You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Loading the Llama 2 - 70B model from TheBloke with rustformers/llm seems to work but fails on inference.
llama.cpp raises an assertion regardless of the use_gpu option :
Loading of model complete
Model size = 27262.60 MB / num tensors = 723
[2023-07-29T14:24:19Z INFO actix_server::builder] starting 10 workers
[2023-07-29T14:24:19Z INFO actix_server::server] Actix runtime found; starting in Actix runtime
GGML_ASSERT: llama-cpp/ggml.c:6192: ggml_nelements(a) == ne0*ne1*ne2
This might be related to the model files, but the models from TheBloke are usually reliable.
Running on MacBook Pro M1 Max 32 GB RAM.
macOS 14.0.0 23A5301g