During my work on DeepSeek-V2 I noticed a failing assert:
https://github.com/ggerganov/llama.cpp/blob/9afdffe70ebf3166d429b4434783bb0b7f97bdeb/llama.cpp#L4923
Since n_embd_gqa is set to n_embd_v_gqa this assert only works for models where n_embd_k_gqa == n_embd_v_gqa, that is when n_embd_head_k == n_embd_head_v - and it fails when n_embd_head_k != n_embd_head_v. Is this intentional behavior?