ggml-cuda: fixed assertion in ggml_cuda_cpy (#18140, #18341)#18433
ggml-cuda: fixed assertion in ggml_cuda_cpy (#18140, #18341)#18433CISC merged 4 commits intoggml-org:masterfrom
Conversation
|
Hi @am17an / @JohannesGaessler, |
|
This is not correct because the pointers inside the kernel are being calculated as signed 32 bit integers, that is why these asserts exist in the first place. |
|
Ok, so the data types of the arguments of other functions that accepts We are passing, |
|
It's fine to add an assert to check that the number of CUDA blocks is still representable as an unsigned 32 bit integer. As it is a copy of that size would require significantly more memory than would be available even on the biggest GPUs that are currently on the market. |
|
Hi Johannes, I have added asserts. |
|
Hi @JohannesGaessler / @am17an , can you confirm if these changes are enough? |
|
Hi @CISC / @JohannesGaessler , |
Nah, it failed because some of the RISCV64 runners don't have git lfs initialized (vocabs test fails, but you can only see that by downloading the whole log). @alitariq4589 Please look into this. |
…gml-org#18433) * ggml-cuda: fixed assertion in ggml_cuda_cpy (ggml-org#18140) * ggml-cuda: changes in data types to int64_t * ggml-cuda: added asserts for CUDA block numbers * ggml-cuda: changed the condition for y and z dimension
Taking a look. |
…#18433) * ggml-cuda: fixed assertion in ggml_cuda_cpy (#18140) * ggml-cuda: changes in data types to int64_t * ggml-cuda: added asserts for CUDA block numbers * ggml-cuda: changed the condition for y and z dimension
The return type of ggml_nbytes function is
size_tand in the assertion we were comparing it with theINT_MAX, for this reason the assertion is failing.Replacing
INT_MAXwithSIZE_MAXwill make the comparison correct.I have tested this change on NVIDIA L4 GPU with over 131k tokens (as suggested in this issue: #18140) and it was working.
This PR solves these issues: #18140, #18341