ggml-cuda: fixed assertion in ggml_cuda_cpy (#18140, #18341)#18433

Meet91721 · 2025-12-28T18:18:55Z

The return type of ggml_nbytes function is size_t and in the assertion we were comparing it with the INT_MAX, for this reason the assertion is failing.

Replacing INT_MAX with SIZE_MAX will make the comparison correct.
I have tested this change on NVIDIA L4 GPU with over 131k tokens (as suggested in this issue: #18140) and it was working.
This PR solves these issues: #18140, #18341

Meet91721 · 2025-12-28T18:41:21Z

Hi @am17an / @JohannesGaessler,
This is my first PR to llama.cpp, could you please review this change?

JohannesGaessler · 2025-12-28T19:14:44Z

This is not correct because the pointers inside the kernel are being calculated as signed 32 bit integers, that is why these asserts exist in the first place.

Meet91721 · 2025-12-29T08:24:07Z

Ok, so the data types of the arguments of other functions that accepts ne and nb should be converted int64_t, correct?
Also, I have another doubt, in functions like ggml_cpy_scalar_contiguous_cuda the following is present:

    const int64_t num_blocks = (ne + CUDA_CPY_BLOCK_SIZE - 1) / CUDA_CPY_BLOCK_SIZE;
    cpy_scalar_contiguous<src_t, dst_t><<<num_blocks, CUDA_CPY_BLOCK_SIZE, 0, stream>>>
        (cx, cdst, ne);

We are passing, num_blocks which is of type int64_t, however dim3 expects uint3, so implicit type casting happens. Is that safe even when ne is much larger?

JohannesGaessler · 2025-12-29T11:12:53Z

It's fine to add an assert to check that the number of CUDA blocks is still representable as an unsigned 32 bit integer. As it is a copy of that size would require significantly more memory than would be available even on the biggest GPUs that are currently on the market.

Meet91721 · 2025-12-29T12:47:49Z

Hi Johannes, I have added asserts.

Meet91721 · 2025-12-31T12:43:29Z

Hi @JohannesGaessler / @am17an , can you confirm if these changes are enough?
I have tested on the latest commit, and it is working fine.

Meet91721 · 2026-01-01T22:46:45Z

Hi @CISC / @JohannesGaessler ,
I think the reason why ubuntu-cpu-cmake-riscv64-native failed is because the Qwen3 coder did not generate proper JSON response for tool calling(line 85975) and not because of the changes that were made in this pr, if that is true can we merge this pr?

CISC · 2026-01-01T23:17:41Z

Hi @CISC / @JohannesGaessler , I think the reason why ubuntu-cpu-cmake-riscv64-native failed is because the Qwen3 coder did not generate proper JSON response for tool calling(line 85975) and not because of the changes that were made in this pr, if that is true can we merge this pr?

Nah, it failed because some of the RISCV64 runners don't have git lfs initialized (vocabs test fails, but you can only see that by downloading the whole log). @alitariq4589 Please look into this.

…gml-org#18433) * ggml-cuda: fixed assertion in ggml_cuda_cpy (ggml-org#18140) * ggml-cuda: changes in data types to int64_t * ggml-cuda: added asserts for CUDA block numbers * ggml-cuda: changed the condition for y and z dimension

alitariq4589 · 2026-01-02T10:49:31Z

Hi @CISC / @JohannesGaessler , I think the reason why ubuntu-cpu-cmake-riscv64-native failed is because the Qwen3 coder did not generate proper JSON response for tool calling(line 85975) and not because of the changes that were made in this pr, if that is true can we merge this pr?

Nah, it failed because some of the RISCV64 runners don't have git lfs initialized (vocabs test fails, but you can only see that by downloading the whole log). @alitariq4589 Please look into this.

Taking a look.

…#18433) * ggml-cuda: fixed assertion in ggml_cuda_cpy (#18140) * ggml-cuda: changes in data types to int64_t * ggml-cuda: added asserts for CUDA block numbers * ggml-cuda: changed the condition for y and z dimension

ggml-cuda: fixed assertion in ggml_cuda_cpy (ggml-org#18140)

a581354

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Dec 28, 2025

loci-dev mentioned this pull request Dec 28, 2025

UPSTREAM PR #18433: ggml-cuda: fixed assertion in ggml_cuda_cpy (#18140, #18341) auroralabs-loci/llama.cpp#728

Open

ggml-cuda: changes in data types to int64_t

acf30c3

ggml-cuda: added asserts for CUDA block numbers

b7710d1

CISC reviewed Dec 31, 2025

View reviewed changes

Comment thread ggml/src/ggml-cuda/cpy.cu Outdated

ggml-cuda: changed the condition for y and z dimension

ac5597a

Meet91721 requested a review from CISC December 31, 2025 17:03

JohannesGaessler approved these changes Dec 31, 2025

View reviewed changes

CISC merged commit e86f3c2 into ggml-org:master Jan 1, 2026
70 of 71 checks passed

alitariq4589 mentioned this pull request Jan 4, 2026

Initialized git lfs in every test for RISC-V CI #18590

Merged

This was referenced Jan 9, 2026

Synchronize llama/#18433 (cuda : fix copy of large tensors (ggml_nbytes <= INT_MAX assertion)) ggml-org/ggml#1413

Closed

feat: add wan2.1/2.2 support leejet/stable-diffusion.cpp#778

Merged

ivaigult mentioned this pull request Apr 18, 2026

Crash (SIGABRT) with Ministral-3 14B + Parallel 8: GGML_ASSERT(ggml_nbytes(src0) <= INT_MAX) failed in cpy.cu ollama/ollama#13887

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml-cuda: fixed assertion in ggml_cuda_cpy (#18140, #18341)#18433

ggml-cuda: fixed assertion in ggml_cuda_cpy (#18140, #18341)#18433
CISC merged 4 commits intoggml-org:masterfrom
Meet91721:fix/assert

Meet91721 commented Dec 28, 2025

Uh oh!

Meet91721 commented Dec 28, 2025

Uh oh!

JohannesGaessler commented Dec 28, 2025

Uh oh!

Meet91721 commented Dec 29, 2025

Uh oh!

JohannesGaessler commented Dec 29, 2025

Uh oh!

Meet91721 commented Dec 29, 2025

Uh oh!

Meet91721 commented Dec 31, 2025

Uh oh!

Uh oh!

Meet91721 commented Jan 1, 2026

Uh oh!

CISC commented Jan 1, 2026

Uh oh!

Uh oh!

alitariq4589 commented Jan 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Meet91721 commented Dec 28, 2025

Uh oh!

Meet91721 commented Dec 28, 2025

Uh oh!

JohannesGaessler commented Dec 28, 2025

Uh oh!

Meet91721 commented Dec 29, 2025

Uh oh!

JohannesGaessler commented Dec 29, 2025

Uh oh!

Meet91721 commented Dec 29, 2025

Uh oh!

Meet91721 commented Dec 31, 2025

Uh oh!

Uh oh!

Meet91721 commented Jan 1, 2026

Uh oh!

CISC commented Jan 1, 2026

Uh oh!

Uh oh!

alitariq4589 commented Jan 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants