-
Notifications
You must be signed in to change notification settings - Fork 14.2k
Fix more int overflow during quant (PPL/CUDA). #6563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix more int overflow during quant (PPL/CUDA). #6563
Conversation
|
It would be good to have a set of tests in |
|
Given blockIdx.x, blockDim.x, and threadIdx.x are all basically uint32_t, we could keep some of those as uint32_t and only cast them to uint64_t or int64_t when actually necessary. |
|
Edit: Ignore below. I was using the wrong environment. Retesting.
|
There are two disadvantages with 64 bit integers over 32 bit integers: they need 2 registers and they are slower. But for dequantize kernels I would intuitively assume that this is not going to matter because you need very few registers and you're going to be heavily IO bound anyways. So for simplicity I would say to just use 64 bits throughout unless someone can demonstrate that this actually makes a performance difference (I'm not seeing any performance difference on my RTX 3090, my other GPUs are currently busy). |
|
Just saw @JohannesGaessler's comment (after I pushed the revert). I can revert the revert if decided to be the right approach. |
|
I personally would in this case prefer to just consistently use 64 bit ints, but ultimately I would say either way is fine. The biggest issue would have been the additional effort from actually changing the code but this has already been done anyways. |
|
I completely forgot about this PR. @slaren even without the tests, do you think we should just merge it, given that it seems to fix the issue for at least one backend? |
|
Yes absolutely, we should merge this now if it solves the immediate problem. The changes look good to me. |
ggml-cuda.cu
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, why is this being changed? I thought the problem was that certain ints had too few bits for large models.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you maybe, in response to one of my earlier comments, accidentally change more places than just the ones originally touched in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JohannesGaessler This one was reverted following an earlier comment questionning why it was changed in the first place. As previously mentioned, I have limited knowledge about these vars and rely on others expertise for the review. And because of the large number of ints that was overflowing, I had to guess and change them in batches until all the crashes were fixed, but surely I most likely changed more than needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine to change more int to int64_t than necessary. But this is a change where a value was int64_t on master to int with your PR. I think this was done on accident when you reverted some of your other changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is because my previous PR was merged into master, this is a subsequent PR. I can revert them back if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The revert is in a single commit dranger003@9acb43d so if these are all fine I can delete that one commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just delete the commit I'd say. Using int64_t has no disadvantages other than maybe slightly worse performance and I was not able to measure any performance difference whatsoever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed a rebase to remove the revert commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this particular change is still there. Revert it and I'll merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cublasGemmEx takes an int anyway, so this doesn't really matter. There is a 64-bit interface to cublas, but I don't think there are any cases where a single dimension is larger than 2^31-1.
ggml-cuda.cu
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question, why the int64_t -> int change?
ggml-cuda/convert.cu
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question.
ggml-cuda/convert.cu
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question.
ggml-cuda/convert.cu
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question.
ggml-cuda/convert.cu
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These were all originally int and I reverted them to avoid changing more than needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No they were not. Go to the "files changed" tab and look at the combined changes of all of your commits relative to master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed them in PR #6491.
4947778 to
91c10ef
Compare
|
Closes #6948. |




Running perplexity on Command-R+ using CUDA is currently broken without this commit (more info here #6491 (comment)).
Although perplexity now works with all tested quants, I may have move some extra vars to
int64_tthan needed.