-
-
Notifications
You must be signed in to change notification settings - Fork 14.3k
Add checks for gpu-kernel calling conv #149991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
rustbot has assigned @WaffleLapkin. Use |
b6cb0c2 to
0a9373c
Compare
|
This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers. |
0a9373c to
1e9b1dc
Compare
This comment has been minimized.
This comment has been minimized.
1e9b1dc to
10b32d6
Compare
This comment has been minimized.
This comment has been minimized.
10b32d6 to
88ad16c
Compare
|
r? workingjubilee As I'm completely missing context |
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The AST-level code looks good.
Some details on messaging here. I'm not committed to a precise message on these, which is why it's a bit "multiple choice" here, just wondering if these could be improved. In one or two cases it is a must-change.
Should we be enforcing a maximum number of arguments, also? Probably not if there's no cross-driver consensus on that, but maybe?
| /// This lint is issued when it detects a probable mistake in a signature. | ||
| IMPROPER_GPU_KERNEL_ARG, | ||
| Warn, | ||
| "simple arguments of gpu-kernel functions" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line should capture the reason for the lint, not what it is checking for, so something like
| "simple arguments of gpu-kernel functions" | |
| "GPU kernel entry points have a limited calling convention" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I’ll call it ABI instead of calling convention, it seems like Rust calls it ABI in most places. (Unless there’s a difference I’m missing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically ABI is a superset of calling convention, notionally.
I'm cool with people abusing the term a bit when it's clear what it means from context and is more concise, such as in diagnostic messaging here.
|
Reminder, once the PR becomes ready for a review, use |
88ad16c to
01f1e1b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review!
Some context on the internal workings:
- On the CPU side, a program passes arguments to a kernel
- The “API” takes these arguments and writes them into GPU memory
- The kernel on the GPU gets a pointer to this memory
- When the kernel accesses arguments, it reads from this memory
(I think nvidia and amd work the same here. I’m not too familiar with nvidia, but this seems to suggest so: https://github.com/Rust-GPU/rust-cuda/blob/44c44baf6fb738d5ffec25aac5db8af02514e890/crates/rustc_codegen_nvvm/src/abi.rs#L60)
So, number of arguments or size of arguments doesn’t really matter, it’s all memory anyways.
And, we could make struct arguments work (maybe, I didn’t look into the details), but Rust would need to take them by value, currently it changes them to pass by pointer.
| /// This lint is issued when it detects a probable mistake in a signature. | ||
| IMPROPER_GPU_KERNEL_ARG, | ||
| Warn, | ||
| "simple arguments of gpu-kernel functions" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I’ll call it ABI instead of calling convention, it seems like Rust calls it ABI in most places. (Unless there’s a difference I’m missing)
This comment has been minimized.
This comment has been minimized.
6e7c9a0 to
6868f66
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some context on the internal workings:
- On the CPU side, a program passes arguments to a kernel
- The “API” takes these arguments and writes them into GPU memory
- The kernel on the GPU gets a pointer to this memory
- When the kernel accesses arguments, it reads from this memory
Ah, yeah. I know of the general idea here, though I am foggy on specifics in many cases.
And even in some hypothetical where the driver isn't writing them into the GPU memory, it would have to invoke some dark magic on the GPU to put it directly into registers anyways, which is just saying "did you know: some computer hardware has memory-mapped registers?"
( I only am making that comment because I vaguely remember one GPU driver involving something similar. )
At some point we are left with doing codegen to accept arguments and we can expect that to have some target-specific nuances, even if it's just stack alignment.
For structs, I'd rather we avoid thinking too hard about the struct question for cases where they aren't just repr(transparent) primitives until we have tinkered with things a bit more. If repr(C) structs are widely supported and fully intentionally that could be worth tackling.
| | ty::Slice(_) | ||
| | ty::Str | ||
| | ty::Tuple(_) | ||
| | ty::UnsafeBinder(_) => false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the answer for ty::UnsafeBinder should be the recursive answer to this question by default (I think the implied result is quite useless given the answer to other things, but that's fine).
Thus this should... probably use an impl of TypeFolder to apply the recursive traversal, I think? I do not consider that required, but I thought I should note it because this seems like a sorta-classic ..super_fold.. case, where we want to only consider the "true" type instead of binders and such: https://rustc-dev-guide.rust-lang.org/ty-fold.html
| | ty::CoroutineClosure(_, _) | ||
| | ty::CoroutineWitness(..) | ||
| | ty::Dynamic(_, _) | ||
| | ty::Error(_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should either return or continue if the type is so erroneous.
| | ty::Infer(_) | ||
| | ty::Never | ||
| | ty::Param(_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
normalizing should have handled Infer and Param cases I think?
The `gpu-kernel` calling convention has several restrictions that were not enforced by the compiler until now. Add the following restrictions: 1. Cannot be async 2. Cannot be called 3. Cannot return values, return type must be `()` or `!` 4. Arguments should be primitives, i.e. passed by value. More complicated types can work when you know what you are doing, but it is rather unintuitive, one needs to know ABI/compiler internals. 5. Export name should be unmangled, either through `no_mangle` or `export_name`. Kernels are searched by name on the CPU side, having a mangled name makes it hard to find and probably almost always unintentional.
|
I tried rewriting the lint pass using |
The
gpu-kernelcalling convention has several restrictions that were not enforced by the compiler until now.Add the following restrictions:
()or!no_mangleorexport_name. Kernels are searched by name on the CPU side, having a mangled name makes it hard to find and probably almost always unintentional.Tracking issue: #135467
amdgpu target tracking issue: #135024
@workingjubilee, these should be all the restrictions we talked about a year ago.
cc @RDambrosio016 @kjetilkjeka for nvptx