Skip to content

Conversation

@Sa4dUs
Copy link
Contributor

@Sa4dUs Sa4dUs commented Dec 27, 2025

Allows modifying the workgroup and thread grid dimensions directly from the intrinsic call.

core::intrinsics::offload(_kernel_1, [256, 1, 1], [32, 1, 1], (x,))

r? @ZuseZ4

@rustbot
Copy link
Collaborator

rustbot commented Dec 27, 2025

The rustc-dev-guide subtree was changed. If this PR only touches the dev guide consider submitting a PR directly to rust-lang/rustc-dev-guide otherwise thank you for updating the dev guide with your changes.

cc @BoxyUwU, @jieyouxu, @Kobzol, @tshepang

Some changes occurred to the intrinsics. Make sure the CTFE / Miri interpreter
gets adapted for the changes, if necessary.

cc @rust-lang/miri, @RalfJung, @oli-obk, @lcnr

@rustbot rustbot added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-rustc-dev-guide Area: rustc-dev-guide labels Dec 27, 2025
@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Dec 27, 2025
@ZuseZ4 ZuseZ4 added the F-gpu_offload `#![feature(gpu_offload)]` label Dec 27, 2025
@ZuseZ4
Copy link
Member

ZuseZ4 commented Dec 27, 2025

@kevinsala it would be nice to have an example in our docs where giving the same dimensions as runtime value (e.g. via command line) vs giving them at compile time (like in the example here) have some measurable perf difference. I assume it's hard to artificially come up with an example exactly for that, or do you know how to get one?

@kevinsala
Copy link

@ZuseZ4 I'll try to find an example.

@Sa4dUs Sa4dUs force-pushed the offload-intrinsic2 branch from efcf026 to 330d170 Compare December 28, 2025 10:05
@ZuseZ4
Copy link
Member

ZuseZ4 commented Dec 30, 2025

As per discussion with offload devs, this should be u32 not i32, otherwise lgtm for now.

Once we have a proper macro frontend for our intrinsic we can also consider changing it to [Option<NonZeroU32>;3], but until then, the interface would be way to cumbersome to call manually and [u32;3] is the best approximation. Certainly better than having the hardcoded values from one benchmark.

@Sa4dUs Sa4dUs force-pushed the offload-intrinsic2 branch from 330d170 to 6e0d0da Compare December 30, 2025 21:24
@rustbot

This comment has been minimized.

@Sa4dUs Sa4dUs force-pushed the offload-intrinsic2 branch from 6e0d0da to 33d39a9 Compare December 31, 2025 11:35
@rustbot

This comment has been minimized.

@ZuseZ4 ZuseZ4 mentioned this pull request Dec 31, 2025
5 tasks
@Sa4dUs Sa4dUs force-pushed the offload-intrinsic2 branch from 33d39a9 to 58e2610 Compare January 2, 2026 10:50
@rustbot
Copy link
Collaborator

rustbot commented Jan 2, 2026

This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

@ZuseZ4
Copy link
Member

ZuseZ4 commented Jan 2, 2026

Thanks @oli-obk!

As per LLVM side discussion, having the kernel launch dimensions at compile time isn't always beneficial. There was a recent case where LLVM actually generated worse code when it was given those dimensions at compile time rather than runtime. Compiler heuristics are fun. So we'll ignore that topic here, and I'll just post a note to our design overview in the dev guide.

@bors r+

@bors
Copy link
Collaborator

bors commented Jan 2, 2026

📌 Commit 58e2610 has been approved by ZuseZ4

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 2, 2026
bors added a commit that referenced this pull request Jan 2, 2026
…uwer

Rollup of 6 pull requests

Successful merges:

 - #150425 (mapping an error from cmd.spawn() in npm::install)
 - #150444 (Expose kernel launch options as offload intrinsic args)
 - #150495 (Correct hexagon "unwinder_private_data_size")
 - #150578 (Fix a typo in the docs of AsMut for #149609)
 - #150581 (mir_build: Separate match lowering for string-equality and scalar-equality)
 - #150594 (Fix typo in the docs of `CString::from_vec_with_nul`)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 58f5089 into rust-lang:main Jan 2, 2026
11 checks passed
@rustbot rustbot added this to the 1.94.0 milestone Jan 2, 2026
rust-timer added a commit that referenced this pull request Jan 2, 2026
Rollup merge of #150444 - Sa4dUs:offload-intrinsic2, r=ZuseZ4

Expose kernel launch options as offload intrinsic args

Allows modifying the workgroup and thread grid dimensions directly from the intrinsic call.

```rust
core::intrinsics::offload(_kernel_1, [256, 1, 1], [32, 1, 1], (x,))
```

r? `@ZuseZ4`
github-actions bot pushed a commit to rust-lang/rustc-dev-guide that referenced this pull request Jan 2, 2026
…uwer

Rollup of 6 pull requests

Successful merges:

 - rust-lang/rust#150425 (mapping an error from cmd.spawn() in npm::install)
 - rust-lang/rust#150444 (Expose kernel launch options as offload intrinsic args)
 - rust-lang/rust#150495 (Correct hexagon "unwinder_private_data_size")
 - rust-lang/rust#150578 (Fix a typo in the docs of AsMut for rust-lang/rust#149609)
 - rust-lang/rust#150581 (mir_build: Separate match lowering for string-equality and scalar-equality)
 - rust-lang/rust#150594 (Fix typo in the docs of `CString::from_vec_with_nul`)

r? `@ghost`
`@rustbot` modify labels: rollup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-rustc-dev-guide Area: rustc-dev-guide F-gpu_offload `#![feature(gpu_offload)]` S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants