Tracking issue for WebAssembly SIMD support

I'm opening this as a tracking issue for the SIMD intrinsics in the `{std,core}::arch::wasm32` module. Eventually we're going to want to stabilize these intrinsics for the WebAssembly target, so I think it's good to have a canonical place to talk about them! I'm also going to update the `#![unstable]` annotations to point to this issue to direct users here if they want to use these intrinsics.

The WebAssembly [simd proposal](https://github.com/webassembly/simd) is [currently in "phase 3"](https://github.com/webassembly/proposals). I would say that we probably don't want to consider stabilizing these intrinsics until the proposal has at least reached "phase 4" where it's being standardized, because there are still changes to the proposal happening over time (small ones at this point, though). As a brief overview, the WebAssembly simd proposal adds a new type, `v128`, and a suite of instructions to perform data processing with this type. The intention is that this is readily portable to a lot of architectures so usage of SIMD can be fast in lots of places.

For rust stabilization purposes the code for all these intrinsics lives in the rust-lang/stdarch git repository. All code lives in `crates/core_arch/src/wasm32/simd128.rs`. I've got a [large refactoring and sync](https://github.com/rust-lang/stdarch/pull/874) queued up for that module, so I'm going to be writing this issue with the assumption that it will land mostly as designed there.

Currently the design principles for the SIMD intrinsics are:

* Like the existing `memory_size`, `memory_grow` and `unreachable` intrinsics, most intrinsics are named after the instruction that it represents. There is generally a 1:1 mapping with new instructions added to WebAssembly and intrinsics in the module.
* The type signature of each intrinsic is intended to match [the textual description of each intrinsic](https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md)
* Each intrinsic has `#[target_feature(enable = "simd128")]` which forces them all to be `unsafe`
* Some gotchas for specific intrinsics are:
  * `v128.const` is exposed through a suite of `const` functions, one for each vector type (but not unsigned, just signed integers). Additionally the arguments are not actually required to be constant, so it's expected that the compiler will make the best choice about how to generate a runtime vector.
  * Instructions using lane indices, such as `v8x16_shuffle` and `*_{extract,replace}_lane` use const generics to represent constant arguments. This is different from x86_64 which uses the older `#[rustc_args_required_const]` attribute.
  * Shuffles are provided for `v16x8`, `v32x4`, and `v64x2` as conveniences instead of only providing `v8x16_shuffle`. All of them are implemented in terms of the `v8x16.shuffle` instruction, however.
* There is a singular `v128` type, not a type for each size of vector that intrinsics operate with
* The `extract_lane` intrinsics return the value type associated with the intrinsic name, they do not all return `i32` unlike the actual WebAssembly instruction. This means that we do not have `extract_lane_s` and `extract_lane_u` intrinsics because the compiler will select the appropriate one depending on the context.

It's important to note that [clang has an implementation of these intrinsics in the `wasm_simd128.h` header](https://github.com/llvm/llvm-project/blob/master/clang/lib/Headers/wasm_simd128.h). The current design of the Rust `wasm32` module is different in that:

* The prefix `wasm_*` isn't used.
* Only one datatype, `v128`, is exposed instead of types for each size/kind of vector
* Naming can be different depending on the intrinsic. For example clang has `wasm_i16x8_load_8x8` and `wasm_u16x8_load_8x8` while Rust has `i16x8_load8x8_s` and `i16x8_load8x8_u`.

Most of these differences are largely stylistic, but there are some that are conveniences (like other forms of shuffles) which might be nice to expose in Rust as well. All the conveniences still compile down to one instruction, it's just different how users specify in code how the instruction is generated. I believe it should be possible for conveniences to live outside the standard library as well, however.

## How SIMD will be used

If the SIMD proposal were to move to stage 4 today I think we're in a really good spot for stabilization. https://github.com/rust-lang/rust/issues/74320 is a pretty serious bug we will want to fix before full stabilization but I don't believe the fix will be hard to land in LLVM (I've already talked with some folks on that side).

Other than that SIMD-in-wasm is different from other platforms where a binary with SIMD will refuse to run on engines that do not have SIMD support. In that sense there is no runtime feature detection available to SIMD consumers. (at least not natively)

After https://github.com/rust-lang/stdarch/pull/874 lands programs will simply use `#[target_feature(enable = "...")]` or `RUSTFLAGS` and everything should work. The SIMD intrinsics will always be exposed from the standard library (but the standard library itself will not use them) and available to users. If programs don't use the intrinsics then SIMD won't get emitted, otherwise when used the binary will use `v128`. 

## Open Questions

A set of things we'll need to settle on before stabilizing (and this will likely expand over time) is:

* [ ] Handle the difference between Clang and Rust. This could come in a number of forms such as accepting the difference or trying to unify the two. Either way the standard itself, unlike for x86, does not nor do I think will it provide a standard convention of how to expose these instructions in languages.
* [x] Audit and confirm the types of pointers in various `*_load_*` and `*_store_*` instructions. Primarily the instructions that load 64 bits (8x8, 16x4, ...) I'm unsure of on the types of their pointer arguments.
* [x] Figure out if the usage of const generics is ok for `v8x16_shuffle` and lane managment instructions.
* [ ] Confirm the deviation of not having `i8x16_extract_lane_s` is ok (e.g. having `i8x16_extract_lane` returning `i8` is all we need), same for `i16x8`.
* [ ] Consider relaxing `#[target_feature]` "requires unsafe" rules for these WebAssembly intrinsics. Intrinsic like `f32x4_splat` have no fundamental reason they need to be `unsafe`. The only reason they're unsafe is because `#[target_feature]` is used on them to ensure that SIMD instructions are generated in LLVM.
* [x] Consider switching `*_{any,all}_true` to returning a `bool`
* [x] A general audit of intrinsic names and signatures to ensure they match the specification.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tracking issue for WebAssembly SIMD support #74372

How SIMD will be used

Open Questions

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Tracking issue for WebAssembly SIMD support #74372

Description

How SIMD will be used

Open Questions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions