Add `bf16`, `f64f64` and `f80` types #3456

ecnelises · 2023-07-10T16:48:54Z

Previous RFC #3451 mixes proposal for IEEE-754 compliant f16/f128 and such non-standard types, split this off from it to focus on the target related ones.

This revision also contains comments addressed from reviewers in RFC rust-lang#3451.

VitWW · 2023-07-10T17:21:04Z

f64f64 is a bad name for type. Maybe fxd64 (float extended double 64) is better.

Having unified rule for naming is a benefit. For example,

it must starts with f:
common letter must follow f, for example x - extended, n - non-standard, c - custom, a - alternative
And we get
fx80
fxd64
fxb16

ecnelises · 2023-07-10T17:42:07Z

Since f80 is x86-only and f64f64 is PowerPC-only, x86_f80 and ppc_f128 look clear and consistent with LLVM. If leading letter should be f, f80_x86 and f128_ppc ... are okay but a little weird? Especially for f128_ppc, maybe confusing with f128. Or f128xppc, but f80xx86 is a bad name.

programmerjake · 2023-07-10T18:23:16Z

text/add-bf16-f64f64-and-f80-type.md

+# Drawbacks
+[drawbacks]: #drawbacks
+
+`bf16` is not a IEEE-754 standard type, so adding it as primitive type may break existing consistency for builtin float types. The truncation after calculation on targets not supporting `bf16` natively also breaks how Rust treats precision loss in other cases.


correctly rounding bf16 can be relatively easily implemented. bf16 add/sub/mul/div/sqrt can then just convert to f32, do a single operation, and round to bf16, which will always give the correct bf16 result.
round to bf16 code (not tested):

fn f32_to_bf16(v: f32) -> bf16 { let b32 = v.to_bits(); bf16::from_bits(if v.is_nan() { (b32 >> 16) as u16 } else if b32 & 0xFFFF == 0x8000 { let b16 = (b32 >> 16) as u16; b16.wrapping_add(b16 & 1) } else { (b32.wrapping_add(0x8000) >> 16) as u16 }) }

round to bf16 code staying entirely in SSE registers on x86_64 (also untested):
https://rust.godbolt.org/z/85Ks9sPP6

The code you provided looks like direct truncation. I need to confirm if the rounding behavior is fixed (tozero? tonearest? toinfinity?) or depending on system rounding mode.

Also, clang provides an option -fbfloat16-excess-precision to specify the 'merging' behavior of bfloat operations. For example, will the intermediate result of a-b+c be rounded? But I think that's not an issue for Rust, the value should be none (no merging will be performed).

The code you provided looks like direct truncation.

it is round to nearest, ties to even. truncation (round towards zero) would be only:
bf16::from_bits((f32::to_bits(v) >> 16) as u16)

I need to confirm if the rounding behavior is fixed (tozero? tonearest? toinfinity?) or depending on system rounding mode.

LLVM assumes the rounding mode is round to nearest, ties to even, unless you use the constrained fp intrinsics that rustc doesn't support (yet?).

text/add-bf16-f64f64-and-f80-type.md

programmerjake · 2023-07-10T18:28:54Z

I think bfloat16 or bf16 are the only good options for that type, because they are the de-facto standardized names (ignoring the extra underlines C/C++ compilers like to add). I strongly dislike fxb16 and f16b.

clarfonthey · 2023-07-10T18:50:19Z

Where will bf16 be located in libcore? The PPC and x86 floats will go in the relevant arch modules, but since bfloat is not arch-specific, it seems relevant to ask.

programmerjake · 2023-07-10T19:13:36Z

Where will bf16 be located in libcore? The PPC and x86 floats will go in the relevant arch modules, but since bfloat is not arch-specific, it seems relevant to ask.

I would expect bf16 to be a primitive type, so it would always be available like f32 (in a new edition) and be in the prelude (for old editions) and core::primitive.

lygstate · 2023-07-10T20:24:20Z

f64f64 is a bad name for type. Maybe fxd64 (float extended double 64) is better.

Having unified rule for naming is a benefit. For example,

it must starts with f:

common letter must follow f, for example x - extended, n - non-standard, c - custom, a - alternative
And we get
fx80
fxd64
fxb16

f64f64 comes from double double， as f64=double， that's makes sense

digama0 · 2023-07-10T20:49:45Z

f64f64 comes from double double， as f64=double， that's makes sense

That's not how rust's naming convention works. Considering that this is a 128 bit float format with a slightly different exponent split than the usual f128, I would strongly recommend using a variation on f128 like fx128, f128ppc, core::arch::power_pc::f128 or similar. Based on the description it's not even correct to describe it as "two f64's", it is one f64 and then a u64 with a bunch of extra mantissa bits. A type which is actually "two f64's" would be f64x2 and I would expect it to come up when representing complex numbers or a small SIMD float.

programmerjake · 2023-07-10T21:16:13Z

Based on the description it's not even correct to describe it as "two f64's", it is one f64 and then a u64 with a bunch of extra mantissa bits.

it is actually two f64's. the number is represented as the sum of two f64 values where one is approximately 2^53 larger than the other so the mantissa bits of one f64 stop about where they start in the other f64. you could also think of it as a f64 and another f64 telling you how far off the first f64 is, approximately doubling the precision.

clarfonthey · 2023-07-10T23:22:28Z

Based on the description it's not even correct to describe it as "two f64's", it is one f64 and then a u64 with a bunch of extra mantissa bits.

it is actually two f64's. the number is represented as the sum of two f64 values where one is approximately 2^53 larger than the other so the mantissa bits of one f64 stop about where they start in the other f64. you could also think of it as a f64 and another f64 telling you how far off the first f64 is, approximately doubling the precision.

In this case f64pf64 might be better to convey it's an f64 value plus another, rather than just the word f64 twice, which seems weird.

lygstate · 2023-07-11T05:45:04Z

Based on the description it's not even correct to describe it as "two f64's", it is one f64 and then a u64 with a bunch of extra mantissa bits.

it is actually two f64's. the number is represented as the sum of two f64 values where one is approximately 2^53 larger than the other so the mantissa bits of one f64 stop about where they start in the other f64. you could also think of it as a f64 and another f64 telling you how far off the first f64 is, approximately doubling the precision.

In this case f64pf64 might be better to convey it's an f64 value plus another, rather than just the word f64 twice, which seems weird.

doubledouble doesn't looks weird, so would f64f64, the p of f64pfp64 are weird becase for new one have no context won't know p is plus, and what's plus for? is that are +?

programmerjake · 2023-07-11T08:16:44Z

one other option is we could copy the existing double-double crate and call it twofloat

lygstate · 2023-07-11T09:27:36Z

one other option is we could copy the existing double-double crate and call it twofloat

this comes with an issue that not the rust style like f16,f32,f64,f128

tgross35 · 2023-07-11T19:52:50Z

this comes with an issue that not the rust style like f16,f32,f64,f128

To echo something said by scottmcm on another thread: types representing 80-bit extended precision (f80) and double-double (f64f64) are specialized types that we want to make available but don't want to encourage common use of (they will forever live in core::arch), so they don't need to match Rust's primitive naming style. "ugly" names along the lines of __m128bh are fine.

lygstate · 2023-07-11T19:59:53Z

this comes with an issue that not the rust style like f16,f32,f64,f128

To echo something said by scottmcm on another thread: types representing 80-bit extended precision (f80) and double-double (f64f64) are specialized types that we want to make available but don't want to encourage common use of (they will forever live in core::arch), so they don't need to match Rust's primitive naming style. "ugly" names along the lines of __m128bh are fine.

If "ugly" names is accepted, __float80 and __ibm128 can be used and that comes from GCC, we should using existing one(for language),
LLVM's x86_fp80 and ppc_fp128 is for IR, not for C/C++ language, so that's should be avoided.

But if ugly names is not accepted, and look at Float80(https://developer.apple.com/documentation/swift/float80) from Swift, I think f80 and f64f64 is still a good idea as beautiful names, but why we would want "ugly" names,
f80 and f64f64 always comes core:arch, it's already prefixed with core::arch string, there is no reason makes it ugly.

clarfonthey · 2023-07-12T07:20:07Z

I'm not exactly sure why the double-underscore convention was adopted for the x86 types, but if we're going for consistency, f80 should be called __m80. I'm not sure if x86 has SIMD types involving 80-bit floats (I sure hope not) but if so we could also use similar naming here.

lygstate · 2023-07-12T07:32:24Z

I'm not exactly sure why the double-underscore convention was adopted for the x86 types, but if we're going for consistency, f80 should be called __m80. I'm not sure if x86 has SIMD types involving 80-bit floats (I sure hope not) but if so we could also use similar naming here.

Where does the __m80 comes from? consistency with what?
x86 have no SIMD for 80-bit floats for sure

programmerjake · 2023-07-12T07:35:58Z

I'm not exactly sure why the double-underscore convention was adopted for the x86 types, but if we're going for consistency, f80 should be called __m80.

the double underscore is likely because C reserves all identifiers starting with __ for the implementation.

the __m64/__m128/... types are likely named for MMX (the predecessor to SSE), they are always SIMD types afaik. f80 is not a SIMD type, so imho naming it __m80 is incorrect.

lygstate · 2023-07-12T07:41:32Z

this comes with an issue that not the rust style like f16,f32,f64,f128

To echo something said by scottmcm on another thread: types representing 80-bit extended precision (f80) and double-double (f64f64) are specialized types that we want to make available but don't want to encourage common use of (they will forever live in core::arch), so they don't need to match Rust's primitive naming style. "ugly" names along the lines of __m128bh are fine.

BTW, __m128bh are comes from https://www.intel.com/content/www/us/en/docs/cpp-compiler/developer-guide-reference/2021-8/intrinsics-for-avx-512-bf16-instructions.html, and it's for SIMD,
but f80 and f64f64 is for c FFI, that's a different story.

skogseth · 2023-07-13T08:14:41Z

Based on the description it's not even correct to describe it as "two f64's", it is one f64 and then a u64 with a bunch of extra mantissa bits.

it is actually two f64's. the number is represented as the sum of two f64 values where one is approximately 2^53 larger than the other so the mantissa bits of one f64 stop about where they start in the other f64. you could also think of it as a f64 and another f64 telling you how far off the first f64 is, approximately doubling the precision.

In this case f64pf64 might be better to convey it's an f64 value plus another, rather than just the word f64 twice, which seems weird.

doubledouble doesn't looks weird, so would f64f64, the p of f64pfp64 are weird becase for new one have no context won't know p is plus, and what's plus for? is that are +?

Hard disagree. doubledouble looks weird, same with f64f64. But yeah, the p is just confusing

text/add-bf16-f64f64-and-f80-type.md

clarfonthey · 2023-07-13T13:54:14Z

the double underscore is likely because C reserves all identifiers starting with __ for the implementation.

This isn't C, though.

the __m64/__m128/... types are likely named for MMX (the predecessor to SSE), they are always SIMD types afaik. f80 is not a SIMD type, so imho naming it __m80 is incorrect.

Hmm, when poking around a few x86 references I found that people used m80 or m80fp to refer to these float args, but I guess that it was just a weird convention? I wasn't under the impression that the m here stood for MMX, but memory, since x86 uses rN to refer to registers, immN to refer to immediates, and mN to refer to memory.

I guess if we wanted to go with the prefix meaning the instruction set, we could go with fp80 since that's closer to what Intel uses.

programmerjake · 2023-07-13T17:15:16Z

the double underscore is likely because C reserves all identifiers starting with __ for the implementation.

This isn't C, though.

but the __m128 naming comes from the x86 intrinsics which are designed for C.

eddyb · 2023-07-13T19:53:39Z

Just randomly saw this, and this is some good timing, because I have rustc_apfloat news:
a bunch of f80 bug fixes, and support for bf16, are included in:

Advance the port all the way to the end of 2022. rustc_apfloat#1

However, I would strongly advise staying away from `f64f64` aka `ppc_f128`

Unlike IEEE formats (which have all of their behavior parameterized by their exponent and significant bitwidths) and x87's 80-bit weird format _{(which is mostly IEEE-like outside of some weird extra NaNs in the form of non-subnormals lacking the "integer bit")}, llvm::APFloat/rustc_apfloat's support for the f64f64/ppc_f128 "double double" format lacks specialized implementations for many operations, relying instead on a lossy fallback to a custom IEEE-style format, that cannot represent some of the nastier edge cases.

(IIRC f64f64 aka ppc_f128 aka "the uniquely weird PPC double-double format", allows its two f64s to have exponents so different, so you would require a massive IEEE-style format to losslessly contain its effective significand - something like f2113 if I had to guess, and requiring rustc_apfloat::ieee::IeeeFloat to have sig: [u128; 16] instead of sig: [u128; 1] - but that's wasteful because most of those bits will always be 0)

Since f80 is x86-only and f64f64 is PowerPC-only, x86_f80 and ppc_f128 look clear and consistent with LLVM.

I think LLVM made a mistake here with x86 (though I can kinda see why they chose that), llvm::APFloat correctly calls it x87 - this is not a format that x86 will use, it's specifically the internal "transiently higher precision" format of the x87 FPU.

x87_f80 and fx87_80 both look kind of silly to me, though (see below for a better solution).

I would strongly recommend using a variation on f128 like fx128, f128ppc, core::arch::power_pc::f128 or similar.

I was about to suggest that last one, i.e. scoping these under core::arch::* (though I would avoid calling it 128 - it's not 128 anything other than storage, it's "the sum of two standard IEEE f64s", maybe core::arch::power_pc::double_f64?).

I think core::arch::x86::f80 or core::arch::x86::x87_f80 would work great.

8573 · 2023-07-14T22:11:08Z

bf16 as builtin type for 'Brain floating format', widely used in machine learning, different from IEEE-754 standard binary16 representation

Putting this in the global namespace seems very under-motivated. Can't users who need it use core::...::bf16?
If "b" stands for "Brain" rather than (as I would have assumed) "binary", why give this the cryptic and generic name bf16, making it sound like a standard (albeit still niche) "binary float, 16-bit", rather than, say, brain_f16? Then machine learning projects can use core::...::brain_f16 as bf16.

text/add-bf16-f64f64-and-f80-type.md

konsumlamm · 2023-08-07T22:06:05Z

text/add-bf16-f64f64-and-f80-type.md

+# Unresolved questions
+[unresolved-questions]: #unresolved-questions
+
+This proposal does not contain information for FFI with C's `_Float128` and `__float128` type. Because they are not so commonly used compared to `long double`, and they are even more complex than the situation of `c_longdouble` (for example, their semantics are different under C or C++ mode).


Suggested change

This proposal does not contain information for FFI with C's `_Float128` and `__float128` type. Because they are not so commonly used compared to `long double`, and they are even more complex than the situation of `c_longdouble` (for example, their semantics are different under C or C++ mode).

This proposal does not contain information for FFI with C's `_Float128` and `__float128` type, because they are not so commonly used compared to `long double`, and they are even more complex than the situation of `c_longdouble` (for example, their semantics are different under C and C++).

I think this is not the reason, indeed, on some target c_longdouble needs _Float128, The real reason is because we have a different RFC3453 for it. Do not said this as it's misleading.
I think we needs say in conjunction with RFC3453, we can define c_longdouble properly on all target

PowerPC has option to control what long double means: f64, f128 or f64f64. But since it is in transition to f128 as of the time of writing, we can drop a little history burden and set f128 on little-endian 64-bit targets.

I don't make sure if x86 has similar option. If not, we can confidently introduce c_longdouble.

text/add-bf16-f64f64-and-f80-type.md

Co-authored-by: Jacob Lifshay <[email protected]>

Co-authored-by: teor <[email protected]>

Co-authored-by: konsumlamm <[email protected]>

jacobbramley · 2023-10-26T13:41:47Z

text/add-bf16-f64f64-and-f80-type.md

+
+However, besides the disadvantage of usage inconsistency between primitive types and types from crates, there are still issues around those bindings.
+
+The availablity of additional float types heavily depends on CPU/OS/ABI/features of different targets. Evolution of LLVM may also unlock the possibility of the types on new targets. Implementing them in the compiler handles the stuff at the best location.


This is an important point. Rust's AArch64 Neon (and prototype SVE) intrinsics currently lack f16 and bf16 vector support precisely because Rust cannot produce the representation that LLVM expects without the real scalar types; new-type wrappers around u16 won't work here.

This proposal (and the related #3453) enable those gaps to be filled in, I think.

jacobbramley · 2023-10-26T15:10:41Z

text/add-bf16-f64f64-and-f80-type.md

+`bf16` is available on all targets. The operators and constants defined for `f32` are also available for `bf16`.
+
+For `f64f64` and `f80`, their availability is limited to the following targets, but this may change over time:
+
+- `f64f64` is supported on `powerpc-*` and `powerpc64(le)-*`, available in `core::arch::{powerpc, powerpc64}`
+- `f80` is supported on `i[356]86-*` and `x86_64-*`, available in `core::arch::{x86, x86_64}`


How useful (and feasible) would it be to also make these types target_feature-dependent?

bf16 is an odd one in a way, because hardware accelerations (where they exist) tend to just use f32 and truncate anyway. Emulating that with f32 hardware is likely to be cheap.

However, there are new, emerging formats that are unlikely to have that property. AArch64 has some 8-bit FP formats on the way, for example. Their incorporation into Rust would have complexities, but they're different enough from existing formats that we probably wouldn't want a polyfill for hardware that doesn't have them. Instead, I'd expect them to need a target_feature guard or similar (like Neon and SVE types).

Finally: something we observed during SVE prototyping (#3268) is that sometimes, we'd really like the target feature to be associated with the type, rather than the function. That's not quite the same as gating availability that way, but it's perhaps related.

jcranmer · 2023-11-08T04:55:22Z

One interesting functionality that f64f64 and f80 bring in is that both of these types have non-canonical representations. f80 essentially has one bit that is completely determined by the other 79 bits, and if that bit is incorrectly set, it is a non-canonical number of some kind. IEEE 754 does have a concept of canonical and noncanonical numbers, but this applies only to f80 (which is itself an implementation of an extended-precision binary64) and the decimal floating-point types. f64f64 has various interesting kinds of non-canonical representations, but that is the detailed extent of my knowledge.

f64f64 would be the first floating-point type added to Rust that cannot be described directly with IEEE 754 semantics (which are parameterized on a base/# digits/maximum exponent basis); concepts like "number of mantissa digits" is not well-defined, and I don't know how this problem is solved in the C/C++ libraries that exist. This does add risks for representing this type.

IEEE 754-2019 adds a section on augmented arithmetic operations, which includes addition, subtraction, and multiplication, but not division (for reasons I don't know and will not speculate on). It may be the case that future versions will grow a more general double-double library functionality for extra precision.

Jules-Bertholet · 2023-11-12T19:20:47Z

However, I would strongly advise staying away from f64f64 aka ppc_f128

Unlike IEEE formats (which have all of their behavior parameterized by their exponent and significant bitwidths) and x87's 80-bit weird format (which is mostly IEEE-like outside of some weird extra NaNs in the form of non-subnormals lacking the "integer bit"), llvm::APFloat/rustc_apfloat's support for the f64f64/ppc_f128 "double double" format lacks specialized implementations for many operations, relying instead on a lossy fallback to a custom IEEE-style format, that cannot represent some of the nastier edge cases.

Is a complete softfloat implementation strictly necessary? We could just forbid operations on ppc_f128 in const contexts.

tgross35 · 2023-11-12T19:23:51Z

I think that is the goal - everything here (except for bf16) would be in std::arch, only available wherever there is hardware support

sorear · 2024-02-20T15:28:58Z

text/add-bf16-f64f64-and-f80-type.md

+
+## `f80` type
+
+`f80` represents the extended precision floating point type on x86 targets, with 1 sign bit, 15 bits of exponent and 63 bits of mantissa.


What is the size and alignment of f80 on x86? gcc can change it using the -m96bit-long-double and -m128bit-long-double options, although only one is conformant with the ABI.

Do we also use f80 for the 80-bit floating point format on m68k? It is nearly identical to the Intel format, although it supports normalized numbers with a biased exponent of 0 and is big endian.

Alignment would be set by the ABI.

For reference, ABI says 16 bytes (same as f128)

oh, according to gcc, the ABI size is 96 bits on x86 and 128 bits on x86_64: https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html#index-m96bit-long-double

Interesting, the alignment is also reduced to make use of that extra space

(From i386 abi table 2.1 at https://www.uclibc.org/docs/psABI-i386.pdf)

lygstate · 2024-03-25T07:00:55Z

Better split bf16 out of this, I think the main reason f64f64 and f80 is for keep ABI compatible with existing C libraries, but bf16 is not just for ABI compatible only, but also for acceleration

plimkilde · 2024-04-18T17:37:41Z

At this point, I've personally come to expect that Rust types named f* represent an IEEE 754 basic/interchange format. I'm not sure if naming the x87 floats f80 would be clear enough in distinguishing these from the cross-platform types. Thus, to make it clear that the type is platform-specific, I think I'd prefer the type to be named x87_f80, x86_f80 or similar.

Jules-Bertholet · 2024-04-18T17:43:57Z

Because these live in core::arch::..., you can establish the convention of always using them as x86::f80 instead of importing the type directly.

VitWW · 2024-04-18T18:14:00Z

At this point, I've personally come to expect that Rust types named f* represent an IEEE 754 basic/interchange format.

"f" in "f*" means "Float point number format".

But, fair point, we expect, that "f[number]" is a part of IEEE 754.

plimkilde · 2024-04-18T18:22:26Z

Because these live in core::arch::..., you can establish the convention of always using them as x86::f80 instead of importing the type directly.

True - but what about the general case? If some other platform has a quirky "f64" that we would like to support, how should that be named? I could imagine that shadowing with the standard f64 could get messy.

kennytm · 2024-04-18T19:25:22Z

You can use std::primitive::f64 to refer to the standard one and std::arch::quirky_platform::f64 for the quirky one.

programmerjake · 2024-04-18T19:50:54Z

or, you can just use a different name: like how f64f64 is not named core::arch::powerpc::legacy::f128

programmerjake · 2024-04-18T19:55:42Z

f80 doesn't conflict with any ieee 754 standard exchange types since there are none between f64 and f128. f80 is a ieee 754 extended type tho, so is in the group mentioned in the standard as a category of types with requirements. the main designer of the 8087 is also one of the main people behind the original ieee 754 version, so they did try to make it ieee754 compliant.

tgross35 · 2024-04-18T23:42:02Z

text/add-bf16-f64f64-and-f80-type.md

+The types listed above may be widely used in existing native code, but are not available on all targets. Their underlying representations are quite different from 16-bit and 128-bit binary floating format defined in IEEE 754.
+
+In respective targets (namely PowerPC and x86), the target-specific extended types are referenced by `long double`, which makes `long double` ambiguous in the context of FFI. Thus defining `c_longdouble` should help interoperating with C code using the `long double` type.


This section needs some stronger motivation - bf16 is not widely used (yet), and being widely used in C isn't strong enough motivation on its own for Rust to do anything. Ideas to add:

bf16 is popular in GPU work, and is supported as a storage format on multiple platforms (especially ARM)

f80 can be used for platform-specific performance improvements (over f128), like a SIMD type

We will have something compatible with C's long double on every platform. Currently we only have f60 and f128.

tgross35 · 2024-04-18T23:46:48Z

text/add-bf16-f64f64-and-f80-type.md

+
+`bf16` consists of 1 sign bit, 8 bits of exponent, 7 bits of mantissa. Some ARM, AArch64, x86 and x86_64 targets support `bf16` operations natively. For other targets, they will be promoted into `f32` before computation and truncated back into `bf16`.
+
+`bf16` will generate the `bfloat` type in LLVM IR.


Is there a place where bf16 ABI is defined, since it is a nonstandard float type? We need to make sure that GCC and LLVM are compatible here.

tgross35 · 2024-04-18T23:51:09Z

text/add-bf16-f64f64-and-f80-type.md

+`core::ffi::c_longdouble` will always represent whatever `long double` does in C. Rust will defer to the compiler backend (LLVM) for what exactly this represents, but it will approximately be:
+
+- `f80` extended precision on `x86` and `x86_64`
+- `f64` double precision with MSVC
+- `f128` quadruple precision on AArch64
+- `f64f64` on PowerPC


Rust will defer to the compiler backend (LLVM) for what exactly this represents

I think you mean to say that we will make it match Clang, since there is no way to query LLVM as to what a long double is (that logic lives in Clang, not the backend).

ARM is another notable platform where long double = f64

tgross35 · 2024-04-18T23:53:09Z

text/add-bf16-f64f64-and-f80-type.md

+# Drawbacks
+[drawbacks]: #drawbacks
+
+`bf16` is not an IEEE 754 standard type, so adding it as primitive type may break existing consistency for builtin float types. The truncation after calculations on targets not supporting `bf16` natively also breaks how Rust treats precision loss in other cases.


I don't get what this is saying - what consistency is broken that is specific to bf16? None of the float types specified here are fully specified in IEE 754 (though f80 is compatible with its extended precision definition).

I think the idea is that on targets not supporting bf16, a technically-incorrect and lazy but fast and sometimes good enough approximation is commonly used: doing the operations as f32 and then taking the high half of the f32 result, which has incorrect rounding (that f32 to bf16 conversion truncates instead of rounding to nearest, ties to even like all other FP normal operations).

tgross35 · 2024-04-18T23:55:18Z

text/add-bf16-f64f64-and-f80-type.md

+
+`bf16` is not an IEEE 754 standard type, so adding it as primitive type may break existing consistency for builtin float types. The truncation after calculations on targets not supporting `bf16` natively also breaks how Rust treats precision loss in other cases.
+
+`c_longdouble` are not uniquely determined by architecture, OS and ABI. On the same target, C compiler options may change what representation `long double` uses.


I think you are referring to options like -mlong-double-128 here. This doesn't strike me as a drawback. Instead, I would mention in the c_longdouble section that what exactly long double represents can be changed at compile time in C but Rust won't have this option.

tgross35 · 2024-04-18T23:59:54Z

text/add-bf16-f64f64-and-f80-type.md

+
+And since third party tools also rely on Rust internal code, implementing additional float types in the compiler also helps the tools to recognize them.
+
+# Prior art


Also:

https://docs.rs/half/latest/half/struct.bf16.html

https://docs.rs/twofloat/0.7.0/twofloat/

tgross35 · 2024-04-19T00:05:35Z

text/add-bf16-f64f64-and-f80-type.md

+
+This proposal does not contain information for FFI with C's `_Float128` and `__float128` type. [RFC #3453](https://github.com/rust-lang/rfcs/pull/3453) focuses on type conforming to IEEE 754 `binary128`.
+
+Although statements like `X target supports A type` is used in above text, some target may only support some type when some target features are enabled. Such features are assumed to be enabled, with precedents like `core::arch::x86_64::__m256d` (which is part of AVX).


Could you list what exactly these target features are in the reference-level explanation section? This RFC should propose whether we want to just disallow the types without relevant target features (probably acceptable) or try to polyfill them somehow (I hope not, unless somebody is extremely motivated).

tgross35 · 2024-04-19T00:05:54Z

text/add-bf16-f64f64-and-f80-type.md

+# Unresolved questions
+[unresolved-questions]: #unresolved-questions
+
+This proposal does not contain information for FFI with C's `_Float128` and `__float128` type. [RFC #3453](https://github.com/rust-lang/rfcs/pull/3453) focuses on type conforming to IEEE 754 `binary128`.


This can probably be dropped since f128 is in nightly now.

tgross35 · 2024-04-19T00:08:35Z

We need to taper the name bikeshed on this thread so other important topics don't get lost. I created a Zulip thread for naming discussion, please do continue talking about it there: https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/Additional.20float.20types.20RFC.20naming.20bikeshed

Some of the bigger open questions as I understand it:

Should 16-bit brain float always be in the namespace, or should it have to be imported from core? Add bf16, f64f64 and f80 types #3456 (comment) Add bf16, f64f64 and f80 types #3456 (comment)
Target feature association of these types Add bf16, f64f64 and f80 types #3456 (comment). There are some other related issues, such as The ABI of float types can be changed by -Ctarget-feature rust#116344
Should these types have literals? Related to Add bf16, f64f64 and f80 types #3456 (comment) and Add bf16, f64f64 and f80 types #3456 (comment), the conversion to/from IEEE types is not always lossless

Jules-Bertholet · 2024-04-19T02:03:41Z

In most cases, the ABI of a C _Complex float type is the same as a struct with two float fields, but for x86 f80 this is not the case. Should there be a dedicated type or repr for this?

RalfJung · 2024-04-20T10:51:07Z

text/add-bf16-f64f64-and-f80-type.md

+
+## `bf16` type
+
+`bf16` consists of 1 sign bit, 8 bits of exponent, 7 bits of mantissa. Some ARM, AArch64, x86 and x86_64 targets support `bf16` operations natively. For other targets, they will be promoted into `f32` before computation and truncated back into `bf16`.


Is that equivalent to whatever the IEEE semantics of bf16 are, if such a thing exists (i.e., a hypothetical IEEE type with 8 bits of exponent, 7 bits of mantissa)?

Reading other comments below it seems like this is actually an incorrect emulation. So it will be the case, for the first time in Rust history, that primitive operations such as multiplying two elements of bf16 have target-dependent behavior even when no NaNs are involved. That's a major downside of the RFC and needs to be discussed and justified more explicitly. The RFC should also state explicitly what is guaranteed to be true about bf16 arithmetic on all targets -- that's needed e.g. for unsafe code authors to know what they can rely on in terms of soundness. Furthermore, the RFC needs to specify whether on targets that have native bf16 support, it is correct for the compiler to do compile-time optimizations using emulated f32 semantics (IOW, the RFC needs to say whether there are any guarantees that bf16 on such a target will actually behave like the native bf16 of the hardware.)

It should be possible to emulate correct rounding semantics, so this frankly seems like a bug in those softfp impls.

It does require temporarily switching to RTZ mode, and then you can truncate the result with RTN-ties-even.

Edit: Actually NVM, the above procedure still has an error from the correctly rounded result, of at most -2^17*ULP. You'd have to first check FE_INEXACT and just how you round accordingly.

Or maybe alternatively the answer for all these float types is -- the resulting bit patterns are entirely unspecified and not guaranteed to be portable in any way.

But even then we need to document how deterministic they are. Does passing the exact same inputs to an operation multiple times during a program execution always definitely produce the exact same outputs, on all targets and optimization levels? For regular floats, the answer turns out to be "only when there are no NaNs" -- that's what #3514 is all about. Sometimes, the same operation with the same inputs on the same target can produce different results depending on optimization levels and how obfuscated the surrounding code is. Even if we don't want to specify the bits that are produced by these operations, we need to specify whether results are consistent across all programs on a given target (define "target" -- is it per-triple or per-architecture), or only consistent across all operations in a single execution, or arbitrarily inconsistent?

iirc, you can do a single add/sub/mul/div/sqrt bf16 operation by promoting to f32, doing the operation in f32, and then rounding back to bf16 using the same rounding mode, not truncating. That is assuming, of course, that bf16 actually meets the conditions which are iirc something like having bf16's mantissa bit count be less than half of f32's mantissa bit count minus 1 or 2.

this is like how you can do that with f32 and f64, which is how you can do f32 operations in JavaScript by using Math.fround.

That's true for the regular IEEE formats, yes -- it was proven here.

But I don't know if bf16 is enough like an IEEE format to make that theorem apply.

Also, does LLVM when it compiles bf16 to f32 guarantee to do the rounding back to bf16 after each and every operation, never doing more than one operation "at once" in f32 mode?

That's true for the regular IEEE formats, yes -- it was proven here.

But I don't know if bf16 is enough like an IEEE format to make that theorem apply.

it is, bf16 is just f16 with a few more exponent bits and a few less mantissa bits, everything else is the same.

bf16 is just f32 with the lower 16 mantissa bits dropped. As f64 values that can be rounded to f32 are effectively f32 values with 29 extra mantissa bits, there would be no difference here.

Also, does LLVM when it compiles bf16 to f32 guarantee to do the rounding back to bf16 after each and every operation, never doing more than one operation "at once" in f32 mode?

that I don't know, but I hope LLVM at least tries to be correct in non-fast-math mode

that I don't know, but I hope LLVM at least tries to be correct in non-fast-math mode

That should definitely be noted as something to figure out before stabilization.

RalfJung · 2024-04-20T10:51:50Z

text/add-bf16-f64f64-and-f80-type.md

+
+`f64f64` is the legacy extended floating point format used on PowerPC targets. It consists of two `f64`s, with the former acting as a normal `f64` and the latter for an extended mantissa.
+
+The following `From` traits are implemented in `core::arch::{powerpc, powerpc64}` for conversion between `f64f64` and other floating point types:


It sounds like this type does not have semantics that are equivalent to any IEEE float type. But we need some document to explain exactly what their semantics are, i.e. the exact bits you get out when doing arithmetic on values of this type. Does such a document exist?

That's a good question. For the most part, it would act like an f64x2 vector (that multiplication/division, etc. would cross), but when exactly bits will move between the two is a question that would need to be answered.

For the most part, it would act like an f64x2 vector

wait, that's not at all how f64f64 arithmetic works, it instead works more like a big-integer. e.g. here's multiplying two double-double values in the twofloat crate: https://docs.rs/twofloat/0.7.0/src/twofloat/arithmetic.rs.html#145

Yeah that's more like what I expected. Is there an explanation somewhere of what the "meaning" of such an (a, b) pair is, i.e. what is its mathematical-valued semantics? Is it a + b (where this is mathematical inf-precision + on rational numbers)?

Is the behavior of all basic operations on that kind of representation exactly guaranteed, the same way IEEE exactly guarantees behavior for our regular floats?

Given the unusual semantics and the somewhat legacy nature of f64f64, would it be better to just provide a type with no methods/trait implementations (apart from Copy/Clone, similar to the arch-specific SIMD types), and leave a fully featured f64f64 implementation to crates like twofloat? AFAIK PowerPC doesn't provide any hardware acceleration for f64f64 specifically, so the only thing that couldn't be done outside the compiler/std would be supporting the f64f64 C ABI, which external crates can then use in a #[repr(transparent)] struct.

Given the unusual semantics and the somewhat legacy nature of f64f64, would it be better to just provide a type with no methods/trait implementations

sounds good to me! though I'd at least have Copy, Clone, Default, and Debug, where Debug could just be as if it was: struct f64f64 { high: f64, low: f64 }

Not knowing the details of the fast_two_sum function, that looks to be a binomial product, which is what I was referring to with "multiplication/division, etc., would cross" though I guess I wasn't quite clear on that.

ok, yeah. addition and subtraction also don't act like a f64x2, about the only ops that act like f64x2 are neg, abs, and copy.

Not knowing the details of the fast_two_sum function, that looks to be a binomial product, which is what I was referring to with "multiplication/division, etc., would cross" though I guess I wasn't quite clear on that.

So -- what is the mathematical value of a pair (a, b) then, the rational number this represents?

Yeah that's more like what I expected. Is there an explanation somewhere of what the "meaning" of such an (a, b) pair is, i.e. what is its mathematical-valued semantics? Is it a + b (where this is mathematical inf-precision + on rational numbers)?

yes, it is high + low where the number is the exact mathematical sum of two f64s

Is the behavior of all basic operations on that kind of representation exactly guaranteed, the same way IEEE exactly guarantees behavior for our regular floats?

I've heard that many special functions (like sin, cos, etc.) don't even always return canonical values (as in the result is represented differently than the exact same number would be by arithmetic ops), idk which ones.

Oh, there are non-canonical representations? That then also opens interesting questions for ==...

Not knowing the details of the fast_two_sum function, that looks to be a binomial product, which is what I was referring to with "multiplication/division, etc., would cross" though I guess I wasn't quite clear on that.

That's also an option, of course.

RalfJung · 2024-04-20T10:52:25Z

text/add-bf16-f64f64-and-f80-type.md

+All proposed types do not have literal representation. Instead, they can be converted to or from IEEE 754 compliant types.
+
+# Reference-level explanation
+[reference-level-explanation]: #reference-level-explanation


For all of these types, what is their interaction with #3514 -- i.e., what exactly is guaranteed (or not) about their NaN values? Is there any other kind of non-determinism for any of them?

RalfJung · 2024-04-20T10:52:58Z

text/add-bf16-f64f64-and-f80-type.md

+
+## `f80` type
+
+`f80` represents the extended precision floating point type on x86 targets, with 1 sign bit, 15 bits of exponent and 63 bits of mantissa.


So this is following strict IEEE float semantics, just with different exponent/mantissa sizes than the existing types we have? That should be stated explicitly.

Ish. x87 has some weirdness in subnormal and nonfinite values, and it has an explicit integer bit, unlike the other interchange formats (which directly results in the aformentioned weirdness).

lygstate · 2025-09-20T15:04:00Z

bf16 should split out, it's a generic type

Initial draft from rust-lang#3451

9302977

This revision also contains comments addressed from reviewers in RFC rust-lang#3451.

ecnelises mentioned this pull request Jul 10, 2023

Additional float types #3451

Closed

Give RFC number

f34867e

programmerjake reviewed Jul 10, 2023

View reviewed changes

text/add-bf16-f64f64-and-f80-type.md Outdated Show resolved Hide resolved

ehuss added the T-lang Relevant to the language team, which will review and decide on the RFC. label Jul 10, 2023

teor2345 reviewed Jul 13, 2023

View reviewed changes

text/add-bf16-f64f64-and-f80-type.md Outdated Show resolved Hide resolved

konsumlamm reviewed Aug 7, 2023

View reviewed changes

ecnelises and others added 7 commits October 17, 2023 17:13

Fix mention of SSE to AVX

26436fa

Co-authored-by: Jacob Lifshay <[email protected]>

Fix typo of f80

2117dda

Co-authored-by: teor <[email protected]>

Update text/add-bf16-f64f64-and-f80-type.md

cb88346

Co-authored-by: konsumlamm <[email protected]>

Update text/add-bf16-f64f64-and-f80-type.md

bdbf8fe

Co-authored-by: konsumlamm <[email protected]>

Update text/add-bf16-f64f64-and-f80-type.md

c57d867

Co-authored-by: konsumlamm <[email protected]>

Syntax and typo fix from @konsumlamm

8fdcc60

Co-authored-by: konsumlamm <[email protected]>

Explain why C _Float128 not mentioned

892334b

jacobbramley reviewed Oct 26, 2023

View reviewed changes

sorear reviewed Feb 20, 2024

View reviewed changes

tgross35 reviewed Apr 19, 2024

View reviewed changes

RalfJung reviewed Apr 20, 2024

View reviewed changes

clarfonthey mentioned this pull request May 23, 2024

ACP: primitive numeric traits rust-lang/libs-team#371

Open

	This proposal does not contain information for FFI with C's `_Float128` and `__float128` type. Because they are not so commonly used compared to `long double`, and they are even more complex than the situation of `c_longdouble` (for example, their semantics are different under C or C++ mode).
	This proposal does not contain information for FFI with C's `_Float128` and `__float128` type, because they are not so commonly used compared to `long double`, and they are even more complex than the situation of `c_longdouble` (for example, their semantics are different under C and C++).


		However, besides the disadvantage of usage inconsistency between primitive types and types from crates, there are still issues around those bindings.

		The availablity of additional float types heavily depends on CPU/OS/ABI/features of different targets. Evolution of LLVM may also unlock the possibility of the types on new targets. Implementing them in the compiler handles the stuff at the best location.


		## `f80` type

		`f80` represents the extended precision floating point type on x86 targets, with 1 sign bit, 15 bits of exponent and 63 bits of mantissa.

		The types listed above may be widely used in existing native code, but are not available on all targets. Their underlying representations are quite different from 16-bit and 128-bit binary floating format defined in IEEE 754.

		In respective targets (namely PowerPC and x86), the target-specific extended types are referenced by `long double`, which makes `long double` ambiguous in the context of FFI. Thus defining `c_longdouble` should help interoperating with C code using the `long double` type.


		`bf16` consists of 1 sign bit, 8 bits of exponent, 7 bits of mantissa. Some ARM, AArch64, x86 and x86_64 targets support `bf16` operations natively. For other targets, they will be promoted into `f32` before computation and truncated back into `bf16`.

		`bf16` will generate the `bfloat` type in LLVM IR.


		`bf16` is not an IEEE 754 standard type, so adding it as primitive type may break existing consistency for builtin float types. The truncation after calculations on targets not supporting `bf16` natively also breaks how Rust treats precision loss in other cases.

		`c_longdouble` are not uniquely determined by architecture, OS and ABI. On the same target, C compiler options may change what representation `long double` uses.


		And since third party tools also rely on Rust internal code, implementing additional float types in the compiler also helps the tools to recognize them.

		# Prior art


		This proposal does not contain information for FFI with C's `_Float128` and `__float128` type. [RFC #3453](https://github.com/rust-lang/rfcs/pull/3453) focuses on type conforming to IEEE 754 `binary128`.

		Although statements like `X target supports A type` is used in above text, some target may only support some type when some target features are enabled. Such features are assumed to be enabled, with precedents like `core::arch::x86_64::__m256d` (which is part of AVX).


		## `bf16` type

		`bf16` consists of 1 sign bit, 8 bits of exponent, 7 bits of mantissa. Some ARM, AArch64, x86 and x86_64 targets support `bf16` operations natively. For other targets, they will be promoted into `f32` before computation and truncated back into `bf16`.


		`f64f64` is the legacy extended floating point format used on PowerPC targets. It consists of two `f64`s, with the former acting as a normal `f64` and the latter for an extended mantissa.

		The following `From` traits are implemented in `core::arch::{powerpc, powerpc64}` for conversion between `f64f64` and other floating point types:

Add bf16, f64f64 and f80 types #3456

Are you sure you want to change the base?

Add bf16, f64f64 and f80 types #3456

Conversation

ecnelises commented Jul 10, 2023

Uh oh!

VitWW commented Jul 10, 2023

Uh oh!

ecnelises commented Jul 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

programmerjake Jul 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

programmerjake Jul 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

programmerjake commented Jul 10, 2023

Uh oh!

clarfonthey commented Jul 10, 2023

Uh oh!

programmerjake commented Jul 10, 2023

Uh oh!

lygstate commented Jul 10, 2023

Uh oh!

digama0 commented Jul 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

programmerjake commented Jul 10, 2023

Uh oh!

clarfonthey commented Jul 10, 2023

Uh oh!

lygstate commented Jul 11, 2023

Uh oh!

programmerjake commented Jul 11, 2023

Uh oh!

lygstate commented Jul 11, 2023

Uh oh!

tgross35 commented Jul 11, 2023

Uh oh!

lygstate commented Jul 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clarfonthey commented Jul 12, 2023

Uh oh!

lygstate commented Jul 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

programmerjake commented Jul 12, 2023

Uh oh!

lygstate commented Jul 12, 2023

Uh oh!

skogseth commented Jul 13, 2023

Uh oh!

Uh oh!

clarfonthey commented Jul 13, 2023

Uh oh!

programmerjake commented Jul 13, 2023

Uh oh!

eddyb commented Jul 13, 2023

However, I would strongly advise staying away from f64f64 aka ppc_f128

Uh oh!

8573 commented Jul 14, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Add `bf16`, `f64f64` and `f80` types #3456

Add `bf16`, `f64f64` and `f80` types #3456

ecnelises commented Jul 10, 2023 •

edited

Loading

programmerjake Jul 10, 2023 •

edited

Loading

programmerjake Jul 10, 2023 •

edited

Loading

digama0 commented Jul 10, 2023 •

edited

Loading

lygstate commented Jul 11, 2023 •

edited

Loading

lygstate commented Jul 12, 2023 •

edited

Loading

However, I would strongly advise staying away from `f64f64` aka `ppc_f128`

lygstate Aug 8, 2023 •

edited

Loading

However, I would strongly advise staying away from `f64f64` aka `ppc_f128`

tgross35 Apr 14, 2024 •

edited

Loading

lygstate commented Mar 25, 2024 •

edited

Loading