Right now, there is a massive rats’ nest and pull request war around the semantics and handling of LLVM’s floating-point minimum and maximum operations, of which there are three different sets.

The LangRef as it is currently live on llvm.org at the time of writing (again, pull request war) lists the following sets of operations:

llvm.minnum.* and llvm.maxnum.*

These operations are defined to match the minNum and maxNum operations defined by IEEE 754-2008, with the exception that they always treat -0.0 as less than +0.0. If one operand is a qNaN but not the other, the non-NaN one is returned.

If either input operand is a signaling NaN, these operations return a qNaN. This means that LLVM’s sNaN/qNaN nondeterminism “leaks” into the operations’ semantics, which appears to make them impossible to implement soundly. The “Semantics” section somewhat acknowledges this.

These operations are not lowered correctly on many backends. For instance, the x86 backend does not distinguish between -0.0 and +0.0, and doesn’t treat sNaN differently from qNaN.

llvm.minimumnum.* and llvm.maximumnum.*

These operations are defined to match the minimumNumber and maximumNumber operations defined by IEEE754-2019, except using LLVM’s NaN semantics instead of IEEE754’s. They can do this because the newly-revised operations do not treat sNaN inputs differently from qNaN inputs. LLVM’s language reference says that if both operands are NaN, these operations return “a NaN”, linking to the section about LLVM’s semantics and how it treats qNaN and sNaN identically.

These operations also treat -0.0 as less than +0.0, but this time it’s actually lowered correctly on x86. This behavior can be opted out of via the nsz flag.

llvm.minimum.* and llvm.maximum.*

These operations are defined to match the minimum and maximum operations defined by IEEE754-2019, except using LLVM’s NaN semantics instead of IEEE754’s.

These behave identically to llvm.minimumnum.* and llvm.maximumnum.*, with the exception that they return a NaN if either input operand is NaN.

The problem

Right now, the semantics of llvm.minnum.* and llvm.maxnum.* are not actually respected across architectures. This was noted in Revert "LangRef: Clarify llvm.minnum and llvm.maxnum about sNaN and signed zero (#112852)" by nikic · Pull Request #138451 · llvm/llvm-project · GitHub :

I’m not a big fan of the change itself (because it is incoherent with LLVM’s general sNaN semantics), but even if we want to do it, it needs to be phased in a lot more carefully than what has actually happened. You can’t just change the semantics in LangRef and then completely ignore the consequences of the change on optimization behavior.

The bigger problem, also included in that quote above, is that LLVM’s NaN semantics are are in direct contradiction with IEEE754-2008’s minNum and maxNum operations. LLVM’s specification directly says “Floating-point math operations are allowed to treat all NaNs as if they were quiet NaNs.” The minNum and maxNum operations cannot treat all NaNs as if they were quiet NaNs.

My proposal

(EDIT: I’ve revised this a bit. These three steps are ordered from easiest to hardest, and shouldn’t all be done at once.)

  1. llvm.minnum.* and llvm.maxnum.* intrinsics should be deprecated and eventually removed. During the deprecation period, they should be treated as llvm.minimumnum.* and llvm.maximumnum.* with the nsz flag set, since their “doesn’t really treat -0.0 as less than +0.0” behavior is important for performance on x86. Their documentation should be updated to reflect this. The existing “intrinsics comparison” table should be changed to not mention sNaN or qNaN at all, just “NaN”, in accordance with LLVM’s semantics.

  2. Constrained intrinsics and strictfp should differentiate between signaling and quiet NaNs. The “Behavior of Floating-Point NaN values” section points you towards the constrained intrinsics, but as far as I can tell, they don’t mention signaling NaNs anywhere. In my opinion, sNaN handling should be part of the additional scope of strictfp and the constrained intrinsics, just like rounding mode and exceptions are now. I’m not sure if this would require any actual codebase changes.

  3. The llvm.minimumnum.* and llvm.maximumnum.* operations can sometimes introduce overhead, even on platforms which do support native signed-zero handling, if they implement the older IEEE754-2008 versions that handle signaling NaNs differently (for instance, ARM). In these cases, extra “canonicalize” operations need to be inserted in order to comply with the spec. A nsnan (“no signaling NaN”) fast-math flag could potentially be added to avoid these canonicalize operations. I’m not sure how that would interact with LLVM’s existing semantics, or whether it violates the “sNaN is equivalent to qNaN” rule, so it’ll require a lot of further discussion.

2 Likes

I expect what would happen is similar to pow: when either input is SNaN, the function non-deterministically does one of the following

  • it returns a NaN (using the usual NaN propagation rules, i.e. this could be an SNaN or QNaN)
  • it behaves as-if all input NaNs are first quieted

This is not unsound or impossible. It just means that the existing non-determinism for whether operations return a quiet or signaling NaN can be “amplified” to affect whether a result is a NaN at all or not. There’s nothing unsound about that, but it is surprising, and it could lead to unsoundness if some LLVM optimization misses this detail and gets it wrong.

The more concerning point, for me, is that these semantics for minnum/maxnum break the property that has been true and documented for a long time, that if exactly one input is NaN, then the other input is returned. This property is now enshrined in the guaranteed-stable documentation of public Rust functions. If LLVM decides to just change the deal here, that could become quite the hassle.
EDIT: Seems like this documentation was already only correct for some backends (like x86_64) before, but was incorrect e.g. on aarch64. :confused:

So I believe that the original motivation for these changes went something like this:

  • While minnum/maxnum documented a specific sNaN behavior, it was not actually respected on all platforms. So it was necessary to fix things in some direction.
  • The previous “sNaN treated like qNaN” behavior can be achieved using minimumnum/maximumnum combined with the nsz flag – under the assumption of a spec change that tightens the semantics of nsz for the float min/max intrinsic family to only allow choice between the two arguments, and not de-novo materialization of zeroes of different sign.
  • As the old behavior is representable in that way, we can define minnum/maxnum with the old IEEE 2008 sNaN semantics. On some hardware, this is the only min/max variant that’s natively available, so exposing it in some form is valuable.

I think that in principle, this motivation is reasonable. However, there are two big issues here:

  • Because LLVM’s (non-strictfp) NaN semantics allow treating sNaN as qNaN, and allow the omission of canonicalizing operations that would turn sNaN into qNaN, it is not actually possible to guarantee a specific sNaN behavior in any sensible way. By itself, this is not a problem – this just means that the result is going to be non-deterministic. But I think this is something that needs to be explicitly acknowledged in the LangRef wording, not just treated as an emergent property.
  • The way this change has been phased in was absolutely terrible. The correct way to make this change was to a) introduce maximumnum/minimumnum and full lowering and optimization support for it, b) change frontends (and LLVM’s canonical form) to use maximumnum nsz instead of maxnum to retain previous semantics and c) change the semantics of maxnum – or better yet, introduce a new intrinsic like llvm.maxnum.ieee2008 and upgrade the old one to maximumnum nsz instead. Instead what happened is that LangRef got changed, and then a haphazard set of changes was landed that partially migrated things here and there, and now we’re in this huge mess where nobody knows how anything is even supposed to work anymore.

At this point, I’m not sure that trying to revert these changes really makes sense. As we’re halfway into this migration, it might make more sense to pull it all the way through.

Some changes that are going to be needed towards that are:

  • Update LangRef to explicitly specify minnum/maxnum as non-deterministic wrt sNaN.
  • Complete optimization support for maximumnum/minimumnum.
  • Make them the canonical forms, instead of current maxnum/minnum.
  • Update frontends to use minmumnum/maximumnum + nsz (including Rust @RalfJung).

Somebody needs to own this work and complete it in a reasonable time frame. @arsenm and @wzssyqa seem to be the people who are pushing for this direction, so I hope they’ll invest some effort into bringing this to a consistent state.

(That is assuming we have a consensus that this is the end state we want to have, rather than reverting everything and fixing backends that failed to comply with the old minnum/maxnum definition.)

2 Likes

I think “the old floating-point operations have completely busted semantics, and the new ones aren’t implemented everywhere” is an issue with the roundeven/rint/nearbyint operations too. It’s not just a matter of lowering to the correct instruction sequences, because if you end up falling back to emitting a libcall, the operations you want only exist in C23 and might be missing in the libm/libc you link against:

I wanted to use roundeven at first. But Windows libc didn’t support it, so I switched to rint, while keeping the roundeven intrinsics around to make it easier to make the switch in the future.

Is there a good solution here? It seems like nailing libcall behavior down could make for an RFC in and of itself.

I agree with this bullet and already mentioned it in https://github.com/llvm/llvm-project/pull/168838

I suggest we update LangRef to specify ±0 is non-deterministic too. We cannot keep the current statement while not fixing backends don’t support it. And it looks odd to me if we respect part of IEEE 2008 semantics but disrespect the other part.

While sNaN handling is going to be trickier to sort out, I think specifying signed zero handing as nondeterminstic can be done pretty straightforwardly. The rationale given in https://github.com/llvm/llvm-project/pull/112852 for specifying -0.0 < +0.0 seems very dubious:

Since some architectures such as aarch64/MIPSr6/LoongArch, have instructions that implements +0.0>-0.0.
So Let’s define llvm.minnum and llvm.maxnum to IEEE754-2008 with +0.0>-0.0.

The architectures without such instructions can implements NSZ flavor to speed up,
and the frontend, such as clang, can call them with nsz attribute.

To me, this is basically saying “some architectures support ordering signed zeros, so it’s fine to require that behavior for all of them, even though it’s beyond what IEEE754-2008 requires. Architectures without those instructions can just emulate them at significant cost, and use a scary fast-math flag to get the actual IEEE754-2008 behavior. I haven’t actually implemented that emulation, by the way.” Citing MIPSr6 and LoongArch while glossing over the performance penalty on x86 is a particularly bold editorial decision.

Most of the discussion in that PR seems to be about the much trickier topic of sNaN behavior, so the (significantly less justifiable) signed-zero behavior change slipped mostly under the radar.

It also seems worth documenting more precisely that this attribute can make otherwise deterministic operations non-deterministic, and what exactly the set of allowed non-deterministic outcomes is. (And double-checking that scalar evolution, and anywhere else where this is relevant, account for this non-determinism that can occur even for non-NaN inputs.)

FWIW I created a PR to do that for pow which has the same issue. That could be a good testing ground for hashing out the exact semantics and wording we want here.

I think the idea here is that some targets do natively support the “IEEE 2008 sNaN semantics + deterministic signed zero”, and in that case exposing those semantics is useful, esp. because it means you can lower maximumnum to canonicalize + maxnum without extra signed zero handling.

Basically, specifying deterministic signed zero and allowing to set nsz to get the non-deterministic behavior is strictly more expressive. So I think that does make sense. (But yes, of course the backend also needs to respect this. This is part of the ongoing mess…)

That should work for Rust, yeah. (nsz = “no signed zeros” sure sounds like an attribute promising “the compiler may assume that there are no signed zeros here”, which would not be what we want. But it seems that’s just a misnomer and what it really means is more like “non-deterministically alter the sign of all zeros before applying the usual semantics of this operation”.)

In my understanding, for Rust we very deliberately want the SNaN treatment of IEEE754-2019 maximumNumber, not the 2008 variant, as the 2008 variant is non-associative. We also want the 2008-style treatment of signed zeros to get efficient codegen on x86. So as long as we have some way to express that, we should be good. Now, it turns out that as of today we don’t actually correctly implement the 2019 SNaN treatment on aarch64, so by now I do agree that the old status quo before the docs got changed also was broken. However, we should ensure that during the transition, we don’t regress targets that already correctly implement the desired SNaN treatment, such as x86. So in that sense it may still be worth to revert this commit. (EDIT: And it seems that happened when this PR landed. :tada: )

My key point is that if a user write a code like this

#include <math.h>

...
fmaxf(a, b);
...

They won’t get a surprise that the result from code generated by LLVM is different with the result from libm.so. As the user will see the documents of glibc.

This is not persuasive. Backends have the full control in lowering every intrinsics. E.g., we have different code in X86 to lower maxnum and maximumnum respectively. So the lowering of maximumnum doesn’t necessarily rely on maxnum. It looks to me like the only benefit is some targets can share their code for maxnum, in the cost of some targets have to emulate it with more code.

As a candidate of lowering target, it’s absolutely fine for glibc that has more strict definitions and have different result with LLVM, as long as both conform to specifications. For example, when -fno-math-errnospecified, some targets will use native instructions, which ofc have different result when errno envoled. Similarly, ignoring SNAN and signed zero are also allowed for target acceleration. The only difference here is the former needs explicitly enabled, while the latter needs explicitly disabled (through strictfp and/or new C functions fmininum/fminimum_num).

Yes. IEEE has cursed us to dealing with these semantics, somewhere. It’s completely inescapable in CodeGen, and I think there’s some value into surfacing that into the IR.

They never have been correct across all backends. IIRC only AMDGPU implemented the quieting correctly, such that AArch64 and other targets with hardware instructions were really getting the new documented behavior. For targets emitting calls to libms fmin/fmax, glibc used to implement the signaling-is-quiet behavior and then switched to the 2008 behavior in some version.

The consequence of this is I receive an endless stream of questions and complaints asking why there are extra instructions compared to other targets. In the real world, nobody cares about signaling nans, and do care about not having spurious extra instructions. I’m confused by the recent burst of interest in this; where are the signaling nans coming from? Clang doesn’t implement initialize-to-snan, so are users manually synthesizing signaling nans?

Almost; we should not fully remove them but users should generally not be using them. They will still be useful as a tool to implement optimizations to avoid unnecessary canonicalizes.

This was always supposed to be the case since the start of strictfp. I’m not sure a plan was ever established for the experimental constrained minnum/maxnum.

These are the semantically better operations that closer match the old documented behavior. We can do a better job eliminating the canonicalizes if we preserve the IEEE 2008 behavior intrinsics, as an aid to expansion of the new 2019 operations

+1 this is basically the reason to do it. The nondeterminism is a hindrance to signed zero value tracking. You can get back the codegen benefit with nsz.

The problem is not of lowering, but analysis. If an operation is more permissive, the value tracking needs to give more conservative results.

The problem here is the signaling nan behavior is radically different, it’s not a question of strictness or not; you cannot simply ignore it.

Is this about adding extra instructions to emulate IEEE754-2008’s behavior?

This unfortunately goes the other way too, since the newer IEEE754-2019 minimumNumber/maximumNumber require canonicalizing the input operands on AArch64.

I think there are two sets of semantics we could choose from to allow these operations to not generate extra instructions:

  • Add the nsnan flag and use it in those operations, so they care about and handle signaling NaNs unless you opt out.
  • Allow the operations to assume that neither input is a signaling NaN, but add constrained versions that do handle signaling NaNs.

For the versions that implement precise signaling NaN semantics, the implication is that LLVM’s “Unchanged NaN propagation” semantics are invalid, and optimizing max(x, x) to x therefore cannot happen. That makes me more inclined to go with the latter choice, since constrained operations are the established way to opt out of LLVM’s floating-point semantics.

I believe this is the case, yes. It’s not something you’d expect people to actually do in practice, but it’s perfectly legal to construct a signaling NaN using a bitcast and then store it in a struct or something.

The annoying part is the tradeoff you mentioned between nondeterminism and constant folding. If you say that a signaling NaN operand results in flat-out undefined behavior or returns a poison value or something, you can have both the value-tracking optimizations and the more efficient instruction lowerings, but that’s problematic for safe languages like Rust.

There might be a way to formalize ad-hoc concepts like “we are allowed to eliminate operations that would convert a sNaN to a qNaN, still pretend that things like minnum and maxnum have deterministic behavior during constant folding, and not introduce a poison value anywhere”, but I’m skeptical. Typically, you come up with a set of operational semantics and then derive the permitted optimizations from there, not vice versa.

Would a nsnan flag on minimumnum/maximumnum work equally well?

If the constrained/strictfp operations already handle qNaN and sNaN (or are at least supposed to, even if that’s not implemented in all backends), is it just a matter of mentioning it in the language reference then?


It seems like we may need to define specific operational semantics/special values for floating-point types. IIUC, LLVM provides the following facilities for working with deferred UB:

  • undef, which represents an arbitrary, non-fixed, but valid bit pattern for a given type. It’ll be removed eventually.
  • poison, which represents an arbitrary, non-fixed, potentially invalid bit pattern for a given type.
  • The freeze instruction, which converts a poison value to an arbitrary, fixed, valid bit pattern of a given type.

For floating-point operational semantics, and I’m spitballing here, perhaps there could be something like poison or undef but narrower. Instead of “any arbitrary bit pattern”, it would be a limited set of values, such as “NaN or this specific constant”, “-0.0 or +0.0”, etc. The freeze instruction would arbitrarily choose one of those values.

What’s the value of value tracking on minnum/maxnum? Both the C fmin/fmax and IEEE clarify the sensitive to the sign of zero is not needed. And we have new minimum/minimumnum functions and intrinsics that track it well. I applaud to use nsz for minimum/minimumnum, but I oppose using it for minnum/maxnum.

I’m confused too. If they are not important, why it’s needed highlighted in #112852? Why were you opposed #168838 to relax it?

And the reasoning applies to signed zero too.

It’s a systematic work to support signaling nan behavior. We need to disable all optimizations that misbehave on SNAN and respect the exception of SNAN in a function granularity, i.e., the strictfp environment. Just like you cannot use a single constrained intrinsic in non strictfp function, defining and implementing a single intrinsic complies signaling nan is meaningless. We are not on the half way. There’s no way ahead.

No, the opposite. The hardware instruction has the IEEE-754-2008 behavior, and we insert canonicalizes to get the signaling-nan-as-quiet behavior (i.e., it is identical to the minimumnum expansion). The problem with the old intrinsic definition is there’s no way to get to the underlying instruction behavior. So this is the same situation AArch64 is in, it’s just AArch64 wasn’t/isn’t using the expansion to insert the canonicalizes and violated the old specification.

There aren’t any free bits for fast math flags. There’s an older RFC out scrounging for bits to add new ones. In this particular case, since these are intrinsics, you could maybe get away with adding nofpclass(snan) on the callsite’s arguments. We don’t have a way to emit that from a source language though. But if we’re asserting no signaling nans, that means the result is poison which isn’t what we want either.

By constrained versions, do you mean an additional pair of intrinsics, which are not strictfp, with these semantics? Just punting this to “use strictfp” is part of the problematic status quo we’ve been in for years. It doesn’t address the production non-strictfp case, and strictfp is a largely stalled project. But this is more or less how I envisioned minnum/maxnum landing. They are there if you know the target supports them, but otherwise should be avoided.

Yes

Not really. Turning signaling nan into poison is too strong. Users who just want the hardware instruction shouldn’t need to introduce UB on signaling nan.

If we wanted to insert nsnan or nofpclass(snan)s automatically, we’re still limited by the lack of guarantees in the IR for when quieting will occur to actually introduce that. An nsnan flag would be more useful in codegen, particularly if we strengthened the rules in SDAG/gMIR to mandate canonicalizing operations actually canonicalize.

The main one would be proving the sign bit isn’t 0, which can enable downstream no-signed-zeros assumptions. e.g. using maxnum(x, 0.0) with fuzzy zero handling can’t be assumed to produce a 0 sign bit.

It’s not required, but ordering the sign bit is a better quality of implementation (and I’m not aware of any hardware implementation which doesn’t respect this). As the possible codegen benefit is equally expressible with the nsz bit, strengthening the signed zero handling is strictly more expressive.

nsz is orthogonal to any particular operation. You can equally attach nsz to either, it doesn’t make sense to distinguish these. Forcibly embedding nsz into the underlying instruction definition doesn’t buy any benefit.

They’re not important, but that doesn’t mean we can just choose to miscompile a signaling nan this way. We’re allowed to not quiet, but returning an entirely different result is a big problem. All of the real world users who do not care about signaling nans pay the cost of additional instructions to quiet the signaling nans. We’ve gone a very long time with AArch64 and other targets not handling signaling nans as written without complaints I’ve seen.

Signed zero is much, much more important than signaling nans.

Getting the correct value and floating-point exceptions are different cases. minnum/maxnum shouldn’t require full exception support

Important or not depends on justifications and majority opinions on these justifications.

I don’t know for other architectures, but more than one people pointed out X86 doesn’t respcet it. It is just the motivation we object this change. Quote from Intel SDM rearding MAXPD:

If the values being compared are both 0.0s (of either sign), the value in the second operand (source operand) is returned.

This is not a direct answer to my original question.

  • `maximumnum(x, 0.0)` covers it well;
  • For middle-end usage, maxnum carries `nsz` (implicitly or explicitly) already generated by FE;
  • For backend usage, they have the knowledge on the sensitivity of signed zeros of instrcutions and ablity to manipulate them;

There’s a clear distinction here. By (original) defination, minnum implies `nsz` while minimum/minimumnum doesn’t. All the LLVM bitcode till now (since the FE still not changed to emit `nsz`) using minnum/maxnum has the inherent flag. All of them will regress once we changed the X86 implementation to reflict the new defination, which is unacceptable from my point of view.

In fact, what we need is an inverse flag like `hsz` for them. And an alternative intrinsic like minnum_ieee can achieve the same goal very well.

So quite the contrary, changing the defination doesn’t buy any benefit, but a lot of harms.

A correct value from a local intrinsic is meaningless if the global path is indeterminate. Just like consant folding, exception can short out the flow graph to minnum/maxnum intrinsics too. The deterministic can only be guaranteed when both are determinate.

MAXPD does not implement any variant of maxnum. It is implementing compare and select, so it’s not accurate to say it doesn’t respect this behavior; it doesn’t have this operation at all.

Of course you could write it that way, but we want to optimally handle the existing code in the wild that uses maxnum.

The bitcode compatibility promise is old bitcode should be functionally correct by newer bitcode readers. This is merely a performance regression for old bitcode, which is fine. Preserving performance of old bitcode should be a non-goal.

No. Operations should be the most conservative possible form, and have relaxation annotations on them. You cannot have a fast math flag that introduces new constraints.

MAXPD is the cornerstone for every maxnum/maximum/maximumnum. There’s no alternative instructions (except the very recent VMINMAXPD which is not supported by any products in the market) to implement them. So it’s accurate to conclude signed zero handling is not respected on X86.

This is confused. What in particular “the existing code” mean for?

  • The C/Rust code? Comparing with the nsz + minmum/minimum/minimumnum proposes, isn’t a new minmum_ieee the least invasive solution?
  • LLVM bitcode? Are you saying performance in not important to others but only to you?
  • Anything else?

It is not maxnum, nor is it claiming to be. The fact that you may use it to implement the other operations doesn’t change anything. It simply means you can make use of nsz in the lowering.

All of the code in the world using fmin/fmax or whatever the equivalent in the source language is.

As for bitcode auto-upgrade, there should be no expectation of optimal performance on auto-upgrade. If you want optimal performance, compile every component with the latest compiler. We’ve never declined an IR change simply because it would performance regress auto-upgraded legacy bitcode