Pre-RFC: Add explicitly-named numeric conversion APIs

SimonSapin · 2019-11-29T13:31:34.492Z

Feature Name: numeric-conversions
Start Date: 2019-10-30
RFC PR: rust-lang/rfcs#0000
Rust Issue: rust-lang/rust#0000

Summary

Add explicitly-named standard library APIs for conversion between primitive number types with various semantics: truncating, saturating, rounding, etc.

This RFC does not attempt to define general-purpose traits that are intended to be implemented by non-primitive types, or to support code that wants to be generic over number types.

Motivation

Status quo as of Rust 1.39

The as keyword allows converting between any two of Rust’s primitive number types:

u8
u16
u32
u64
u128
i8
i16
i32
i64
i128
usize
isize
f32
f64

However the semantics of that conversion varies based on the combination of input and output type. The Rustonomicon documents:

casting between two integers of the same size (e.g. i32 -> u32) is a no-op

casting from a larger integer to a smaller integer (e.g. u32 -> u8) will truncate

casting from a smaller integer to a larger integer (e.g. u8 -> u32) will

zero-extend if the source is unsigned

sign-extend if the source is signed

casting from a float to an integer will round the float towards zero

NOTE: currently this will cause Undefined Behavior if the rounded value cannot be represented by the target integer type. This includes Inf and NaN. This is a bug and will be fixed.

casting from an integer to float will produce the floating point representation of the integer, rounded if necessary (rounding to nearest, ties to even)

casting from an f32 to an f64 is perfect and lossless

casting from an f64 to an f32 will produce the closest possible value (rounding to nearest, ties to even)

(Note: the proposed fix for the float to integer case is to make the conversion saturating.)

Additionally, the general-purpose From trait (and therefore TryFrom through the blanket impl<T, U> TryFrom<U> for T where U: Into<T>) is implemented in cases where the conversion is exact: when every value of the input type is converted to a distinct value of the output type that represents exactly the same real number.

The TryFrom trait is also implemented for the remaining combinations of integer types, returning an error when the input value is outside of the MIN..=MAX range supported by the output type. For this purpose usize and isize are conservatively considered to be potentially any size of at least 16 bits, to avoid having non-portable From impls that only exist on some platforms.

The table below exhaustively lists those impls, with F indicating a From impl and TF indicating (only) TryFrom. Rows are input types, columns outputs.

↬	u8	u16	u32	u64	u128	i8	i16	i32	i64	i128	usize	isize	f32	f64
u8	F	F	F	F	F	TF	F	F	F	F	F	F	F	F
u16	TF	F	F	F	F	TF	TF	F	F	F	F	TF	F	F
u32	TF	TF	F	F	F	TF	TF	TF	F	F	TF	TF		F
u64	TF	TF	TF	F	F	TF	TF	TF	TF	F	TF	TF
u128	TF	TF	TF	TF	F	TF	TF	TF	TF	TF	TF	TF
i8	TF	TF	TF	TF	TF	F	F	F	F	F	TF	F	F	F
i16	TF	TF	TF	TF	TF	TF	F	F	F	F	TF	F	F	F
i32	TF	TF	TF	TF	TF	TF	TF	F	F	F	TF	TF		F
i64	TF	TF	TF	TF	TF	TF	TF	TF	F	F	TF	TF
i128	TF	TF	TF	TF	TF	TF	TF	TF	TF	F	TF	TF
usize	TF	TF	TF	TF	TF	TF	TF	TF	TF	TF	F	TF
isize	TF	TF	TF	TF	TF	TF	TF	TF	TF	TF	TF	F
f32													F	F
f64														F

Preferring explicit semantics

When looking at code with a $expr as $ty cast expression, the semantics of the conversion are often not obvious to human readers. Deducing the type of the input expression usually requires looking at other parts of the code, possibly distant ones. In some cases it’s even possible to make the compiler infer the output type, with syntax like foo() as _.

It’s also possible for those types to change when a possibly-distant part of the code is modified. A cast that was previously exact could suddenly have truncation semantics, which might be incorrect for a given algorithm.

To avoid this, it’s preferable to use for example an explicit u32::from(foo) call instead of casting with as. In fact Clippy has a lint for exactly this (though a silent by default one).

In some other cases however, truncation or some other conversion semantics might be the desired behavior. Communicating that intent to human readers is just as useful then as it would be with a from call.

(Not yet) deprecating the `as` keyword

Because of the ambiguity described above, deprecating as casts entirely has been discussed before.

Providing an alternative with something like this RFC would be a prerequisite, but this RFC is not proposing such a deprecation.

Guide-level explanation

For the purpose of conversion semantics, Rust has two kinds of primitive number types: floating point and integer. This makes four combinations of input and output kind.

For a given conversion let’s call:

I the input type
i the input value
O the output type
o the output value, the result of the conversion: let o: O = convert(i: I);

Exact conversions

For combinations of primitive number types where they are implemented, the general-purpose convert::Into and convert::From traits offer exact conversion: o always represents the exact same real number as i.

The I::into(self) -> O method and I::from(O) -> Self constructor are available without importing the corresponding trait explicitly, since the traits are in the prelude.

Integer to integer

For all combinations of primitive integer types I and O, the standard library additionally provides:

The I::try_into<O>(self) -> Result<O, E> method and O::try_from<I>(I) -> Result<Self, E> constructor for fallible conversion.

These are inherent methods of primitive integers that delegate to the general-purpose convert::Into and convert::From traits. Although these traits are not in the prelude, they do not need to be in scope for the inherent methods to be called.

This returns an error when i is outside of the range that O can represent. The error type E is either convert::Infallible (where a From is also implemented) or num::TryFromIntError.
The ~~I::modulo_to<O>(self) -> O~~ I::wrapping_to<O>(self) -> O method for wrapping conversion, also known as bit-truncating conversion.

In terms of arithmetic, o is the only value that O can represent such that o = i + k×2ⁿ where k is an integer and n is the number of bits of O.

In terms of memory representation, this takes the n lower bits of the input value. The upper bits are truncated off. This is an a sense opposite of float-to-integer truncation where the less-significant fractional part is truncated off.

For example, 0xCAFE_u16 maps to 0xFE_u8, and 130_u32 to -126_i8.

Note: This is the behavior of the as operator.
The I::saturating_to<O>(self) -> O method for saturating conversion. o is the value arithmetically closest to i that O can represent. This is O::MIN or O::MAX for underflow or overflow respectively.

Float to float, integer to float

For all combinations of primitive number types I (floating point or integer) and primitive floating point type O, the standard library additionally provides:

~~I::round_to<O>(self) -> O~~ I::approx_to<O>(self) -> O for approximate conversion.

o is the value arithmetically closest to i that O can represent. Overflow produces infinity of the same sign as i.

For floating point I, rounding may happen due to precision loss through fewer mantissa bits. For integer I, rounding may happen for large values (positive or negative).

Rounding is according to roundTiesToEven mode as defined in IEEE 754-2008 §4.3.1: pick the nearest floating point number, preferring the one with an even least significant digit if exactly halfway between two floating point numbers.

Note: This is the behavior of the as operator.

Float to integer

For all combinations of primitive floating point type I and primitive integer type O, the standard library additionally provides:

I::saturating_to<O>(self) -> O for saturating truncating conversion.

The fractional part of i is truncated off in order to keep the integral part. That is, the value is rounded towards zero.

Underflow maps to O::MIN. Overflow maps to O::MAX. NaN maps zero.

Note: this may become the behavior of the as operator in a future Rust version.
I::unchecked_to<O>(self) -> O for unsafe truncating conversion.

The fractional part of i is truncated off in order to keep the integral part. That is, the value is rounded towards zero.

This method is an unsafe fn. It has Undefined Behavior if i is infinite, is NaN, or cannot be represented exactly in O after truncation.

Note: This is the behavior of the as operator as of Rust 1.39, even though it can be used outside of any unsafe block or function.

Reference-level explanation

Everything discussed in this RFC is defined in the core crate and reexported in the std crate.

Exact, fallible, and unsafe truncating conversion conversions described above already exist in the standard library. FIXME: this assumes PR #66852 and PR #66841 are accepted and have landed.

Inherent methods are added that delegate calls to the corresponding trait method. They are generic to support multiple return types. Some of these impls are macro-generated, to reduce source code duplication:

impl $Int {
    // Added in https://github.com/rust-lang/rust/pull/66852
    pub fn try_from<T>(value: T) -> Result<Self, Self::Error>
        where Self: TryFrom<T> { /* … */}
    pub fn try_into<T>(self) -> Result<T, Self::Error>
        where Self: TryInto<T> { /* … */}

    pub fn wrapping_to<T>(self) -> T where Self: IntToInt<T> { /* … */}
    pub fn saturating_to<T>(self) -> T where Self: IntToInt<T> { /* … */}

    pub fn approx_to<T>(self) -> T where Self: IntToFloat<T> { /* … */}
}

impl $Float {
    pub fn approx_to<T>(self) -> T where Self: FloatToFloat<T> { /* … */}

    pub fn saturating_to<T>(self) -> T where Self: FloatToInt<T> { /* … */}

    // Added in https://github.com/rust-lang/rust/pull/66841
    pub unsafe fn unchecked_to<T>(self) -> T where Self: FloatToInt<T> { /* … */}
}

Four supporting traits are added to the convert module:

mod private {
    pub trait Sealed {}
}

pub trait IntToInt<T>: self::private::Sealed {
    // Supporting methods…
}

pub trait IntToFloat<T>: self::private::Sealed {
    // Supporting methods…
}

pub trait FloatToFloat<T>: self::private::Sealed {
    // Supporting methods…
}

pub trait FloatToInt<T>: self::private::Sealed {
    // Supporting methods…
}

Each trait has methods with the same signatures as inherent methods that delegate calls to them.

The sealed trait pattern is used to prevent impls outside of the standard library. This will allow adding more methods after the traits are stabilized. See Future possibilities below.

The traits are implemented for all relevant combinations of types. Again, some of these impls are macro-generated:

impl IntToInt<$OutputInt> for $InputInt { /* … */ }

impl IntToFloat<$OutputFloat> for $InputInt { /* … */ }

impl FloatToFloat<$OutputFloat> for $InputFloat { /* … */ }

impl FloatToInt<$OutputInt> for $InputFloat { /* … */ }

Drawbacks

This adds a significant number of items to libcore. However primitive number types already have numerous inherent methods and trait methods, so this isn’t unprecedented.

If the as keyword is never deprecated or until it is, we would in many cases have two ways of doing the same thing.

Rationale and alternatives

The “shape” of the API could be different. Namely, instead of inherent methods that delegate to supporting traits we could have:

Plain trait methods, with traits that need to be imported into scope. This less convenient to users.
Plain trait methods, with traits in the prelude. The bar is generally high to add anything to the prelude.
Non-generic inherent methods that include the name name of the return type in their name: wrapping_to_u8, wrapping_to_i8, wrapping_to_u16, … This causes multiplicative explosion of the number of new items.

This RFC however makes no active attempt at supporting callers who are themselves generic to support multiple number types. Traits are only used as a way to avoid multiplicative explosion.

This RFC proposes adding multiple conversions methods with various semantics even for combinations of types where they are “useless” because the conversion is always exact. For example, u8::wrapping_to<i32> and u8::saturating_to<i32> both behave the same as <u8 as Into<i32>>::into. This avoids the question of what to do about the portability of impls for usize and isize.

In the case of float to float conversion specifically, I = f64 and O = f32 is the only combination that is really useful. We could have only f64::approx_to(self) -> f32 instead of generic methods with a trait. Keeping a trait anyway makes this more consistent with the other kinds of conversions, and is compatible with a future addition of new primitives floating point types (f16, f80, …) in case those are ever desired.

Prior art

FIXME

Unresolved questions

FIXME

Future possibilities

This pattern of API is extensible and supports adding more methods with different conversion semantics. For example:

Wrapping approximate floating point to integer conversion that “wraps around” instead of saturating. (But what to do about infinities and NaN?)
Fallible approximate floating point to floating point conversion that returns an error instead of mapping a finite value to infinity
Fallible approximate floating point to integer conversion that returns an error for NaN and instead of saturating to MAX or MIN.
Fallible exact conversion that never rounds and returns an error if the input value doesn’t have an exact representation in the output type, for some subset or all of:
- Integer to floating point
- Floating point to integer
- Floating point to floating point

This RFC doesn’t explore which of these (or others) are useful enough to merit adding to the standard library.

SimonSapin · 2019-11-29T13:32:16.153Z

Any feedback or help with the prior art or unresolved question is appreciated!

Shnatsel · 2019-11-29T13:59:51.352Z

SimonSapin:

In fact Clippy has a lint for exactly this (though a silent by default one).

I am surprised that it is not enabled by default. I am sure I have encountered this lint without passing --pedantic or any other flags, and found its output helpful.

https://github.com/rust-lang/rust-clippy/tree/master/clippy_lints/src does not list this lint, so I cannot trace its history directly. I could not find any relevant issues on the bug tracker either.

I would also be glad to see Clippy lints suggesting the explicit methods in place of as, if this RFC is accepted.

Shnatsel · 2019-11-29T14:20:44.763Z

I find the behavior of NaN being silently turned into 0 on float-to-int conversion very surprising. It feels like a major footgun. I suggest making this method failible, just like try_into(), and returning error on NaN and possibly +/-inf.

Looking through the original issue, it seems this has been already proposed and discussed in detail. It seems that the CPUs allow efficient detection of inf and NaN conversions to integer, too. It was not an option for as keyword since that must be infallible, but sounds like a great option for this API.

Bikeshedding ahead: It is also surprising to have modulo_to produce a negative value - anything called "modulo" should return only non-negative numbers in my mind. It is also inconsistent with try_into(). I suggest calling it truncate_into(). Also, use of the word "round" in float-to-float conversions is surprising as it's already reserved for turning a float into an integer - see e.g. the built-in round() function.

Other than that, looks very good to me! I would be glad to see this in the language.

phaylon · 2019-11-29T14:23:19.764Z

Shnatsel:

https://github.com/rust-lang/rust-clippy/tree/master/clippy_lints/src does not list this lint, so I cannot trace its history directly. I could not find any relevant issues on the bug tracker either.

It's in the types submodule: Link.

Shnatsel · 2019-11-29T14:25:57.485Z

Thanks for the link! I've found the commit that made it pedantic, I'll go dispute it.

XAMPPRocky · 2019-11-29T14:39:26.793Z

I really like this change, however I wouldn't be happy with the implementation proposed. I would love to be able to use these functions in a generic context just like I can use From/Into and TryFrom/TryInto. I think it would be better to have a generic like SaturatingTo<usize> rather than having to have two different methods for when it's IntToInt and FloatToInt. It would help reduce a lot of boilerplate that currently is only solved by writing macros, and personally I would like to be able to move away from writing macros for writing numeric generic code and towards using traits.

I also don't think these methods should be behind sealed traits as there are plenty of numeric types in the library space such as bignum that should also be able to implement these traits so they can feel and be used more interchangeably with the numeric types in the language.

Nokel81 · 2019-11-29T14:53:17.462Z

I can get behind the NaN and +/-inf returning an error. Even quite explicit ones,

enum RoundError {
    Nan,
    InfinityPositive,
    InfinityNegative
}

impl f64 {
    type Output = u32;

    fn try_round_to<Output>(self) -> Result<Output, RoundError>;
}

Or something similar (since this RFC doesn't recommend traits, though I think that it should)

pcpthm · 2019-11-29T15:08:34.939Z

I agree that as should be replaced by more explicit methods. Below are just nits.

For combinations of primitive number types where they are implemented, the general-purpose convert::Into and convert::From traits offer exact conversion: o always represents the exact same real number as i.

Floating-point types are not subsets of the real number +/- Infinity and NaN. Also, there is -0. Is converting -0f32 to f64 preserve -0? Is it can be 0? Is a bit pattern of NaN preserved?

The I::modulo_to<O>(self) -> O method for modulo conversion, also known as bit-truncating conversion.

I find this naming confusing as modulo_to seems like a binary operation. I propose it I::bit_truncate_to.

Write that this method is only implemented if and only if the number of bits of O is less than the number of bits of I.

The I::saturating_to<O>(self) -> O method for saturating conversion. o is the value arithmetically closest to i that O can represent. This is O::MIN or O::MAX for underflow or overflow respectively.

The condition of the method is the same as above.

"Arithmetically close" is not a usual term and just "close" is enough I think. Also, the term "underflow" has a different meaning when used for floating-point numbers. Alternative terms are "negative overflow" and "positive overflow".

Rounding is according to roundTiesToEven mode as defined in IEEE 754-2008 §4.3.1: pick the nearest floating point number, preferring the one with an even least significant digit if exactly halfway between two floating point numbers.

In my opinion, the rounding mode should be more explicit. I propose to name it I::round_ties_to_even and preferably provide other rounding modes.

SimonSapin · 2019-11-29T15:15:45.389Z

I’d rather not go into the merits of mapping NaN to zero. Please consider that part as a placeholder for “whatever as will do after https://github.com/rust-lang/rust/issues/10184 is fixed”. If you’d prefer as to have some other behavior, please propose it in that thread.

It’s quite possible that I’m using the word modulo incorrectly. I see that we already have APIs like wrapping_add, so wrapping_to is probably a better name.

Regarding to v.s. into I’ll refer to API naming guidelines. The Into trait has this name because it is general-purpose and also used with some input types that are !Copy. Primitive number types however are Copy, and taking self by value does not transfer ownership.

Would approximate_to or approx_to sound better than round_to? “Truncate” seems incorrect in that case since the approximation is to the nearest, not towards zero.

SimonSapin · 2019-11-29T15:15:56.521Z

The standard library already makes it a non-goal to support code generic over number types. Before 1.0 it had Int and Float traits that were moved to the num crate. I feel changing that should come before supporting conversions in generic-number code. And it’s stuff for another RFC.

I also (and this is for the unwritten Prior Art section) wanted to depart from the FromLossy / TryFromLossy RFC which proposes general-purpose traits. IMO a big problem of that RFC is that it’s hard to give a single useful definition of “lossy” where there can be such variaety of conversion semantics for conversion between two given types.

Not supporting something like bignum is also deliberate: details of those precise semantics might not apply, and general-purpose is hard. A library it free to define its own APIs (including extension traits) for interacting with primitive number types.

SimonSapin · 2019-11-29T15:21:48.103Z

Nokel81:

I can get behind the NaN and +/-inf returning an error.

That sounds like fallible rounding conversion discussed in Future possibilities.

pcpthm:

Floating-point types are not subsets of the real number +/- Infinity and NaN. Also, there is -0. Is converting -0f32 to f64 preserve -0? Is it can be 0? Is a bit pattern of NaN preserved?

This is an attempt at describing the existing behavior of as. The Nomicon only has this to say: “casting from an f32 to an f64 is perfect and lossless”

pcpthm:

Write that this method is only implemented if and only if the number of bits of O is less than the number of bits of I .

That’s what what this RFC proposes. The Alternatives section mentions it.

pcpthm · 2019-11-29T15:25:46.229Z

SimonSapin:

That’s what what this RFC proposes. The Alternatives section mentions it.

Sorry, I didn't read that section. ~~But then~~

In terms of arithmetic, o is the only value that O can represent such that o = i + k×2ⁿ where k is an integer and n is the number of bits of O .

~~is not well defined.~~

Edit: I was wrong.

SimonSapin · 2019-11-29T15:27:59.860Z

pcpthm:

is not well defined.

Could you say more about what’s wrong with it?

Shnatsel · 2019-11-29T15:33:50.942Z

I see now that fallible float-to-int conversion is listed under future possibilities, and we probably do need an explicit method that matches the behavior of as, so I retract my objection. I still feel that fallible float-to-int conversion important enough and essential enough to pursue in tandem with the other methods described here, but it is not a blocker for this RFC.

approx_to does indeed sound better than round_to.

XAMPPRocky · 2019-11-29T15:39:23.292Z

SimonSapin:

The standard library already makes it a non-goal to support code generic over number types. Before 1.0 it had Int and Float traits that were moved to the num crate. I feel changing that should come before supporting conversions in generic-number code. And it’s stuff for another RFC.

I feel like that argument is moving the goal posts of what I proposed. What I proposed would be no more generic or general purpose than TryFrom<usize> or using any of the ops traits both of which are already in the standard library.

I also don't accept the argument of supporting generic numeric code or interoperablity with library numerics being a non-goal when this is not something that is documented and something that was not decided in an open process.

SimonSapin:

Not supporting something like bignum is also deliberate: details of those precise semantics might not apply, and general-purpose is hard. A library it free to define its own APIs (including extension traits) for interacting with primitive number types.

To me this is an argument against including this API in the standard library. General purpose is hard, which is what this API is, even if it's the exact semantics of the as operator. So wouldn't it make sense for this API to also belong in num and not in std?

I don't see why this would be the special case, yes using the as operator is a pain, so is a lot of how the language special cases numerics, and in-completeness of operators having equivalent traits. Why is this one different?

robinm · 2019-11-29T16:15:22.489Z

Maybe I missed it, but I think that the conversions from float to int/float towards positive and negative infinity, as well as toward zero are missing. From what I saw, you only propose rounding toward nearest.

SimonSapin · 2019-11-29T16:38:11.065Z

So in general this RFC does not immediately propose adding every kind of conversion there is, mostly those that are done with as today. And make so that more can be added later.

For those specifically, can they be implemented any more efficiently than chaining with .ceil(), .floor(), or .trunc()? If not, do you still think it’s wort having dedicated APIs?

tspiteri · 2019-11-29T17:23:43.657Z

I did some experimenting in this area and wrote the az crate, which uses traits.

You can write i.az::<O>() or cast::<I, O>(i) instead of i as O, with overflow panicking with debug_assertions or wrapping otherwise.
There are other traits/casts for checked, saturating, wrapping and overflowing conversions, e.g. i.checked_as::<O>().

For floating-point to integer conversions I chose truncation by default, but rounding is supported via a Round wrapper, so that

assert_eq!(0.6f32.az::<i32>(), 0);
assert_eq!(Round(0.6f32).az::<i32>(), 1);
// Ties round to even.
assert_eq!(Round(0.5f32).az::<i32>(), 0);
assert_eq!(Round(1.5f32).az::<i32>(), 2);

I also wrote conversions into e.g Wrapping<i32>, which does wrapping.
For floating-point to integer conversions I used bitwise manipulation instead of intrinsic conversions as it wasn't clear what the guarantees where; this is probably much slower but every behavior is defined. And I was more interested in the API at this point.

robinm · 2019-11-29T17:32:34.616Z

I may miss something, but for me, .ceil(), .floor() and .truc() don't cover the same need.

// example with positive numbers
// 10.2 and 10.7 can round to 10 or 11 but not all functions give the same results

assert(approx_to<i32>(10.2_f64) == 10);
assert(approx_toward_infinity<i32>(10.2_f64) == 11);
assert(approx_toward_negative_infinity<i32>(10.2_f64) == 10);
assert(approx_toward_zero<i32>(10.2_f64) == 10);

assert(approx_to<i32>(10.7_f64) == 11); // different rounding
assert(approx_toward_infinity<i32>(10.7_f64) == 11);
assert(approx_toward_negative_infinity<i32>(10.7_f64) == 10);
assert(approx_toward_zero<i32>(10.7_f64) == 10);

// example with negative numbers
// likewise -10.2 and -10.7 can round to -10 or -11 but not all functions give the
// same results (except for the sign) than with positive numbers

assert(approx_to<i32>(-10.2_f64) == -10);
assert(approx_toward_infinity<i32>(-10.2_f64) == -10); // different rounding than with +10.2
assert(approx_toward_negative_infinity<i32>(-10.2_f64) == -11); // different rounding than with +10.2
assert(approx_toward_zero<i32>(-10.2_f64) == -10);

assert(approx_to<i32>(-10.7_f64) == -11);
assert(approx_toward_infinity<i32>(-10.7_f64) == -10);
assert(approx_toward_negative_infinity<i32>(-10.7_f64) == -11);
assert(approx_toward_zero<i32>(-10.7_f64) == -10);

Chaining function may work for floats into integers, but I don't think it is possible to do the same when converting f64 into f32 (or any big floats to a smaller float). Note that I only used integer in the above example, because it's easier to see what the result will be when rounding compared to the same operation with floats.

Also ceil() is like approx_toward_infinity, but only for positive numbers and it doesn't work for f64 to f32. If you want approx_toward_positive_infinity for negative numbers, you need to use floor() with the same limitation.