Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Better addition semantics #7416

Open
SpexGuy opened this issue Dec 12, 2020 · 7 comments
Open

Proposal: Better addition semantics #7416

SpexGuy opened this issue Dec 12, 2020 · 7 comments
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@SpexGuy
Copy link
Contributor

SpexGuy commented Dec 12, 2020

As I understand it, the binary operators + and - currently behave as follows:

  1. use peer type resolution between the two operands to compute the intermediate type
  2. coerce both operands to the intermediate type
  3. perform the operation, with overflow being checked illegal behavior
  4. coerce the result from the intermediate type to the result type, if applicable.

Peer type resolution only supports certain combinations though, so there are valid computations that are not expressible with these rules. For example: say I have a u32 with value 0xFFFFFFFF, and I want to add to that the i32 -4, and store the result in a u32. This addition is valid, but there is currently no way to do this without using +% and completely giving up overflow checks. To make this compile with overflow checks, you either need to cast the u32 to an i32 (trips overflow check), cast the i32 to a u32 (trips overflow check), or use i33 (unnecessary perf hit in release). This is a problem that needs to be fixed.


Examples of things that I think should and shouldn't work:

// helper function to make values runtime known
fn as(comptime T: type, val: T) T { return val; }

export fn tests() {
var x = as(u16, 32) + as(i16, -10); // compile error, no type for x
var x = as(u16, 32) + as(u16, 10); // resolves to unsiged
var x = as(i16, 32) + as(i16, -39); // resolves to signed
var x: u16 = as(u16, 32) + as(i16, -10); // should work, does not overflow
var x: u16 = as(u16, 32) + as(i16, -10); // should work, does not overflow
var x: u16 = as(u16, 32) + as(i16, -45) + @as(i16, 64); // should work, does not overflow if the first + uses a signed intermediate.
var x: u16 = as(u16, 65530) + as(i16, -45); // should work, does not overflow with an unsigned intermediate.
var x: u16 = as(u16, 65530) + as(u16, 20); // UB, result overflows.
var x: u16 = as(u16, 65530) + as(u16, 20) + as(i16, -100); // UB, intermediate overflows.
var x: u16 = as(i16, 32750) + as(i16, 32750); // should work, does not overflow unsigned intermediate.
var x: i16 = as(i16, -32750) + as(u16, 200); // should work, does not overflow signed intermediate
var x: i16 = as(i16, -32750) + as(u16, 65530) + as(i16, -32750); // should work, does not overflow unsigned intermediate
var x: i16 = as(i16, -32750) - as(u16, 200) + as(u16, 1000); // UB, intermediate overflows

var x: u32 = (as(u16, 32) + as(i16, -10)) * 4; // compile error, not known if lhs of multiply is signed
_ = @TypeOf(as(u16, 32) + as(i16, -10)); // compile error, signedness not known

To solve this, I'll introduce the concept of a "hybrid integer", which is simultaneously signed and unsigned. Before I do though, I want to make it clear that this type is not part of the type system. It's an internal type used only by the compiler. Hybrid integers have a bit width and a "bias", which can be signed, unsigned, or indeterminate. If the type of a hybrid integer is observed through the type system, it is an implicit cast to the bias type. If this cast overflows, it causes checked UB. If the bias is indeterminate, it causes a compile error indicating that peer type resolution cannot decide between unsigned and signed.

Without loss of generality, I'll use 16 bit integers throughout this proposal. Replace 16 with any other integer size and it will still work.

The range of a hybrid integer h16 extends from the minimum value of i16 to the maximum value of u16. Attempting to encode a value outside this range is checked UB. h16s are the result of peer type resolution between integers, as follows:

u16, u16 -> h16 with unsigned bias (h16[u])
i16, i16 -> h16 with signed bias (h16[i])
u16, i16 -> h16 with indeterminate bias (h16[?])

The + and - operators are the only operators that are aware of hybrid integers. They do the following type coercions:

u16    + h16[u] -> h16[u]
h16[u] + h16[u] -> h16[u]
i16    + h16[i] -> h16[i]
h16[i] + h16[i] -> h16[i]
all other combinations -> h16[?]

When the two operands have different bit widths, the smaller one is promoted to the larger bit width, preserving its original signedness. If the smaller operand is a hybrid integer, it is first cast to its bias type, then extended.

Hybrid integers may coerce to signed or unsigned types of the same or larger bit width, regardless of bias. If the result does not fit, this is checked UB. If the type of a hybrid int is inferred (for example as an input to multiplication, or with var x = ), it yields the bias type. If the bias type is indeterminate, this causes a compile error stating that peer type resolution does not work between signed and unsigned integers.

The use of UB in this proposal has been carefully designed so that hybrid integers do not need extra bits in release-fast to distinguish between values > max(i16) or < 0. So these semantics give all the safety of using i17 to do your math, without the runtime cost of that extra bit in release-fast.

@andrewrk andrewrk added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Dec 12, 2020
@andrewrk andrewrk added this to the 0.8.0 milestone Dec 12, 2020
@matu3ba
Copy link
Contributor

matu3ba commented Dec 17, 2020

@SpexGuy Are you suggesting to allow implicit casting?

I dont understand the advantage than being forced to explicitly annotate type conversions. You also can do typeconversion with overflow check.

@SpexGuy
Copy link
Contributor Author

SpexGuy commented Feb 4, 2021

I think you have misunderstood my examples. The compiler is not allowed to propagate literals through the as function I have defined. It returns a runtime-known value, as if it had come from a runtime variable. All of the semantics described in my proposal apply only to runtime known values. Comptime known integers are not related to anything proposed here.

@lerno
Copy link

lerno commented Feb 4, 2021

@SpexGuy sorry, my bad. I was not reading the examples correctly. Rereading...

And I think this is an interesting idea. The non-biased versions makes the rules somewhat complex and opaque though. I would offer I somewhat more arbitrary solution to this: make the biased version the leftmost type.

@SpexGuy
Copy link
Contributor Author

SpexGuy commented Feb 4, 2021

The goal of the biases is to catch cases where you are mixing unsigned and signed inputs and expecting the compiler to infer a result type. I think that should remain a compile error, and require you to explicitly specify the result type you want. The simple way to explain them is this: if all inputs to a tree of +/- are signed, the result is signed. If all inputs are unsigned, the result is unsigned. Otherwise you must specify the result type.

Using the leftmost type is nice in some ways because it simplifies the definition of +=, but I think it would cause unexpected types to appear if you accidentally mix signedness in an expression. It also means a + b != b + a, which is kind of weird.

@lerno
Copy link

lerno commented Feb 7, 2021

An even simpler idea: make conversion implicit but UB on lossy conversion, default to signed version:

Basically some_i16 + some_u16 => some_i16 + @intCast(i16, some_u16) and then rely on the @intcast trap/UB.

@lerno
Copy link

lerno commented Mar 1, 2021

A variant of this proposal is – I think – simple implicit bit widening and narrowing. I have described this elsewhere

For example

u16var = some_i16 + some_u16; 
// => 
u16var = @intCast(u16, @intCast(i32, some_i16) + @intCast(i32, some_u16));

One can also choose to simply cast to i17. That might be more appropriate for Zig. The implicit @intCast is only allowed where the maximum bit width of the operands do not exceed the bit width of the target type. So having an 8 bit variable would not be implicitly changed. Although this means an extra bit of information, the resulting type is always well defined, which is a plus when working at compile time. Taking the type of a hybrid is largely undefined. While this may be ok for add and sub, trying to divide a "hybrid" fails, as udiv and sdiv have different behaviour.

Having a small hack that only works in certain cases just adds corners to the language and makes the cost much higher.

There is precedent for promoting to a common, wider type: C# uses this strategy... i32 + u32 results in a i64. However, this does not extend to i64 + u64... there is no corresponding i128 promotion. In Zig this is not a limiting factor though.

However, it has some other consequences. For example, if the we now introduce a special implicit (trapped) narrowing from i17 to i16, how does this mesh with the current state of conversion?

For example if this works:

u16var = some_i16 + some_u16; 

– by inserting an explicit trap on the result being negative.

Then why couldn't this work:

u16var = some_i16 + 0; 

The traps that are build into Zig for addition and subtraction for something like u16 + u16, actually works as if the code had been this:

// u16var = some_u16 + some_u16 behaves as:
u16var = @intCast(u16, @intCast(u17, some_u16) + @intCast(u17, some_u16));

In the above code the trap does not occur in the addition but in the final truncation instead. Still, one could argue that the Zig model on trapping addition could also be seen as stepwise implicit trapping truncation after addition using infinitely ranged integers.

Viewed from that angle u16var = some_i16 with implicit trapping isn't more implicit than the simple + and - operators.

I am not advocating one method or the other, I just want to highlight this.

I also want to note that my proposal #7967 touches on the subject.

@andrewrk andrewrk modified the milestones: 0.8.0, 0.9.0 May 19, 2021
@andrewrk andrewrk modified the milestones: 0.9.0, 0.10.0 Nov 23, 2021
@andrewrk andrewrk modified the milestones: 0.10.0, 0.11.0 Apr 16, 2022
@andrewrk andrewrk modified the milestones: 0.11.0, 0.12.0 Apr 9, 2023
@lerno
Copy link

lerno commented Apr 20, 2023

I would like to suggest at minimum Zig disallows implicit widening of complex expressions. Where "complex" is any expression that has more than 1 potential way of doing widening. In practice that's most binary expressions, but not unary ones.

To illustrate the problem

 // Example expression
 some_u64 = some_u32 + another_u32;

 // conversion 1:
 some_u64 = @intCast(u64, some_u32) + @intCast(u64, another_u32);

 // conversion 2:
 some_u64 = @intCast(u64, some_u32 + another_u32);

As we see, the first will not overflow on any sums that fit in an u64, but the latter will.
In other places I've advocated for (1) to become the default, but I have since come to the conclusion that it is not clear that this is desired. For example, if +% is used somewhere, the actual wrapping becomes very confusing.

It also turns out that the main reason you want implicit widening is in situations with simple expressions, e.g. some_u64 = some_u32 and some_u64 = call_returning_u32().

The check for valid widening is fairly straightforward to implement: recursively search the expression to be implicitly widened, and if it isn't a "simple" expression type - disallow it. This avoids a whole class of bugs due to unexpected trapping/UB that could be found at compile time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests

4 participants