Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std.math: add lerp #13002

Merged
merged 4 commits into from
Apr 20, 2023
Merged

std.math: add lerp #13002

merged 4 commits into from
Apr 20, 2023

Conversation

wooster0
Copy link
Contributor

I think it'd be cool if we could have a basic lerp function in the std.

@wooster0 wooster0 marked this pull request as ready for review September 28, 2022 18:41
@steveno
Copy link

steveno commented Sep 29, 2022

This is a drive-by comment, so I apologize for that. I was just curious why you picked the algorithm you picked? Wikipedia [1] for example, lists two versions, an approximate and a precise version.

[1] https://en.wikipedia.org/wiki/Linear_interpolation

@wooster0
Copy link
Contributor Author

wooster0 commented Sep 29, 2022

Mostly for performance. I compared the two and the imprecise method (unsurprisingly) is faster. You are definitely making a good point that the user may want precision over speed. Not sure how to go about that.
Maybe impreciseLerp and preciseLerp?
It seems C++ has a lerp as well:

return __t * __b + (1 - __t) * __a;

and it looks like it chooses the precise but slower variant by default?
However down here it seems to have the imprecise variant as well:
const _Fp __x = __a + __t * (__b - __a);

Hmmm... I will investigate this a bit more.

@wooster0
Copy link
Contributor Author

wooster0 commented Sep 29, 2022

FYI here you can see the differences in codegen: https://zig.godbolt.org/z/GPG8YG9qd
Just go to the exported main and change the lerp for the one you want to test.
C++'s way is the worst as you can see. So I think the way of having two lerps impreciseLerp and preciseLerp is probably best.

For longevity:

const std = @import("std");
const assert = std.debug.assert;

pub fn impreciseLerp(comptime T: type, a: T, b: T, t: T) T {
    assert(t >= 0 and t <= 1);
    return a + (b - a) * t;
}

pub fn preciseLerp(comptime T: type, a: T, b: T, t: T) T {
    assert(t >= 0 and t <= 1);
    return (1 - t) * a + t * b;
}

pub fn cppLerp(comptime T: type, a: T, b: T, t: T) T {
    if ((a <= 0 and b >= 0) or (a >= 0 and b <= 0))
        return t * b + (1 - t) * a;

    if (t == 1) return b;

    const x = a + t * (b - a);
    if ((t > 1) == (b > a)) {
        return if (b < x) x else b;
    } else {
        return if (x < b) x else b;
    }
}

export fn main(a: f32, b: f32, t: f32) f32 {
    return impreciseLerp(f32, a, b, t);
}

@bndbsh
Copy link
Contributor

bndbsh commented Sep 30, 2022

Maybe fastLerp and preciseLerp? That way each version advertises why you'd use it in the name.

@wooster0
Copy link
Contributor Author

wooster0 commented Sep 30, 2022

That's not a bad idea either. But I think that name focuses a bit too much on speed over the fact that it's simply just a bit less precise. And it might add a bit of an "unsafe" notion to it ("fast" as in "no safety checks") even though it's perfectly safe. I still wouldn't be against using that name if others prefer it.

Maybe lerp and preciseLerp because in most cases I believe it's perfectly fine to use the slightly less precise lerp until you actually need so much precision.
But it's less descriptive and makes the two go less hand in hand than before so I'm not sure.

@InKryption
Copy link
Contributor

Does using @mulAdd affect performance/precision significantly in either of these?

@wooster0
Copy link
Contributor Author

wooster0 commented Sep 30, 2022

Wow, thanks for bringing that up! That is one very interesting builtin that I haven't considered here.
Have a look: https://zig.godbolt.org/z/qcMehj4jo
mulAddImpreciseLerp is indeed the fastest here (I did some research; not just going by the LOC here).
So now we actually have 4 ways to do lerp. I think it's time to wrap it all in a single std.math.lerp and make it accept an enum that allows you to specify the implementation. Perfect.

Because we use @mulAdd, a function that the average user will probably just not think of for this, we now have a very good reason to add this to std.math.

For longevity:

const std = @import("std");
const assert = std.debug.assert;

// 4 assembly LOC
pub fn impreciseLerp(comptime T: type, a: T, b: T, t: T) T {
    assert(t >= 0 and t <= 1);
    return a + (b - a) * t;
}

// 7 assembly LOC
pub fn preciseLerp(comptime T: type, a: T, b: T, t: T) T {
    assert(t >= 0 and t <= 1);
    return (1 - t) * a + t * b;
}

// 3 assembly LOC
pub fn mulAddImpreciseLerp(comptime T: type, a: T, b: T, t: T) T {
    assert(t >= 0 and t <= 1);
    return @mulAdd(T, (b - a), t, a);
}

// 6 assembly LOC
pub fn mulAddPreciseLerp(comptime T: type, a: T, b: T, t: T) T {
    assert(t >= 0 and t <= 1);
    return @mulAdd(T, a, (1 - t), t * b);
}

// too many assembly LOC
pub fn cppLerp(comptime T: type, a: T, b: T, t: T) T {
    if ((a <= 0 and b >= 0) or (a >= 0 and b <= 0))
        return t * b + (1 - t) * a;

    if (t == 1) return b;

    const x = a + t * (b - a);
    if ((t > 1) == (b > a)) {
        return if (b < x) x else b;
    } else {
        return if (x < b) x else b;
    }
}

export fn main(a: f32, b: f32, t: f32) f32 {
    return mulAddPreciseLerp(f32, a, b, t);
}

@wooster0
Copy link
Contributor Author

Now the lerp looks like this:

    std.math.lerp(f32, .{}, 0, 100, 0.5); // 50

And in .{} you are free to alter the way it calculates.

@InKryption
Copy link
Contributor

Hmm, if that is the case, is there any reason to not always use @mulAdd?

@wooster0
Copy link
Contributor Author

Well I thought it'd be nice to offer more options for all kinds of cases but I guess it would be a ton simpler if there were only 2 options.

@wooster0
Copy link
Contributor Author

Or maybe most_precise: bool or precisest: bool or something?

lib/std/math.zig Outdated Show resolved Hide resolved
@wooster0
Copy link
Contributor Author

wooster0 commented Oct 1, 2022

I think this is starting to look pretty good now. We're using a builtin function that most users wouldn't think of to provide speed and accuracy. I think this is very well worth merging.

Copy link
Member

@andrewrk andrewrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no reason to have an enum for the strategy. Just have different functions for the different strategies or preferably just pick the main implementation used in most cases and only have one implementation.

@andrewrk
Copy link
Member

andrewrk commented Oct 1, 2022

Also try to avoid the word "fast". "precise" and "lossy" are okay words. I suggest that the generally useful one has no suffix at all.

@wooster0
Copy link
Contributor Author

wooster0 commented Oct 1, 2022

Alright. I too think it's perfectly fine to just go with the slightly "less precise" option even though it's still really accurate. In practical usage I can't really think of a situation where you really need the slightly more precise option.
Also, if you need more precision I think you could also just turn up the float bits by e.g. using f64 instead of f32, maybe only temporarily for one lerp and then go back to f32, or something like that.

And you're right; "lossy" and "precise" is a much better combination of words. Couldn't think of it in that moment.

@wooster0 wooster0 force-pushed the patch-15 branch 2 times, most recently from 2a17a1e to ebada7c Compare October 1, 2022 09:00
@wooster0
Copy link
Contributor Author

wooster0 commented Oct 1, 2022

Perfect.

@wooster0 wooster0 requested a review from andrewrk October 1, 2022 09:26
@wooster0
Copy link
Contributor Author

wooster0 commented Oct 1, 2022

Alright so the first thing I can tell from the CI failures is that there are some problems with f16 for Wasm here...
This line in the "lerp" test in particular:

    try testing.expectEqual(@as(f16, 0.0), lerp(f16, 1.0e4, 1.0, 1.0));

fails like this:

$ zig test x.zig -target wasm32-wasi --test-cmd wasmtime --test-cmd-bin
expected 0.0e+00, found 1.0e+00
0 passed; 0 skipped; 1 failed.
test failureError: failed to run main module `test.wasm`

Caused by:
    0: failed to invoke command default
    1: wasm trap: wasm `unreachable` instruction executed
       wasm backtrace:
           0: 0x4205 - <unknown>!os.abort
           1: 0x2806 - <unknown>!builtin.default_panic
           2: 0x1e63 - <unknown>!test_runner.main
           3: 0x1de7 - <unknown>!_start
       note: using the `WASMTIME_BACKTRACE_DETAILS=1` environment variable to may show more debugging information

error: the following test command failed with exit code 134:
wasmtime test.wasm

...while it works with zig test x.zig on Linux.
So am I supposed to not use f16? Is it known to be broken on Wasm?
Found some possibly related things: #4952 #9900

Apart from that, I also found #7702 which I think is also talking about how @mulAdd doesn't behave the same on all platforms? I mean it is expected of course that a single instruction doesn't exist for it everywhere but I expect it to be substituted with multiple instructions doing the same thing.

Considering that there are still all these problems I think I will just replace the odd f16 and f128 sizes in the test with common ones considering that they're usually not natively supported by the hardware and harder to support.

lib/std/math.zig Outdated Show resolved Hide resolved
@andrewrk
Copy link
Member

andrewrk commented Oct 4, 2022

2 questions for you @r00ster91:

  • if an additional function for lerp for integers would be added, what might be its fully qualified name in the std lib, and what might its function signature be?

For example, in a side project I have this function (for better or worse, avoiding float operations):

fn lerp(start: i32, end: i32, num: isize, den: isize) i32 {
    return @intCast(i32, start + @divTrunc(num * (end - start), den));
}
  • have you considered this function signature for the float version? fn lerp(a: anytype, b: anytype, t: anytype) @TypeOf(a,b,t)

@wooster0
Copy link
Contributor Author

wooster0 commented Oct 4, 2022

  • if an additional function for lerp for integers would be added, what might be its fully qualified name in the std lib, and what might its function signature be?

At first I was gonna say let's overload the existing lerp but seeing that you have another parameter for what I suppose is the denominator, that's an interesting question.

I can definitely understand when someone might not want to depend on floating-point numbers for linear interpolation.
I'm not sure how exactly your integer version of lerp is used but if we can do it with 3 parameters, maybe we can overload this same lerp further with integers in the future because we can just check if the args are integers.

However, if that doesn't work out, lerpInt seems fine too? I do think that lerp functions are generally associated with floats; the float version is the main version and the integer version is a bit more "niche" and for cases when you really need that extra performance and or can't afford floating-point calculations (maybe all your floats are implemented in software).

So std.math.lerp will stay the primary lerp function and all other possible additions will have prefixes or suffixes to "lerp" meaning more specialized versions of the standard lerp function.

I'm also interested in something like fixed-point numbers for this but we don't even know if that will be added to the language yet so it seems a bit premature to spend a lot of thought on that just yet. Maybe it will be possible to just make them compatible with the existing lerp (and other functions using floats).

  • have you considered this function signature for the float version? fn lerp(a: anytype, b: anytype, t: anytype) @TypeOf(a,b,t)

Sounds good. Seems like in most cases this will mean having to type less types, and I feel like this could be done for a lot more of the std.

@wooster0
Copy link
Contributor Author

wooster0 commented Oct 7, 2022

I should also note that although the codegen for this looks great on x86, this may codegen poorly on platforms that don't have a native FMA:

const std = @import("std");
const assert = std.debug.assert;

export fn lerp1(a: f32, b: f32, t: f32) f32 {
    assert(t >= 0 and t <= 1);
    return @mulAdd(f32, b - a, t, a);
}

export fn lerp2(a: f32, b: f32, t: f32) f32 {
    assert(t >= 0 and t <= 1);
    return ((b - a) * t) + a;
}
$ zig build-lib x.zig -target wasm32-freestanding -O ReleaseSmall -dynamic --strip
$ wasm2wat x.wasm
(module
  (type (;0;) (func (param f32 f32 f32) (result f32)))
  (func (;0;) (type 0) (param f32 f32 f32) (result f32)
    local.get 1
    local.get 0
    f32.sub
    local.get 2
    local.get 0
    call 2)
  (func (;1;) (type 0) (param f32 f32 f32) (result f32)
    local.get 1
    local.get 0
    f32.sub
    local.get 2
    f32.mul
    local.get 0
    f32.add)
  (func (;2;) (type 0) (param f32 f32 f32) (result f32)
    local.get 0
    f64.promote_f32
    local.get 1
    f64.promote_f32
    f64.mul
    local.get 2
    f64.promote_f32
    f64.add
    f32.demote_f64)
  (memory (;0;) 16)
  (global (;0;) (mut i32) (i32.const 1048576))
  (export "memory" (memory 0))
  (export "lerp1" (func 0))
  (export "lerp2" (func 1)))

Notice call 2 in func (;0;): it calls to func (;2;) (fmaf) which is another bunch of instructions whereas the non-@mulAdd lerp2 is func (;1;) and is only that code.
So on Wasm this performs worse with @mulAdd.

  • Should @mulAdd emit a handleable error if the target does not have a native FMA so that we can use return ((b - a) * t) + a instead?
  • Should we fiddle around with @import("builtin").cpu to try to detect if the target has a native FMA and otherwise use return ((b - a) * t) + a?
  • Should we always use return ((b - a) * t) + a and then just hope that it gets turned into an FMA if @setFloatMode(.Optimized)?
    https://ziglang.org/documentation/master/#setFloatMode

    Perform floating-point contraction (e.g. fusing a multiply followed by an addition into a fused multiply-and-add).

@topolarity
Copy link
Contributor

Alright so the first thing I can tell from the CI failures is that there are some problems with f16 for Wasm here...

Just a heads up, this will be fixed by #13100

lib/std/math.zig Outdated
@compileError("T must be a float type");

assert(t >= 0 and t <= 1);
return @mulAdd(T, b - a, t, a);
Copy link
Contributor

@topolarity topolarity Oct 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know y'all have been digging into lots of different implementations, but I wanted to put forward my recommendation:

return if (t <= 0.5) a+(b-a)*t else b+(b-a)*(t-1.0);

This includes a branch, but it is also precise at both endpoints and additionally is monotonic (lerp(a, b, t) >= lerp(a, b, t') if t > t') and for nearby t values the branch should be well-predicted.

I think it's better to prioritize these numerical properties over the performance for the "generally useful" variant, since they mean this operation will compose better with surrounding code. Of course, if a user knows that that they don't need precision at the endpoints, they are free to write (b - a)*t + a themselves.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any issues with changing that to if (t <= 0.5) @mulAdd(t, b - a, a) else b - (b - a) * (1.0 - t)?

Copy link
Contributor

@topolarity topolarity Oct 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues with that - if (t <= 0.5) @mulAdd(b - a, t, a) else @mulAdd(b - a, t - 1.0, b) would be just fine too

Copy link
Contributor Author

@wooster0 wooster0 Oct 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry but I'm not entirely convinced, at least right now.
I use lerp in some deep places in my code where I could really use the best performance, which we can get quite easily here while still having a lot of precision.

The formula (b - a)*t + a is pretty obvious

Not to me personally. If I wrote it often enough I could memorize it but generally I would look it up every time, or I could think it up from scratch, but at least for me it would take a lot longer than if I can just use std.math.lerp.
On top of that, what this std implementation provides is that it also uses @mulAdd for more precision and performance.
This is something most people just won't really think of if they write a quick (b - a) * t + a function.

but it is also precise at both endpoints

It sounds like this will make it so that "This does not guarantee returning b if t is 1 due to floating-point errors." is no longer true.
But still for general use cases is the precision with @mulAdd added not enough?
I mean, for general use when do you actually need it to be this precise at both endpoints? Do you have some kind of example?

and additionally is monotonic

Is it not already monotonic in its current form? Is the documentation "This is monotonic." I included wrong?

In this case I would like to go back to the solution of having multiple implementations to choose from. So we would have one lerp for general use cases and then a lossyLerp or a preciseLerp for other cases depending on what that lerp is going to be.

I'm not convinced a lerpFast is worth including in the stdlib.

But yeah, we should probably try to avoid doing something like that.

Maybe I will change my mind later. There are just really a lot of ways to do this. It's not that easy sadly.

Copy link
Contributor Author

@wooster0 wooster0 Oct 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One area where lerp is very useful is gamedev. In gamedev people will not care about these tiny differences in precision, at least I wouldn't. In gamedev I would rather appreciate the performance. If you just use it for visuals and stuff it's either not going to make a difference or matter, I believe. Maybe I'm wrong.

Do you have an example or can you name some other area where this kind of additional precision is necessary for lerp?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it depends on whether we want std.math to be general-purpose or ultra-high-performance.

My recommendation for extremely performance-sensitive applications would be to use a third-party lib or write the optimized version yourself. There's also nothing wrong with adding a comment for ultra-high-perf use cases:

/// Guarantees for finite a, b:
///    - lerp(a, b, 0.0) == a
///    - lerp(a, b, 1.0) == b
///    - lerp(a, b, t) >= lerp(a, b, t') for t >= t' and b => a
///    - lerp(a, b, t) <= lerp(a, b, t') for t >= t' and b <= a
///
/// If numerical imprecision is tolerable, a faster alternative is @mulAdd(T, b - a, t, a)
fn lerp(a: anytype, b: anytype, t: anytype) @TypeOf(a,b,t) {

That said, if we'd like to commit to providing "fast, but imprecise" functions in stdlib, maybe it's worth considering a separate std.fastmath namespace.

I don't have more input on this one, so I'll defer to @andrewrk to make the right call

Copy link
Contributor

@topolarity topolarity Oct 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P.S. It looks like on x86-64 LLVM is smart enough to optimize this branch away and use vblend instead:

lerp:
        vaddsd  xmm4, xmm2, qword ptr [rip + .LCPI0_0]
        vsubsd  xmm3, xmm1, xmm0
        vfmadd231sd     xmm0, xmm3, xmm2
        vfmadd213sd     xmm4, xmm3, xmm1
        vcmpnlesd       xmm1, xmm2, qword ptr [rip + .LCPI0_1]
        vblendvpd       xmm0, xmm0, xmm4, xmm1
        ret

Copy link
Contributor

@topolarity topolarity Oct 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an example or can you name some other area where this kind of additional precision is necessary for lerp?

The main advantage of this alternative is that it is bounded by [a,b]

For example, acos(lerp(a, b, t) / max(abs(a), abs(b))) is never nan for the precise version (unless both a and b are huge, i.e. min(abs(a),abs(b)) >= std.math.floatMax(T) / 2). Using the imprecise version here results in nan sometimes (e.g. a = -0.55, b = 1.1, t = 1.0)

If I have to write an equation with a restricted-domain function like this (e.g. sqrt, log, acos, etc.), then I'd have to clamp the output of lerp to avoid producing erroneous results.

The imprecise lerp is still a good choice in many situations of course, and I agree there's no one-size-fits-all here. Nonetheless, I think bounded-ness is a useful property to expect for "general-purpose" applications.

@wooster0
Copy link
Contributor Author

wooster0 commented Oct 13, 2022

Or actually, could we possibly make use of the currently set float mode here? There is enum std.builtin.FloatMode with values Strict and Optimized, and we have @setFloatMode to set the mode.
So if it's Strict, we could use some other even more precise version and in Optimized we use the fastest one (like the one we have now).

However, I'm not sure how to actually find out what the currently set float mode is at runtime. Maybe that's not possible (indeed; it's not).

@Opioid
Copy link

Opioid commented Oct 26, 2022

I often interpolate between vectors with a scalar value. In your opinion, would it makes sense to handle this use case here as well?

@wooster0
Copy link
Contributor Author

wooster0 commented Oct 26, 2022

Vectors with a scalar value? So 2 vectors each with 1 scalar?
In that case you can do something like this right?

    _ = lerp(f32, (@Vector(1, f32){0.5})[0], (@Vector(1, f32){1.0})[0], 0.5);

I'm not sure 2 vectors with 1 scalar each is common enough to justify shortening this, but I don't work often enough with vectors to have a sophisticated-enough opinion on this, either, so I'm not sure.

@Opioid
Copy link

Opioid commented Oct 26, 2022

No, I mean more or less like this:

    fn lerp(a: anytype, b: anytype, t: f32) @TypeOf(a, b) {
        switch (@typeInfo(@TypeOf(a))) {
            .Float => {
                const u = 1.0 - t;
                return u * a + t * b;
            },
            .Vector => |v| {
                const u = 1.0 - t;
                return @splat(v.len, u) * a + @splat(v.len, t) * b;
            },
            else => unreachable,
        }
    }

It is useful for interpolating colors for example.

EDIT:

I thought about it, and it probably is more zig-like style to simply call 'lerp(vector_a, vector_b, @Splat(4, scalar_t))' or something like that. It is also more flexible in theory, so probably a solid choice to just keep it simple as a general purpose std-function. But I don´t love the ergonomics so I would just keep using my own implementation.

FWIW, I'm also in favor of using the more precise algorithm for a std-function.

@wooster0
Copy link
Contributor Author

wooster0 commented Oct 27, 2022

I would just keep using my own implementation

I'm also in favor of using the more precise algorithm

Clearly we're all so split on the implementation on this because there are so many different ways of doing interpolation that it probably isn't incredibly important what the implementation will end up being (just waiting for Andrew to decide); in all more sophisticated situations we'll probably all use our own implementation, anyway, so std.math.lerp will be good as a general solution for when its implementation is not too important, or for prototyping, testing, etc.

I tried to add support for vectors but I don't think it will work out unless we change the API and add a comptime Type: type parameter because of @TypeOf(a, b, t). And, as you said, maybe it is better to keep it simple (so, as it is).

@eLeCtrOssSnake
Copy link

I would make focus on the codegen and performance(same thing?) implementation rather then being absolutely robust, because we might end up with an unoptimized std piece that will be used across different projects be the bottleneck everywhere. That would be disasterous. Reminds me of C++ STL in some cases.

Considering that there are still problems with these (including f80 probably),
I think it's fine to exclude these and only test the common types f32 and f64.
We shouldn't rely on this test to provide test coverage for f16, f80, and f128, anyway.
@andrewrk andrewrk merged commit 7d90410 into ziglang:master Apr 20, 2023
der-teufel-programming pushed a commit to der-teufel-programming/zig that referenced this pull request Apr 21, 2023
@Manuzor
Copy link
Contributor

Manuzor commented Oct 14, 2023

Can anyone explain the hard requirement for t being in the range [0,1]? Coming from a gamedev background, I've never seen a lerp implementation that enforces the expected range of t. lerp(10.0, 20.0, 2.0) // = 30 is very useful behavior to have and mathematically it works out as well.

@wooster0
Copy link
Contributor Author

You're free to open an issue or a PR for that.

@kcbanner
Copy link
Contributor

kcbanner commented Nov 6, 2023

It looks like one of these tests rely n t being a constant. If you apply this diff:

-try testing.expectEqual(@as(f64, 0.0), lerp(@as(f64, 1.0e16), 1.0, 1.0));
+try testing.expectEqual(@as(f64, 0.0), lerp(@as(f64, 1.0e16), @as(f64, 1.0), @as(f64, 1.0)));

AIR when using @as:

# Begin Function AIR: math.lerp__anon_931:
# Total AIR+Liveness bytes: 524B
# AIR Instructions:         28 (252B)
# AIR Extra Data:           32 (128B)
# Liveness tomb_bits:       16B
# Liveness Extra Data:      4 (16B)
# Liveness special table:   2 (16B)
  %0 = arg(f64, 0)
  %1 = arg(f64, 1)
  %2 = arg(f64, 2)
  %3!= save_err_return_trace_index()
  %5!= dbg_block_begin()
  %6!= dbg_stmt(4:13)
  %9!= dbg_block_begin()
  %10!= dbg_stmt(5:41)
  %11 = cmp_gte(%2, <f64, 0>)
  %12 = block(bool, {
    %16!= cond_br(%11!, {
      %14 = cmp_lte(%2, <f64, 1>)
      %15!= br(%12, %14!)
    }, {
      %13!= br(%12, @Air.Inst.Ref.bool_false)
    })
  } %11!)
  %17!= dbg_stmt(5:41)
  %18!= call(<fn (bool) void, (function 'assert')>, [%12!])
  %19!= dbg_block_end()
  %21!= dbg_stmt(14:28)
  %23 = sub(%1!, %0)
  %24 = mul_add(%23!, %2!, %0!)
  %25!= dbg_stmt(14:5)
  %27!= dbg_block_end()
  %26!= ret(%24!)
# End Function AIR: math.lerp__anon_931

AIR when not using @as

# Begin Function AIR: math.lerp__anon_931:
# Total AIR+Liveness bytes: 363B
# AIR Instructions:         19 (171B)
# AIR Extra Data:           20 (80B)
# Liveness tomb_bits:       16B
# Liveness Extra Data:      0 (0B)
# Liveness special table:   0 (0B)
  %0 = arg(f64, 0)
  %1!= save_err_return_trace_index()
  %3!= dbg_block_begin()
  %4!= dbg_stmt(4:13)
  %7!= dbg_block_begin()
  %8!= dbg_stmt(5:41)
  %9!= call(<fn (bool) void, (function 'assert')>, [@Air.Inst.Ref.bool_true])
  %10!= dbg_block_end()
  %12!= dbg_stmt(14:28)
  %14 = sub(<f64, 1>, %0)
  %15 = mul_add(%14!, <f64, 1>, %0!)
  %16!= dbg_stmt(14:5)
  %18!= dbg_block_end()
  %17!= ret(%15!)
# End Function AIR: math.lerp__anon_931

This ends up getting optimized to not even do the multiply in the @mulAdd, so the result comes out as zero. The problem is that if this optimization doesn't happen, we get 0x20000000000000 (4.450147717014403e-308).

I discovered this during some CBE changes that resulted in this optimization not happening, I'm not actually sure what the resolution should be. It could actually be a bug in the fma implementation in our compiler_rt, assuming the result should be zero?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.