-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
std.math: add lerp #13002
std.math: add lerp #13002
Conversation
This is a drive-by comment, so I apologize for that. I was just curious why you picked the algorithm you picked? Wikipedia [1] for example, lists two versions, an approximate and a precise version. |
Mostly for performance. I compared the two and the imprecise method (unsurprisingly) is faster. You are definitely making a good point that the user may want precision over speed. Not sure how to go about that. Line 621 in 716d923
and it looks like it chooses the precise but slower variant by default? However down here it seems to have the imprecise variant as well: Line 624 in 716d923
Hmmm... I will investigate this a bit more. |
FYI here you can see the differences in codegen: https://zig.godbolt.org/z/GPG8YG9qd For longevity: const std = @import("std");
const assert = std.debug.assert;
pub fn impreciseLerp(comptime T: type, a: T, b: T, t: T) T {
assert(t >= 0 and t <= 1);
return a + (b - a) * t;
}
pub fn preciseLerp(comptime T: type, a: T, b: T, t: T) T {
assert(t >= 0 and t <= 1);
return (1 - t) * a + t * b;
}
pub fn cppLerp(comptime T: type, a: T, b: T, t: T) T {
if ((a <= 0 and b >= 0) or (a >= 0 and b <= 0))
return t * b + (1 - t) * a;
if (t == 1) return b;
const x = a + t * (b - a);
if ((t > 1) == (b > a)) {
return if (b < x) x else b;
} else {
return if (x < b) x else b;
}
}
export fn main(a: f32, b: f32, t: f32) f32 {
return impreciseLerp(f32, a, b, t);
} |
Maybe |
That's not a bad idea either. But I think that name focuses a bit too much on speed over the fact that it's simply just a bit less precise. And it might add a bit of an "unsafe" notion to it ("fast" as in "no safety checks") even though it's perfectly safe. I still wouldn't be against using that name if others prefer it. Maybe |
Does using |
Wow, thanks for bringing that up! That is one very interesting builtin that I haven't considered here. Because we use For longevity: const std = @import("std");
const assert = std.debug.assert;
// 4 assembly LOC
pub fn impreciseLerp(comptime T: type, a: T, b: T, t: T) T {
assert(t >= 0 and t <= 1);
return a + (b - a) * t;
}
// 7 assembly LOC
pub fn preciseLerp(comptime T: type, a: T, b: T, t: T) T {
assert(t >= 0 and t <= 1);
return (1 - t) * a + t * b;
}
// 3 assembly LOC
pub fn mulAddImpreciseLerp(comptime T: type, a: T, b: T, t: T) T {
assert(t >= 0 and t <= 1);
return @mulAdd(T, (b - a), t, a);
}
// 6 assembly LOC
pub fn mulAddPreciseLerp(comptime T: type, a: T, b: T, t: T) T {
assert(t >= 0 and t <= 1);
return @mulAdd(T, a, (1 - t), t * b);
}
// too many assembly LOC
pub fn cppLerp(comptime T: type, a: T, b: T, t: T) T {
if ((a <= 0 and b >= 0) or (a >= 0 and b <= 0))
return t * b + (1 - t) * a;
if (t == 1) return b;
const x = a + t * (b - a);
if ((t > 1) == (b > a)) {
return if (b < x) x else b;
} else {
return if (x < b) x else b;
}
}
export fn main(a: f32, b: f32, t: f32) f32 {
return mulAddPreciseLerp(f32, a, b, t);
} |
Now the lerp looks like this: std.math.lerp(f32, .{}, 0, 100, 0.5); // 50 And in |
Hmm, if that is the case, is there any reason to not always use |
Well I thought it'd be nice to offer more options for all kinds of cases but I guess it would be a ton simpler if there were only 2 options. |
Or maybe |
I think this is starting to look pretty good now. We're using a builtin function that most users wouldn't think of to provide speed and accuracy. I think this is very well worth merging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no reason to have an enum for the strategy. Just have different functions for the different strategies or preferably just pick the main implementation used in most cases and only have one implementation.
Also try to avoid the word "fast". "precise" and "lossy" are okay words. I suggest that the generally useful one has no suffix at all. |
Alright. I too think it's perfectly fine to just go with the slightly "less precise" option even though it's still really accurate. In practical usage I can't really think of a situation where you really need the slightly more precise option. And you're right; "lossy" and "precise" is a much better combination of words. Couldn't think of it in that moment. |
2a17a1e
to
ebada7c
Compare
Perfect. |
Alright so the first thing I can tell from the CI failures is that there are some problems with try testing.expectEqual(@as(f16, 0.0), lerp(f16, 1.0e4, 1.0, 1.0)); fails like this:
...while it works with Apart from that, I also found #7702 which I think is also talking about how Considering that there are still all these problems I think I will just replace the odd |
2 questions for you @r00ster91:
For example, in a side project I have this function (for better or worse, avoiding float operations): fn lerp(start: i32, end: i32, num: isize, den: isize) i32 {
return @intCast(i32, start + @divTrunc(num * (end - start), den));
}
|
At first I was gonna say let's overload the existing I can definitely understand when someone might not want to depend on floating-point numbers for linear interpolation. However, if that doesn't work out, So I'm also interested in something like fixed-point numbers for this but we don't even know if that will be added to the language yet so it seems a bit premature to spend a lot of thought on that just yet. Maybe it will be possible to just make them compatible with the existing
Sounds good. Seems like in most cases this will mean having to type less types, and I feel like this could be done for a lot more of the std. |
I should also note that although the codegen for this looks great on x86, this may codegen poorly on platforms that don't have a native FMA: const std = @import("std");
const assert = std.debug.assert;
export fn lerp1(a: f32, b: f32, t: f32) f32 {
assert(t >= 0 and t <= 1);
return @mulAdd(f32, b - a, t, a);
}
export fn lerp2(a: f32, b: f32, t: f32) f32 {
assert(t >= 0 and t <= 1);
return ((b - a) * t) + a;
} $ zig build-lib x.zig -target wasm32-freestanding -O ReleaseSmall -dynamic --strip
$ wasm2wat x.wasm
(module
(type (;0;) (func (param f32 f32 f32) (result f32)))
(func (;0;) (type 0) (param f32 f32 f32) (result f32)
local.get 1
local.get 0
f32.sub
local.get 2
local.get 0
call 2)
(func (;1;) (type 0) (param f32 f32 f32) (result f32)
local.get 1
local.get 0
f32.sub
local.get 2
f32.mul
local.get 0
f32.add)
(func (;2;) (type 0) (param f32 f32 f32) (result f32)
local.get 0
f64.promote_f32
local.get 1
f64.promote_f32
f64.mul
local.get 2
f64.promote_f32
f64.add
f32.demote_f64)
(memory (;0;) 16)
(global (;0;) (mut i32) (i32.const 1048576))
(export "memory" (memory 0))
(export "lerp1" (func 0))
(export "lerp2" (func 1))) Notice
|
Just a heads up, this will be fixed by #13100 |
lib/std/math.zig
Outdated
@compileError("T must be a float type"); | ||
|
||
assert(t >= 0 and t <= 1); | ||
return @mulAdd(T, b - a, t, a); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know y'all have been digging into lots of different implementations, but I wanted to put forward my recommendation:
return if (t <= 0.5) a+(b-a)*t else b+(b-a)*(t-1.0);
This includes a branch, but it is also precise at both endpoints and additionally is monotonic (lerp(a, b, t) >= lerp(a, b, t')
if t > t'
) and for nearby t
values the branch should be well-predicted.
I think it's better to prioritize these numerical properties over the performance for the "generally useful" variant, since they mean this operation will compose better with surrounding code. Of course, if a user knows that that they don't need precision at the endpoints, they are free to write (b - a)*t + a
themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any issues with changing that to if (t <= 0.5) @mulAdd(t, b - a, a) else b - (b - a) * (1.0 - t)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No issues with that - if (t <= 0.5) @mulAdd(b - a, t, a) else @mulAdd(b - a, t - 1.0, b)
would be just fine too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry but I'm not entirely convinced, at least right now.
I use lerp
in some deep places in my code where I could really use the best performance, which we can get quite easily here while still having a lot of precision.
The formula
(b - a)*t + a
is pretty obvious
Not to me personally. If I wrote it often enough I could memorize it but generally I would look it up every time, or I could think it up from scratch, but at least for me it would take a lot longer than if I can just use std.math.lerp
.
On top of that, what this std implementation provides is that it also uses @mulAdd
for more precision and performance.
This is something most people just won't really think of if they write a quick (b - a) * t + a
function.
but it is also precise at both endpoints
It sounds like this will make it so that "This does not guarantee returning b if t is 1 due to floating-point errors." is no longer true.
But still for general use cases is the precision with @mulAdd
added not enough?
I mean, for general use when do you actually need it to be this precise at both endpoints? Do you have some kind of example?
and additionally is monotonic
Is it not already monotonic in its current form? Is the documentation "This is monotonic." I included wrong?
In this case I would like to go back to the solution of having multiple implementations to choose from. So we would have one lerp
for general use cases and then a lossyLerp
or a preciseLerp
for other cases depending on what that lerp
is going to be.
I'm not convinced a
lerpFast
is worth including in the stdlib.
But yeah, we should probably try to avoid doing something like that.
Maybe I will change my mind later. There are just really a lot of ways to do this. It's not that easy sadly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One area where lerp is very useful is gamedev. In gamedev people will not care about these tiny differences in precision, at least I wouldn't. In gamedev I would rather appreciate the performance. If you just use it for visuals and stuff it's either not going to make a difference or matter, I believe. Maybe I'm wrong.
Do you have an example or can you name some other area where this kind of additional precision is necessary for lerp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it depends on whether we want std.math
to be general-purpose or ultra-high-performance.
My recommendation for extremely performance-sensitive applications would be to use a third-party lib or write the optimized version yourself. There's also nothing wrong with adding a comment for ultra-high-perf use cases:
/// Guarantees for finite a, b:
/// - lerp(a, b, 0.0) == a
/// - lerp(a, b, 1.0) == b
/// - lerp(a, b, t) >= lerp(a, b, t') for t >= t' and b => a
/// - lerp(a, b, t) <= lerp(a, b, t') for t >= t' and b <= a
///
/// If numerical imprecision is tolerable, a faster alternative is @mulAdd(T, b - a, t, a)
fn lerp(a: anytype, b: anytype, t: anytype) @TypeOf(a,b,t) {
That said, if we'd like to commit to providing "fast, but imprecise" functions in stdlib, maybe it's worth considering a separate std.fastmath
namespace.
I don't have more input on this one, so I'll defer to @andrewrk to make the right call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P.S. It looks like on x86-64 LLVM is smart enough to optimize this branch away and use vblend
instead:
lerp:
vaddsd xmm4, xmm2, qword ptr [rip + .LCPI0_0]
vsubsd xmm3, xmm1, xmm0
vfmadd231sd xmm0, xmm3, xmm2
vfmadd213sd xmm4, xmm3, xmm1
vcmpnlesd xmm1, xmm2, qword ptr [rip + .LCPI0_1]
vblendvpd xmm0, xmm0, xmm4, xmm1
ret
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have an example or can you name some other area where this kind of additional precision is necessary for lerp?
The main advantage of this alternative is that it is bounded by [a,b]
For example, acos(lerp(a, b, t) / max(abs(a), abs(b)))
is never nan
for the precise version (unless both a and b are huge, i.e. min(abs(a),abs(b)) >= std.math.floatMax(T) / 2). Using the imprecise version here results in nan
sometimes (e.g. a = -0.55, b = 1.1, t = 1.0)
If I have to write an equation with a restricted-domain function like this (e.g. sqrt, log, acos, etc.), then I'd have to clamp the output of lerp to avoid producing erroneous results.
The imprecise lerp is still a good choice in many situations of course, and I agree there's no one-size-fits-all here. Nonetheless, I think bounded-ness is a useful property to expect for "general-purpose" applications.
Or actually, could we possibly make use of the currently set float mode here? There is enum However, I'm not sure how to actually find out what the currently set float mode is at runtime. Maybe that's not possible (indeed; it's not). |
I often interpolate between vectors with a scalar value. In your opinion, would it makes sense to handle this use case here as well? |
Vectors with a scalar value? So 2 vectors each with 1 scalar? _ = lerp(f32, (@Vector(1, f32){0.5})[0], (@Vector(1, f32){1.0})[0], 0.5); I'm not sure 2 vectors with 1 scalar each is common enough to justify shortening this, but I don't work often enough with vectors to have a sophisticated-enough opinion on this, either, so I'm not sure. |
No, I mean more or less like this:
It is useful for interpolating colors for example. EDIT: I thought about it, and it probably is more zig-like style to simply call 'lerp(vector_a, vector_b, @Splat(4, scalar_t))' or something like that. It is also more flexible in theory, so probably a solid choice to just keep it simple as a general purpose std-function. But I don´t love the ergonomics so I would just keep using my own implementation. FWIW, I'm also in favor of using the more precise algorithm for a std-function. |
Clearly we're all so split on the implementation on this because there are so many different ways of doing interpolation that it probably isn't incredibly important what the implementation will end up being (just waiting for Andrew to decide); in all more sophisticated situations we'll probably all use our own implementation, anyway, so I tried to add support for vectors but I don't think it will work out unless we change the API and add a |
I would make focus on the codegen and performance(same thing?) implementation rather then being absolutely robust, because we might end up with an unoptimized std piece that will be used across different projects be the bottleneck everywhere. That would be disasterous. Reminds me of C++ STL in some cases. |
6e60529
to
10c16a1
Compare
Considering that there are still problems with these (including f80 probably), I think it's fine to exclude these and only test the common types f32 and f64. We shouldn't rely on this test to provide test coverage for f16, f80, and f128, anyway.
Can anyone explain the hard requirement for t being in the range [0,1]? Coming from a gamedev background, I've never seen a lerp implementation that enforces the expected range of |
You're free to open an issue or a PR for that. |
It looks like one of these tests rely n -try testing.expectEqual(@as(f64, 0.0), lerp(@as(f64, 1.0e16), 1.0, 1.0));
+try testing.expectEqual(@as(f64, 0.0), lerp(@as(f64, 1.0e16), @as(f64, 1.0), @as(f64, 1.0))); AIR when using
AIR when not using
This ends up getting optimized to not even do the multiply in the I discovered this during some CBE changes that resulted in this optimization not happening, I'm not actually sure what the resolution should be. It could actually be a bug in the fma implementation in our compiler_rt, assuming the result should be zero? |
I think it'd be cool if we could have a basic lerp function in the std.