New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Language: Infinity and NaN #818
Comments
Partially addressed in #781.
|
So add NaN? |
The problem with capitalizing it is that the convention in Wren is for methods (even if they're effectively constants) to begin with a lower case letter. So I wouldn't be against
|
NaN is a bit complicated. There are many byte representations for NaN
values, and some have some meanings in the CPU and in wren as value
holders. I'm not against adding `Num::nan` but it's value should be
carefully thought.
|
Yes. Like I said, in C++, you can specify both quiet NaN and signaling NaN. However, other languages have only one NaN. So yes, we should be careful, but we use just 3 bits from the mantissa ,so we still have 48 bits (52 of mantissa - sign bit is used to distinguish pointers - highest mantissa bit is set to mean quiet and not signaling NaN - 3 LSB bits denote type). We just need to be careful to not use the About capitalizing, Java for example uses I also think |
The real problem is the CPU/libc. It produce NaN where expected, but not
always the same/sane value, making wren a little bit fragile.
|
So, by |
Although @mhermier is right to inject a note of caution, ISTM that the problems of NaNs are already with us, whether we adopt this proposal or not. Left to their own devices, I suspect that nearly everybody will generate a NaN value using the canonical 0.0/0.0 calculation and that the only real benefit of this proposal is to save them from having to declare their own variable to hold such a value. As long as Num.nan (or whatever we're going to call it) generates the NaN value in this way and it's made clear in the documentation that this is what it's doing, the position is therefore going to be the same as it is already and anybody who wishes to generate their NaN values using a different calculation will, of course, still be free to do so. The documentation for the existing Num.isNan method already states that it returns true for the result of 0/0 (presumably with good reason) so at least we'd be consistent here. |
The biggest problem I have is that `Num.nan` would not be a real singleton
(modulo it's sign). There is no guarantee, it is only CPU/libc dependent.
(And to make it worse `wrenValuesSame` already shows a different behavior
depending on Value representations of doubles).
|
Sure, there's a theoretical concern here. But does it really matter in practice if the bit values produced by 0/0 (say) may differ as long as they're within the accepted NaN range, are recognized as such by Num.isNan and don't interfere with Wren's 'NaN tagging' approach? Even when the bit values are exactly the same, NaNs are never going to compare equal in any case. |
Yes, you can catch them using `Num::isNan` but then what is the point of
producing a NaN in code? I mean I see only 3 cases.
- Get the value to perform a compare, that emplies that it is a stable
singleton.
- Write a math function that is not in C. One can argue that you can
shortcut the function to produce NaN early, but considering the nature of
these functions short-circuit in wren would be quite *costly*, and you'll
probably want to optimise that function in C anyway.
- Pass errors by using NaN, but we can use more reliable singleton as
errors like `null` or user defined.
If you have counter examples, I would like to see some.
|
I don't follow why NaN needs to be a stable singleton to perform a compare. AFAIK, NaN values never compare as equal even if they have exactly the same bit representation. Indeed, checking that a value isn't equal to itself is one of the standard ways of checking for a NaN. But, as I said earlier, the only value I see in this proposal is as a convenience method to generate a NaN using the 0/0 calculation that folks would probably have used anyway. We already know (or think we know) that applying isNan to the result will return true because the documentation says so. So, the question is does anyone know of an actual CPU/libc set up which contradicts this? If such a setup exists, then not only is the proposal pointless but we shouldn't be using NaNs anyway for their usual purpose of indicating an absent value or propagating errors. It would be better to use 'null' instead for this purpose even though Null and Num are distinct types in Wren. |
This is all about `Object::same(,)` that test about instance equality and
not value equality. If `Num::nan` is introduced without warning, one could
assume it is a singleton used for every operation that returns a NaN.
Leading to some incomprehension about why `Object.same(Num.nan,
some_nan_result)` could return true or false depending on the CPU/libc.
I know for sure that at least x86 and arm don't use the same default NaN
values on math operations. And I remember, wren writting "-NaN" on the
console in the days of dealing with `Num::toString` from glibc, before it
was dealt in wren so it is stable across platform.
|
OK, that's a good point and so we're not just discussing this in a vacuum, I've written a few examples on my x86-64 machine which demonstrate that, even on the same CPU, NaNs produced in different ways don't always have the same bit representations: var a = 0/0
var b = 0/0
var c = 1/0 - 1/0
var d = (-1).sqrt
var e = (-1).log
var f = (-2).asin
System.print(a == b) // false as both Nan
System.print(a == a) // ditto
System.print(Object.same(a, b)) // true, equivalent state
System.print(Object.same(a, a)) // true, equivalent state
System.print(Object.same(a, c)) // true
System.print(Object.same(a, d)) // true
System.print(e.isNan) // true
System.print(Object.same(a, e)) // false!
System.print(f.isNan) // true
System.print(Object.same(a, f)) // false!
System.print(Object.same(e, f)) // true
System.print(e == f) // false But my point is that we're living with this sort of nonsense already and that introducing a Num.nan method which always produces a NaN using 0/0 isn't going to make this any worse than it already is. It will also give us the opportunity to stress on people in the docs that NaNs are tricky things and that some thought is required before using them. So, I'm still marginally in favor of the proposal. |
So I think that the behavior of // ECMA#sec-samevalue
// This algorithm differs from the Strict Equality Comparison Algorithm in its
// treatment of signed zeroes and NaNs.
void CodeStubAssembler::BranchIfSameValue(SloppyTNode<Object> lhs,
SloppyTNode<Object> rhs,
Label* if_true, Label* if_false,
SameValueMode mode) {
TVARIABLE(Float64T, var_lhs_value);
TVARIABLE(Float64T, var_rhs_value);
Label do_fcmp(this);
// Immediately jump to {if_true} if {lhs} == {rhs}, because - unlike
// StrictEqual - SameValue considers two NaNs to be equal.
GotoIf(TaggedEqual(lhs, rhs), if_true);
// Check if the {lhs} is a Smi.
Label if_lhsissmi(this), if_lhsisheapobject(this);
Branch(TaggedIsSmi(lhs), &if_lhsissmi, &if_lhsisheapobject);
BIND(&if_lhsissmi);
{
// Since {lhs} is a Smi, the comparison can only yield true
// iff the {rhs} is a HeapNumber with the same float64 value.
Branch(TaggedIsSmi(rhs), if_false, [&] {
GotoIfNot(IsHeapNumber(CAST(rhs)), if_false);
var_lhs_value = SmiToFloat64(CAST(lhs));
var_rhs_value = LoadHeapNumberValue(CAST(rhs));
Goto(&do_fcmp);
});
}
BIND(&if_lhsisheapobject);
{
// Check if the {rhs} is a Smi.
Branch(
TaggedIsSmi(rhs),
[&] {
// Since {rhs} is a Smi, the comparison can only yield true
// iff the {lhs} is a HeapNumber with the same float64 value.
GotoIfNot(IsHeapNumber(CAST(lhs)), if_false);
var_lhs_value = LoadHeapNumberValue(CAST(lhs));
var_rhs_value = SmiToFloat64(CAST(rhs));
Goto(&do_fcmp);
},
[&] {
// Now this can only yield true if either both {lhs} and {rhs} are
// HeapNumbers with the same value, or both are Strings with the
// same character sequence, or both are BigInts with the same
// value.
Label if_lhsisheapnumber(this), if_lhsisstring(this),
if_lhsisbigint(this);
const TNode<Map> lhs_map = LoadMap(CAST(lhs));
GotoIf(IsHeapNumberMap(lhs_map), &if_lhsisheapnumber);
if (mode != SameValueMode::kNumbersOnly) {
const TNode<Uint16T> lhs_instance_type =
LoadMapInstanceType(lhs_map);
GotoIf(IsStringInstanceType(lhs_instance_type), &if_lhsisstring);
GotoIf(IsBigIntInstanceType(lhs_instance_type), &if_lhsisbigint);
}
Goto(if_false);
BIND(&if_lhsisheapnumber);
{
GotoIfNot(IsHeapNumber(CAST(rhs)), if_false);
var_lhs_value = LoadHeapNumberValue(CAST(lhs));
var_rhs_value = LoadHeapNumberValue(CAST(rhs));
Goto(&do_fcmp);
}
if (mode != SameValueMode::kNumbersOnly) {
BIND(&if_lhsisstring);
{
// Now we can only yield true if {rhs} is also a String
// with the same sequence of characters.
GotoIfNot(IsString(CAST(rhs)), if_false);
const TNode<Object> result = CallBuiltin(
Builtins::kStringEqual, NoContextConstant(), lhs, rhs);
Branch(IsTrue(result), if_true, if_false);
}
BIND(&if_lhsisbigint);
{
GotoIfNot(IsBigInt(CAST(rhs)), if_false);
const TNode<Object> result = CallRuntime(
Runtime::kBigIntEqualToBigInt, NoContextConstant(), lhs, rhs);
Branch(IsTrue(result), if_true, if_false);
}
}
});
}
BIND(&do_fcmp);
{
TNode<Float64T> lhs_value = UncheckedCast<Float64T>(var_lhs_value.value());
TNode<Float64T> rhs_value = UncheckedCast<Float64T>(var_rhs_value.value());
BranchIfSameNumberValue(lhs_value, rhs_value, if_true, if_false);
}
}
void CodeStubAssembler::BranchIfSameNumberValue(TNode<Float64T> lhs_value,
TNode<Float64T> rhs_value,
Label* if_true,
Label* if_false) {
Label if_equal(this), if_notequal(this);
Branch(Float64Equal(lhs_value, rhs_value), &if_equal, &if_notequal);
BIND(&if_equal);
{
// We still need to handle the case when {lhs} and {rhs} are -0.0 and
// 0.0 (or vice versa). Compare the high word to
// distinguish between the two.
const TNode<Uint32T> lhs_hi_word = Float64ExtractHighWord32(lhs_value);
const TNode<Uint32T> rhs_hi_word = Float64ExtractHighWord32(rhs_value);
// If x is +0 and y is -0, return false.
// If x is -0 and y is +0, return false.
Branch(Word32Equal(lhs_hi_word, rhs_hi_word), if_true, if_false);
}
BIND(&if_notequal);
{
// Return true iff both {rhs} and {lhs} are NaN.
GotoIf(Float64Equal(lhs_value, lhs_value), if_false);
Branch(Float64Equal(rhs_value, rhs_value), if_false, if_true);
}
} Here's what the code doing: compare by bits (because if bits equal that must be equal). If equal - done, return true. If not: if either one is not a number - done, return false. Else (both are numbers) - compare them using CPU's floating-point compare. If equal, compare the MSB word (bitwise compare). If equal, return true. Else one is minus zero and the other is zero (they're equal but not same), return false (I guess they could also compare only one MSB bit or MSB byte, but probably the above is faster). If the floating-point comparison is not equal, check if both are NaNs. If true, return true (same). If false, return false. Yeah, this is longer and slower than just bitwise-compare, but this is the only correct way. |
I'm surprised that Object.same() and the equals operator don't give the same results for NaNs unless their bit representations differ because the docs had led me to believe that they would do for a value type such as Num. In practice, I always use the == operator for Nums anyway and wouldn't have known there was a difference if @mhermier hadn't pointed it out. It's probably been done that way in the interests of performance and (judging by the V8 code) keeping Wren's code nice and tight. I wouldn't call it wrong just unexpected :) One advantage is that it does enable us to pinpoint when NaNs are different though how useful this is in the context of writing Wren code is questionable. As I intimated earlier, I don't think this invalidates your proposal as long as the docs make it clear how the NaN is being calculated and perhaps include a note on the treatment of NaNs generally. |
There is nothing wrong as long as the behaviour is documented.
I think that the ieee754 committee did a *poor* job by not specifying a
default value for math errors. This led to gazillions possible error
values, instead of having one error (with possibly negative version) and
user defined *errors*.
|
I think we do need to follow JS in this manner. |
Either way a clarification of the docs can't be bad. |
|
Updated #781 with preliminary support for QNAN checking, and |
So let do the branch directly in |
My experiment shows that is has almost zero impact at least on my CPU. I would like to see some confirmation from other. |
Why take the risk? Nothing bad will happen from an acceleration of some nanoseconds. Also, we only have tiny benches for Wren, and to accurately estimate the impact we need real-world use cases. |
What really makes me nervous is that a rogue math function/user/CPU achieve to produce a NaN that is compatible with a valid object, not that it matters a lot yet, but is a security issue. |
I already said that (#818 (comment)):
|
Not really likely. But one possibility could be to trick a C function binding that parse a string like "-NaN(0x012345)" and inject a valid valid object like |
I think we can live with this, because by definition, NaN tagging is not portable. This is also stated in the code. So if we meet a processor that somewhen generates a NaN that interferes with other objects, we can turn NaN tagging off. |
Addressed in 0.4.0. |
Currently, due to C's
double
nature (and IEEE-754), you can getNaN
via0 / 0
,Infinity
via1 / 0
, and-Infinity
via-1 / 0
(or-(1 / 0)
, i.e.-Infinity
). Example:I think there are two things two improve here:
.toString
is bad. Really 😺. I think it would be better to capitalize it, that is,NaN
,Infinity
, and-Infinity
.NaN
andInfinity
. C and C++ use macros (NAN
andINFINITY
in<math.h>
or<cmath>
, since C99 and C++ 11), which are similar.double.NaN
,double.PositiveInfinity
, anddouble.NegativeInfinity
(or theirfloat
counterparts). Java has similar names:Double.NaN
,Double.POSITIVE_INFINITY
,Double.NEGATIVE_INFINITY
. Python hasmath.nan
andmath.inf
since 3.5; before that, the recommended way was to convert from string (see https://stackoverflow.com/a/7781273/7884305, https://stackoverflow.com/a/19374296/7884305). Rust hasf64::NAN
,f64::INFINITY
, andf64::NEG_INFINITY
(orf32
counterparts). C++ also has methods, inside<limits>
,std::numeric_limits<{double, float, long double}>::{infinity, quiet_NaN, signaling_NaN}()
.I think the methods approach is better. However, do we need to also insert
Num.negativeInfinity
or no because users can just do-Num.infinity
?The text was updated successfully, but these errors were encountered: