Language: Infinity and NaN #818

ChayimFriedman2 · 2020-10-05T01:19:52Z

Currently, due to C's double nature (and IEEE-754), you can get NaN via 0 / 0, Infinity via 1 / 0, and -Infinity via -1 / 0 (or -(1 / 0), i.e. -Infinity). Example:

var NaN = 0 / 0
var Infinity = 1 / 0
var MinusInfinity = -Infinity
System.print("NaN: %(NaN), Infinity: %(Infinity), MinusInfinity: %(MinusInfinity)") // NaN: nan, Infinity: infinity, MinusInfinity: -infinity
System.print("NaN == NaN? %(NaN == NaN)") // NaN == NaN? false

I think there are two things two improve here:

The .toString is bad. Really 😺. I think it would be better to capitalize it, that is, NaN, Infinity, and -Infinity.
We should have a way to construct them. There are two manners in different programming languages (you're welcome to add more languages):
1. JavaScript uses keywords: NaN and Infinity. C and C++ use macros (NAN and INFINITY in <math.h> or <cmath>, since C99 and C++ 11), which are similar.
2. Python, C#, Java and Rust uses methods/properties/constant. In C# it's double.NaN, double.PositiveInfinity, and double.NegativeInfinity (or their float counterparts). Java has similar names: Double.NaN, Double.POSITIVE_INFINITY, Double.NEGATIVE_INFINITY. Python has math.nan and math.inf since 3.5; before that, the recommended way was to convert from string (see https://stackoverflow.com/a/7781273/7884305, https://stackoverflow.com/a/19374296/7884305). Rust has f64::NAN, f64::INFINITY, and f64::NEG_INFINITY (or f32 counterparts). C++ also has methods, inside <limits>, std::numeric_limits<{double, float, long double}>::{infinity, quiet_NaN, signaling_NaN}().

I think the methods approach is better. However, do we need to also insert Num.negativeInfinity or no because users can just do -Num.infinity?

The text was updated successfully, but these errors were encountered:

mhermier · 2020-10-05T07:09:06Z

Partially addressed in #781.

ChayimFriedman2 · 2020-10-05T08:11:38Z

So add NaN?

PureFox48 · 2020-10-05T08:30:10Z

The problem with capitalizing it is that the convention in Wren is for methods (even if they're effectively constants) to begin with a lower case letter. So Num.NaN wouldn't sit too well with Num.pi.

I wouldn't be against Num.nan though.

Num.e might also be worth adding now that Num.expis being introduced in the next version, though I suppose one could just write 1.exp.

mhermier · 2020-10-05T09:13:46Z

NaN is a bit complicated. There are many byte representations for NaN values, and some have some meanings in the CPU and in wren as value holders. I'm not against adding `Num::nan` but it's value should be carefully thought.

ChayimFriedman2 · 2020-10-05T21:26:23Z

Yes. Like I said, in C++, you can specify both quiet NaN and signaling NaN. However, other languages have only one NaN. So yes, we should be careful, but we use just 3 bits from the mantissa ,so we still have 48 bits (52 of mantissa - sign bit is used to distinguish pointers - highest mantissa bit is set to mean quiet and not signaling NaN - 3 LSB bits denote type). We just need to be careful to not use the <math.h>'s NAN constant (it's implementation defined).

About capitalizing, Java for example uses NaN although its convention for constants says NAN should be used. I guess that's because this is abbr.. Both variants are acceptable, however (Python for example uses math.nan).

I also think Num.exp will be good (C++ 20, C#, JS, Python, all have it in addition to the exponent function).

mhermier · 2020-10-05T23:17:11Z

The real problem is the CPU/libc. It produce NaN where expected, but not always the same/sane value, making wren a little bit fragile.

ChayimFriedman2 · 2020-10-06T00:09:21Z

So, by 0 / 0, users can get, say, a string? (and a page fault of course)

PureFox48 · 2020-10-06T09:20:57Z

Although @mhermier is right to inject a note of caution, ISTM that the problems of NaNs are already with us, whether we adopt this proposal or not.

Left to their own devices, I suspect that nearly everybody will generate a NaN value using the canonical 0.0/0.0 calculation and that the only real benefit of this proposal is to save them from having to declare their own variable to hold such a value.

As long as Num.nan (or whatever we're going to call it) generates the NaN value in this way and it's made clear in the documentation that this is what it's doing, the position is therefore going to be the same as it is already and anybody who wishes to generate their NaN values using a different calculation will, of course, still be free to do so.

The documentation for the existing Num.isNan method already states that it returns true for the result of 0/0 (presumably with good reason) so at least we'd be consistent here.

mhermier · 2020-10-06T10:27:32Z

The biggest problem I have is that `Num.nan` would not be a real singleton (modulo it's sign). There is no guarantee, it is only CPU/libc dependent. (And to make it worse `wrenValuesSame` already shows a different behavior depending on Value representations of doubles).

PureFox48 · 2020-10-06T11:02:13Z

Sure, there's a theoretical concern here.

But does it really matter in practice if the bit values produced by 0/0 (say) may differ as long as they're within the accepted NaN range, are recognized as such by Num.isNan and don't interfere with Wren's 'NaN tagging' approach?

Even when the bit values are exactly the same, NaNs are never going to compare equal in any case.

mhermier · 2020-10-06T13:48:14Z

Yes, you can catch them using `Num::isNan` but then what is the point of producing a NaN in code? I mean I see only 3 cases. - Get the value to perform a compare, that emplies that it is a stable singleton. - Write a math function that is not in C. One can argue that you can shortcut the function to produce NaN early, but considering the nature of these functions short-circuit in wren would be quite *costly*, and you'll probably want to optimise that function in C anyway. - Pass errors by using NaN, but we can use more reliable singleton as errors like `null` or user defined. If you have counter examples, I would like to see some.

PureFox48 · 2020-10-06T15:03:42Z

I don't follow why NaN needs to be a stable singleton to perform a compare. AFAIK, NaN values never compare as equal even if they have exactly the same bit representation. Indeed, checking that a value isn't equal to itself is one of the standard ways of checking for a NaN.

But, as I said earlier, the only value I see in this proposal is as a convenience method to generate a NaN using the 0/0 calculation that folks would probably have used anyway. We already know (or think we know) that applying isNan to the result will return true because the documentation says so.

So, the question is does anyone know of an actual CPU/libc set up which contradicts this?

If such a setup exists, then not only is the proposal pointless but we shouldn't be using NaNs anyway for their usual purpose of indicating an absent value or propagating errors. It would be better to use 'null' instead for this purpose even though Null and Num are distinct types in Wren.

mhermier · 2020-10-06T15:46:08Z

This is all about `Object::same(,)` that test about instance equality and not value equality. If `Num::nan` is introduced without warning, one could assume it is a singleton used for every operation that returns a NaN. Leading to some incomprehension about why `Object.same(Num.nan, some_nan_result)` could return true or false depending on the CPU/libc. I know for sure that at least x86 and arm don't use the same default NaN values on math operations. And I remember, wren writting "-NaN" on the console in the days of dealing with `Num::toString` from glibc, before it was dealt in wren so it is stable across platform.

PureFox48 · 2020-10-06T17:14:16Z

OK, that's a good point and so we're not just discussing this in a vacuum, I've written a few examples on my x86-64 machine which demonstrate that, even on the same CPU, NaNs produced in different ways don't always have the same bit representations:

var a = 0/0
var b = 0/0
var c = 1/0 - 1/0
var d = (-1).sqrt
var e = (-1).log
var f = (-2).asin

System.print(a == b)             // false as both Nan
System.print(a == a)             // ditto

System.print(Object.same(a, b))  // true, equivalent state
System.print(Object.same(a, a))  // true, equivalent state
 
System.print(Object.same(a, c))  // true 
System.print(Object.same(a, d))  // true

System.print(e.isNan)            // true
System.print(Object.same(a, e))  // false!

System.print(f.isNan)            // true
System.print(Object.same(a, f))  // false!
System.print(Object.same(e, f))  // true
System.print(e == f)             // false

But my point is that we're living with this sort of nonsense already and that introducing a Num.nan method which always produces a NaN using 0/0 isn't going to make this any worse than it already is.

It will also give us the opportunity to stress on people in the docs that NaNs are tricky things and that some thought is required before using them.

So, I'm still marginally in favor of the proposal.

ChayimFriedman2 · 2020-10-06T21:57:40Z

So I think that the behavior of Object.same() is wrong. JS, for example, treats all of the above numbers as same (Object.is()). If you look into V8's implementation:

// ECMA#sec-samevalue
// This algorithm differs from the Strict Equality Comparison Algorithm in its
// treatment of signed zeroes and NaNs.
void CodeStubAssembler::BranchIfSameValue(SloppyTNode<Object> lhs,
                                          SloppyTNode<Object> rhs,
                                          Label* if_true, Label* if_false,
                                          SameValueMode mode) {
  TVARIABLE(Float64T, var_lhs_value);
  TVARIABLE(Float64T, var_rhs_value);
  Label do_fcmp(this);


  // Immediately jump to {if_true} if {lhs} == {rhs}, because - unlike
  // StrictEqual - SameValue considers two NaNs to be equal.
  GotoIf(TaggedEqual(lhs, rhs), if_true);


  // Check if the {lhs} is a Smi.
  Label if_lhsissmi(this), if_lhsisheapobject(this);
  Branch(TaggedIsSmi(lhs), &if_lhsissmi, &if_lhsisheapobject);


  BIND(&if_lhsissmi);
  {
    // Since {lhs} is a Smi, the comparison can only yield true
    // iff the {rhs} is a HeapNumber with the same float64 value.
    Branch(TaggedIsSmi(rhs), if_false, [&] {
      GotoIfNot(IsHeapNumber(CAST(rhs)), if_false);
      var_lhs_value = SmiToFloat64(CAST(lhs));
      var_rhs_value = LoadHeapNumberValue(CAST(rhs));
      Goto(&do_fcmp);
    });
  }


  BIND(&if_lhsisheapobject);
  {
    // Check if the {rhs} is a Smi.
    Branch(
        TaggedIsSmi(rhs),
        [&] {
          // Since {rhs} is a Smi, the comparison can only yield true
          // iff the {lhs} is a HeapNumber with the same float64 value.
          GotoIfNot(IsHeapNumber(CAST(lhs)), if_false);
          var_lhs_value = LoadHeapNumberValue(CAST(lhs));
          var_rhs_value = SmiToFloat64(CAST(rhs));
          Goto(&do_fcmp);
        },
        [&] {
          // Now this can only yield true if either both {lhs} and {rhs} are
          // HeapNumbers with the same value, or both are Strings with the
          // same character sequence, or both are BigInts with the same
          // value.
          Label if_lhsisheapnumber(this), if_lhsisstring(this),
              if_lhsisbigint(this);
          const TNode<Map> lhs_map = LoadMap(CAST(lhs));
          GotoIf(IsHeapNumberMap(lhs_map), &if_lhsisheapnumber);
          if (mode != SameValueMode::kNumbersOnly) {
            const TNode<Uint16T> lhs_instance_type =
                LoadMapInstanceType(lhs_map);
            GotoIf(IsStringInstanceType(lhs_instance_type), &if_lhsisstring);
            GotoIf(IsBigIntInstanceType(lhs_instance_type), &if_lhsisbigint);
          }
          Goto(if_false);


          BIND(&if_lhsisheapnumber);
          {
            GotoIfNot(IsHeapNumber(CAST(rhs)), if_false);
            var_lhs_value = LoadHeapNumberValue(CAST(lhs));
            var_rhs_value = LoadHeapNumberValue(CAST(rhs));
            Goto(&do_fcmp);
          }


          if (mode != SameValueMode::kNumbersOnly) {
            BIND(&if_lhsisstring);
            {
              // Now we can only yield true if {rhs} is also a String
              // with the same sequence of characters.
              GotoIfNot(IsString(CAST(rhs)), if_false);
              const TNode<Object> result = CallBuiltin(
                  Builtins::kStringEqual, NoContextConstant(), lhs, rhs);
              Branch(IsTrue(result), if_true, if_false);
            }


            BIND(&if_lhsisbigint);
            {
              GotoIfNot(IsBigInt(CAST(rhs)), if_false);
              const TNode<Object> result = CallRuntime(
                  Runtime::kBigIntEqualToBigInt, NoContextConstant(), lhs, rhs);
              Branch(IsTrue(result), if_true, if_false);
            }
          }
        });
  }


  BIND(&do_fcmp);
  {
    TNode<Float64T> lhs_value = UncheckedCast<Float64T>(var_lhs_value.value());
    TNode<Float64T> rhs_value = UncheckedCast<Float64T>(var_rhs_value.value());
    BranchIfSameNumberValue(lhs_value, rhs_value, if_true, if_false);
  }
}


void CodeStubAssembler::BranchIfSameNumberValue(TNode<Float64T> lhs_value,
                                                TNode<Float64T> rhs_value,
                                                Label* if_true,
                                                Label* if_false) {
  Label if_equal(this), if_notequal(this);
  Branch(Float64Equal(lhs_value, rhs_value), &if_equal, &if_notequal);


  BIND(&if_equal);
  {
    // We still need to handle the case when {lhs} and {rhs} are -0.0 and
    // 0.0 (or vice versa). Compare the high word to
    // distinguish between the two.
    const TNode<Uint32T> lhs_hi_word = Float64ExtractHighWord32(lhs_value);
    const TNode<Uint32T> rhs_hi_word = Float64ExtractHighWord32(rhs_value);


    // If x is +0 and y is -0, return false.
    // If x is -0 and y is +0, return false.
    Branch(Word32Equal(lhs_hi_word, rhs_hi_word), if_true, if_false);
  }


  BIND(&if_notequal);
  {
    // Return true iff both {rhs} and {lhs} are NaN.
    GotoIf(Float64Equal(lhs_value, lhs_value), if_false);
    Branch(Float64Equal(rhs_value, rhs_value), if_false, if_true);
  }
}

Here's what the code doing: compare by bits (because if bits equal that must be equal). If equal - done, return true. If not: if either one is not a number - done, return false. Else (both are numbers) - compare them using CPU's floating-point compare. If equal, compare the MSB word (bitwise compare). If equal, return true. Else one is minus zero and the other is zero (they're equal but not same), return false (I guess they could also compare only one MSB bit or MSB byte, but probably the above is faster). If the floating-point comparison is not equal, check if both are NaNs. If true, return true (same). If false, return false.

Yeah, this is longer and slower than just bitwise-compare, but this is the only correct way.

PureFox48 · 2020-10-07T08:35:27Z

I'm surprised that Object.same() and the equals operator don't give the same results for NaNs unless their bit representations differ because the docs had led me to believe that they would do for a value type such as Num. In practice, I always use the == operator for Nums anyway and wouldn't have known there was a difference if @mhermier hadn't pointed it out.

It's probably been done that way in the interests of performance and (judging by the V8 code) keeping Wren's code nice and tight.

I wouldn't call it wrong just unexpected :)

One advantage is that it does enable us to pinpoint when NaNs are different though how useful this is in the context of writing Wren code is questionable.

As I intimated earlier, I don't think this invalidates your proposal as long as the docs make it clear how the NaN is being calculated and perhaps include a note on the treatment of NaNs generally.

mhermier · 2020-10-07T09:06:33Z

There is nothing wrong as long as the behaviour is documented. I think that the ieee754 committee did a *poor* job by not specifying a default value for math errors. This led to gazillions possible error values, instead of having one error (with possibly negative version) and user defined *errors*.

ChayimFriedman2 · 2020-10-07T12:44:21Z

I think we do need to follow JS in this manner. Object.same() is not usually called, and so probably won't be a performance bottleneck. The code won't be too bad, too: V8's code scares anyway 😉. The reason for the length of the code for Object.is() in V8 is related to the fact that V8's builtins aren't C++, this is something like assembly.

ChayimFriedman2 · 2020-10-07T12:45:43Z

Either way a clarification of the docs can't be bad.

mhermier · 2020-10-07T16:34:04Z

Object::same(,) is not often called but wrenValuesSame is the used by the shortcut of wrenValueEqual which is the default equality and is used a lot. So if the solution is to test for qNaN (which should be the requirement in wren) then it has to be bench marked to see the impact of it. And with that solution, we should be in the clear to add Num::nan with a minimum of documentation.

mhermier · 2020-10-07T18:54:02Z

Updated #781 with preliminary support for QNAN checking, and Num::nan implementation. This is to have more feedback.

ChayimFriedman2 · 2020-10-07T22:05:04Z

Object::same(,) is not often called but wrenValuesSame is the used by the shortcut of wrenValueEqual which is the default equality and is used a lot.

So let do the branch directly in Object.same()!

mhermier · 2020-10-07T23:43:46Z

My experiment shows that is has almost zero impact at least on my CPU. I would like to see some confirmation from other.

ChayimFriedman2 · 2020-10-08T00:54:01Z

Why take the risk? Nothing bad will happen from an acceleration of some nanoseconds.

Also, we only have tiny benches for Wren, and to accurately estimate the impact we need real-world use cases.

mhermier · 2020-10-08T06:54:15Z

What really makes me nervous is that a rogue math function/user/CPU achieve to produce a NaN that is compatible with a valid object, not that it matters a lot yet, but is a security issue.

ChayimFriedman2 · 2020-10-08T07:11:49Z

I already said that (#818 (comment)):

So, by 0 / 0, users can get, say, a string? (and a page fault of course)

mhermier · 2020-10-08T07:20:07Z

Not really likely. But one possibility could be to trick a C function binding that parse a string like "-NaN(0x012345)" and inject a valid valid object like strtod does on some platform.

ChayimFriedman2 · 2020-10-08T07:38:52Z

I think we can live with this, because by definition, NaN tagging is not portable. This is also stated in the code. So if we meet a processor that somewhen generates a NaN that interferes with other objects, we can turn NaN tagging off.

ChayimFriedman2 · 2020-12-10T15:25:18Z

Addressed in 0.4.0.

ChayimFriedman2 closed this as completed Dec 10, 2020

PureFox48 mentioned this issue Sep 12, 2021

Are 0.0 and -0.0 equal or not? #1052

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Language: Infinity and NaN #818

Language: Infinity and NaN #818

ChayimFriedman2 commented Oct 5, 2020 •

edited

mhermier commented Oct 5, 2020 via email

ChayimFriedman2 commented Oct 5, 2020

PureFox48 commented Oct 5, 2020

mhermier commented Oct 5, 2020 via email

ChayimFriedman2 commented Oct 5, 2020

mhermier commented Oct 5, 2020 via email

ChayimFriedman2 commented Oct 6, 2020 •

edited

PureFox48 commented Oct 6, 2020

mhermier commented Oct 6, 2020 via email

PureFox48 commented Oct 6, 2020

mhermier commented Oct 6, 2020 via email

PureFox48 commented Oct 6, 2020

mhermier commented Oct 6, 2020 via email

PureFox48 commented Oct 6, 2020

ChayimFriedman2 commented Oct 6, 2020 •

edited

PureFox48 commented Oct 7, 2020

mhermier commented Oct 7, 2020 via email

ChayimFriedman2 commented Oct 7, 2020

ChayimFriedman2 commented Oct 7, 2020

mhermier commented Oct 7, 2020

mhermier commented Oct 7, 2020

ChayimFriedman2 commented Oct 7, 2020

mhermier commented Oct 7, 2020

ChayimFriedman2 commented Oct 8, 2020

mhermier commented Oct 8, 2020

ChayimFriedman2 commented Oct 8, 2020

mhermier commented Oct 8, 2020

ChayimFriedman2 commented Oct 8, 2020 •

edited

ChayimFriedman2 commented Dec 10, 2020

Language: Infinity and NaN #818

Language: Infinity and NaN #818

Comments

ChayimFriedman2 commented Oct 5, 2020 • edited

mhermier commented Oct 5, 2020 via email

ChayimFriedman2 commented Oct 5, 2020

PureFox48 commented Oct 5, 2020

mhermier commented Oct 5, 2020 via email

ChayimFriedman2 commented Oct 5, 2020

mhermier commented Oct 5, 2020 via email

ChayimFriedman2 commented Oct 6, 2020 • edited

PureFox48 commented Oct 6, 2020

mhermier commented Oct 6, 2020 via email

PureFox48 commented Oct 6, 2020

mhermier commented Oct 6, 2020 via email

PureFox48 commented Oct 6, 2020

mhermier commented Oct 6, 2020 via email

PureFox48 commented Oct 6, 2020

ChayimFriedman2 commented Oct 6, 2020 • edited

PureFox48 commented Oct 7, 2020

mhermier commented Oct 7, 2020 via email

ChayimFriedman2 commented Oct 7, 2020

ChayimFriedman2 commented Oct 7, 2020

mhermier commented Oct 7, 2020

mhermier commented Oct 7, 2020

ChayimFriedman2 commented Oct 7, 2020

mhermier commented Oct 7, 2020

ChayimFriedman2 commented Oct 8, 2020

mhermier commented Oct 8, 2020

ChayimFriedman2 commented Oct 8, 2020

mhermier commented Oct 8, 2020

ChayimFriedman2 commented Oct 8, 2020 • edited

ChayimFriedman2 commented Dec 10, 2020

ChayimFriedman2 commented Oct 5, 2020 •

edited

ChayimFriedman2 commented Oct 6, 2020 •

edited

ChayimFriedman2 commented Oct 6, 2020 •

edited

ChayimFriedman2 commented Oct 8, 2020 •

edited