Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Language: Infinity and NaN #818

Closed
ChayimFriedman2 opened this issue Oct 5, 2020 · 29 comments
Closed

Language: Infinity and NaN #818

ChayimFriedman2 opened this issue Oct 5, 2020 · 29 comments

Comments

@ChayimFriedman2
Copy link
Contributor

ChayimFriedman2 commented Oct 5, 2020

Currently, due to C's double nature (and IEEE-754), you can get NaN via 0 / 0, Infinity via 1 / 0, and -Infinity via -1 / 0 (or -(1 / 0), i.e. -Infinity). Example:

var NaN = 0 / 0
var Infinity = 1 / 0
var MinusInfinity = -Infinity
System.print("NaN: %(NaN), Infinity: %(Infinity), MinusInfinity: %(MinusInfinity)") // NaN: nan, Infinity: infinity, MinusInfinity: -infinity
System.print("NaN == NaN? %(NaN == NaN)") // NaN == NaN? false

I think there are two things two improve here:

  1. The .toString is bad. Really 😺. I think it would be better to capitalize it, that is, NaN, Infinity, and -Infinity.
  2. We should have a way to construct them. There are two manners in different programming languages (you're welcome to add more languages):
    1. JavaScript uses keywords: NaN and Infinity. C and C++ use macros (NAN and INFINITY in <math.h> or <cmath>, since C99 and C++ 11), which are similar.
    2. Python, C#, Java and Rust uses methods/properties/constant. In C# it's double.NaN, double.PositiveInfinity, and double.NegativeInfinity (or their float counterparts). Java has similar names: Double.NaN, Double.POSITIVE_INFINITY, Double.NEGATIVE_INFINITY. Python has math.nan and math.inf since 3.5; before that, the recommended way was to convert from string (see https://stackoverflow.com/a/7781273/7884305, https://stackoverflow.com/a/19374296/7884305). Rust has f64::NAN, f64::INFINITY, and f64::NEG_INFINITY (or f32 counterparts). C++ also has methods, inside <limits>, std::numeric_limits<{double, float, long double}>::{infinity, quiet_NaN, signaling_NaN}().

I think the methods approach is better. However, do we need to also insert Num.negativeInfinity or no because users can just do -Num.infinity?

@mhermier
Copy link
Contributor

mhermier commented Oct 5, 2020 via email

@ChayimFriedman2
Copy link
Contributor Author

So add NaN?

@PureFox48
Copy link
Contributor

The problem with capitalizing it is that the convention in Wren is for methods (even if they're effectively constants) to begin with a lower case letter. So Num.NaN wouldn't sit too well with Num.pi.

I wouldn't be against Num.nan though.

Num.e might also be worth adding now that Num.expis being introduced in the next version, though I suppose one could just write 1.exp.

@mhermier
Copy link
Contributor

mhermier commented Oct 5, 2020 via email

@ChayimFriedman2
Copy link
Contributor Author

Yes. Like I said, in C++, you can specify both quiet NaN and signaling NaN. However, other languages have only one NaN. So yes, we should be careful, but we use just 3 bits from the mantissa ,so we still have 48 bits (52 of mantissa - sign bit is used to distinguish pointers - highest mantissa bit is set to mean quiet and not signaling NaN - 3 LSB bits denote type). We just need to be careful to not use the <math.h>'s NAN constant (it's implementation defined).

About capitalizing, Java for example uses NaN although its convention for constants says NAN should be used. I guess that's because this is abbr.. Both variants are acceptable, however (Python for example uses math.nan).

I also think Num.exp will be good (C++ 20, C#, JS, Python, all have it in addition to the exponent function).

@mhermier
Copy link
Contributor

mhermier commented Oct 5, 2020 via email

@ChayimFriedman2
Copy link
Contributor Author

ChayimFriedman2 commented Oct 6, 2020

So, by 0 / 0, users can get, say, a string? (and a page fault of course)

@PureFox48
Copy link
Contributor

Although @mhermier is right to inject a note of caution, ISTM that the problems of NaNs are already with us, whether we adopt this proposal or not.

Left to their own devices, I suspect that nearly everybody will generate a NaN value using the canonical 0.0/0.0 calculation and that the only real benefit of this proposal is to save them from having to declare their own variable to hold such a value.

As long as Num.nan (or whatever we're going to call it) generates the NaN value in this way and it's made clear in the documentation that this is what it's doing, the position is therefore going to be the same as it is already and anybody who wishes to generate their NaN values using a different calculation will, of course, still be free to do so.

The documentation for the existing Num.isNan method already states that it returns true for the result of 0/0 (presumably with good reason) so at least we'd be consistent here.

@mhermier
Copy link
Contributor

mhermier commented Oct 6, 2020 via email

@PureFox48
Copy link
Contributor

Sure, there's a theoretical concern here.

But does it really matter in practice if the bit values produced by 0/0 (say) may differ as long as they're within the accepted NaN range, are recognized as such by Num.isNan and don't interfere with Wren's 'NaN tagging' approach?

Even when the bit values are exactly the same, NaNs are never going to compare equal in any case.

@mhermier
Copy link
Contributor

mhermier commented Oct 6, 2020 via email

@PureFox48
Copy link
Contributor

I don't follow why NaN needs to be a stable singleton to perform a compare. AFAIK, NaN values never compare as equal even if they have exactly the same bit representation. Indeed, checking that a value isn't equal to itself is one of the standard ways of checking for a NaN.

But, as I said earlier, the only value I see in this proposal is as a convenience method to generate a NaN using the 0/0 calculation that folks would probably have used anyway. We already know (or think we know) that applying isNan to the result will return true because the documentation says so.

So, the question is does anyone know of an actual CPU/libc set up which contradicts this?

If such a setup exists, then not only is the proposal pointless but we shouldn't be using NaNs anyway for their usual purpose of indicating an absent value or propagating errors. It would be better to use 'null' instead for this purpose even though Null and Num are distinct types in Wren.

@mhermier
Copy link
Contributor

mhermier commented Oct 6, 2020 via email

@PureFox48
Copy link
Contributor

OK, that's a good point and so we're not just discussing this in a vacuum, I've written a few examples on my x86-64 machine which demonstrate that, even on the same CPU, NaNs produced in different ways don't always have the same bit representations:

var a = 0/0
var b = 0/0
var c = 1/0 - 1/0
var d = (-1).sqrt
var e = (-1).log
var f = (-2).asin

System.print(a == b)             // false as both Nan
System.print(a == a)             // ditto

System.print(Object.same(a, b))  // true, equivalent state
System.print(Object.same(a, a))  // true, equivalent state
 
System.print(Object.same(a, c))  // true 
System.print(Object.same(a, d))  // true

System.print(e.isNan)            // true
System.print(Object.same(a, e))  // false!

System.print(f.isNan)            // true
System.print(Object.same(a, f))  // false!
System.print(Object.same(e, f))  // true
System.print(e == f)             // false

But my point is that we're living with this sort of nonsense already and that introducing a Num.nan method which always produces a NaN using 0/0 isn't going to make this any worse than it already is.

It will also give us the opportunity to stress on people in the docs that NaNs are tricky things and that some thought is required before using them.

So, I'm still marginally in favor of the proposal.

@ChayimFriedman2
Copy link
Contributor Author

ChayimFriedman2 commented Oct 6, 2020

So I think that the behavior of Object.same() is wrong. JS, for example, treats all of the above numbers as same (Object.is()). If you look into V8's implementation:

// ECMA#sec-samevalue
// This algorithm differs from the Strict Equality Comparison Algorithm in its
// treatment of signed zeroes and NaNs.
void CodeStubAssembler::BranchIfSameValue(SloppyTNode<Object> lhs,
                                          SloppyTNode<Object> rhs,
                                          Label* if_true, Label* if_false,
                                          SameValueMode mode) {
  TVARIABLE(Float64T, var_lhs_value);
  TVARIABLE(Float64T, var_rhs_value);
  Label do_fcmp(this);


  // Immediately jump to {if_true} if {lhs} == {rhs}, because - unlike
  // StrictEqual - SameValue considers two NaNs to be equal.
  GotoIf(TaggedEqual(lhs, rhs), if_true);


  // Check if the {lhs} is a Smi.
  Label if_lhsissmi(this), if_lhsisheapobject(this);
  Branch(TaggedIsSmi(lhs), &if_lhsissmi, &if_lhsisheapobject);


  BIND(&if_lhsissmi);
  {
    // Since {lhs} is a Smi, the comparison can only yield true
    // iff the {rhs} is a HeapNumber with the same float64 value.
    Branch(TaggedIsSmi(rhs), if_false, [&] {
      GotoIfNot(IsHeapNumber(CAST(rhs)), if_false);
      var_lhs_value = SmiToFloat64(CAST(lhs));
      var_rhs_value = LoadHeapNumberValue(CAST(rhs));
      Goto(&do_fcmp);
    });
  }


  BIND(&if_lhsisheapobject);
  {
    // Check if the {rhs} is a Smi.
    Branch(
        TaggedIsSmi(rhs),
        [&] {
          // Since {rhs} is a Smi, the comparison can only yield true
          // iff the {lhs} is a HeapNumber with the same float64 value.
          GotoIfNot(IsHeapNumber(CAST(lhs)), if_false);
          var_lhs_value = LoadHeapNumberValue(CAST(lhs));
          var_rhs_value = SmiToFloat64(CAST(rhs));
          Goto(&do_fcmp);
        },
        [&] {
          // Now this can only yield true if either both {lhs} and {rhs} are
          // HeapNumbers with the same value, or both are Strings with the
          // same character sequence, or both are BigInts with the same
          // value.
          Label if_lhsisheapnumber(this), if_lhsisstring(this),
              if_lhsisbigint(this);
          const TNode<Map> lhs_map = LoadMap(CAST(lhs));
          GotoIf(IsHeapNumberMap(lhs_map), &if_lhsisheapnumber);
          if (mode != SameValueMode::kNumbersOnly) {
            const TNode<Uint16T> lhs_instance_type =
                LoadMapInstanceType(lhs_map);
            GotoIf(IsStringInstanceType(lhs_instance_type), &if_lhsisstring);
            GotoIf(IsBigIntInstanceType(lhs_instance_type), &if_lhsisbigint);
          }
          Goto(if_false);


          BIND(&if_lhsisheapnumber);
          {
            GotoIfNot(IsHeapNumber(CAST(rhs)), if_false);
            var_lhs_value = LoadHeapNumberValue(CAST(lhs));
            var_rhs_value = LoadHeapNumberValue(CAST(rhs));
            Goto(&do_fcmp);
          }


          if (mode != SameValueMode::kNumbersOnly) {
            BIND(&if_lhsisstring);
            {
              // Now we can only yield true if {rhs} is also a String
              // with the same sequence of characters.
              GotoIfNot(IsString(CAST(rhs)), if_false);
              const TNode<Object> result = CallBuiltin(
                  Builtins::kStringEqual, NoContextConstant(), lhs, rhs);
              Branch(IsTrue(result), if_true, if_false);
            }


            BIND(&if_lhsisbigint);
            {
              GotoIfNot(IsBigInt(CAST(rhs)), if_false);
              const TNode<Object> result = CallRuntime(
                  Runtime::kBigIntEqualToBigInt, NoContextConstant(), lhs, rhs);
              Branch(IsTrue(result), if_true, if_false);
            }
          }
        });
  }


  BIND(&do_fcmp);
  {
    TNode<Float64T> lhs_value = UncheckedCast<Float64T>(var_lhs_value.value());
    TNode<Float64T> rhs_value = UncheckedCast<Float64T>(var_rhs_value.value());
    BranchIfSameNumberValue(lhs_value, rhs_value, if_true, if_false);
  }
}


void CodeStubAssembler::BranchIfSameNumberValue(TNode<Float64T> lhs_value,
                                                TNode<Float64T> rhs_value,
                                                Label* if_true,
                                                Label* if_false) {
  Label if_equal(this), if_notequal(this);
  Branch(Float64Equal(lhs_value, rhs_value), &if_equal, &if_notequal);


  BIND(&if_equal);
  {
    // We still need to handle the case when {lhs} and {rhs} are -0.0 and
    // 0.0 (or vice versa). Compare the high word to
    // distinguish between the two.
    const TNode<Uint32T> lhs_hi_word = Float64ExtractHighWord32(lhs_value);
    const TNode<Uint32T> rhs_hi_word = Float64ExtractHighWord32(rhs_value);


    // If x is +0 and y is -0, return false.
    // If x is -0 and y is +0, return false.
    Branch(Word32Equal(lhs_hi_word, rhs_hi_word), if_true, if_false);
  }


  BIND(&if_notequal);
  {
    // Return true iff both {rhs} and {lhs} are NaN.
    GotoIf(Float64Equal(lhs_value, lhs_value), if_false);
    Branch(Float64Equal(rhs_value, rhs_value), if_false, if_true);
  }
}

Here's what the code doing: compare by bits (because if bits equal that must be equal). If equal - done, return true. If not: if either one is not a number - done, return false. Else (both are numbers) - compare them using CPU's floating-point compare. If equal, compare the MSB word (bitwise compare). If equal, return true. Else one is minus zero and the other is zero (they're equal but not same), return false (I guess they could also compare only one MSB bit or MSB byte, but probably the above is faster). If the floating-point comparison is not equal, check if both are NaNs. If true, return true (same). If false, return false.

Yeah, this is longer and slower than just bitwise-compare, but this is the only correct way.

@PureFox48
Copy link
Contributor

I'm surprised that Object.same() and the equals operator don't give the same results for NaNs unless their bit representations differ because the docs had led me to believe that they would do for a value type such as Num. In practice, I always use the == operator for Nums anyway and wouldn't have known there was a difference if @mhermier hadn't pointed it out.

It's probably been done that way in the interests of performance and (judging by the V8 code) keeping Wren's code nice and tight.

I wouldn't call it wrong just unexpected :)

One advantage is that it does enable us to pinpoint when NaNs are different though how useful this is in the context of writing Wren code is questionable.

As I intimated earlier, I don't think this invalidates your proposal as long as the docs make it clear how the NaN is being calculated and perhaps include a note on the treatment of NaNs generally.

@mhermier
Copy link
Contributor

mhermier commented Oct 7, 2020 via email

@ChayimFriedman2
Copy link
Contributor Author

I think we do need to follow JS in this manner. Object.same() is not usually called, and so probably won't be a performance bottleneck. The code won't be too bad, too: V8's code scares anyway 😉. The reason for the length of the code for Object.is() in V8 is related to the fact that V8's builtins aren't C++, this is something like assembly.

@ChayimFriedman2
Copy link
Contributor Author

Either way a clarification of the docs can't be bad.

@mhermier
Copy link
Contributor

mhermier commented Oct 7, 2020

Object::same(,) is not often called but wrenValuesSame is the used by the shortcut of wrenValueEqual which is the default equality and is used a lot. So if the solution is to test for qNaN (which should be the requirement in wren) then it has to be bench marked to see the impact of it. And with that solution, we should be in the clear to add Num::nan with a minimum of documentation.

@mhermier
Copy link
Contributor

mhermier commented Oct 7, 2020

Updated #781 with preliminary support for QNAN checking, and Num::nan implementation. This is to have more feedback.

@ChayimFriedman2
Copy link
Contributor Author

Object::same(,) is not often called but wrenValuesSame is the used by the shortcut of wrenValueEqual which is the default equality and is used a lot.

So let do the branch directly in Object.same()!

@mhermier
Copy link
Contributor

mhermier commented Oct 7, 2020

My experiment shows that is has almost zero impact at least on my CPU. I would like to see some confirmation from other.

@ChayimFriedman2
Copy link
Contributor Author

Why take the risk? Nothing bad will happen from an acceleration of some nanoseconds.

Also, we only have tiny benches for Wren, and to accurately estimate the impact we need real-world use cases.

@mhermier
Copy link
Contributor

mhermier commented Oct 8, 2020

What really makes me nervous is that a rogue math function/user/CPU achieve to produce a NaN that is compatible with a valid object, not that it matters a lot yet, but is a security issue.

@ChayimFriedman2
Copy link
Contributor Author

I already said that (#818 (comment)):

So, by 0 / 0, users can get, say, a string? (and a page fault of course)

@mhermier
Copy link
Contributor

mhermier commented Oct 8, 2020

Not really likely. But one possibility could be to trick a C function binding that parse a string like "-NaN(0x012345)" and inject a valid valid object like strtod does on some platform.

@ChayimFriedman2
Copy link
Contributor Author

ChayimFriedman2 commented Oct 8, 2020

I think we can live with this, because by definition, NaN tagging is not portable. This is also stated in the code. So if we meet a processor that somewhen generates a NaN that interferes with other objects, we can turn NaN tagging off.

@ChayimFriedman2
Copy link
Contributor Author

Addressed in 0.4.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants