Should ECMAScript implementations be allowed to canonicalize NaN when writing into an Array? #635

littledan · 2016-07-07T22:27:16Z

V8 has several representations of Arrays based on different kinds of unboxed content, called ElementKind. One ElementKind is FAST_HOLEY_DOUBLE_ELEMENTS. With this kind, the Array may contain holes (missing properties which delegate to the prototype), but all non-hole elements are unboxed doubles (Numbers). The trick is to canonicalize NaNs when written into such an array, and use a particular NaN value for "the hole" to indicate that a lookup on the prototype is needed.

This implementation technique violates the ECMAScript specification because, in SetValueInBuffer, there is the following text:

An implementation must always choose the same encoding for each implementation distinguishable NaN value.

I believe this text was written with the intent of creating the cross-cutting effect that NaNs should always carry around their payload, and only get a new payload when a "new" NaN is created. (We could use some better spec text here to back that intention with more clear mechanics, and I've promised to do that, except I want to get this other issue worked out first.) A big piece of motivation for that is to prevent information leaks, e.g., in writing properties to frozen objects.

I'm wondering, is there a way that we could weaken this requirement? V8's practice of canonicalizing NaNs just in order to write them into certain objects seems to improve our performance, and it seems harmless to me, as I don't know of any information leaks that we're creating.

For a strawman: what if we simply removed that sentence and let SetValueInBuffer write whatever NaN representation it felt like? Is there another way we could convince ourselves that information leaks won't happen?

cc @bakkot @anba @erights @leobalter @verwaest

ljharb · 2016-07-08T00:08:23Z

-1 from me - the fewer places we allow differentiable NaN values, the better.

However, I think this ties into the last meeting's discussion where we wanted, instead of the current situation, to enumerate exactly where implementations must canonicalize, and then let the rest remain unspecified?

erights · 2016-07-08T00:17:16Z

and then let the rest remain unspecified

This was not the outcome of the last tc39 mtg, and would not have addressed the information leak concern. The concern is the leakage of implementation choices creating an exploitable side channel that others can use to communicate information. The more choices are left to the implementation, the greater the danger of this side channel.

In particular, the opaque passing around of a value --- loading, storing, passing as argument, etc... --- should not cause the value to be visibly mutated. This is the assumption that pervades coding practice. No one thinks "If I store this value in variable x and then read it from variable x, it might be different." Certainly, no one is prepared to defensive code against that possibility. If attacker A causes defender B to engage in one load/store pattern vs another before defender B passes that same value to collaborator C, then C can tell which way A provoked B.

littledan · 2016-07-08T00:18:25Z

@ljharb Yes, it does tie into that. I was suggesting to formalize what I felt to be the current intention for spec semantics (where the wording isn't so consistent right now), but then it turned out that these are too restrictive to describe V8's behavior. I want to figure this issue out first because it changes how we'd go about clearing up the spec.

The upside of making NaN canonicalization more weakly specified is that it gives implementations more flexibility to optimize. Fast holey double arrays are really useful for cases where you call the Array constructor with a length and then write all Numbers into it. At the same time, not doing NaN boxing saves memory for pointers.

The downsides are mostly theoretical from what I can see: a hypothetical risk of information leaks, and more unspecified behavior, where it's already pretty loose what happens. Where would users want to take advantage of the current invariants of NaN semantics, which they wouldn't be able to do if semantics were weakened?

littledan · 2016-07-08T00:21:31Z

@erights Agreed that this is a separate topic from what we discussed in Munich.

What if we think about it differently, that the NaN value doesn't actually "have" a payload (like in ES5 days), and SetValueInBuffer materializes a payload out of thin air? I'm not sure what people would use these payloads for anyway. They don't have semantics that lend them to use for data storage in any web browser currently.

erights · 2016-07-08T00:22:09Z

canonicalize (multiple uses)

I want to be sure we all mean the same thing by this term. The normal English and Computer Science use of "canonicalize" is to reduce all values in the same equivalence class to one agreed representative value from that class. For example, if we take all NaNs to be in one equivalence class, then canonicalizing any NaN would produce the same given canonical NaN.

As I recall from the TC39 mtg, until this was pointed out, people we using "canonicalize" in almost the opposite sense. Since @littledan started this thread with "canonicalize" in the title: @littledan , what do you mean by "canonicalize" or "canonical" here?

erights · 2016-07-08T00:24:22Z

@ljharb Please also clarify what you meant by "canonicalize" in your post. Thanks.

ljharb · 2016-07-08T00:49:06Z

@erights sorry, what i meant was: if i take two differentiable NaN values and pass them through the same code path "foo", and I get out two undifferentiable NaN values, then I consider that code path one that "canonicalizes". If I'm using the term wrong please do correct me.

ljharb · 2016-07-08T00:50:25Z

@erights

No one thinks "If I store this value in variable x and then read it from variable x, it might be different."

In this case, writing into an array isn't storing it in a variable, it's storing it on an object property - and if nobody expects that behavior with object properties, then why does the language have setters?

erights · 2016-07-08T00:56:38Z

Getters and setters are functions. "storing" and "reading" from an accessor property are function invocations. If those functions merely pass the value around opaquely, for example, if the setter merely stores it in a variable it shares with the getter, and the getter merely reads that variable and returns its value, then the same constraint applies. If the setter or getter explicitly calculates a new NaN, that's a completely different story. It would be the same as invoking any other function that explicitly calculates a new NaN.

erights · 2016-07-08T01:02:01Z

@ljharb @littledan We only care about what is observable. If the only way to observe the bits of a NaN value is via SetValueInBuffer, and if this always canonicalizes so that we always observe only the same bit pattern for all NaNs, then, as far as is observable, this bit pattern is the only bit pattern that implementation assigns to NaN values. Such an implementation would agree with the consensus from the last tc39 mtg.

concavelenz · 2016-07-08T15:02:00Z

I'm a little confused. In what way are the different NaN values observable
from JS? If you can't write code to detect different NaN values what does
it matter whether the values are canonicalized or not?

On Thu, Jul 7, 2016 at 6:02 PM, Mark S. Miller notifications@github.com
wrote:

@ljharb https://github.com/ljharb @littledan
https://github.com/littledan We only care about what is observable. If
the only way to observe the bits of a NaN value is via SetValueInBuffer,
and if this always canonicalizes so that we always observe only the same
bit pattern for all NaNs, then, as far as is observable, this bit
pattern is the only bit pattern that implementation assigns to NaN values.
Such an implementation would agree with the consensus from the last tc39
mtg.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#635 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/ABMDKiozv7F04IgiLHnWx9kjZ7Enq8Joks5qTaGOgaJpZM4JHj3w
.

erights · 2016-07-08T16:05:37Z

Serializing them to a typed array whose storage is then viewed via a different type.

ljharb · 2016-07-08T16:37:13Z

@concavelenz http://npmjs.com/get-nans has a few examples.

littledan · 2016-07-08T16:59:54Z

Sorry, by 'canonicalize' I meant in the obscure sense of replacing a NaN with any other NaN. Any suggestions for another word for this operation?

Replacing with a particular other NaN value on SetValueInBuffer would have serious negative performance implications for V8, and we previously got consensus at TC39 not to do that for this reason.

leobalter · 2016-07-08T17:23:53Z

Replacing with a particular other NaN value on SetValueInBuffer would have serious negative performance implications for V8, and we previously got consensus at TC39 not to do that for this reason.

I remember that and I believe that should not be the point to canonicalize NaN values. I also guess having the same NaN generated by every distinguishable operations affects performance.

The trick is to canonicalize NaNs when written into such an array, and use a particular NaN value for "the hole" to indicate that a lookup on the prototype is needed.

How would that differentiate from SetValueInBuffer for TypedArrays?

I'm wondering, is there a way that we could weaken this requirement?

By weakening that requirement you mean something like this: An implementation may choose any IEEE 754 valid encoding for each implementation distinguishable NaN value. ?

Canonicalization would still be valid as keeping the payload would be valid as well. That would be implementation defined and the standards would not enforce distinguishable encodings we could observe.

what if we simply removed that sentence and let SetValueInBuffer write whatever NaN representation it felt like? Is there another way we could convince ourselves that information leaks won't happen?

I'm +1 for that or anything that looks like that phrase I mentioned.

allenwb · 2016-07-08T17:32:58Z

An implementation must always choose the same encoding for each implementation distinguishable NaN value.

I believe this text was written with the intent of creating the cross-cutting effect that NaNs should always carry around their payload, and only get a new payload when a "new" NaN is created. (We could use some better spec text here to back that intention with more clear mechanics, and I've promised to do that, except I want to get this other issue worked out first.) A big piece of motivation for that is to prevent information leaks, e.g., in writing properties to frozen objects.

Rather than guessing at the intent of that spec. text it would be better to take a look at the bugs.ecmascript.org issues (and meeting notes and possibly some es-discuss posts) that led to its creation. That's why we keep those records.

Now, I'll violate my own advice and from memory try to explain the intent.

The spec. only cares things that are observable. We don't care about NaN encodings at all, unless there is a way to observe details of the actual encodings. ArrayBuffers+Typed Arrays, depending upon how they are implemented, may provide such an observation platform. If SetValueInBuffer was specified to store NaN values using a specific NaN encoding (ie, it canonicalized the NaN encodings), we wouldn't have that observability (ignoring any implementation defined features). But there may be overhead in doing such canonicalization that some engine implementors wanted to avoid.

One problem with observable, implementation-dependent encodings is that we loose deterministic behavior across independent implementations. EG, a program might behave differently on V8 then it does on Edge. In this case, people seemed willing to accept that. But a bigger problem would be if the same program behaved non-determinatically on the same implementation. EG, each time you ran your program on V8 it behaves differently. So, if we couldn't require cross implementation deterministic non-observability of internal NaN encodings (via ArrayBuffers) the decision was made to require internal consistency within an implementation. For example,

let NaN1234=generateNaNNumberValueWithEncoding(1234); //however
let NaN5678=generateNaNNumberValueWithEncoding(5678);
let buff1 = new Float64Array(2);
buff1[0]=NaN1234;
buff1[1]=NaN5678;
// it is implementation dependent whether the first 8 bytes of buff1.buffer are the same as the last 8 bytes

let buff2 = new Float64Array(2);
buff2[0]=NaN1234;
buff2[1]=NaN5678;
//the spec. requires that buff1.buffer and buff2.buffer contain the same sequence of byte values

That is what was intended to be required by the spec. language:

An implementation must always choose the same encoding for each implementation distinguishable NaN value.

Regarding FAST_HOLEY_DOUBLE_ELEMENTS, I don't think the above language even comes into play. To me, this array representation is not essentially different from what happens when implementations use NaN boxing of pointers. In either case, certain NaN encodings can not be used as Number values, so when those forbidden encodings are encountered in a Number context (usually via GetValueFromBuffer, but in the HOLEY case perhaps via [[DefineOwnProperty]]) they must be replaced with a different NaN encoding. This replacement may be observable via ArrayBuffer inspection.

I don't believe there is anything in the spec. that says you can't do such an encoding substitution when storing a value as an ordinary property value. Remember that property values hold ECMAScript language values of which Number is one possible types. The Number type does not deal with NaN encodings. It considers all Number NaN encodings used by the implementation to be the equivalent value. When NaN is stored as the Number value of a property and then retrieved the value you get back must be NaN. But nothing says the retrieved value must have the same internal NaN encoding as the NaN value that was stored.

leobalter · 2016-07-08T18:01:31Z

Chrome and Firefox:

var buffer = new ArrayBuffer(4);
a = new Float32Array(buffer, 0);
b = new Uint8Array(buffer, 0)

a[0] = NaN;
console.log(b[0], b[1], b[2], b[3]); // 0 0 192 127

a[0] = Math.pow(-1, 0.5);
console.log(b[0], b[1], b[2], b[3]); // 0 0 192 255

When NaN is stored as the Number value of a property and then retrieved the value you get back must be NaN. But nothing says the retrieved value must have the same internal NaN encoding as the NaN value that was stored.

Yes. The problem comes when we don't retrieve the from same encoding chain as in the example above.

So, if we couldn't require cross implementation deterministic non-observability of internal NaN encodings (via ArrayBuffers) the decision was made to require internal consistency within an implementation.

Thanks for explaining this, it makes all the sense.

SetValueInBuffer: If value is NaN, rawValue may be set to any implementation chosen IEEE 754-2008 binary32 format Not-a-Number encoding. An implementation must always choose the same encoding for each implementation distinguishable NaN value.

It's important to notice the value may be set to any encoding. The spec only requires the encoding is preserved for each NaN value if the implementation chooses any different encoding.

So I guess the spec already allows the proposed canonicalization, I just missed this explanation before. We can solve this with a note.

erights · 2016-07-08T20:45:42Z

Just a very quick note for now until I have time to explain.

@allenwb No that is less that we agreed on and less than we need.
@leobalter No we cannot solve this with a note.

All, could someone please look at the notes for the last mtg. At least, could someone please find the term we came up with to use instead of "canonicalize" when we mean the opposite of canonicalize.

littledan · 2016-07-08T21:03:22Z

Re the name of canonicalize: How about "exchange NaN"?

@allenwb 's interpretation provides semantics that I would personally be happy with, though it is a new one to me.

If we do want to say that Numbers explicitly carry around a NaN payload (as we were discussing at the previous meeting), we could make sure Allen's semantics are clearly specified if we simply apply an explicit ExchangeNaN operation on them during ValidateAndApplyPropertyDescriptor. That would make it clear that there is no side channel, as it would only be called in the path where the property is writable and we actually are mutating the object.

Alternatively, if we want to go back to saying that NaNs don't carry a payload, what if we say, normatively (rather than in a note), that ValidateAndApplyPropertyDescriptor must not have any observable side effect? Would that handle the side channel issue? Thanks to @bakkot for this suggestion.

erights · 2016-07-08T21:14:16Z

"exchange" implies that two things trade places. When I exchange my money for your rock, afterwards it is your money and my rock.

The mtg notes did record the perfect term: "scramble".
@littledan The notes attribute that to you, so thanks ;).

littledan · 2016-12-03T11:57:11Z

In the November 2016 TC39 meeting, we reached consensus that the SetValueInBuffer definition can be loosened to allow optional canonicalization to a particular fixed value. This change would allow the behavior in V8 described in this thread. I'll follow up with a PR.

However, @jfbastien pointed out that NaN payloads sometimes change within processors in other ways. We did not come to a resolution on what to do about that issue.

jfbastien · 2016-12-03T16:57:50Z

Specifically, this can happen with x86's x87 stack load / store instructions fldl and fstpl. x87 has a bunch of issues, but it's used to pass floating-point values in the cdecl calling convention. We could still disallow NaN canonicalization on load / store, but I would suggest better research: this example should have been known, there may be others we're missing.

littledan · 2016-12-04T06:41:27Z

@jfbastien How would you suggest semantics should be specified?

jfbastien · 2016-12-04T17:05:00Z

@littledan I'd back up a bit and instead ask: is there a point in specifying NaN bits? If JS says "a NaN is a NaN, we don't guarantee any particular encoding" what do we lose?

For comparison, we're trying to guarantee certain bits for WebAssembly because we want to allow NaN-boxing to work on top of WebAssembly. It's tricky! I don't think that's a goal for JS, but maybe I'm wrong.

One downside is that cmpxchg of FP values can always fail if NaNs aren't preserved (because cmpxchg instructions works on the bits as if through memcmp / memcpy). Maybe NaNs around SAB need to be preserved when copied, and elsewhere not.

Are there other downsides?

littledan · 2016-12-04T23:11:25Z

Not sure what you mean by 'around'--how could you preserve a scalar NaN just some of the time?

It sounds to me like saying that NaN values are scrambled (given arbitrary payloads) when writing into a TypedArray/ArrayBuffer would provide a correct description of behavior. I would be happy with this specification. However, @erights has argued that this would hurt programmers' ability to reason about their code.

littledan · 2016-12-04T23:12:10Z

@jfbastien Note that NaN payloads are preserved when inside a TypedArray, e.g. you can read the bytes out.

This patch legalizes V8's occasional canonicalization of NaNs by changing SetValueInBuffer to *either* a particular value for the implementation-distinguishable NaN value *or* an implementation-defined canonical value. This semantic change reached consensus at the November 2016 TC39 meeting. Closes tc39#635

littledan · 2020-01-03T14:57:33Z

To clarify the status of this issue: From the discussion in #758 , it's not clear if optional canonicalization would be enough: @jfbastien pointed out that various issues may make even #758 reasonably unimplementable.

I'm not sure what, if any, guarantees we could make about the serialized NaN bit pattern--my intuition is to just remove the guarantees we currently have, as @waldemarhorwat has previously advocated (if I understood him correctly).

I don't plan to write a PR for this right now, but if someone else is interested, I'm happy to mentor them through the process and give a more detailed explanation of the history.

(Thanks for the ping on this issue, @ljharb !)

erights · 2020-01-03T18:53:17Z

may make even #758 reasonably implementable.

Did you mean "unimplementable"? Pointed out where?

littledan · 2020-01-03T19:25:39Z

@erights In these notes, @jfbastien said:

JFB: Not always true. Sometimes moving a NaN changes its value on hardware.

@waldemarhorwat 's suggestion was:

WH: I don't see what the quest to nail this down to anything more than "SetValueInBuffer stores an arbitrary NaN" achieves. Arithmetic operations, maps, etc. generate arbitrary NaNs and the same concerns apply to those.

anba · 2020-05-27T13:54:33Z

Rather than guessing at the intent of that spec. text it would be better to take a look at the bugs.ecmascript.org issues (and meeting notes and possibly some es-discuss posts) that led to its creation. That's why we keep those records.

https://tc39.es/archives/bugzilla/3508/

littledan linked a pull request Dec 28, 2016 that will close this issue

Normative: Allow NaN values to be *optionally* canoncalized #758

Open

acutmore mentioned this issue Feb 5, 2022

Requiring NaN canonicalization tc39/proposal-record-tuple#291

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should ECMAScript implementations be allowed to canonicalize NaN when writing into an Array? #635

Should ECMAScript implementations be allowed to canonicalize NaN when writing into an Array? #635

littledan commented Jul 7, 2016

ljharb commented Jul 8, 2016

erights commented Jul 8, 2016

littledan commented Jul 8, 2016 •

edited

Loading

littledan commented Jul 8, 2016 •

edited

Loading

erights commented Jul 8, 2016

erights commented Jul 8, 2016

ljharb commented Jul 8, 2016

ljharb commented Jul 8, 2016

erights commented Jul 8, 2016 •

edited

Loading

erights commented Jul 8, 2016

concavelenz commented Jul 8, 2016

erights commented Jul 8, 2016

ljharb commented Jul 8, 2016

littledan commented Jul 8, 2016

leobalter commented Jul 8, 2016

allenwb commented Jul 8, 2016

leobalter commented Jul 8, 2016

erights commented Jul 8, 2016

littledan commented Jul 8, 2016

erights commented Jul 8, 2016

littledan commented Dec 3, 2016

jfbastien commented Dec 3, 2016

littledan commented Dec 4, 2016

jfbastien commented Dec 4, 2016

littledan commented Dec 4, 2016

littledan commented Dec 4, 2016

littledan commented Jan 3, 2020 •

edited

Loading

erights commented Jan 3, 2020

littledan commented Jan 3, 2020

anba commented May 27, 2020

Should ECMAScript implementations be allowed to canonicalize NaN when writing into an Array? #635

Should ECMAScript implementations be allowed to canonicalize NaN when writing into an Array? #635

Comments

littledan commented Jul 7, 2016

ljharb commented Jul 8, 2016

erights commented Jul 8, 2016

littledan commented Jul 8, 2016 • edited Loading

littledan commented Jul 8, 2016 • edited Loading

erights commented Jul 8, 2016

erights commented Jul 8, 2016

ljharb commented Jul 8, 2016

ljharb commented Jul 8, 2016

erights commented Jul 8, 2016 • edited Loading

erights commented Jul 8, 2016

concavelenz commented Jul 8, 2016

erights commented Jul 8, 2016

ljharb commented Jul 8, 2016

littledan commented Jul 8, 2016

leobalter commented Jul 8, 2016

allenwb commented Jul 8, 2016

leobalter commented Jul 8, 2016

erights commented Jul 8, 2016

littledan commented Jul 8, 2016

erights commented Jul 8, 2016

littledan commented Dec 3, 2016

jfbastien commented Dec 3, 2016

littledan commented Dec 4, 2016

jfbastien commented Dec 4, 2016

littledan commented Dec 4, 2016

littledan commented Dec 4, 2016

littledan commented Jan 3, 2020 • edited Loading

erights commented Jan 3, 2020

littledan commented Jan 3, 2020

anba commented May 27, 2020

littledan commented Jul 8, 2016 •

edited

Loading

littledan commented Jul 8, 2016 •

edited

Loading

erights commented Jul 8, 2016 •

edited

Loading

littledan commented Jan 3, 2020 •

edited

Loading