Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should ECMAScript implementations be allowed to canonicalize NaN when writing into an Array? #635

Open
littledan opened this issue Jul 7, 2016 · 30 comments · May be fixed by #758
Open

Comments

@littledan
Copy link
Member

V8 has several representations of Arrays based on different kinds of unboxed content, called ElementKind. One ElementKind is FAST_HOLEY_DOUBLE_ELEMENTS. With this kind, the Array may contain holes (missing properties which delegate to the prototype), but all non-hole elements are unboxed doubles (Numbers). The trick is to canonicalize NaNs when written into such an array, and use a particular NaN value for "the hole" to indicate that a lookup on the prototype is needed.

This implementation technique violates the ECMAScript specification because, in SetValueInBuffer, there is the following text:

An implementation must always choose the same encoding for each implementation distinguishable NaN value.

I believe this text was written with the intent of creating the cross-cutting effect that NaNs should always carry around their payload, and only get a new payload when a "new" NaN is created. (We could use some better spec text here to back that intention with more clear mechanics, and I've promised to do that, except I want to get this other issue worked out first.) A big piece of motivation for that is to prevent information leaks, e.g., in writing properties to frozen objects.

I'm wondering, is there a way that we could weaken this requirement? V8's practice of canonicalizing NaNs just in order to write them into certain objects seems to improve our performance, and it seems harmless to me, as I don't know of any information leaks that we're creating.

For a strawman: what if we simply removed that sentence and let SetValueInBuffer write whatever NaN representation it felt like? Is there another way we could convince ourselves that information leaks won't happen?

cc @bakkot @anba @erights @leobalter @verwaest

@ljharb
Copy link
Member

ljharb commented Jul 8, 2016

-1 from me - the fewer places we allow differentiable NaN values, the better.

However, I think this ties into the last meeting's discussion where we wanted, instead of the current situation, to enumerate exactly where implementations must canonicalize, and then let the rest remain unspecified?

@erights
Copy link

erights commented Jul 8, 2016

and then let the rest remain unspecified

This was not the outcome of the last tc39 mtg, and would not have addressed the information leak concern. The concern is the leakage of implementation choices creating an exploitable side channel that others can use to communicate information. The more choices are left to the implementation, the greater the danger of this side channel.

In particular, the opaque passing around of a value --- loading, storing, passing as argument, etc... --- should not cause the value to be visibly mutated. This is the assumption that pervades coding practice. No one thinks "If I store this value in variable x and then read it from variable x, it might be different." Certainly, no one is prepared to defensive code against that possibility. If attacker A causes defender B to engage in one load/store pattern vs another before defender B passes that same value to collaborator C, then C can tell which way A provoked B.

@littledan
Copy link
Member Author

littledan commented Jul 8, 2016

@ljharb Yes, it does tie into that. I was suggesting to formalize what I felt to be the current intention for spec semantics (where the wording isn't so consistent right now), but then it turned out that these are too restrictive to describe V8's behavior. I want to figure this issue out first because it changes how we'd go about clearing up the spec.

The upside of making NaN canonicalization more weakly specified is that it gives implementations more flexibility to optimize. Fast holey double arrays are really useful for cases where you call the Array constructor with a length and then write all Numbers into it. At the same time, not doing NaN boxing saves memory for pointers.

The downsides are mostly theoretical from what I can see: a hypothetical risk of information leaks, and more unspecified behavior, where it's already pretty loose what happens. Where would users want to take advantage of the current invariants of NaN semantics, which they wouldn't be able to do if semantics were weakened?

@littledan
Copy link
Member Author

littledan commented Jul 8, 2016

@erights Agreed that this is a separate topic from what we discussed in Munich.

What if we think about it differently, that the NaN value doesn't actually "have" a payload (like in ES5 days), and SetValueInBuffer materializes a payload out of thin air? I'm not sure what people would use these payloads for anyway. They don't have semantics that lend them to use for data storage in any web browser currently.

@erights
Copy link

erights commented Jul 8, 2016

canonicalize (multiple uses)

I want to be sure we all mean the same thing by this term. The normal English and Computer Science use of "canonicalize" is to reduce all values in the same equivalence class to one agreed representative value from that class. For example, if we take all NaNs to be in one equivalence class, then canonicalizing any NaN would produce the same given canonical NaN.

As I recall from the TC39 mtg, until this was pointed out, people we using "canonicalize" in almost the opposite sense. Since @littledan started this thread with "canonicalize" in the title: @littledan , what do you mean by "canonicalize" or "canonical" here?

@erights
Copy link

erights commented Jul 8, 2016

@ljharb Please also clarify what you meant by "canonicalize" in your post. Thanks.

@ljharb
Copy link
Member

ljharb commented Jul 8, 2016

@erights sorry, what i meant was: if i take two differentiable NaN values and pass them through the same code path "foo", and I get out two undifferentiable NaN values, then I consider that code path one that "canonicalizes". If I'm using the term wrong please do correct me.

@ljharb
Copy link
Member

ljharb commented Jul 8, 2016

@erights

No one thinks "If I store this value in variable x and then read it from variable x, it might be different."

In this case, writing into an array isn't storing it in a variable, it's storing it on an object property - and if nobody expects that behavior with object properties, then why does the language have setters?

@erights
Copy link

erights commented Jul 8, 2016

Getters and setters are functions. "storing" and "reading" from an accessor property are function invocations. If those functions merely pass the value around opaquely, for example, if the setter merely stores it in a variable it shares with the getter, and the getter merely reads that variable and returns its value, then the same constraint applies. If the setter or getter explicitly calculates a new NaN, that's a completely different story. It would be the same as invoking any other function that explicitly calculates a new NaN.

@erights
Copy link

erights commented Jul 8, 2016

@ljharb @littledan We only care about what is observable. If the only way to observe the bits of a NaN value is via SetValueInBuffer, and if this always canonicalizes so that we always observe only the same bit pattern for all NaNs, then, as far as is observable, this bit pattern is the only bit pattern that implementation assigns to NaN values. Such an implementation would agree with the consensus from the last tc39 mtg.

@concavelenz
Copy link

I'm a little confused. In what way are the different NaN values observable
from JS? If you can't write code to detect different NaN values what does
it matter whether the values are canonicalized or not?

On Thu, Jul 7, 2016 at 6:02 PM, Mark S. Miller notifications@github.com
wrote:

@ljharb https://github.com/ljharb @littledan
https://github.com/littledan We only care about what is observable. If
the only way to observe the bits of a NaN value is via SetValueInBuffer,
and if this always canonicalizes so that we always observe only the same
bit pattern
for all NaNs, then, as far as is observable, this bit
pattern is the only bit pattern that implementation assigns to NaN values.
Such an implementation would agree with the consensus from the last tc39
mtg.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#635 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/ABMDKiozv7F04IgiLHnWx9kjZ7Enq8Joks5qTaGOgaJpZM4JHj3w
.

@erights
Copy link

erights commented Jul 8, 2016

Serializing them to a typed array whose storage is then viewed via a different type.

@ljharb
Copy link
Member

ljharb commented Jul 8, 2016

@concavelenz http://npmjs.com/get-nans has a few examples.

@littledan
Copy link
Member Author

Sorry, by 'canonicalize' I meant in the obscure sense of replacing a NaN with any other NaN. Any suggestions for another word for this operation?

Replacing with a particular other NaN value on SetValueInBuffer would have serious negative performance implications for V8, and we previously got consensus at TC39 not to do that for this reason.

@leobalter
Copy link
Member

Replacing with a particular other NaN value on SetValueInBuffer would have serious negative performance implications for V8, and we previously got consensus at TC39 not to do that for this reason.

I remember that and I believe that should not be the point to canonicalize NaN values. I also guess having the same NaN generated by every distinguishable operations affects performance.

The trick is to canonicalize NaNs when written into such an array, and use a particular NaN value for "the hole" to indicate that a lookup on the prototype is needed.

How would that differentiate from SetValueInBuffer for TypedArrays?

I'm wondering, is there a way that we could weaken this requirement?

By weakening that requirement you mean something like this: An implementation may choose any IEEE 754 valid encoding for each implementation distinguishable NaN value. ?

Canonicalization would still be valid as keeping the payload would be valid as well. That would be implementation defined and the standards would not enforce distinguishable encodings we could observe.

what if we simply removed that sentence and let SetValueInBuffer write whatever NaN representation it felt like? Is there another way we could convince ourselves that information leaks won't happen?

I'm +1 for that or anything that looks like that phrase I mentioned.

@allenwb
Copy link
Member

allenwb commented Jul 8, 2016

An implementation must always choose the same encoding for each implementation distinguishable NaN value.

I believe this text was written with the intent of creating the cross-cutting effect that NaNs should always carry around their payload, and only get a new payload when a "new" NaN is created. (We could use some better spec text here to back that intention with more clear mechanics, and I've promised to do that, except I want to get this other issue worked out first.) A big piece of motivation for that is to prevent information leaks, e.g., in writing properties to frozen objects.

Rather than guessing at the intent of that spec. text it would be better to take a look at the bugs.ecmascript.org issues (and meeting notes and possibly some es-discuss posts) that led to its creation. That's why we keep those records.

Now, I'll violate my own advice and from memory try to explain the intent.

The spec. only cares things that are observable. We don't care about NaN encodings at all, unless there is a way to observe details of the actual encodings. ArrayBuffers+Typed Arrays, depending upon how they are implemented, may provide such an observation platform. If SetValueInBuffer was specified to store NaN values using a specific NaN encoding (ie, it canonicalized the NaN encodings), we wouldn't have that observability (ignoring any implementation defined features). But there may be overhead in doing such canonicalization that some engine implementors wanted to avoid.

One problem with observable, implementation-dependent encodings is that we loose deterministic behavior across independent implementations. EG, a program might behave differently on V8 then it does on Edge. In this case, people seemed willing to accept that. But a bigger problem would be if the same program behaved non-determinatically on the same implementation. EG, each time you ran your program on V8 it behaves differently. So, if we couldn't require cross implementation deterministic non-observability of internal NaN encodings (via ArrayBuffers) the decision was made to require internal consistency within an implementation. For example,

let NaN1234=generateNaNNumberValueWithEncoding(1234); //however
let NaN5678=generateNaNNumberValueWithEncoding(5678);
let buff1 = new Float64Array(2);
buff1[0]=NaN1234;
buff1[1]=NaN5678;
// it is implementation dependent whether the first 8 bytes of buff1.buffer are the same as the last 8 bytes

let buff2 = new Float64Array(2);
buff2[0]=NaN1234;
buff2[1]=NaN5678;
//the spec. requires that buff1.buffer and buff2.buffer contain the same sequence of byte values

That is what was intended to be required by the spec. language:

An implementation must always choose the same encoding for each implementation distinguishable NaN value.

Regarding FAST_HOLEY_DOUBLE_ELEMENTS, I don't think the above language even comes into play. To me, this array representation is not essentially different from what happens when implementations use NaN boxing of pointers. In either case, certain NaN encodings can not be used as Number values, so when those forbidden encodings are encountered in a Number context (usually via GetValueFromBuffer, but in the HOLEY case perhaps via [[DefineOwnProperty]]) they must be replaced with a different NaN encoding. This replacement may be observable via ArrayBuffer inspection.

I don't believe there is anything in the spec. that says you can't do such an encoding substitution when storing a value as an ordinary property value. Remember that property values hold ECMAScript language values of which Number is one possible types. The Number type does not deal with NaN encodings. It considers all Number NaN encodings used by the implementation to be the equivalent value. When NaN is stored as the Number value of a property and then retrieved the value you get back must be NaN. But nothing says the retrieved value must have the same internal NaN encoding as the NaN value that was stored.

@leobalter
Copy link
Member

Chrome and Firefox:

var buffer = new ArrayBuffer(4);
a = new Float32Array(buffer, 0);
b = new Uint8Array(buffer, 0)

a[0] = NaN;
console.log(b[0], b[1], b[2], b[3]); // 0 0 192 127

a[0] = Math.pow(-1, 0.5);
console.log(b[0], b[1], b[2], b[3]); // 0 0 192 255

When NaN is stored as the Number value of a property and then retrieved the value you get back must be NaN. But nothing says the retrieved value must have the same internal NaN encoding as the NaN value that was stored.

Yes. The problem comes when we don't retrieve the from same encoding chain as in the example above.

So, if we couldn't require cross implementation deterministic non-observability of internal NaN encodings (via ArrayBuffers) the decision was made to require internal consistency within an implementation.

Thanks for explaining this, it makes all the sense.

SetValueInBuffer: If value is NaN, rawValue may be set to any implementation chosen IEEE 754-2008 binary32 format Not-a-Number encoding. An implementation must always choose the same encoding for each implementation distinguishable NaN value.

It's important to notice the value may be set to any encoding. The spec only requires the encoding is preserved for each NaN value if the implementation chooses any different encoding.

So I guess the spec already allows the proposed canonicalization, I just missed this explanation before. We can solve this with a note.

@erights
Copy link

erights commented Jul 8, 2016

Just a very quick note for now until I have time to explain.

@allenwb No that is less that we agreed on and less than we need.
@leobalter No we cannot solve this with a note.

All, could someone please look at the notes for the last mtg. At least, could someone please find the term we came up with to use instead of "canonicalize" when we mean the opposite of canonicalize.

@littledan
Copy link
Member Author

Re the name of canonicalize: How about "exchange NaN"?

@allenwb 's interpretation provides semantics that I would personally be happy with, though it is a new one to me.

If we do want to say that Numbers explicitly carry around a NaN payload (as we were discussing at the previous meeting), we could make sure Allen's semantics are clearly specified if we simply apply an explicit ExchangeNaN operation on them during ValidateAndApplyPropertyDescriptor. That would make it clear that there is no side channel, as it would only be called in the path where the property is writable and we actually are mutating the object.

Alternatively, if we want to go back to saying that NaNs don't carry a payload, what if we say, normatively (rather than in a note), that ValidateAndApplyPropertyDescriptor must not have any observable side effect? Would that handle the side channel issue? Thanks to @bakkot for this suggestion.

@erights
Copy link

erights commented Jul 8, 2016

"exchange" implies that two things trade places. When I exchange my money for your rock, afterwards it is your money and my rock.

The mtg notes did record the perfect term: "scramble".
@littledan The notes attribute that to you, so thanks ;).

@littledan
Copy link
Member Author

In the November 2016 TC39 meeting, we reached consensus that the SetValueInBuffer definition can be loosened to allow optional canonicalization to a particular fixed value. This change would allow the behavior in V8 described in this thread. I'll follow up with a PR.

However, @jfbastien pointed out that NaN payloads sometimes change within processors in other ways. We did not come to a resolution on what to do about that issue.

@jfbastien
Copy link

Specifically, this can happen with x86's x87 stack load / store instructions fldl and fstpl. x87 has a bunch of issues, but it's used to pass floating-point values in the cdecl calling convention. We could still disallow NaN canonicalization on load / store, but I would suggest better research: this example should have been known, there may be others we're missing.

@littledan
Copy link
Member Author

@jfbastien How would you suggest semantics should be specified?

@jfbastien
Copy link

@littledan I'd back up a bit and instead ask: is there a point in specifying NaN bits? If JS says "a NaN is a NaN, we don't guarantee any particular encoding" what do we lose?

For comparison, we're trying to guarantee certain bits for WebAssembly because we want to allow NaN-boxing to work on top of WebAssembly. It's tricky! I don't think that's a goal for JS, but maybe I'm wrong.

One downside is that cmpxchg of FP values can always fail if NaNs aren't preserved (because cmpxchg instructions works on the bits as if through memcmp / memcpy). Maybe NaNs around SAB need to be preserved when copied, and elsewhere not.

Are there other downsides?

@littledan
Copy link
Member Author

Not sure what you mean by 'around'--how could you preserve a scalar NaN just some of the time?

It sounds to me like saying that NaN values are scrambled (given arbitrary payloads) when writing into a TypedArray/ArrayBuffer would provide a correct description of behavior. I would be happy with this specification. However, @erights has argued that this would hurt programmers' ability to reason about their code.

@littledan
Copy link
Member Author

@jfbastien Note that NaN payloads are preserved when inside a TypedArray, e.g. you can read the bytes out.

littledan added a commit to littledan/ecma262 that referenced this issue Dec 28, 2016
This patch legalizes V8's occasional canonicalization of NaNs
by changing SetValueInBuffer to *either* a particular value
for the implementation-distinguishable NaN value *or* an
implementation-defined canonical value.

This semantic change reached consensus at the November 2016
TC39 meeting.

Closes tc39#635
littledan added a commit to littledan/ecma262 that referenced this issue Mar 8, 2018
This patch legalizes V8's occasional canonicalization of NaNs
by changing SetValueInBuffer to *either* a particular value
for the implementation-distinguishable NaN value *or* an
implementation-defined canonical value.

This semantic change reached consensus at the November 2016
TC39 meeting.

Closes tc39#635
@littledan
Copy link
Member Author

littledan commented Jan 3, 2020

To clarify the status of this issue: From the discussion in #758 , it's not clear if optional canonicalization would be enough: @jfbastien pointed out that various issues may make even #758 reasonably unimplementable.

I'm not sure what, if any, guarantees we could make about the serialized NaN bit pattern--my intuition is to just remove the guarantees we currently have, as @waldemarhorwat has previously advocated (if I understood him correctly).

I don't plan to write a PR for this right now, but if someone else is interested, I'm happy to mentor them through the process and give a more detailed explanation of the history.

(Thanks for the ping on this issue, @ljharb !)

@erights
Copy link

erights commented Jan 3, 2020

may make even #758 reasonably implementable.

Did you mean "unimplementable"? Pointed out where?

@littledan
Copy link
Member Author

@erights In these notes, @jfbastien said:

JFB: Not always true. Sometimes moving a NaN changes its value on hardware.

@waldemarhorwat 's suggestion was:

WH: I don't see what the quest to nail this down to anything more than "SetValueInBuffer stores an arbitrary NaN" achieves. Arithmetic operations, maps, etc. generate arbitrary NaNs and the same concerns apply to those.

@anba
Copy link
Contributor

anba commented May 27, 2020

Rather than guessing at the intent of that spec. text it would be better to take a look at the bugs.ecmascript.org issues (and meeting notes and possibly some es-discuss posts) that led to its creation. That's why we keep those records.

https://tc39.es/archives/bugzilla/3508/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants