Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upShould ECMAScript implementations be allowed to canonicalize NaN when writing into an Array? #635
Comments
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
ljharb
Jul 8, 2016
Member
-1 from me - the fewer places we allow differentiable NaN values, the better.
However, I think this ties into the last meeting's discussion where we wanted, instead of the current situation, to enumerate exactly where implementations must canonicalize, and then let the rest remain unspecified?
|
-1 from me - the fewer places we allow differentiable NaN values, the better. However, I think this ties into the last meeting's discussion where we wanted, instead of the current situation, to enumerate exactly where implementations must canonicalize, and then let the rest remain unspecified? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
erights
Jul 8, 2016
and then let the rest remain unspecified
This was not the outcome of the last tc39 mtg, and would not have addressed the information leak concern. The concern is the leakage of implementation choices creating an exploitable side channel that others can use to communicate information. The more choices are left to the implementation, the greater the danger of this side channel.
In particular, the opaque passing around of a value --- loading, storing, passing as argument, etc... --- should not cause the value to be visibly mutated. This is the assumption that pervades coding practice. No one thinks "If I store this value in variable x and then read it from variable x, it might be different." Certainly, no one is prepared to defensive code against that possibility. If attacker A causes defender B to engage in one load/store pattern vs another before defender B passes that same value to collaborator C, then C can tell which way A provoked B.
erights
commented
Jul 8, 2016
This was not the outcome of the last tc39 mtg, and would not have addressed the information leak concern. The concern is the leakage of implementation choices creating an exploitable side channel that others can use to communicate information. The more choices are left to the implementation, the greater the danger of this side channel. In particular, the opaque passing around of a value --- loading, storing, passing as argument, etc... --- should not cause the value to be visibly mutated. This is the assumption that pervades coding practice. No one thinks "If I store this value in variable x and then read it from variable x, it might be different." Certainly, no one is prepared to defensive code against that possibility. If attacker A causes defender B to engage in one load/store pattern vs another before defender B passes that same value to collaborator C, then C can tell which way A provoked B. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
littledan
Jul 8, 2016
Member
@ljharb Yes, it does tie into that. I was suggesting to formalize what I felt to be the current intention for spec semantics (where the wording isn't so consistent right now), but then it turned out that these are too restrictive to describe V8's behavior. I want to figure this issue out first because it changes how we'd go about clearing up the spec.
The upside of making NaN canonicalization more weakly specified is that it gives implementations more flexibility to optimize. Fast holey double arrays are really useful for cases where you call the Array constructor with a length and then write all Numbers into it. At the same time, not doing NaN boxing saves memory for pointers.
The downsides are mostly theoretical from what I can see: a hypothetical risk of information leaks, and more unspecified behavior, where it's already pretty loose what happens. Where would users want to take advantage of the current invariants of NaN semantics, which they wouldn't be able to do if semantics were weakened?
|
@ljharb Yes, it does tie into that. I was suggesting to formalize what I felt to be the current intention for spec semantics (where the wording isn't so consistent right now), but then it turned out that these are too restrictive to describe V8's behavior. I want to figure this issue out first because it changes how we'd go about clearing up the spec. The upside of making NaN canonicalization more weakly specified is that it gives implementations more flexibility to optimize. Fast holey double arrays are really useful for cases where you call the Array constructor with a length and then write all Numbers into it. At the same time, not doing NaN boxing saves memory for pointers. The downsides are mostly theoretical from what I can see: a hypothetical risk of information leaks, and more unspecified behavior, where it's already pretty loose what happens. Where would users want to take advantage of the current invariants of NaN semantics, which they wouldn't be able to do if semantics were weakened? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
littledan
Jul 8, 2016
Member
@erights Agreed that this is a separate topic from what we discussed in Munich.
What if we think about it differently, that the NaN value doesn't actually "have" a payload (like in ES5 days), and SetValueInBuffer materializes a payload out of thin air? I'm not sure what people would use these payloads for anyway. They don't have semantics that lend them to use for data storage in any web browser currently.
|
@erights Agreed that this is a separate topic from what we discussed in Munich. What if we think about it differently, that the NaN value doesn't actually "have" a payload (like in ES5 days), and SetValueInBuffer materializes a payload out of thin air? I'm not sure what people would use these payloads for anyway. They don't have semantics that lend them to use for data storage in any web browser currently. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
erights
Jul 8, 2016
canonicalize (multiple uses)
I want to be sure we all mean the same thing by this term. The normal English and Computer Science use of "canonicalize" is to reduce all values in the same equivalence class to one agreed representative value from that class. For example, if we take all NaNs to be in one equivalence class, then canonicalizing any NaN would produce the same given canonical NaN.
As I recall from the TC39 mtg, until this was pointed out, people we using "canonicalize" in almost the opposite sense. Since @littledan started this thread with "canonicalize" in the title: @littledan , what do you mean by "canonicalize" or "canonical" here?
erights
commented
Jul 8, 2016
I want to be sure we all mean the same thing by this term. The normal English and Computer Science use of "canonicalize" is to reduce all values in the same equivalence class to one agreed representative value from that class. For example, if we take all NaNs to be in one equivalence class, then canonicalizing any NaN would produce the same given canonical NaN. As I recall from the TC39 mtg, until this was pointed out, people we using "canonicalize" in almost the opposite sense. Since @littledan started this thread with "canonicalize" in the title: @littledan , what do you mean by "canonicalize" or "canonical" here? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
erights
Jul 8, 2016
@ljharb Please also clarify what you meant by "canonicalize" in your post. Thanks.
erights
commented
Jul 8, 2016
|
@ljharb Please also clarify what you meant by "canonicalize" in your post. Thanks. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
ljharb
Jul 8, 2016
Member
@erights sorry, what i meant was: if i take two differentiable NaN values and pass them through the same code path "foo", and I get out two undifferentiable NaN values, then I consider that code path one that "canonicalizes". If I'm using the term wrong please do correct me.
|
@erights sorry, what i meant was: if i take two differentiable NaN values and pass them through the same code path "foo", and I get out two undifferentiable NaN values, then I consider that code path one that "canonicalizes". If I'm using the term wrong please do correct me. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
ljharb
Jul 8, 2016
Member
No one thinks "If I store this value in variable x and then read it from variable x, it might be different."
In this case, writing into an array isn't storing it in a variable, it's storing it on an object property - and if nobody expects that behavior with object properties, then why does the language have setters?
In this case, writing into an array isn't storing it in a variable, it's storing it on an object property - and if nobody expects that behavior with object properties, then why does the language have setters? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
erights
Jul 8, 2016
Getters and setters are functions. "storing" and "reading" from an accessor property are function invocations. If those functions merely pass the value around opaquely, for example, if the setter merely stores it in a variable it shares with the getter, and the getter merely reads that variable and returns its value, then the same constraint applies. If the setter or getter explicitly calculates a new NaN, that's a completely different story. It would be the same as invoking any other function that explicitly calculates a new NaN.
erights
commented
Jul 8, 2016
•
|
Getters and setters are functions. "storing" and "reading" from an accessor property are function invocations. If those functions merely pass the value around opaquely, for example, if the setter merely stores it in a variable it shares with the getter, and the getter merely reads that variable and returns its value, then the same constraint applies. If the setter or getter explicitly calculates a new NaN, that's a completely different story. It would be the same as invoking any other function that explicitly calculates a new NaN. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
erights
Jul 8, 2016
@ljharb @littledan We only care about what is observable. If the only way to observe the bits of a NaN value is via SetValueInBuffer, and if this always canonicalizes so that we always observe only the same bit pattern for all NaNs, then, as far as is observable, this bit pattern is the only bit pattern that implementation assigns to NaN values. Such an implementation would agree with the consensus from the last tc39 mtg.
erights
commented
Jul 8, 2016
|
@ljharb @littledan We only care about what is observable. If the only way to observe the bits of a NaN value is via SetValueInBuffer, and if this always canonicalizes so that we always observe only the same bit pattern for all NaNs, then, as far as is observable, this bit pattern is the only bit pattern that implementation assigns to NaN values. Such an implementation would agree with the consensus from the last tc39 mtg. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
concavelenz
Jul 8, 2016
I'm a little confused. In what way are the different NaN values observable
from JS? If you can't write code to detect different NaN values what does
it matter whether the values are canonicalized or not?
On Thu, Jul 7, 2016 at 6:02 PM, Mark S. Miller notifications@github.com
wrote:
@ljharb https://github.com/ljharb @littledan
https://github.com/littledan We only care about what is observable. If
the only way to observe the bits of a NaN value is via SetValueInBuffer,
and if this always canonicalizes so that we always observe only the same
bit pattern for all NaNs, then, as far as is observable, this bit
pattern is the only bit pattern that implementation assigns to NaN values.
Such an implementation would agree with the consensus from the last tc39
mtg.—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#635 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/ABMDKiozv7F04IgiLHnWx9kjZ7Enq8Joks5qTaGOgaJpZM4JHj3w
.
concavelenz
commented
Jul 8, 2016
|
I'm a little confused. In what way are the different NaN values observable On Thu, Jul 7, 2016 at 6:02 PM, Mark S. Miller notifications@github.com
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
erights
Jul 8, 2016
Serializing them to a typed array whose storage is then viewed via a different type.
erights
commented
Jul 8, 2016
|
Serializing them to a typed array whose storage is then viewed via a different type. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
@concavelenz http://npmjs.com/get-nans has a few examples. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
littledan
Jul 8, 2016
Member
Sorry, by 'canonicalize' I meant in the obscure sense of replacing a NaN with any other NaN. Any suggestions for another word for this operation?
Replacing with a particular other NaN value on SetValueInBuffer would have serious negative performance implications for V8, and we previously got consensus at TC39 not to do that for this reason.
|
Sorry, by 'canonicalize' I meant in the obscure sense of replacing a NaN with any other NaN. Any suggestions for another word for this operation? Replacing with a particular other NaN value on SetValueInBuffer would have serious negative performance implications for V8, and we previously got consensus at TC39 not to do that for this reason. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
leobalter
Jul 8, 2016
Member
Replacing with a particular other NaN value on SetValueInBuffer would have serious negative performance implications for V8, and we previously got consensus at TC39 not to do that for this reason.
I remember that and I believe that should not be the point to canonicalize NaN values. I also guess having the same NaN generated by every distinguishable operations affects performance.
The trick is to canonicalize NaNs when written into such an array, and use a particular NaN value for "the hole" to indicate that a lookup on the prototype is needed.
How would that differentiate from SetValueInBuffer for TypedArrays?
I'm wondering, is there a way that we could weaken this requirement?
By weakening that requirement you mean something like this: An implementation may choose any IEEE 754 valid encoding for each implementation distinguishable NaN value. ?
Canonicalization would still be valid as keeping the payload would be valid as well. That would be implementation defined and the standards would not enforce distinguishable encodings we could observe.
what if we simply removed that sentence and let SetValueInBuffer write whatever NaN representation it felt like? Is there another way we could convince ourselves that information leaks won't happen?
I'm +1 for that or anything that looks like that phrase I mentioned.
I remember that and I believe that should not be the point to canonicalize NaN values. I also guess having the same NaN generated by every distinguishable operations affects performance.
How would that differentiate from SetValueInBuffer for TypedArrays?
By weakening that requirement you mean something like this: Canonicalization would still be valid as keeping the payload would be valid as well. That would be implementation defined and the standards would not enforce distinguishable encodings we could observe.
I'm +1 for that or anything that looks like that phrase I mentioned. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
allenwb
Jul 8, 2016
Member
An implementation must always choose the same encoding for each implementation distinguishable NaN value.
I believe this text was written with the intent of creating the cross-cutting effect that NaNs should always carry around their payload, and only get a new payload when a "new" NaN is created. (We could use some better spec text here to back that intention with more clear mechanics, and I've promised to do that, except I want to get this other issue worked out first.) A big piece of motivation for that is to prevent information leaks, e.g., in writing properties to frozen objects.
Rather than guessing at the intent of that spec. text it would be better to take a look at the bugs.ecmascript.org issues (and meeting notes and possibly some es-discuss posts) that led to its creation. That's why we keep those records.
Now, I'll violate my own advice and from memory try to explain the intent.
The spec. only cares things that are observable. We don't care about NaN encodings at all, unless there is a way to observe details of the actual encodings. ArrayBuffers+Typed Arrays, depending upon how they are implemented, may provide such an observation platform. If SetValueInBuffer was specified to store NaN values using a specific NaN encoding (ie, it canonicalized the NaN encodings), we wouldn't have that observability (ignoring any implementation defined features). But there may be overhead in doing such canonicalization that some engine implementors wanted to avoid.
One problem with observable, implementation-dependent encodings is that we loose deterministic behavior across independent implementations. EG, a program might behave differently on V8 then it does on Edge. In this case, people seemed willing to accept that. But a bigger problem would be if the same program behaved non-determinatically on the same implementation. EG, each time you ran your program on V8 it behaves differently. So, if we couldn't require cross implementation deterministic non-observability of internal NaN encodings (via ArrayBuffers) the decision was made to require internal consistency within an implementation. For example,
let NaN1234=generateNaNNumberValueWithEncoding(1234); //however
let NaN5678=generateNaNNumberValueWithEncoding(5678);
let buff1 = new Float64Array(2);
buff1[0]=NaN1234;
buff1[1]=NaN5678;
// it is implementation dependent whether the first 8 bytes of buff1.buffer are the same as the last 8 bytes
let buff2 = new Float64Array(2);
buff2[0]=NaN1234;
buff2[1]=NaN5678;
//the spec. requires that buff1.buffer and buff2.buffer contain the same sequence of byte valuesThat is what was intended to be required by the spec. language:
An implementation must always choose the same encoding for each implementation distinguishable NaN value.
Regarding FAST_HOLEY_DOUBLE_ELEMENTS, I don't think the above language even comes into play. To me, this array representation is not essentially different from what happens when implementations use NaN boxing of pointers. In either case, certain NaN encodings can not be used as Number values, so when those forbidden encodings are encountered in a Number context (usually via GetValueFromBuffer, but in the HOLEY case perhaps via [[DefineOwnProperty]]) they must be replaced with a different NaN encoding. This replacement may be observable via ArrayBuffer inspection.
I don't believe there is anything in the spec. that says you can't do such an encoding substitution when storing a value as an ordinary property value. Remember that property values hold ECMAScript language values of which Number is one possible types. The Number type does not deal with NaN encodings. It considers all Number NaN encodings used by the implementation to be the equivalent value. When NaN is stored as the Number value of a property and then retrieved the value you get back must be NaN. But nothing says the retrieved value must have the same internal NaN encoding as the NaN value that was stored.
Rather than guessing at the intent of that spec. text it would be better to take a look at the bugs.ecmascript.org issues (and meeting notes and possibly some es-discuss posts) that led to its creation. That's why we keep those records. Now, I'll violate my own advice and from memory try to explain the intent. The spec. only cares things that are observable. We don't care about NaN encodings at all, unless there is a way to observe details of the actual encodings. ArrayBuffers+Typed Arrays, depending upon how they are implemented, may provide such an observation platform. If SetValueInBuffer was specified to store NaN values using a specific NaN encoding (ie, it canonicalized the NaN encodings), we wouldn't have that observability (ignoring any implementation defined features). But there may be overhead in doing such canonicalization that some engine implementors wanted to avoid. One problem with observable, implementation-dependent encodings is that we loose deterministic behavior across independent implementations. EG, a program might behave differently on V8 then it does on Edge. In this case, people seemed willing to accept that. But a bigger problem would be if the same program behaved non-determinatically on the same implementation. EG, each time you ran your program on V8 it behaves differently. So, if we couldn't require cross implementation deterministic non-observability of internal NaN encodings (via ArrayBuffers) the decision was made to require internal consistency within an implementation. For example, let NaN1234=generateNaNNumberValueWithEncoding(1234); //however
let NaN5678=generateNaNNumberValueWithEncoding(5678);
let buff1 = new Float64Array(2);
buff1[0]=NaN1234;
buff1[1]=NaN5678;
// it is implementation dependent whether the first 8 bytes of buff1.buffer are the same as the last 8 bytes
let buff2 = new Float64Array(2);
buff2[0]=NaN1234;
buff2[1]=NaN5678;
//the spec. requires that buff1.buffer and buff2.buffer contain the same sequence of byte valuesThat is what was intended to be required by the spec. language:
Regarding FAST_HOLEY_DOUBLE_ELEMENTS, I don't think the above language even comes into play. To me, this array representation is not essentially different from what happens when implementations use NaN boxing of pointers. In either case, certain NaN encodings can not be used as Number values, so when those forbidden encodings are encountered in a Number context (usually via GetValueFromBuffer, but in the HOLEY case perhaps via [[DefineOwnProperty]]) they must be replaced with a different NaN encoding. This replacement may be observable via ArrayBuffer inspection. I don't believe there is anything in the spec. that says you can't do such an encoding substitution when storing a value as an ordinary property value. Remember that property values hold ECMAScript language values of which Number is one possible types. The Number type does not deal with NaN encodings. It considers all Number NaN encodings used by the implementation to be the equivalent value. When NaN is stored as the Number value of a property and then retrieved the value you get back must be NaN. But nothing says the retrieved value must have the same internal NaN encoding as the NaN value that was stored. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
leobalter
Jul 8, 2016
Member
Chrome and Firefox:
var buffer = new ArrayBuffer(4);
a = new Float32Array(buffer, 0);
b = new Uint8Array(buffer, 0)
a[0] = NaN;
console.log(b[0], b[1], b[2], b[3]); // 0 0 192 127
a[0] = Math.pow(-1, 0.5);
console.log(b[0], b[1], b[2], b[3]); // 0 0 192 255When NaN is stored as the Number value of a property and then retrieved the value you get back must be NaN. But nothing says the retrieved value must have the same internal NaN encoding as the NaN value that was stored.
Yes. The problem comes when we don't retrieve the from same encoding chain as in the example above.
So, if we couldn't require cross implementation deterministic non-observability of internal NaN encodings (via ArrayBuffers) the decision was made to require internal consistency within an implementation.
Thanks for explaining this, it makes all the sense.
SetValueInBuffer: If value is NaN, rawValue may be set to any implementation chosen IEEE 754-2008 binary32 format Not-a-Number encoding. An implementation must always choose the same encoding for each implementation distinguishable NaN value.
It's important to notice the value may be set to any encoding. The spec only requires the encoding is preserved for each NaN value if the implementation chooses any different encoding.
So I guess the spec already allows the proposed canonicalization, I just missed this explanation before. We can solve this with a note.
|
Chrome and Firefox: var buffer = new ArrayBuffer(4);
a = new Float32Array(buffer, 0);
b = new Uint8Array(buffer, 0)
a[0] = NaN;
console.log(b[0], b[1], b[2], b[3]); // 0 0 192 127
a[0] = Math.pow(-1, 0.5);
console.log(b[0], b[1], b[2], b[3]); // 0 0 192 255
Yes. The problem comes when we don't retrieve the from same encoding chain as in the example above.
Thanks for explaining this, it makes all the sense.
It's important to notice the value may be set to any encoding. The spec only requires the encoding is preserved for each NaN value if the implementation chooses any different encoding. So I guess the spec already allows the proposed canonicalization, I just missed this explanation before. We can solve this with a note. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
erights
Jul 8, 2016
Just a very quick note for now until I have time to explain.
@allenwb No that is less that we agreed on and less than we need.
@leobalter No we cannot solve this with a note.
All, could someone please look at the notes for the last mtg. At least, could someone please find the term we came up with to use instead of "canonicalize" when we mean the opposite of canonicalize.
erights
commented
Jul 8, 2016
|
Just a very quick note for now until I have time to explain. @allenwb No that is less that we agreed on and less than we need. All, could someone please look at the notes for the last mtg. At least, could someone please find the term we came up with to use instead of "canonicalize" when we mean the opposite of canonicalize. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
littledan
Jul 8, 2016
Member
Re the name of canonicalize: How about "exchange NaN"?
@allenwb 's interpretation provides semantics that I would personally be happy with, though it is a new one to me.
If we do want to say that Numbers explicitly carry around a NaN payload (as we were discussing at the previous meeting), we could make sure Allen's semantics are clearly specified if we simply apply an explicit ExchangeNaN operation on them during ValidateAndApplyPropertyDescriptor. That would make it clear that there is no side channel, as it would only be called in the path where the property is writable and we actually are mutating the object.
Alternatively, if we want to go back to saying that NaNs don't carry a payload, what if we say, normatively (rather than in a note), that ValidateAndApplyPropertyDescriptor must not have any observable side effect? Would that handle the side channel issue? Thanks to @bakkot for this suggestion.
|
Re the name of canonicalize: How about "exchange NaN"? @allenwb 's interpretation provides semantics that I would personally be happy with, though it is a new one to me. If we do want to say that Numbers explicitly carry around a NaN payload (as we were discussing at the previous meeting), we could make sure Allen's semantics are clearly specified if we simply apply an explicit ExchangeNaN operation on them during ValidateAndApplyPropertyDescriptor. That would make it clear that there is no side channel, as it would only be called in the path where the property is writable and we actually are mutating the object. Alternatively, if we want to go back to saying that NaNs don't carry a payload, what if we say, normatively (rather than in a note), that ValidateAndApplyPropertyDescriptor must not have any observable side effect? Would that handle the side channel issue? Thanks to @bakkot for this suggestion. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
erights
Jul 8, 2016
"exchange" implies that two things trade places. When I exchange my money for your rock, afterwards it is your money and my rock.
The mtg notes did record the perfect term: "scramble".
@littledan The notes attribute that to you, so thanks ;).
erights
commented
Jul 8, 2016
|
"exchange" implies that two things trade places. When I exchange my money for your rock, afterwards it is your money and my rock. The mtg notes did record the perfect term: "scramble". |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
littledan
Dec 3, 2016
Member
In the November 2016 TC39 meeting, we reached consensus that the SetValueInBuffer definition can be loosened to allow optional canonicalization to a particular fixed value. This change would allow the behavior in V8 described in this thread. I'll follow up with a PR.
However, @jfbastien pointed out that NaN payloads sometimes change within processors in other ways. We did not come to a resolution on what to do about that issue.
|
In the November 2016 TC39 meeting, we reached consensus that the SetValueInBuffer definition can be loosened to allow optional canonicalization to a particular fixed value. This change would allow the behavior in V8 described in this thread. I'll follow up with a PR. However, @jfbastien pointed out that NaN payloads sometimes change within processors in other ways. We did not come to a resolution on what to do about that issue. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
jfbastien
Dec 3, 2016
Specifically, this can happen with x86's x87 stack load / store instructions fldl and fstpl. x87 has a bunch of issues, but it's used to pass floating-point values in the cdecl calling convention. We could still disallow NaN canonicalization on load / store, but I would suggest better research: this example should have been known, there may be others we're missing.
jfbastien
commented
Dec 3, 2016
|
Specifically, this can happen with x86's x87 stack load / store instructions |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
@jfbastien How would you suggest semantics should be specified? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
jfbastien
Dec 4, 2016
@littledan I'd back up a bit and instead ask: is there a point in specifying NaN bits? If JS says "a NaN is a NaN, we don't guarantee any particular encoding" what do we lose?
For comparison, we're trying to guarantee certain bits for WebAssembly because we want to allow NaN-boxing to work on top of WebAssembly. It's tricky! I don't think that's a goal for JS, but maybe I'm wrong.
One downside is that cmpxchg of FP values can always fail if NaNs aren't preserved (because cmpxchg instructions works on the bits as if through memcmp / memcpy). Maybe NaNs around SAB need to be preserved when copied, and elsewhere not.
Are there other downsides?
jfbastien
commented
Dec 4, 2016
|
@littledan I'd back up a bit and instead ask: is there a point in specifying NaN bits? If JS says "a NaN is a NaN, we don't guarantee any particular encoding" what do we lose? For comparison, we're trying to guarantee certain bits for WebAssembly because we want to allow NaN-boxing to work on top of WebAssembly. It's tricky! I don't think that's a goal for JS, but maybe I'm wrong. One downside is that cmpxchg of FP values can always fail if NaNs aren't preserved (because cmpxchg instructions works on the bits as if through memcmp / memcpy). Maybe NaNs around SAB need to be preserved when copied, and elsewhere not. Are there other downsides? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
littledan
Dec 4, 2016
Member
Not sure what you mean by 'around'--how could you preserve a scalar NaN just some of the time?
It sounds to me like saying that NaN values are scrambled (given arbitrary payloads) when writing into a TypedArray/ArrayBuffer would provide a correct description of behavior. I would be happy with this specification. However, @erights has argued that this would hurt programmers' ability to reason about their code.
|
Not sure what you mean by 'around'--how could you preserve a scalar NaN just some of the time? It sounds to me like saying that NaN values are scrambled (given arbitrary payloads) when writing into a TypedArray/ArrayBuffer would provide a correct description of behavior. I would be happy with this specification. However, @erights has argued that this would hurt programmers' ability to reason about their code. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
littledan
Dec 4, 2016
Member
@jfbastien Note that NaN payloads are preserved when inside a TypedArray, e.g. you can read the bytes out.
|
@jfbastien Note that NaN payloads are preserved when inside a TypedArray, e.g. you can read the bytes out. |
littledan commentedJul 7, 2016
V8 has several representations of Arrays based on different kinds of unboxed content, called ElementKind. One ElementKind is FAST_HOLEY_DOUBLE_ELEMENTS. With this kind, the Array may contain holes (missing properties which delegate to the prototype), but all non-hole elements are unboxed doubles (Numbers). The trick is to canonicalize NaNs when written into such an array, and use a particular NaN value for "the hole" to indicate that a lookup on the prototype is needed.
This implementation technique violates the ECMAScript specification because, in SetValueInBuffer, there is the following text:
I believe this text was written with the intent of creating the cross-cutting effect that NaNs should always carry around their payload, and only get a new payload when a "new" NaN is created. (We could use some better spec text here to back that intention with more clear mechanics, and I've promised to do that, except I want to get this other issue worked out first.) A big piece of motivation for that is to prevent information leaks, e.g., in writing properties to frozen objects.
I'm wondering, is there a way that we could weaken this requirement? V8's practice of canonicalizing NaNs just in order to write them into certain objects seems to improve our performance, and it seems harmless to me, as I don't know of any information leaks that we're creating.
For a strawman: what if we simply removed that sentence and let SetValueInBuffer write whatever NaN representation it felt like? Is there another way we could convince ourselves that information leaks won't happen?
cc @bakkot @anba @erights @leobalter @verwaest