Require Unicode 8.0.0 #300

Closed
wants to merge 1 commit into
from

Projects

None yet

3 participants

@littledan
Contributor

Interpretation of some basic things like whitespace changed after
Unicode 5.1. This patch requires the latest Unicode standard.

@littledan
Contributor

A note with respect to that thread: Chrome runs on Windows 7, but it supports Unicode 8.0.0 (with a couple exceptions in V8 using 7.0, but a fix is in progress). I think we should be OK with saying that software which doesn't update to pretty recent Unicode versions isn't implementing the latest ECMAScript spec. @bterlson What do you think, as Microsoft?

@bterlson
Member

I am a fan of this personally, although I have concerns with how well Chakra will be able to support this as we depend on platform in many cases. Will dig into this more. In the meantime, this is a simple change and yet is something that is discussed in committee. We can get a quick sign off without going through the normal proposal process I bet.

@mathiasbynens mathiasbynens referenced this pull request in whatwg/javascript Feb 8, 2016
Closed

Remove Unicode version requirement #28

@littledan
Contributor

At the January 2016 TC39 meeting, we reached consensus in support of this proposal. Is anything else needed to merge this? I fixed the merge conflict.

@mathiasbynens
Contributor

Relevant meeting notes: https://github.com/rwaldron/tc39-notes/blob/master/es7/2016-01/2016-01-26.md#unicode-fix-httpsgithubcomtc39ecma262pull300-de

It’s probably intentional, but just to make sure this is not being overlooked — this patch leaves the following section intact: https://tc39.github.io/ecma262/#sec-white-space

<p>ECMAScript implementations must recognize as <emu-nt><a href="#prod-WhiteSpace">WhiteSpace</a></emu-nt> code points listed in the “Separator, space” (Zs) category by Unicode 5.1. ECMAScript implementations may also recognize as <emu-nt><a href="#prod-WhiteSpace">WhiteSpace</a></emu-nt> additional category Zs code points from subsequent editions of the Unicode Standard.</p>

i.e. WhiteSpace is still based on Unicode 5.1.0 + Unicode 8 or later, meaning U+180E is considered whitespace. I’m not sure if this is a necessity for backwards compatibility.

@littledan
Contributor

Oh, I missed that section. Actually, this patch was partly motivated by getting U+180E out of whitespace! This was all discussed pretty explicitly at the meeting, so I'll upload a new patch with that modified.

@mathiasbynens
Contributor

👍

@mathiasbynens mathiasbynens added a commit to mathiasbynens/regexpu-fixtures that referenced this pull request Feb 8, 2016
@mathiasbynens mathiasbynens Only use Unicode v8.0.0 for whitespace
ECMAScript 6 required Unicode v5.1.0 `Zs` symbols to be recognized as whitespace in addition to any `Zs` symbols in whatever Unicode version the engine implemented.

Per tc39/ecma262#300 this is no longer the case in ES2016. 🎉

The only observable change is that U+180E is no longer considered whitespace.
b3f7ff4
@mathiasbynens mathiasbynens added a commit to mathiasbynens/regexpu-core that referenced this pull request Feb 8, 2016
@mathiasbynens mathiasbynens Only use Unicode v8.0.0 for whitespace
ECMAScript 6 required Unicode v5.1.0 `Zs` symbols to be recognized as whitespace in addition to any `Zs` symbols in whatever Unicode version the engine implemented.

Per tc39/ecma262#300 this is no longer the case in ES2016. 🎉

The only observable change is that U+180E is no longer considered whitespace.
9b10d2a
@littledan littledan Require Unicode 8.0.0
Interpretation of some basic things like whitespace changed after
Unicode 5.1. This patch requires the latest Unicode standard.
536f361
@bterlson
Member

Committed as 24dad16. Thanks @littledan!

@bterlson bterlson closed this Feb 10, 2016
@littledan
Contributor

Thanks for the reviews and landing, everyone!

@mathiasbynens mathiasbynens added a commit to whatwg/javascript that referenced this pull request Feb 11, 2016
@mathiasbynens mathiasbynens Remove Unicode database version requirement
It’s now part of the ECMAScript spec: tc39/ecma262#300
28eea7e
@mathiasbynens mathiasbynens added a commit to whatwg/javascript that referenced this pull request Feb 11, 2016
@mathiasbynens mathiasbynens Remove Unicode database version requirement
It’s now part of the ECMAScript spec: tc39/ecma262#300

Closes #28.
4f1a517
@mathiasbynens mathiasbynens added a commit to mathiasbynens/test262 that referenced this pull request Jun 29, 2016
@mathiasbynens mathiasbynens Ensure U+180E is no longer considered whitespace b15967a
@mathiasbynens mathiasbynens added a commit to mathiasbynens/test262 that referenced this pull request Jun 29, 2016
@mathiasbynens mathiasbynens Ensure U+180E is no longer considered whitespace 68a5e61
@mathiasbynens mathiasbynens added a commit to mathiasbynens/test262 that referenced this pull request Jun 29, 2016
@mathiasbynens mathiasbynens Ensure U+180E is no longer considered whitespace 6184563
@dilijev dilijev added a commit to dilijev/ChakraCore that referenced this pull request Dec 5, 2016
@dilijev dilijev Removed U+180E MONGOLIAN VOWEL SEPARATOR from Whitespace classification.
This character was recategorized from Zs to Cf in Unicode 6.3, and remains categorized as such in Unicode 9.0 (current target version). The current version of ECMAScript requires that only characters classed as Zs in the target version of Unicode be recognized as Whitespace. This change is now reflected in Test262 so making this change will improve our Test262 score rather than regress it.

Additionally, update comments about location of UnicodeData.txt and about the definition of Whitespace characters.

See: tc39/ecma262#300

See: tc39/test262@3a5a09e

See: mathiasbynens/regexpu-core@9b10d2a

Fixes #2120
b1a37f6
@dilijev dilijev added a commit to dilijev/ChakraCore that referenced this pull request Dec 5, 2016
@dilijev dilijev Removed U+180E MONGOLIAN VOWEL SEPARATOR from Whitespace classification.
This character was recategorized from Zs to Cf in Unicode 6.3, and remains categorized as such in Unicode 9.0 (current target version). The current version of ECMAScript requires that only characters classed as Zs in the target version of Unicode be recognized as Whitespace. This change is now reflected in Test262 so making this change will improve our Test262 score rather than regress it.

Additionally, update comments about location of UnicodeData.txt and about the definition of Whitespace characters.

See: tc39/ecma262#300

See: tc39/test262@3a5a09e

See: mathiasbynens/regexpu-core@9b10d2a

Fixes #2120
7c097b6
@chakrabot chakrabot pushed a commit to Microsoft/ChakraCore that referenced this pull request Dec 7, 2016
@dilijev dilijev [MERGE #2121 @dilijev] Removed U+180E MONGOLIAN VOWEL SEPARATOR from …
…Whitespace classification.

Merge pull request #2121 from dilijev:regex-ws

This character was recategorized from Zs to Cf in Unicode 6.3, and remains categorized as such in Unicode 9.0 (current target version). The current version of ECMAScript requires that only characters classed as Zs in the target version of Unicode be recognized as Whitespace. This change is now reflected in Test262 so making this change will improve our Test262 score rather than regress it.

Additionally, update comments about location of UnicodeData.txt and about the definition of Whitespace characters.

See: tc39/ecma262#300

See: tc39/test262@3a5a09e

See: mathiasbynens/regexpu-core@9b10d2a

Fixes #2120
04074af
@chakrabot chakrabot pushed a commit to Microsoft/ChakraCore that referenced this pull request Dec 7, 2016
@dilijev dilijev [1.4>master] [MERGE #2121 @dilijev] Removed U+180E MONGOLIAN VOWEL SE…
…PARATOR from Whitespace classification.

Merge pull request #2121 from dilijev:regex-ws

This character was recategorized from Zs to Cf in Unicode 6.3, and remains categorized as such in Unicode 9.0 (current target version). The current version of ECMAScript requires that only characters classed as Zs in the target version of Unicode be recognized as Whitespace. This change is now reflected in Test262 so making this change will improve our Test262 score rather than regress it.

Additionally, update comments about location of UnicodeData.txt and about the definition of Whitespace characters.

See: tc39/ecma262#300

See: tc39/test262@3a5a09e

See: mathiasbynens/regexpu-core@9b10d2a

Fixes #2120
361cd68
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment