Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upIssues with normative references to Unicode spec. #726
Comments
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
Good finds! |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
bterlson
Nov 14, 2016
Member
What wording do you think would need to be updated? I don't see any inbound references to sec-bibliography and dependent clauses refer to the reference by name (eg. "Unicode Standard").
|
What wording do you think would need to be updated? I don't see any inbound references to sec-bibliography and dependent clauses refer to the reference by name (eg. "Unicode Standard"). |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
bterlson
Nov 14, 2016
Member
I also note that all the non-Unicode Standard references are found in a note in String#localeCompare. What is indispensable about a clause that contains a note with a reference? Moving the Unicode Standard makes sense but I'm not sure about the others.
|
I also note that all the non-Unicode Standard references are found in a note in String#localeCompare. What is indispensable about a clause that contains a note with a reference? Moving the Unicode Standard makes sense but I'm not sure about the others. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
allenwb
Nov 14, 2016
Member
As noted in tc39/proposal-regexp-unicode-property-escapes#13 I noticed this while reviewing https://github.com/mathiasbynens/es-regexp-unicode-property-escapes which needs to added additional such references. Seems like a good reason to get a normative reference act together.
Other places where we have a missing or improper normative reference to a Unicode doc:
- https://tc39.github.io/ecma262/#sec-string.prototype.normalize
- https://tc39.github.io/ecma262/#sec-string.prototype.tolowercase
- https://tc39.github.io/ecma262/#sec-string.prototype.touppercase
- https://tc39.github.io/ecma262/#sec-runtime-semantics-canonicalize-ch
- probably multiple places in clause 11
|
As noted in tc39/proposal-regexp-unicode-property-escapes#13 I noticed this while reviewing https://github.com/mathiasbynens/es-regexp-unicode-property-escapes which needs to added additional such references. Seems like a good reason to get a normative reference act together. Other places where we have a missing or improper normative reference to a Unicode doc:
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
allenwb
Nov 14, 2016
Member
Well here is how the ISO version of the Unicode standards references such things:
Normalization forms are the mechanisms allowing the selection of a unique coded representation among alternative; but equivalent coded text representations of the same text. Normalization forms for use with this International Standard are specified in the Unicode Standard UAX#15 (see Clause 3) and shall be used in the context of this International Standard. There are four normalization forms:
and their clause 3 has the following normative reference:
Unicode Standard Annex, UAX #15, Unicode Normalization Forms:
http://www.unicode.org/reports/tr15/tr15-41.html.
I think the parenthetical "see Clause 3" is a little bit much. Overall, I think your formulation (UAX #15 Unicode Normalization Forms) is probably fine and a bit more useful.
|
Well here is how the ISO version of the Unicode standards references such things:
and their clause 3 has the following normative reference: I think the parenthetical "see Clause 3" is a little bit much. Overall, I think your formulation (UAX #15 Unicode Normalization Forms) is probably fine and a bit more useful. |
allenwb commentedNov 1, 2016
While reviewing some Unicode related proposals I developed some concerns about how ECMA-262 currently references Unicode related standards:
In ECMAScript 2016 we switch to an undated "current" edition usage of the Unicode standard. However, the normative reference in clause 3 is still a dated reference to "ISO/IEC 10646:2003" plus assorted amendments. Also, the title listed for that ISO standard is also obsolete. The normative reference should be:
ISO/IEC 10646 Information Technology — Universal Coded Character Set (UCS)
Unfortunately ISO/IEC 10646 published by ISO and The Unicode Standard published by the Unicode Consortium are different documents. There is material in the The Unicode Standard which is not included in
ISO/IEC 10646 but which is "indispensable" for the application of ECMA-262. However, neither the The Unicode Standard nor its relevant related documents are included in clause 3. Instead they are listed in ECMA-262 Bibliography.
The reason for this is that ECMA (and ISO) apparently prefer to only normatively reference documents published by ISO recognized organizations. Apparently the Unicode Consortium documents are in the Bibliography because it was assumed that they don't meet the criteria to be a normative reference. But this assumption is easy to disprove. As shown in the following image, ISO/IEC 10646 itself normatively references Unicode Consortium documents:
If ISO/IEC 10646 can normatively reference Unicode Consortium documents then ECMA-262 also can. Subclauses 11.6 and 21.1.3.10 and perhaps other subclauses have "indispensable" dependencies upon Unicode Consortium documents that are currently in the Bibliography. The indispensable documents should be moved to Clause 3 and the language in the dependent subclauses may need to be adjusted accordingly.