New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with normative references to Unicode spec. #726

Open
allenwb opened this Issue Nov 1, 2016 · 6 comments

Comments

Projects
None yet
2 participants
@allenwb
Member

allenwb commented Nov 1, 2016

While reviewing some Unicode related proposals I developed some concerns about how ECMA-262 currently references Unicode related standards:

  • Normative reference to ISO/IEC10646 in clause 3 should be undated

In ECMAScript 2016 we switch to an undated "current" edition usage of the Unicode standard. However, the normative reference in clause 3 is still a dated reference to "ISO/IEC 10646:2003" plus assorted amendments. Also, the title listed for that ISO standard is also obsolete. The normative reference should be:

ISO/IEC 10646 Information Technology — Universal Coded Character Set (UCS)

  • Some "indispensable" documented are referenced in the Bibliography rather than in clause 3

Unfortunately ISO/IEC 10646 published by ISO and The Unicode Standard published by the Unicode Consortium are different documents. There is material in the The Unicode Standard which is not included in
ISO/IEC 10646 but which is "indispensable" for the application of ECMA-262. However, neither the The Unicode Standard nor its relevant related documents are included in clause 3. Instead they are listed in ECMA-262 Bibliography.

The reason for this is that ECMA (and ISO) apparently prefer to only normatively reference documents published by ISO recognized organizations. Apparently the Unicode Consortium documents are in the Bibliography because it was assumed that they don't meet the criteria to be a normative reference. But this assumption is easy to disprove. As shown in the following image, ISO/IEC 10646 itself normatively references Unicode Consortium documents:

4th-10646-00-main_pdf__page_10_of_146_

If ISO/IEC 10646 can normatively reference Unicode Consortium documents then ECMA-262 also can. Subclauses 11.6 and 21.1.3.10 and perhaps other subclauses have "indispensable" dependencies upon Unicode Consortium documents that are currently in the Bibliography. The indispensable documents should be moved to Clause 3 and the language in the dependent subclauses may need to be adjusted accordingly.

@bterlson

This comment has been minimized.

Show comment
Hide comment
@bterlson

bterlson Nov 2, 2016

Member

Good finds!

Member

bterlson commented Nov 2, 2016

Good finds!

@bterlson

This comment has been minimized.

Show comment
Hide comment
@bterlson

bterlson Nov 14, 2016

Member

What wording do you think would need to be updated? I don't see any inbound references to sec-bibliography and dependent clauses refer to the reference by name (eg. "Unicode Standard").

Member

bterlson commented Nov 14, 2016

What wording do you think would need to be updated? I don't see any inbound references to sec-bibliography and dependent clauses refer to the reference by name (eg. "Unicode Standard").

@bterlson

This comment has been minimized.

Show comment
Hide comment
@bterlson

bterlson Nov 14, 2016

Member

I also note that all the non-Unicode Standard references are found in a note in String#localeCompare. What is indispensable about a clause that contains a note with a reference? Moving the Unicode Standard makes sense but I'm not sure about the others.

Member

bterlson commented Nov 14, 2016

I also note that all the non-Unicode Standard references are found in a note in String#localeCompare. What is indispensable about a clause that contains a note with a reference? Moving the Unicode Standard makes sense but I'm not sure about the others.

@allenwb

This comment has been minimized.

Show comment
Hide comment
@allenwb

allenwb Nov 14, 2016

Member

As noted in tc39/proposal-regexp-unicode-property-escapes#13 I noticed this while reviewing https://github.com/mathiasbynens/es-regexp-unicode-property-escapes which needs to added additional such references. Seems like a good reason to get a normative reference act together.

Other places where we have a missing or improper normative reference to a Unicode doc:

Member

allenwb commented Nov 14, 2016

As noted in tc39/proposal-regexp-unicode-property-escapes#13 I noticed this while reviewing https://github.com/mathiasbynens/es-regexp-unicode-property-escapes which needs to added additional such references. Seems like a good reason to get a normative reference act together.

Other places where we have a missing or improper normative reference to a Unicode doc:

@bterlson

This comment has been minimized.

Show comment
Hide comment
@bterlson

bterlson Nov 14, 2016

Member

@allenwb in terms of wording updates after those references are moved to normative references, what do you want to see? Is it fine to refer to it by standard name (eg. "UAX #15 Unicode Normalization Forms") if that wording is used under the normative references clause?

Member

bterlson commented Nov 14, 2016

@allenwb in terms of wording updates after those references are moved to normative references, what do you want to see? Is it fine to refer to it by standard name (eg. "UAX #15 Unicode Normalization Forms") if that wording is used under the normative references clause?

@allenwb

This comment has been minimized.

Show comment
Hide comment
@allenwb

allenwb Nov 14, 2016

Member

Well here is how the ISO version of the Unicode standards references such things:

Normalization forms are the mechanisms allowing the selection of a unique coded representation among alternative; but equivalent coded text representations of the same text. Normalization forms for use with this International Standard are specified in the Unicode Standard UAX#15 (see Clause 3) and shall be used in the context of this International Standard. There are four normalization forms:

and their clause 3 has the following normative reference:
Unicode Standard Annex, UAX #15, Unicode Normalization Forms:
http://www.unicode.org/reports/tr15/tr15-41.html.

I think the parenthetical "see Clause 3" is a little bit much. Overall, I think your formulation (UAX #15 Unicode Normalization Forms) is probably fine and a bit more useful.

Member

allenwb commented Nov 14, 2016

Well here is how the ISO version of the Unicode standards references such things:

Normalization forms are the mechanisms allowing the selection of a unique coded representation among alternative; but equivalent coded text representations of the same text. Normalization forms for use with this International Standard are specified in the Unicode Standard UAX#15 (see Clause 3) and shall be used in the context of this International Standard. There are four normalization forms:

and their clause 3 has the following normative reference:
Unicode Standard Annex, UAX #15, Unicode Normalization Forms:
http://www.unicode.org/reports/tr15/tr15-41.html.

I think the parenthetical "see Clause 3" is a little bit much. Overall, I think your formulation (UAX #15 Unicode Normalization Forms) is probably fine and a bit more useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment