IDNA #53

annevk opened this Issue Jul 30, 2015 · 14 comments


None yet

7 participants

annevk commented Jul 30, 2015

This issue tracks faults in since Unicode doesn't really do that well. If you find an issue, use to report it and then report back here.


I’ve just submitted the following to I’ll update this when I get a response.

Subject: xn-- prefix never added in UTS # 46

In , the algorithm looks for a xn-- prefix and decodes the rest of the label per Punycode when it is present.

In however, the xn-- prefix is never added:

Convert each label with non-ASCII characters into Punycode [RFC3492]. This may record an error.

This should probably be replaced with something like:

For each label with non-ASCII characters, replace the label with “xn--” followed by the encoding of the label according to Punycode [RFC3492]. This may record an error.

@SimonSapin SimonSapin referenced this issue in servo/rust-url Jul 30, 2015

IDNA support #119


My report to Unicode from some time ago which seems to not be fixed yet:

The Format section (8.1) under Conformance Testing in UTS46 is confusing.

The explanation for the toASCII and toUnicode explains to use the provided processing_option for toUnicode, and always use nontransitional for toASCII.
However, in the implementation section of toUnicode (4.3), it explains to always call the processing step with nontransitional. The toASCII parameter list provides a processing_option, though.

It looks to me, as if the descriptions for toASCII and toUnicode in the conformance testing section got mixed up. This also applies to the descriptions in the header of IdnaTest.txt.

The other thing is that there's only a single IdnaTest file, but there's no explanation to which algorithm it applies. Is it for IDNA2008, IDNA2003 or UTS46? It seems to be categorized according to Unicode standard instead of IDNA reference, which makes this really confusing. Haven't reported that one yet though.


@Sebmaster regarding the other thing, explains how "To test for conformance to UTS46" using IdnaTest.txt.


@SimonSapin I'm not sure that's totally correct either since:

Bn for Bidi Rule #n from Section 2. The Bidi Rule, in Right-to-Left Scripts for Internationalized Domain Names for Applications (IDNA) [IDNA2008]
Cn for ContextJ tests in, Appendix A.n in The Unicode Code Points and Internationalized Domain Names for Applications (IDNA) [IDNA2008]. Thus C1 = Appendix A.1. ZERO WIDTH NON-JOINER, and C2 = Appendix A.2. ZERO WIDTH JOINER. The CONTEXTO tests are optional for client software, and not tested here.

is not described in TR46 at all. It's imported from the IDNA2008 standard, which has no relevance in the TR46 spec... I think 😕


Got a mail today from Unicode (regarding conformance test description):

This was discussed at the UTC meeting in July, and has been forwarded to the author of the UTS for consideration in a subsequent version.

So that's pretty sweet.


Oh yeah, I came back into this and recall that the IdnaTest.txt is really bad at telling you how to process it.

The ToASCII column uses nontransitional processing (read IdnaTest.txt's commented header) and UseSTD3ASCIIRules=true (see §8 of the input). However, they definitely appear to have some extra rules not described in their algorithm (for example, ToUnicode should never produce an [A4_1] or [A4_2] error, since those are specific to the ToASCII regime and ToUnicode never calls ToASCII, yet you can clearly see for yourself that they do).


I got a response to #53 (comment):

This has been added to the feedback document for next week's meeting.


… and today:

I was directed by the UTC to let you know that this has been sent to the editor for review during the next update cycle.


As per servo/rust-url#160 I submitted feedback regarding Validation rule no. 2 - "2. The label must not contain a U+002D HYPHEN-MINUS character in both the third and fourth positions."
This isn't being enforced by all UAs, as it's being used on youtube which uses domains such as This domain breaks that rule.

@domenic domenic referenced this issue in jsdom/whatwg-url Apr 15, 2016

Bug in parsing URLs #50

srl295 commented May 12, 2016 edited

@valenting Your feedback is tracked as part of PRI317 (being discussed now).

By the way @SimonSapin I'd think the right way to track is via UTC agenda items


It seems like Unicode has closed that ticket without removing the -- validity requirement 😞

Does anybody have the ability to look into the Unicode ... process to see what's going on there?

@annevk annevk added the parser label Dec 20, 2016
domenic commented Jan 6, 2017 edited

The validation rule problem mentioned at #53 (comment) doesn't seem to have made it into by my reading. What's the latest? It's item A, nevermind

@bagder bagder referenced this issue Jan 30, 2017

IDNA2008 #223

@annevk annevk added the idna label Feb 10, 2017
annevk commented Feb 13, 2017

Going forward, rather than tracking all UTS 46 feedback here, I suggest we just create new issues against this repository, so we can discuss each problem in isolation. I created an idna label that we can use to group them all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment