Skip to content

IDNA: avoid defining valid domain string in terms of the parser #245

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
annevk opened this issue Feb 10, 2017 · 3 comments
Open

IDNA: avoid defining valid domain string in terms of the parser #245

annevk opened this issue Feb 10, 2017 · 3 comments
Labels
i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. topic: idna topic: validation Pertaining to the rules for URL writing and validity (as opposed to parsing)

Comments

@annevk
Copy link
Member

annevk commented Feb 10, 2017

This is basically something we need to raise again with the IDNA folks as their document does not really address it. This used to be tracked by https://www.w3.org/Bugs/Public/show_bug.cgi?id=25334.

As part of fixing this we should make it clear they are at least ASCII case-insensitive.

@annevk annevk added i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. topic: model For issues with the abstract-but-normative bits topic: validation Pertaining to the rules for URL writing and validity (as opposed to parsing) and removed topic: model For issues with the abstract-but-normative bits labels May 4, 2020
@whatwg whatwg deleted a comment Nov 20, 2022
@whatwg whatwg deleted a comment from Boody6343 Nov 21, 2022
@dubzzz
Copy link

dubzzz commented Sep 6, 2023

Maybe I'm wrong but aren't valid domains defined in the RFCs below?

The first one saying:

<domain> ::= <subdomain> | " "
<subdomain> ::= <label> | <subdomain> "." <label>
<label> ::= <letter> [ [ <ldh-str> ] <let-dig> ]
<ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str>
<let-dig-hyp> ::= <let-dig> | "-"
<let-dig> ::= <letter> | <digit>

The second one somehow saying that we can start a domain with a figure.

@annevk
Copy link
Member Author

annevk commented Sep 6, 2023

It's not entirely wrong, but those definitions don't account for IDNA and also don't seem to account for ASCII code points that happen to work in practice, such as _. What we want is a definition that does account for that, for which, when the host parser defined in the URL standard is applied to it, the output is not failure.

@pspacek
Copy link

pspacek commented Feb 17, 2025

Hi,

a DNS guy here. Allow me to describe this from DNS perspective:

Maybe I'm wrong but aren't valid domains defined in the RFCs below?

* https://www.ietf.org/rfc/rfc1034.txt

* https://www.ietf.org/rfc/rfc1123.txt

The first one saying:

<domain> ::= <subdomain> | " "

Indeed that is wrong in a subtle way. This quote comes from section
3.5. Preferred name syntax of RFC 1034 -with emphasis on preferred.

The real limits of the DNS protocol are made clear here:
11. Name syntax in RFC 2181. TL;DR anything goes, including binary 0 (ASCII NUL) and .. These weird-but-permissible-in-DNS names are then encoded into ASCII strings like \000\..example.com. where the leftmost label is consists of two ASCII characters:

  • NUL
  • . - which is a character inside the leftmost label, not a label separator

We could argue URL should be concerned only with host names (as opposed to domains) and then the quote might more fitting, but that ignores IDNA completely. RFC5890 defines stricter subset of permissible names in ASCII encoding...

I'm happy to discuss further if there's interest!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. topic: idna topic: validation Pertaining to the rules for URL writing and validity (as opposed to parsing)
Development

No branches or pull requests

3 participants