Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can a registrable domain or public suffix have a trailing dot? #693

Closed
annevk opened this issue Apr 29, 2022 · 5 comments · Fixed by #694
Closed

Can a registrable domain or public suffix have a trailing dot? #693

annevk opened this issue Apr 29, 2022 · 5 comments · Fixed by #694
Labels
security/privacy There are security or privacy implications topic: model For issues with the abstract-but-normative bits

Comments

@annevk
Copy link
Member

annevk commented Apr 29, 2022

Normally example.com. and example.com are considered distinct origins, but https://github.com/publicsuffix/list/wiki/Format#formal-algorithm suggests

A domain or rule can be split into a list of labels using the separator "." (dot). The separator is not part of any of the labels. Empty labels are not permitted, meaning that leading and trailing dots are ignored.

they might be considered same-site? That seems wrong to me and I would expect a trailing dot to be preserved as we preserve it everywhere else.

Neither URL nor HTML has trailing dot examples so I'm not sure if we explicitly considered this before.

cc @mikewest @sleevi @valenting @achristensen07 @dnsguru

(If the trailing dot is indeed significant follow-up is needed in at least httpwg/http-extensions#1758 and Fetch's "localhost" check, which should then also consider "localhost.".)

@annevk annevk added editorial Changes that do not affect how the standard is understood. security/privacy There are security or privacy implications topic: model For issues with the abstract-but-normative bits and removed editorial Changes that do not affect how the standard is understood. labels Apr 29, 2022
@sleevi
Copy link

sleevi commented Apr 29, 2022

they might be considered same-site?

Curious, why so? That is, I would think the PSL storage algorithm independent of the site processing algorithm, but is there a dependency?

That seems wrong to me and I would expect a trailing dot to be preserved as we preserve it everywhere else.

Yes, I would expect it preserved, even if the storage format of the PSL is that all entries are implicitly relative to the root (i.e. presumed to have a trailing dot, even if not stored as such, nor reflected as a transformation in output)

@annevk
Copy link
Member Author

annevk commented Apr 29, 2022

Well, one way of reading the above is that if you pass in example.com., its public suffix is com, not com.. (I'm not sure if that's what led to the error in Fetch, per whatwg/fetch#1257 it might just be oversight, but I justified it not listing "localhost." to myself at least once because of that.)

@sleevi
Copy link

sleevi commented Apr 29, 2022

Totally agree that there is an opportunity for wording clarification - I hope it doesn’t seem that I’m dismissive of the problem statement. And yes, I acknowledge that’s one way to read the algorithm.

Another way to read it is that the rules are all relative to the root, therefore don’t need . encoded or while processing, but that the caller of the algorithm can (and in the case of browser implementations, AIUI, does) preserve the trailing . from inputs, since the trailing . is significant to resolver APIs and (in the case of unencrypted connections) the logical origin.

That is:

  • The PSL always stores in root relative form
  • The algorithm itself processes all labels but the empty root label of an input
  • for both example.com and example.com., the output of the PSL algorithm is com
  • The effective TLD is formed based on the input - e.g. if the input was example.com., then the root label would be appended to the output of the PSL algorithm to get the effective TLD of com.

Or, put differently, the length of the eTLD in the input is len(psl_output) + (input[len] == '.' ? 1 : 0)

@sleevi
Copy link

sleevi commented Apr 29, 2022

@annevk
Copy link
Member Author

annevk commented Apr 29, 2022

Thanks, so I think what we should do here is update the two caller algorithms in https://url.spec.whatwg.org/#host-miscellaneous to preserve a trailing dot in the input in the output. And on top of that we should at least add one example that illustrates trailing dot preservation and ideally also update the same site examples with one example to illustrate the mismatch.

And after that follow-up on the parenthetical in OP.

annevk added a commit that referenced this issue May 2, 2022
Also update the PSL algorithm reference.

Fixes #692 and fixes #693.
annevk added a commit to whatwg/fetch that referenced this issue May 2, 2022
annevk added a commit to whatwg/fetch that referenced this issue May 3, 2022
annevk added a commit that referenced this issue May 3, 2022
Also update the PSL algorithm reference.

Fixes #692 and fixes #693.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
security/privacy There are security or privacy implications topic: model For issues with the abstract-but-normative bits
Development

Successfully merging a pull request may close this issue.

2 participants