Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define hosts' public suffix and registrable domain. #391

Merged
merged 7 commits into from
Jun 7, 2018
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
118 changes: 118 additions & 0 deletions url.bs
Original file line number Diff line number Diff line change
Expand Up @@ -272,6 +272,124 @@ for further processing.
U+0020 SPACE, U+0023 (#), U+0025 (%), U+002F (/), U+003A (:), U+003F (?), U+0040 (@), U+005B ([),
U+005C (\), or U+005D (]).

<p>A <a for=/>host</a>'s <dfn for=host export>public suffix</dfn> is the portion of a
<a for=/>host</a> which is included on the <cite>Public Suffix List</cite>. To obtain
<var>host</var>'s <a for=host>public suffix</a>, run these steps: [[!PSL]]

<ol>
<li><p>If <var>host</var> is not a <a>domain</a>, then return null.

<li><p>Return the <a for=host>public suffix</a> obtained by executing the
<a href="https://publicsuffix.org/list/">algorithm</a> defined by the Public Suffix List on
<var>host</var>. [[!PSL]].
</ol>

<p>A <a for=/>host</a>'s <dfn for=host export>registrable domain</dfn> is a <a>domain</a> formed by
the most specific public suffix, along with the domain label immediately preceeding it, if any. To
obtain <var>host</var>'s <a for=host>registrable domain</a>, run these steps:

<ol>
<li><p>If <var>host</var>'s <a for=host>public suffix</a> is null or <var>host</var>'s
<a for=host>public suffix</a> <a for=host>equals</a> <var>host</var>, then return null.

<li><p>Return the <a for=host>registrable domain</a> obtained by executing the
<a href="https://publicsuffix.org/list/">algorithm</a> defined by the Public Suffix List on
<var>host</var>. [[!PSL]]
</ol>

<div class=example id=example-host-psl>
<table>
<tr>
<th>Host input
<th>Public suffix
<th>Registrable domain
<tr>
<td><code>com</code>
<td><code>com</code>
<td><i>null</i>
<tr>
<td><code>example.com</code>
<td><code>com</code>
<td><code>example.com</code>
<tr>
<td><code>www.example.com</code>
<td><code>com</code>
<td><code>example.com</code>
<tr>
<td><code>sub.www.example.com</code>
<td><code>com</code>
<td><code>example.com</code>
<tr>
<td><code>EXAMPLE.COM</code>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a host, but input to the host parser.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's helpful to point out that no matter how folks spell the URL, it's going to be normalized. Perhaps shifting this table to include a URL rather than a host would make that point, especially for the punycode bits?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to just list hosts, but we should label it "host input" or some such, to not confuse it with host as a concept, which is already parsed and normalized.

<td><code>com</code>
<td><code>example.com</code>
<tr>
<td><code>github.io</code>
<td><code>github.io</code>
<td><i>null</i>
<tr>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this row duplicated? The previous one looks the same.

<td><code>whatwg.github.io</code>
<td><code>github.io</code>
<td><code>whatwg.github.io</code>
<tr>
<td><code>إختبار</code>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. And also applies below.

<td><code>xn-kgbechtv</code>
<td><i>null</i>
<tr>
<td><code>example.إختبار</code>
<td><code>xn-kgbechtv</code>
<td><code>example.xn-kgbechtv</code>
Copy link

@sleevi sleevi Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So one of the things is the PSL doesn't specify whether or not it returns U-Label or A-Label (that's left to the implementation). I'm curious the documentation here for the A-Label - is this an expectation of the contract?

That is, are you trying to show that either U-Label or A-Label can be returned regardless of U-Label or A-Label input, or are you trying to state that A-Labels should be the consistent return?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we don't rely on this anywhere (assuming it's consistent to be one or the other, is that at least required?), but A-label seems preferable as that'd be consistent with how the platform exposes URLs and origins overall.

I suspect this will only matter if we add an API, but it really depends on whether PSL dependencies keep getting added or not.

<tr>
<td><code>sub.example.إختبار</code>
<td><code>xn-kgbechtv</code>
<td><code>example.xn-kgbechtv</code>
</table>
</div>

<p>Two <a for=/>hosts</a>, <var>A</var> and <var>B</var> are said to be
<dfn for=host export>same site</dfn> with each other if either of the following statements are true:

<ul class=brief>
<li><p><var>A</var> <a for=host>equals</a> <var>B</var> and <var>A</var>'s
<a for=host>registrable domain</a> is non-null.

<li><p><var>A</var>'s <a for=host>registrable domain</a> is <var>B</var>'s
<a for=host>registrable domain</a> and is non-null.
</ul>

<div class=example id=example-same-site>
<p>Assuming that <code>suffix.example</code> is a <a for=host>public suffix</a> and that
<code>example.com</code> is not:

<ul>
<li><p><code>example.com</code>, <code>sub.example.com</code>, <code>other.example.com</code>,
<code>sub.sub.example.com</code>, and <code>sub.other.example.com</code> are all <a>same site</a>
with each other (and themselves), as their <a for=host>registrable domains</a> are
<code>example.com</code>.

<li><p><code>registrable.suffix.example</code>, <code>sub.registrable.suffix.example</code>,
<code>other.registrable.suffix.example</code>, <code>sub.sub.registrable.suffix.example</code>,
and <code>sub.other.registrable.suffix.example</code> are all <a>same site</a> with each other
(and themselves), as their <a for=host>registrable domains</a> are
<code>registrable.suffix.example</code>.

<li><p><code>example.com</code> and <code>registrable.suffix.example</code> are not
<a>same site</a> with each other, as their <a for=host>registrable domains</a> differ.

<li><p><code>suffix.example</code> is not <a>same site</a> with <code>suffix.example</code>, as
it is a <a for=host>public suffix</a>, and therefore has a null
<a for=host>registrable domain</a>.
</ul>
</div>

<p class=warning>Specifications should avoid depending on "<a for=host>public suffix</a>",
"<a for=host>registrable domain</a>", and "<a>same site</a>". The public suffix list will diverge
from client to client, and cannot be relied-upon to provide a hard security boundary. Specifications
which ignore this advice are encouraged to carefully consider whether URLs' schemes ought to be
incorporated into any decision made based upon whether or not two <a for=/>hosts</a> are
<a>same site</a>. HTML's <a>same origin-domain</a> concept is a reasonable example of this
consideration in practice.


<h3 id=idna>IDNA</h3>

Expand Down