Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Editorial: clarify URL validity #666

Merged
merged 2 commits into from
Dec 9, 2022
Merged

Editorial: clarify URL validity #666

merged 2 commits into from
Dec 9, 2022

Conversation

annevk
Copy link
Member

@annevk annevk commented Oct 21, 2021

Closes #595.


Preview | Diff


Preview | Diff

@annevk annevk requested review from domenic and rmisev October 21, 2021 11:44
url.bs Outdated Show resolved Hide resolved
url.bs Outdated
@@ -1170,7 +1170,9 @@ unified model would be, please file an issue.

<li><p>The <a>URL serializer</a> takes a <a for=/>URL</a> and returns an <a>ASCII string</a>. (If
that string is then <a lt="URL parser">parsed</a>, the result will <a for=url>equal</a> the <a
for=/>URL</a> that was <a lt="URL serializer">serialized</a>.)
for=/>URL</a> that was <a lt="URL serializer">serialized</a>.) The output of the
<a>URL serializer</a> is not always a <a>valid URL string</a>. I.e., not all <a for=/>URLs</a> are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth adding a pointer to #379, because I am still hoping we can change this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't think it's useful that \ in HTTP URLs shows up as something you probably want to fix?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, we can rehash that thread here if you want... yes, I think the serializer should always produce valid URLs, either by expanding the definition of valid, or changing the serializer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry, \ is not applicable then I think. I got this confused with the people who think all inputs ought to be valid or rejected. Whereas you don't necessarily think all inputs ought to be valid or rejected, but the invalid inputs that are accepted, ought to be transformed to something valid when they are spit out again.

So yeah, the reason for that is mainly encouraging RFC 3986 interop. But I'm not sure anyone is really appreciative of that.

url.bs Outdated
@@ -1160,7 +1160,7 @@ unified model would be, please file an issue.

<ul>
<li><p>The <a>URL parser</a> takes an arbitrary string and returns either failure or a
<a for=/>URL</a>.
<a for=/>URL</a>. It might also record zero or more <a>validation errors</a>.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know if

"URL parser records zero validation errors" implies "input string is a valid URL string"?

How about the other direction?

It would be great to clarify the purpose of these validation errors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not, and I strongly suspect they are not equivalent. I think there are some open issues on it.

My preferred strategy has been to instrument whatwg-url with both modes of validation and fuzz to find examples where they mismatch. I haven't made the time to do so yet though.

@annevk annevk requested a review from domenic December 9, 2022 10:28
@annevk annevk added the topic: validation Pertaining to the rules for URL writing and validity (as opposed to parsing) label Dec 9, 2022
@annevk annevk merged commit 2885626 into main Dec 9, 2022
@annevk annevk deleted the annevk/valid branch December 9, 2022 10:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: validation Pertaining to the rules for URL writing and validity (as opposed to parsing)
Development

Successfully merging this pull request may close these issues.

Parsing square brackets ([]) in path, query, and fragment
3 participants