Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disallow all C0 control and DEL codepoints in hosts #673

Closed
wants to merge 1 commit into from
Closed

Disallow all C0 control and DEL codepoints in hosts #673

wants to merge 1 commit into from

Conversation

alwinb
Copy link
Contributor

@alwinb alwinb commented Nov 22, 2021

A first pass at fixing #627 and #319

  • At least two implementers are interested (and none opposed):
    • Safari
    • Chrome already implement this behaviour
  • Tests are written and can be reviewed and commented upon at:
  • Implementation bugs are filed:
    • Chrome: …
    • Firefox: …
    • Safari: …

This adds the C0 controls and delete to the forbidden host codepoints, as discussed in #627.
It also adapts the opaque-host-parser to percent-encode rather than fail on these code points.

For this I added an opaque-host percent encode set that refers to the forbidden host codepoints.
Please review.


Preview | Diff

Copy link
Member

@annevk annevk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer not adding percent-encoding of additional code points to opaque hosts at the same time. I.e., restrict this change to additional failure cases of domains.

What I was envisioning is that we have two constants:

  • Forbidden host code points (this can be renamed to forbidden domain code points). We would modify this as you did.
  • Forbidden opaque-host code points (this would be the old forbidden host code points).

@karwa
Copy link
Contributor

karwa commented Dec 24, 2021

One thing I'd like to add (before I'm too full of "festive cheer" and forget it):

Failing on certain forbidden host code-points can unlock optimisation opportunities. For example, currently when parsing the hostname portion of a URL string, we have to precisely track whether we're inside or outside of a square bracket (in order to determine whether a colon is part of an IPv6 address or is a hostname/port delimiter). This slows down all URL strings. I've been experimenting with an approach that simply splits on the first colon after a closing square bracket.

So for the hostname "][::][:80", the split is:

Method Host Port
Std "][::][" "80"
Mine "][" ":][:80"

Currently that's fine (I think?). Since the hostname doesn't begin with an opening bracket, it won't be parsed as IPv6, and ultimately the opaque host parser will reject them both because square brackets are forbidden. If we changed to percent-encode forbidden code-points, this input would potentially be accepted.

Perhaps this could be limited to the setter function?

@alwinb
Copy link
Contributor Author

alwinb commented Dec 27, 2021

Ok. So it makes sense to investigate this more thoroughly; how percent coding opaque hosts could/ should interact with the ipv6 address syntax, and make it a separate issue (as was suggested).

Maybe we can take care of that first? I’m fine with handling this one first, but I myself just can’t really get behind having two distinct forbidden host codepoints sets… it may just be a minor thing, I realise. But still…

Is failing on all controls in opaque hosts really not an option? Maybe there are there solutions?

@annevk
Copy link
Member

annevk commented Jan 25, 2022

I expect we'll take #685 at which point we should probably close this.

@alwinb
Copy link
Contributor Author

alwinb commented Jan 25, 2022

Of course. I’m happy @karwa picked this up.

@karwa
Copy link
Contributor

karwa commented Jan 25, 2022

Sorry, this was just blocking some interop stuff that I was working on, and since there seemed to be agreement on the idea of blocking C0s and delete, I figured I'd just get it done and take the opportunity to make the tests more comprehensive.

@alwinb
Copy link
Contributor Author

alwinb commented Jan 25, 2022

Oh no it’s excellent. I felt like I was blocking the progress here, and now this issue is taken care of and the opaque hosts can maybe be looked at later. I’ll close the PR.

@alwinb alwinb closed this Jan 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants