Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unescaped '>' should probably not be allowed in URLs #291

Closed
bzbarsky opened this issue Apr 5, 2017 · 2 comments
Closed

Unescaped '>' should probably not be allowed in URLs #291

bzbarsky opened this issue Apr 5, 2017 · 2 comments

Comments

@bzbarsky
Copy link

bzbarsky commented Apr 5, 2017

The standard way, going back to at least the mid-90s, to mark up URLs in text is <url>. This, of course, relies on unescaped > not being allowed in URLs. This is clearly stated, with exactly this rationale, in RFC 1738 section 2.2. The URL standard should have similar provisions.

I don't know what that should mean for URL parsing, but in terms of serialization '>' should always be escaped in URLs, imo.

I just tested browser behavior, and:

  • Firefox consistently escapes '>' in path, userinfo, query, fragment. '>' in host or port cause parsing failure.
  • Safari escapes '>' in path, userinfo, query. It allows '>' unchanged in host and fragment. '>' in port causes parsing failure.
  • Chrome escapes '>' in path, userinfo, query, host. It allows '>' unchanged in fragment. '>' in port causes parsing failure.
  • Edge escapes '>' in path and host. It allows '>' unchanged in fragment and query. '>' in port causes parsing failure. Presence of userinfo causes parsing failure no matter what.

Testcase used:

<pre><script>
  var strs = [
    "http://test>test/foo\\bar",
    "http://a>b@test/foo\\bar",
    "http://test/foo\\bar/#a>b",
    "http://test/foo\\bar/?a=c>d",
    "http://test:2>3/foo\\bar",
    "http://test/foo>bar\\baz",
  ];
  for (var str of strs) {
    var a = document.createElement("a");
    a.setAttribute("href", str);
    var href;
    try {
      href = a.href;
    } catch(e) {
      href = "href getter threw";
    }
    var url;
    try {
      url = (new URL(str).href);
    } catch(e) {
      url = "constructor threw";
    }
    document.writeln(str, " -- ", href, " -- ", url);
  }
</script>

with the \\ bits in there a way to tell whether parsing failed in the href case.

@bzbarsky
Copy link
Author

bzbarsky commented Apr 5, 2017

Note also that there are various other standards (e.g. the one for the Link HTTP header) that rely on being able to put <> around a URL to delimit it.

annevk pushed a commit that referenced this issue Dec 5, 2017
Currently, we percent-encode characters in "fragment state" using the C0
control percent-encode set. Firefox encodes more than that, and it seems
reasonable to align around that behavior for reasons spelled out in #291
and the comments of #344.

This patch adds a new "fragment percent-encode set" which contains the
C0 control percent-encode set, along with:

* 0x20 (SP)
* 0x22 (")
* 0x3C (<)
* 0x3E (>)
* 0x60 (`)

Tests: web-platform-tests/wpt#7776.

Closes #344.
@annevk
Copy link
Member

annevk commented May 6, 2020

Apart from host it seems this is in order: https://jsdom.github.io/whatwg-url/#url=aHR0cHM6Ly9leGFtcGxlLmNvbS88Pj88PiM8Pg==&base=YWJvdXQ6Ymxhbms=. Probably due to #347.

Host is tracked by #458.

@annevk annevk closed this as completed May 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants
@bzbarsky @annevk and others