Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"/\\example.com" is treated as an unsafe protocol-relative url, not a path, by most browsers #59

Closed
hillbrad opened this issue Aug 10, 2015 · 14 comments

Comments

@hillbrad
Copy link

Am I missing the place where this is described by the parsing algorithm? Seems it should be between:

https://url.spec.whatwg.org/#no-scheme-state

and

https://url.spec.whatwg.org/#path-or-authority-state

@domenic
Copy link
Member

domenic commented Aug 10, 2015

Can you quantify "many" browsers and provide a repro (e.g. with <a> tags)?

Confirmed that the current spec (as implemented by jsdom/whatwg-url) will treat such input as an invalid URL and fail to parse it entirely.

@hillbrad
Copy link
Author

chromecap

This is Chrome 44 on Yosemite. Same behavior for Safari and FF.

@hillbrad
Copy link
Author

Hmm.. so the above is the behavior when you type such a link directly into the URL bar, or when the resource is loaded from file:, but when you load from http, it treats "image.jpg" as a hostname. So "/" is treated as a protocol-relative URL, ignoring the initial /.
cap2

@hillbrad hillbrad changed the title "/\\foobar.txt" is treated as a file: scheme by many browsers "/\\example.com" is treated as an unsafe protocol-relative url, not a path, by most browsers Aug 10, 2015
@annevk
Copy link
Member

annevk commented Aug 11, 2015

Yeah, this is defined by the parser. What is the problem?

@domenic
Copy link
Member

domenic commented Aug 11, 2015

The parser currently rejects this input whereas browsers do not, it seems.

@annevk
Copy link
Member

annevk commented Aug 11, 2015

It does? Step 7 of https://url.spec.whatwg.org/#scheme-state handles this, no?

@domenic
Copy link
Member

domenic commented Aug 11, 2015

Oh, you're right, I forgot to input a base URL when parsing it. The spec seems OK here (and matches Chrome, although not Firefox or IE11). A good case for the test suite.

https://jsbin.com/rilorigufo/edit?html,console,output in case anyone wants to test.

@hillbrad
Copy link
Author

I don't see how you get to step 7 of scheme state for a url with no ":" in
it.

Here's my manual walkthrough of the algorithm:

base URL: "http://foobar.com"
URL: "/\example.jpg"
Non-relative flag is unset
State override is not given

Begin:
scheme start state, pointer at /
scheme start state choice 2: c is not ASCII alpha, state = no scheme state,
decrease pointer
increase pointer, no scheme state, pointer at /
no scheme state choice 3: base URL non-relative flag is unset, so state =
relative state, decrease pointer
increase pointer, relative state, pointer at /
relative state: set scheme to base URL scheme of http
choice 2, state = relative slash state
increase pointer, relative slash state pointer at
relative slash state: choice 1, url is special and c is ""
subchoice 1: c is "", return parse error

@annevk
Copy link
Member

annevk commented Aug 11, 2015

You're right, but at the end you're wrong. "Parse error" is just an indication of something being wrong, it's not something that's returned.

@hillbrad
Copy link
Author

So parse error "falls through" rather than terminating the algorithm.

I guess, "It's the Web, always return SOMETHING."

Thanks, I was confused about that for sure.

@sideshowbarker
Copy link
Contributor

So parse error "falls through" rather than terminating the algorithm. I guess, "It's the Web, always return SOMETHING." Thanks, I was confused about that for sure.

I think it’s unfortunate that Hixie chose to use the term parse error for this when he originally set the conventions for it in the parsing algorithm in the HTML spec. In talking about parsing behavior it makes discussion of concepts like “parse errors” and “errors in parsing” ambiguous. And specs shouldn’t coin terms that are inherently ambiguous. It would have been less misleading and confusing if to begin with it has just been named parse warning or parsing conformance violation or something instead.

@annevk
Copy link
Member

annevk commented Aug 11, 2015

I would be okay with renaming this "report a parse error" or "report a violation" or some such. Perhaps best to do that as a new issue?

@sideshowbarker
Copy link
Contributor

I would be okay with renaming this "report a parse error" or "report a violation" or some such. Perhaps best to do that as a new issue?

Raised #60

@hillbrad
Copy link
Author

I'd be fine with "report a parse error and continue." where the "and continue" is the more important bit of clarification for the casual reader.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants