Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"D:\foo" should be parsed as "file:///D:/foo" #271

Open
domenic opened this issue Mar 10, 2017 · 9 comments
Open

"D:\foo" should be parsed as "file:///D:/foo" #271

domenic opened this issue Mar 10, 2017 · 9 comments
Labels
topic: file Aren't file: URLs the best? topic: parser

Comments

@domenic
Copy link
Member

domenic commented Mar 10, 2017

https://quuz.org/url/liveview.html#D:/foo Edge and Chrome on Windows at least parse this as a file URL, which I think is much more friendly. Firefox does not, but has some special logic so that when you enter D:\foo in the URL bar, it translates it to file:///D:/foo.

They also parse https://quuz.org/url/liveview.html#D:b/foo as a file URL, so it's not about the path name starting with /... maybe they treat all single-character schemes this way?

Discovered in nodejs/node-eps#51 (comment) by @jkrems

@annevk
Copy link
Member

annevk commented Mar 15, 2017

For the record, the address bar is out-of-scope.

I guess allowing this basically means giving up on single-code-point schemes, indeed. Not sure what the right trade-off is there.

On the upside no such schemes are registered at http://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml but nothing is currently prohibiting that either.

@zcorpan
Copy link
Member

zcorpan commented Mar 15, 2017

httparchive

SELECT * FROM (
SELECT page, url, REGEXP_EXTRACT(LOWER(body), r'(<[a-z][^>]+\s(?:src|href)\s*=\s*["\']?[a-z]:/[^>]+>)') AS match
FROM [httparchive:har.2017_01_15_chrome_requests_bodies]
WHERE page = url
) WHERE match != "null"
Row	page	url	match	 
1	http://www.xm-n-tax.gov.cn/	http://www.xm-n-tax.gov.cn/	<img src="d:/piaochuang/piaochuang.jpg" width="150px" height="90px;" onclick="javascript:window.open('/content/n4676.html');"/>	 

Page has changed.

2	http://www.newsforshoppers.com/	http://www.newsforshoppers.com/	<link href="s://plus.google.com/102103991664781080361" rel="publisher" />	 

rel="publisher" has no effect for browsers

3	http://www.aaai.org/	http://www.aaai.org/	<script src=s://seal.verisign.com/getseal?host_name=www.aaai.org&size=s&use_flash=no&use_transparent=no&lang=en>	 
4	http://www.mathematichka.ru/	http://www.mathematichka.ru/	<base href="d:/mathematichka/web/">

These are commented out.

Possibly there is content such as documentation on CDs that rely on this? Maybe a use counter could help?

@annevk
Copy link
Member

annevk commented Mar 15, 2017

Well, the URL parser should be generally applicable ideally, also beyond browsers. Part of the reason we're doing this is so that non-browsers can still browse the web.

@zcorpan
Copy link
Member

zcorpan commented Mar 16, 2017

Sure, I was just trying to find out if there were strong compat reasons for browsers to behave one way or the other for such URLs. I think there isn't, for publicly-accessible web content at least.

@annevk
Copy link
Member

annevk commented Mar 22, 2017

Actually, we could maybe support this by branching on the backslash, which is normally non-conforming and doesn't occur in the examples above.

@zcorpan
Copy link
Member

zcorpan commented Mar 22, 2017

Oops, the query only looked for forward slash. New query. Also removed the WHERE page = url which was limiting to top-level resources.

SELECT * FROM (
SELECT page, url, REGEXP_EXTRACT(LOWER(body), r'(<[a-z][^>]+\s(?:src|href)\s*=\s*["\']?[a-z]:[/\\][^>]+>)') AS match
FROM [httparchive:har.2017_01_15_chrome_requests_bodies]
) WHERE match != "null"

22 rows. https://gist.github.com/zcorpan/98a61be4877858d3de18c19d8939a3be

@annevk
Copy link
Member

annevk commented Mar 22, 2017

Looks mostly like errors (and stuff that won't work since we don't want http -> file to do anything but network error), but also all of those with backslash expect the behavior OP asks for I think.

@annevk
Copy link
Member

annevk commented May 10, 2020

I confirmed that this is a quirk IE6+/Chrome (on Windows only) have. They do it for both d:/foo and d:\foo. In fact, they do it for any a-z scheme. IE6 also does it for a 0-9 or -/+ scheme; I'll consider those to be bugs. (Firefox's address bar quirk is only with a backslash, not a forward slash.)

Thoughts on only adopting this when a backslash is used? Or should we add a platform-specific quirk here similar to https://w3c.github.io/FileAPI/#convert-line-endings-to-native and make single-scheme URLs impossible forever on that platform?

cc @sleevi @valenting @achristensen07 @jasnell

@domenic
Copy link
Member Author

domenic commented May 10, 2020

I'm -1 on platform-specific behavior (seems especially bad in contexts like HTTP servers and proxies).

I'm neutral on treating backslash specially vs. just treating all single-letter schemes as drive letters.

I'm +1 on addressing this in general. It would be great if full Windows file paths can be parsed as URLs as simply as passing them to the URL constructor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: file Aren't file: URLs the best? topic: parser
Development

No branches or pull requests

3 participants