Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to file URL path normalization #544

Merged
merged 3 commits into from
Sep 30, 2020
Merged

Changes to file URL path normalization #544

merged 3 commits into from
Sep 30, 2020

Conversation

alwinb
Copy link
Contributor

@alwinb alwinb commented Sep 25, 2020

Changes to file URL path normalization as proposed in #405.

Summary:

Applicable only to file URLs:

  • Leading empty path segements are no longer removed.
  • If absent, the host is copied from the base URL,
  • Otherwise the host is preserved with one exception:
  • If the host is localhost then it is set to the empty string.

Intends to close #405 and #302.


Preview | Diff

url.bs Outdated Show resolved Hide resolved
url.bs Outdated Show resolved Hide resolved
@alwinb
Copy link
Contributor Author

alwinb commented Sep 28, 2020

The build fails because bikeshed is down at the moment.

@annevk annevk requested a review from domenic September 29, 2020 09:54
@annevk annevk merged commit 47efa00 into whatwg:master Sep 30, 2020
annevk pushed a commit to web-platform-tests/wpt that referenced this pull request Sep 30, 2020
@annevk
Copy link
Member

annevk commented Sep 30, 2020

Thanks a lot @alwinb for your work on this!

One thing I failed to check in on is bugs. I'll file one for Node.js and rust-url. I don't think Chrome/Firefox is needed as they have bigger issues. Should there be one for Safari?

@alwinb
Copy link
Contributor Author

alwinb commented Sep 30, 2020

Yes, there are corner cases where Safari is different. My understanding is this happens only when a host-less url is parsed against a base url. Quote:

As an aside, Safari fails the mentioned tests because it (sometimes? or always?) does not copy the host from the base URL when parsing host-less URLs. I suspect that this is not intentional;

From #405 (comment). Note that this is not drive letter specific behaviour.

And, thanks, it's the first time I did something like this so it's nice to see it work out!
Should I file a bug for Safari? I've not done that before either.

domenic pushed a commit to jsdom/whatwg-url that referenced this pull request Sep 30, 2020
@annevk
Copy link
Member

annevk commented Oct 1, 2020

Thanks, I filed https://bugs.webkit.org/show_bug.cgi?id=217170. (You're free to do this yourself next time if you want, https://github.com/whatwg/meta/blob/master/MAINTAINERS.md has some instructions.)

moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this pull request Oct 3, 2020
…ation proposal, a=testonly

Automatic update from web-platform-tests
Changes URL tests for file path normalization proposal

For whatwg/url#544.
--

wpt-commits: 050308a616a8388f1ad5d6e87eac0270fd35023f
wpt-pr: 25716
watilde added a commit to watilde/node that referenced this pull request Oct 5, 2020
@guybedford
Copy link

guybedford commented Oct 5, 2020

There is a concern here for Node.js modules that import '//path' and import '///path' will effectively now reference separate module URL specifiers despite referring to the same file path which might lead to unexpected identity issues. In turn this might affect scope lookups for import maps as well.

There are also existing examples of new URL('file:///' + path.resolve(path)) to get cross-platform URL resolution of paths which will now similarly return a non unique URL.

Are these properties hard constraints on the change here? The alternative in the Node.js modules system might be to throw for these file:///////path return cases in the resolver and hope that isn't too much of a breaking change. We are discussing it at the meeting on Wednesday so I can report back further here, but any thoughts on the above would help to hear before then too.

@domenic
Copy link
Member

domenic commented Oct 5, 2020

In general the mapping from URLs to files on the filesystem will not be 1:1. E.g. on Windows /path and /PATH are the same file. This just adds one more case of this type.

This generally comes down to the question as to how module maps are to be keyed: by "request URL" (~ module specifier) or "response URL" (~file inode).

@achristensen07
Copy link
Collaborator

It is true that how a filesystem handles a file URL is outside of the scope of the URL specification. We support both case-sensitive and non-case-sensitive filesystems, which causes strange problems like domenic said.
While I expressed support in changing the spec to match Safari and Firefox, I don't have a strong preference here.

@alwinb
Copy link
Contributor Author

alwinb commented Oct 5, 2020

For NodeJS, another thing to maybe take into account is code portability. I think that e.g. import '//path' was non-portable in the old situation, where in a http context it would resolve to http://path/.

@guybedford
Copy link

Various JS platforms and tools rely on file URLs now as their module registry identifier to align with browsers and support HTTPs loading alongside files. Node.js and Deno are two. More build tools may follow suit here as well.

Yes this is very much about edge cases. A better example might be something like:

export async function loadFileSystemPath (path) {
  return await import('file:///' + path);
}

or any other registry full URL construction.

Case sensitivity is a problem Node.js had before the shift to URLs so is probably best treated out of scope certainly.

Might it be possible to more easily support these base separator renormalization cases here?

It not, Node.js will likely implement some extra renormalization / validations before landing these changes. But it's a class of bugs for server JS platforms going forward so if there's ways to make it easier that could be useful. Otherwise these separator normalization issues could cause these edge case bugs in environments that are not aware of the pitfalls.

@guybedford
Copy link

Also note these cases may not even be that rare. Here is the list of tests that needed to be changed in Node.js core for this PR to land - nodejs/node#35429 (comment).

@domenic
Copy link
Member

domenic commented Oct 5, 2020

I think handling this in a normalization step is the right thing to do. The URL parser can produce many URLs that map to the same file; it's the function of systems built on top of the URL parser (such as HTTP servers, module registries, or standalone helpers like function urlToFilePath) to handle that.

@alwinb
Copy link
Contributor Author

alwinb commented Oct 5, 2020

I am not sure if the example is to be taken literally, but just in case, new URL('file://' + path.resolve(filepath)) would solve some issues (assuming resolve returns absolute paths and collapses slashes). Better yet, you can use the pathname setter to be safe about # and ? characters in directory/file names, but you do still have to manually escape percent signs, and backslashes in names within posix style paths. Ideally there would be a path.toURL library method for such conversions.

@bmeck
Copy link

bmeck commented Oct 5, 2020

@alwinb url.pathToFileURL exists already. I'm not entirely sure the recommendation here as the test for policies at least is failing with the existing implementation of that. We could update that helper but is the expectation that all comparisons of these leading / prefixes be coalesced when comparing? If so, does that mean URL comparison should not be done using === of the string representation / components?

@guybedford
Copy link

@alwinb the specific issue is that the path being concatenated will be either C:\\some\\path or /some/path. Certainly pathToFileURL is recommended for encoding handling etc, but people still do the above.

@domenic
Copy link
Member

domenic commented Oct 5, 2020

URL comparison can be done using ===. file://foo and file://///foo are different URLs.

File comparison cannot be done using === comparison on URLs. file://foo and file:////foo point to the same file, despite being different URLs.

Whether you want to use URL comparison, or file comparison, for your module specifiers, is up to you.

@bmeck
Copy link

bmeck commented Oct 5, 2020

@domenic so, to clarify, if file://foo and file:///foo are expected to point to the same file does that mean that scoping mechanisms should treat them as 2 different URL spaces for things like import maps?

@domenic
Copy link
Member

domenic commented Oct 5, 2020

Import maps (and the web in general) do not have a concept of files, only URLs. I don't know how Node, or "scoping mechanisms", want to deal with files.

(So in particular, the "are expected to" is not a valid premise on the web.)

@guybedford
Copy link

@domenic in both Node.js and Deno, import.meta.url is a file:/// URL from the module registry. In addition resolution is URL resolution when importing.

@bmeck
Copy link

bmeck commented Oct 5, 2020

@domenic https://fetch.spec.whatwg.org/ is vague on the behavior of "file" URLs and I'm more just trying to ensure we don't have some problem with forwards compatibility compared to current behavior for what URLs are assigned when you use files (such as ). It seems like prohibiting fetch operations from succeeding for more than 2 / would be forwards compatible for now and possible to relax in the future? I guess a different question is, would the web ever have a place where it expects to generate a URL with more than 2 leading / for a file. More asking to avoid divergence as much as possible rather than having opinions.

@domenic
Copy link
Member

domenic commented Oct 5, 2020

Right, the translation from URLs to files on the filesystem is not specified (certainly not by Fetch), and not really part of the web. It's browser- and OS-specific. I'm not sure how to give compatibility guidance about that subject.

I guess a different question is, would the web ever have a place where it expects to generate a URL with more than 2 leading / for a file.

Yes, for example if someone did <a href="file://////foo">, or new URL("file://///foo"), those would all generate such URLs.

@masinter
Copy link

masinter commented Oct 5, 2020

don't forget there are normalization issues with Unicode non-ascii file names.
Case insensitive file name matching is only a small part of the problem.

@ExE-Boss
Copy link

ExE-Boss commented Oct 5, 2020

In general the mapping from URLs to files on the filesystem will not be 1:1. E.g. on Windows /path and /PATH are the same file. This just adds one more case of this type.

Not if you have the case sensitive flag enabled on the directory in which the file is located:

I have it enabled for a good chunk of my file system.

ziransun pushed a commit to ziransun/wpt that referenced this pull request Oct 6, 2020
@alwinb alwinb deleted the file-path-normalization-issue-405 branch October 8, 2020 08:54
watilde added a commit to nodejs/node that referenced this pull request Oct 12, 2020
Refs: whatwg/url#544
Refs: web-platform-tests/wpt#25716

PR-URL: #35477
Fixes: #35429
Reviewed-By: Guy Bedford <guybedford@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: David Carlier <devnexen@gmail.com>
Reviewed-By: Bradley Farias <bradley.meck@gmail.com>
Reviewed-By: Michaël Zasso <targos@protonmail.com>
ambroff pushed a commit to ambroff/gecko that referenced this pull request Nov 4, 2020
…ation proposal, a=testonly

Automatic update from web-platform-tests
Changes URL tests for file path normalization proposal

For whatwg/url#544.
--

wpt-commits: 050308a616a8388f1ad5d6e87eac0270fd35023f
wpt-pr: 25716
joesepi pushed a commit to joesepi/node that referenced this pull request Jan 8, 2021
Refs: whatwg/url#544
Refs: web-platform-tests/wpt#25716

PR-URL: nodejs#35477
Fixes: nodejs#35429
Reviewed-By: Guy Bedford <guybedford@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: David Carlier <devnexen@gmail.com>
Reviewed-By: Bradley Farias <bradley.meck@gmail.com>
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Review slashes after file:/// and file path normalization
9 participants