Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve "localhost" in file URLs? #618

Open
karwa opened this issue Jul 6, 2021 · 5 comments
Open

Preserve "localhost" in file URLs? #618

karwa opened this issue Jul 6, 2021 · 5 comments
Labels
topic: file Aren't file: URLs the best? topic: parser

Comments

@karwa
Copy link
Contributor

karwa commented Jul 6, 2021

Currently, this standard does not allow file URLs to have the hostname "localhost", instead replacing it with an empty hostname.

Unfortunately, while both an empty hostname and "localhost" refer to the local machine, they may imply different access patterns on Windows.

  • For DOS-style paths (i.e. with drive letters), Chrome and Edge can resolve both file:///C:/Windows and file://localhost/C:/Windows.
  • For UNC paths (i.e. with share names), Chrome and Edge can resolve file://localhost/SomeShare but fail to resolve file:///SomeShare.

There are 2 ways to resolve this: either we change the standard to preserve "localhost" and the implication that the URL may refer to a UNC path, or browsers will need to parse the first path component, detect whether or not it is a drive letter, and consider it a reference to a UNC local share if it isn't (so that file:///SomeShare can actually be resolved). Both are kind of reasonable, I suppose.

FWIW, the Windows shell APIs also get this wrong. If you call GetFullPathName (to normalize the path) followed by UrlCreateFromPath, it will also remove "localhost" and return a broken file URL.

@annevk
Copy link
Member

annevk commented Jul 20, 2021

See #302 and #544 for recent changes in this area.

cc @alwinb

@alwinb
Copy link
Contributor

alwinb commented Oct 10, 2021

I don't know.

I don't know the reason for localhost being normalised away. I don't see how not doing that would cause issues.

As for the other approach, this is problematic right? As the syntax cannot distinguish UNC- and ordinary file paths, and it wouldn't sit well with normalisation of dotted segments? I'd have to read up on this properly.

What's your preference, @karwa? It seems to me you're the best person to make a call on this.

@karwa
Copy link
Contributor Author

karwa commented Oct 25, 2021

@alwinb

As for why this is the way it is, my guess is that, since it only affects Windows users, and both Chrome and Edge support file://localhost/SomeShare, it just hasn't been much of an issue. It may be more of an abstract concern that this standard can't express file URLs to local UNC paths.

Since I'm not a Windows user, I can't judge the impact of not fixing this. My naive assumption is that UNC is mostly used by businesses, who will likely also use the default browser (Edge), so it may become an issue when Chromium aligns with this part of the standard.

I don't know if the browsers have a way to gather telemetry on how often this comes up, but I think that would be a good idea for Chrome/Edge since this would cause a loss in functionality. @TimothyGu?

@TimothyGu
Copy link
Member

TimothyGu commented Oct 25, 2021

Check out the recent https://crrev.com/c/3088038 in Chromium, in particular this part:

Second, it changes the conditions required for removing host from

      Windows-only,    any host,    path has drive letter
                              to
         any OS,    localhost-only, path has drive letter

This is a step towards compliance with the WHATWG URL Standard, and
partially fixes bug 688961. For reference, the spec and Safari do host
removal with:
         any OS,    localhost-only,       any path
while Firefox uses:
         any OS,       any host,          any path

While my intention was to eventually do a follow-up that aligns with the current Standard, in the context of this issue I think Chromium's behavior after the CL is reasonable too. (I.e., strip localhost only if the path has a drive letter.) Notice that this takes care of the OP, and can resolve all three URLs that should be resolvable:

  • file:///C:/Windows
  • file://localhost/C:/Windows (localhost stripped during parsing)
  • file://localhost/SomeShare (localhost not stripped)

With regards to not stripping localhost at all, I think it's probably technically doable, but there isn't much to be gained over Chromium's solution at that point.

@TimothyGu
Copy link
Member

I realize that Chromium's solution is going to be a bit tricky to implement in the Standard, given the way the parser is set up currently. We would have to make two additions:

  • in the path state: to check whether the host is localhost and strip it if needed
  • counterintuitively, also in the host state: to check whether the path starts with the drive letter

The path state addition takes care of the parsing-from-scratch case as well as u.pathname = 'C:/abc'. The host state addition is needed for u.host = 'localhost' on a URL that already has a Windows drive letter.


Related to this, I've really starting to prefer separate parsing and canonicalizing stages, rather than putting everything in a single parser algorithm. That architecture not only simplifies implementing this change, but also makes proving roundtrippability easier (it's easier to show that canonicalization is idempotent).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: file Aren't file: URLs the best? topic: parser
Development

No branches or pull requests

4 participants