Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is an URL’s path a list of strings or a single string? #33

Closed
SimonSapin opened this issue Jul 1, 2015 · 15 comments
Closed

Is an URL’s path a list of strings or a single string? #33

SimonSapin opened this issue Jul 1, 2015 · 15 comments

Comments

@SimonSapin
Copy link
Contributor

A URL’s path is a list of zero or more ASCII string holding data, usually identifying a location in hierarchical form. It is initially the empty list.

Sounds good.

(Just to name the things in the list, this could be "[…] a list of zero or more <a>path components</a> holding […] A <dfn>path component</dfn> is an ASCII string.")

An absolute URL must be a scheme, followed by ":", followed by either a scheme-relative URL, or if URL is not special, a path, optionally followed by "?" and a query.

Here, it looks like a path is a single string that is concatenated with other strings. "a path" here probably should be something like "a path as components separated with /." Also, should there be an initial / before the first component?

A scheme-relative URL must be "//", followed by a host, optionally followed by ":" and a port, optionally followed by a path that starts with "/".

Same here. Are components separated by /? What does it mean for a list of string to start with "/", is that the value of the first component?

A path must be zero or more URL units, excluding "?".

URL units being code points, this sounds like a path is a single string.

Set url’s object to a structured clone of the entry in the blob URL store corresponding to the first string in url’s path. [HTML]

… and a list of strings again. (Same in various places in the parser.)

@annevk
Copy link
Member

annevk commented Jul 1, 2015

The "URL writing" section describes how you write a component. It wouldn't make sense to refer to a data structure there, since there isn't any yet. It's about the eventual input to the URL parser.

@SimonSapin
Copy link
Contributor Author

If for the purpose of that section "a URL’s path" is a different concept than in the rest of the spec, it should not link to #concept-url-path.

@annevk
Copy link
Member

annevk commented Jul 1, 2015

I guess that would require a whole set of fresh identifiers then... Since none of them are model components... They're all syntax components. Meh.

@SimonSapin
Copy link
Contributor Author

Another option is to have the path be a single string everywhere. Path components are not actually used outside the parser as far as I know, and could still be obtained by splitting on /.

@annevk
Copy link
Member

annevk commented Jul 1, 2015

That would not solve this problem. E.g. IPv4 address is a 32-bit integer, but that's not how you write it. If we make port a 16-bit integer, it likewise doesn't represent syntax. And even port being a string you could argue that the syntax thing is different since it can have leading 0s and such.

@SimonSapin
Copy link
Contributor Author

There is a "Host writing" section that describes how to represent an IPv4 address as a string with some .s. Should there be a similar section (or just a sentence) for an URL’s path with /s?

@SimonSapin
Copy link
Contributor Author

(Looking a bit more at the spec…) Namely, I think this sentence:

A path must be zero or more URL units, excluding "?".

should mention path components and slashes.

@annevk
Copy link
Member

annevk commented Aug 15, 2015

Since fixing this is not happening today this is what I want to do when I get back to this, hopefully soon:

  • Bring back "path segment" in the URL syntax section.
  • Attempt to explain the special path segments "." and "..".
  • Consider whether or not Windows drive letters need to be a parse error or part of the URL syntax section. Likely the former? Although that kind of obsoletes Windows from the perspective of the specification...
  • Introduce railroad diagrams to accompany the prose.

@domenic
Copy link
Member

domenic commented Aug 15, 2015

Consider whether or not Windows drive letters need to be a parse error or part of the URL syntax section. Likely the former? Although that kind of obsoletes Windows from the perspective of the specification...

Yes, let's not do the former, please. Remember that UAs are used ~95% of the time on Windows, even if developers prefer other OSs.

@sideshowbarker
Copy link
Contributor

The planned changes outlined in #33 (comment) look great to me. The one other somewhat-related thing I’m still hoping for are normative requirements for what code points are allowed in a domain, as raised at https://www.w3.org/Bugs/Public/show_bug.cgi?id=25334

@annevk
Copy link
Member

annevk commented Aug 16, 2015

@domenic I kind of wish we could fade out file URLs entirely. But I guess they still have legitimate use in node.js (or Node.js?) development?

It's a bit tricky too to define the syntax constructs for them since it heavily depends on the base URL, but I'll try to figure something out.

@masinter
Copy link

https://www.ietf.org/mail-archive/web/apps-discuss/current/msg14575.html

a proposed updated IETF spec for 'file:' URI scheme, check it out.
r

@annevk
Copy link
Member

annevk commented Aug 16, 2015

@masinter we did, see w3ctag/design-reviews#59.

@domenic
Copy link
Member

domenic commented Aug 16, 2015

They have legit uses in pretty much any system which deals with both files and URLs, yeah. Getting them documented and nailed down would be very helpful, especially if the URL Standard wants to be more than just the standard for browsers, but instead the standard for anything that interoperates with browsers.

@annevk
Copy link
Member

annevk commented Aug 17, 2015

I've decided to address railroad diagrams separately. See #67.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

5 participants