Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why are baseUrl and path separate fields? #1

Closed
mgiuca opened this issue Aug 29, 2019 · 16 comments
Closed

Why are baseUrl and path separate fields? #1

mgiuca opened this issue Aug 29, 2019 · 16 comments

Comments

@mgiuca
Copy link

mgiuca commented Aug 29, 2019

It seems simpler and equally expressive to just have a single field representing the origin and path parts of the scope. I'm not sure what to call it, perhaps just baseUrl?

{
    baseUrl: self.origin,
    path: '/foo/?*',
    search: '*',
}

becomes:

{
    baseUrl: self.origin + 'foo/?*',
    search: '*',
}

Alternatively, if you want to keep them separate, why not rename baseUrl to origin?

@wanderview
Copy link
Member

Partly its allow future extension to control parts of the origin via patterns as well. I'm not sure if that would be an origin field or if it would be separate fields for host, scheme, port, etc.

Also, the existing URL() API supports a separate argument for a base URL. So this seems similar to that API.

Finally, we have to provide base URLs in some way in order to support relative paths. This seems valuable to me in general, but is also needed for backward compat with the service worker scopes behavior.

@mgiuca
Copy link
Author

mgiuca commented Aug 30, 2019

Partly its allow future extension to control parts of the origin via patterns as well. I'm not sure if that would be an origin field or if it would be separate fields for host, scheme, port, etc.

Would those be alternatives to baseURL? (i.e., you can either use baseURL or the more specific parts?) I feel like you could just make baseURL (or "origin") a pattern.

If you intend this to be an origin, it should be called origin. Is it allowed to also contain a path part?

Also, the existing URL() API supports a separate argument for a base URL. So this seems similar to that API.

That's fairly different from the origin. The base parameter to the URL constructor is an absolute URL string that the supplied relative URL string is resolved against. For example, new URL("baz", "https://example.com/foo/bar") gives "https://example.com/foo/baz".

Do you intend baseURL to be used in this way? (For example, being able to supply a bunch of path segments, which the supplied pattern is resolved against?) I don't really see the point of being able to do that; it would be simpler if it were just the origin.

But then it would be simpler to just combine into a single field.

Finally, we have to provide base URLs in some way in order to support relative paths. This seems valuable to me in general, but is also needed for backward compat with the service worker scopes behavior.

I don't really understand this one. SW scope today is resolved against the script URL as base (so it can be relative). If we combined baseURL and path into a single field, that could also be a relative URL resolved against the script URL.

@wanderview
Copy link
Member

The baseUrl is not just for origin. It is to support relative paths. Not sure how to more clearly state that.

For example, the intent is so someone can write their script like:

let pattern = new URLPattern({
  baseUrl: self.location,
  path: 'content/index.html'
});

And then the app can be re-hosted to a new location at a possibly different sub-path and it will still work correctly to its relative content/index.html file location.

@mgiuca
Copy link
Author

mgiuca commented Sep 2, 2019

But why does the user need to supply an absolute URL at all?

Let's say you combined baseUrl and path into a single field (let's call it originAndPath as a strawman). If you wanted an absolute URL, you'd supply one:

let pattern = new URLPattern({
  originAndPath: 'https://example.com/myapp/content/index.html'
});

or, if you wanted a relative URL, you'd simply omit the origin and any part of the path that's part of the current page's location, supplying a relative URL as a pattern:

let pattern = new URLPattern({
  originAndPath: 'content/index.html'
});

There's no need to explicitly supply self.location as a base URL, because it would be implied if your originAndPath does not start with a slash. (And this has the advantage of being able to use all of the URL resolution rules, such as starting with a slash but not a scheme uses the current origin but replaces the path.)

@wanderview
Copy link
Member

I guess a relative path is the same as a pattern with a variable host, start-of-path, etc. That could be considered, but it is definitely not something we would want for service workers. We must have an absolute URL prefix in order to do scope matching.

I can't recall if I explicitly called this out above, but the other reason I favor separate fields like path is that it simplifies the parsing. There are fewer characters that are permitted in the path; e.g. : is used in the scheme, but not the path. Also ? is used in the query string, but not in the path, etc.

@mgiuca
Copy link
Author

mgiuca commented Sep 5, 2019

I guess a relative path is the same as a pattern with a variable host, start-of-path, etc. That could be considered, but it is definitely not something we would want for service workers. We must have an absolute URL prefix in order to do scope matching.

The current scope is a relative URL: https://www.w3.org/TR/service-workers-1/#navigator-service-worker-register

Else, set scopeURL to the result of parsing options.scope with the context object’s relevant settings object’s API base URL.

Scope is resolved relative to the URL of the document calling navigator.serviceWorker.register. It's done so at registration time, so there is no performance overhead each time it is used.

Why can't you resolve the URLPattern against the document URL in the same way?

@wanderview
Copy link
Member

Personally I don't mind an implicit base URL, but I anticipate opposition to that like in whatwg/url#71.

@mgiuca
Copy link
Author

mgiuca commented Sep 6, 2019

I view whatwg/url#71 as being somewhat of a special case. That is a discussion about the behaviour of the URL constructor itself, wherein I agree with the opposition: the constructor of a simple data-holding class such as URL should not consume data from the surrounding environment, they should just hold the data you give them.

On the other hand, nearly every other thing on the entire web platform that consumes a URL string does so relative to the document URL (or the worker script URL, if inside a worker). A few examples that come to mind:

In fact, this is so common that the HTML spec defines "parse a URL" wrapper around the URL parser, which automatically resolves the given URL string against the document URL.

I think any new web API that takes a URL string should allow it to be relative and resolve it against the document URL. Technically, you're not taking a URL string, you're defining a new syntax for a "URL pattern", but I think the same principle should apply.

@wanderview
Copy link
Member

URLPattern seems to have very close parallels to URL to me. All the APIs you reference there operate in the context of a specific context. I think URLPattern is more like URL in that its not clear what context it will be used in when you construct it. That is why I followed the lead from URL.

And if we remove the dictionary in favor of simple strings like you request in #4 then we end up with pretty much the same constructor API shape as URL().

Anyway, I'd really like to get @annevk's input on this. I'm hoping we can discuss at TPAC.

@wanderview
Copy link
Member

Note, if we dropped URLPattern completely and had a service worker specific pattern match then I think it would be more reasonable to automatically apply the base URL.

@mgiuca
Copy link
Author

mgiuca commented Sep 13, 2019

The way I would view it is as an analogy with URL: we wouldn't be saying that when you construct a URLPattern, it is automatically resolved against the document URL (just like when you construct a URL, it isn't automatically resolved against the document URL).

Rather, there would be "absolute URLPatterns" and "relative URLPatterns" (based on the same logic as a URL: if the URLPattern begins with a scheme then it's absolute, otherwise relative), then when the URLPattern is applied in a real-world usage such as being used in a service worker scope, relative patterns are resolved against the base URL (just like whenever a URL is used in a specific context, it gets resolved against the base URL).

For example, new URLPattern("/foo") doesn't automatically get converted into a URLPattern at https://example.com/foo (if that's your current origin), it is stored as a relative URLPattern, /foo. But when you call navigator.serviceWorker.register with that as the scope, it makes sense that the relative URL pattern would be resolved against the current origin, rather than being an error or something.

EDIT: Oops, I realised the above is a mistake, since the URL constructor does not allow a relative URL string (i.e., there is no such thing as a "relative URL" (object), but there is a "relative URL string"). Still, I think the principle applies: ideally we would want there to be such a thing as a "relative URL pattern". The awkward thing about comparing this to URL is that even though there is a URL class defined, it is never used for any APIs --- all APIs take strings to represent URLs; the URL class is just a convenience class for users and it also defines the behaviour of internal URL parsing. So I think we want URLPattern to follow the lead of URL strings, not URL objects, then what I said above still makes sense.

@domenic
Copy link
Member

domenic commented Sep 16, 2019

Personally I agree that there shouldn't be an implicit base URL. However, I am still unsure whether including baseURL + path is a good idea. In particular the following alternatives seem nicer to me:

// Actually orthogonal components, but just as powerful:
new URLPattern({
  origin: self.origin,
  pathname: "foo/bar"
});

new URLPattern({
  origin: self.origin,
  pathname: new URL("relative/bar", self.location).pathname
});

// Combined
new URLPattern({
  prefix: new URL("foo/bar", self.origin)
});

new URLPattern({
  prefix: new URL("foo/bar", self.location)
});

The URL constructor as precendent does not by itself seem like a reason to include a baseURL option. In particular the URL constructor kind of serves two purposes: constructing an absolute URL, and resolving a relative URL. I think for URLPattern we could constrain it to just constructing a (necessarily absolute) URL pattern, and not for resolving a relative URL pattern; I think the "relative URL pattern" concept is best eliminated in this way.

@domenic domenic mentioned this issue Sep 16, 2019
@mgiuca
Copy link
Author

mgiuca commented Sep 16, 2019

I think the "relative URL pattern" concept is best eliminated in this way.

Why are we trying to eliminate the "relative URL pattern"? Relative URLs are super useful as a concept; for one thing they let you build a site that's independent of where it's hosted, and it seems like being able to specify patterns (such as service worker scope) relative is just as useful. (Current service worker scope is possible to use a relative URL prefix.)

Why invent a less natural way to do that when the concept of a relative URL already exists?

The URL class is not a good analogy here, because as I said above, APIs do not directly use URL; they take URL strings and all APIs that I know of let you supply a relative URL string, resolving against the document as a base URL:

https://url.spec.whatwg.org/#url-apis-elsewhere :

A standard that exposes URLs, should expose the URL as a string (by serializing an internal URL). A standard should not expose a URL using a URL object. URL objects are meant for URL manipulation.

So if you're going to use URL class as an analogy for URLPattern (citing the precedent that URL doesn't auto-resolve against the document URL, therefore neither should URLPattern) then by extension of that analogy, no APIs should actually expose URLPattern directly; they should take a string that is interpreted as a URL pattern, and that string should allow the pattern to be relative.

Alternatively, if you want URLPattern to be an object that you are expected to pass instances of to APIs like Service Worker register, it should be designed so that relative patterns are possible.

In particular the following alternatives seem nicer to me

I don't understand why those alternatives seem nicer. All of those require at least some dynamic logic as opposed to being completely declarative. None of these are possible in the web app manifest where it's just JSON (no code).

Why not new URLPattern({prefix: "/foo/bar"}) for an absolute-path-relative-to-current-origin and new URLPattern({prefix: "relative/bar"}) for a relative-to-current-document? Anyone who understands relative URLs can understand that.

@wanderview
Copy link
Member

I've attempted to the split the difference in the new revamped proposal. Some of this is in the detailed design doc, however, since I tried to keep the explainer focused on use cases.

To summarize the new proposal:

  1. There is a "long form" that takes an options dictionary with a baseURL attribute and separate attributes for each component part of a URL.
    a. If baseURL is provided then each component pattern defaults to the value from the baseURL.
    b. If baseURL is not provided, then each component pattern default to * accepting any value in that field.
    c. Any specified component patterns then override these defaults.
  2. There is a "short form" that takes a pattern string and an optional second baseURL parameter.
    a. This should provide a concise shortcut for many common use cases, but it may be necessary to use the long form for certain cases due to parsing difficulty.
    b. This will only require the baseURL if the first string is a relative pattern.

So, if you are trying to match any resources loading a jpg, can you write it briefly as:

new URLPattern({pathname: '/*.jpg'}).

An example of the short form might be matching any subdomain html documents:

new URLPattern('https://*.foo.com/*.html')

I'm hoping this is an acceptable compromise between the need I feel for a verbose featureful syntax and the desire for a more concise syntax.

@wanderview
Copy link
Member

I'm going to close this for now, but please feel free to re-open it if you think further discussion is needed. Or possibly open a new issue specific to the new revamped proposal. Thank you!

wanderview added a commit that referenced this issue Dec 17, 2020
@bathos
Copy link

bathos commented Nov 18, 2021

Old thread but since it doesn’t appear to have been mentioned: one benefit of a distinct baseURL is that it helps avoid common concatenation errors like the one in the initial post:

    baseUrl: self.origin + 'foo/?*'

It’s pretty easy to forget that "/" is the start of the pathname component rather than the end of the origin component — this would be e.g. https://github.comfoo/?* rather than https://github.com/foo?*.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants