Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking changes important to the polyfill #13

Closed
kenchris opened this issue Oct 6, 2020 · 19 comments
Closed

Tracking changes important to the polyfill #13

kenchris opened this issue Oct 6, 2020 · 19 comments

Comments

@kenchris
Copy link
Contributor

kenchris commented Oct 6, 2020

I made a small polyfill on top of path-to-regex to play around with this and found a few differences.

First of all, something like

pathname: '/*.:imagetype(jpg|gif|png)'

doesn't work, the wildcard has to be (.*) instead. I do like the /* wildcard so maybe we can just document this as an improvement.

You expose named groups as result.pathname.groups - but JS regexp actually supports named groups now (path-to-regex doesn't use that feature) and that it exposed to result.groups.

Basically this

const imagePattern = new URLPattern({
  pathname: '/(.*).:imagetype(jpg|gif|png)'
});

let result = imagePattern.exec("/photo.jpg");

console.log(result)
console.log(result.pathname.groups['imagetype']);

could be done with regexps like:

const imageRegex = new RegExp("/(.*)\.(?<imagetype>jpg|gif|png)");
let res = imageRegex.exec("/photo.jpg");

console.log(res);
console.log(res.groups['imagetype']);
@wanderview
Copy link
Member

I plan to target the 1.7 version of path-to-regexp that includes * wildcards. Its possible path-to-regexp might also add it back to latest versions if we proceed with this plan.

You can see this by using the express route tester with versions set to 1.7.0 and options set to strict and end:

https://forbeslindesay.github.io/express-route-tester/

You expose named groups as result.pathname.groups - but JS regexp actually supports named groups now (path-to-regex doesn't use that feature) and that it exposed to result.groups.

I'm not sure what you are suggesting here. If you use the route tester above you can see that path-to-regexp does pull out named groups. It does it via a keys optional param to populate with the results. See:

https://github.com/pillarjs/path-to-regexp#named-parameters

@kenchris
Copy link
Contributor Author

kenchris commented Oct 6, 2020

Oh, so that support was there in 1.7 and removed in future version? I think the support would be quite useful.

I'm not sure what you are suggesting here. If you use the route tester above you can see that path-to-regexp does pull out named groups. It does it via a keys optional param to populate with the results. See:

That I know, as I polyfilled it, but you added the results to result.pathname.groups but regex.exec already adds named groups to result.groups if using the named groups feature in JS regexps today (which path-to-regexp doesn't use).

So as the explainer suggested aligning with current JS regexp APIs, we should use .groups instead of .pathname.groups

@wanderview
Copy link
Member

But in URLPattern we have multiple patterns. You can have a pathname pattern and a separate hostname pattern, etc. This is like having separate regex's for the different URL components. So to distinguish these I added the component name into the result structure.

I'm not sure how else to handle this to be honest. I could forbid conflicting named groups in the pathname and hostname patterns, for example, but that would not help with non-named groups where an index is used.

@wanderview
Copy link
Member

wanderview commented Oct 6, 2020

For example:

new URLPattern({
  pathname: '/:product/index.html',
  hostname: ':product.example.com',
});

@wanderview
Copy link
Member

Basically there is not a single path-to-regexp instance per URLPattern. There are multiple. One for each component of the URL. This is because our use case is to match the entire URL and not just the path which has been path-to-regexp primary goal.

@kenchris
Copy link
Contributor Author

kenchris commented Oct 6, 2020

I see, that different would be nice to point out in the explainer.

@wanderview
Copy link
Member

I can add that. I guess I was trying to keep the explainer focused on use cases and avoid going into too much API detail there. Hard to strike the right balance. I must admit I didn't anticipate anyone would attempt to polyfill from just the explainer.

There are a lot more details about URLPattern in the design doc. It goes into both the issues you encountered here.

@kenchris
Copy link
Contributor Author

kenchris commented Oct 7, 2020

Could you give me comment rights?

The constructor builds a new URLPattern object from the given URLPatternInit dictionary.
Makes sense

A pattern is then compiled for each URL part. If the init specifies a pattern value for the part, then it is used.
Makes sense

Otherwise the pattern value is set to explicitly match the URL part from the baseURL.
You mean HREF part? like new URL(baseURL).href

Also what does pattern value refer to, there are many pattern values here (pathname, password, port etc)?

You mean that pathname, protocol and host ?

Also what is supposed to happen if protocol is set to "https" but baseURL is say "http://google.com/maps"?

If the baseURL is missing then then pattern value is set to a wildcard to match any value.

What pattern value again? You mean that "host" should be matched against ""? What is protocol is not given, should that also be "" then?

In the case of the path, it is set to "/*" to distinguish from a relative path.

So I read this as if pathname is not given and there is no baseURL it should be "/*"

Is baseURL always supposed to be an actual valid URL, or do you allow people to set o a pattern as dicussed above { baseURL: "*" }?

@wanderview
Copy link
Member

wanderview commented Oct 7, 2020

So URLPattern does pattern matching on URL components individually. If any component fails to match, then the whole match fails.

All component patterns default to *, except pathname which defaults to /*.

If you specify a baseUrl then its used to populate the pattern for each URL component. It takes the component value from the baseUrl. So if the baseUrl is "https://example.com/foo", then protocol's pattern is "https", hostname's pattern is "example.com", pathname's pattern is "/foo", and all other components are set to the empty string.

The URLPattern init dictionary can then override any component pattern. For example:

new URLPattern({
  baseUrl: 'https://example.com/foo',
  pathname: '/*.jpg'
})

Ends up with the same component patterns I listed above, but the pathname pattern is overridden to /*.jpg.

I think this will become clearer when I actually write a spec with a processing model.

Also note, this conversation helped me realize there is a bug in my current design. There is currently a toRegexp() method on URLPattern, that currently returns a single regexp. That needs to change in some way to account for each URL component having a separate regexp.

@wanderview
Copy link
Member

Overall I expect the API to continue to evolve a bit as I prototype things and discover issues. Sorry its still in flux.

@wanderview
Copy link
Member

To extend the example above, if you did this instead:

new URLPattern({
  pathname: '/*.jpg'
});

Then protocol, hostname, and other components are all *. Just the pathname would have /*.jpg. This means it would match any jpg resource regardless of the origin, query params, etc.

@kenchris
Copy link
Contributor Author

kenchris commented Oct 7, 2020

Also note, this conversation helped me realize there is a bug in my current design. There is currently a toRegexp() method on URLPattern, that currently returns a single regexp.

Yeah, I noticed that myself.

Also it doesn't make it clear what value is in the exec case. Like result.pathname.value I assume is the result of the regexp. But as there are multiple, what is result.value then?

@kenchris
Copy link
Contributor Author

kenchris commented Oct 7, 2020

I have a pretty simple polyfill (so far only for URLPattern object) here: https://github.com/kenchris/urlpattern-polyfill/blob/main/src/index.js

It seems to work pretty well and makes it quite easy to play around with the API shape

@wanderview
Copy link
Member

That looks great! I will try to play around with it when I get a chance.

@blakeembrey
Copy link

A couple of things I'd like to flag on the syntax, and this could be a whole different issue if that's more appropriate:

The expectation of * wasn't well defined which is the core reason for its removal - does * act as if .* (regexp) or as (.*) (unnamed matching group) or as /:x* (repeated segment). The implication of the first is that /foo/bar roughly becomes /foo/.*, the second that it'd become /foo(?:/.*)? and the third /foo(/[^\/]+)*.

This starts to touch on #14 and other issues flagged in the repo related to hostname and other segment matching. One thing I'd probably recommend, but does make it incompatible with path-to-regexp, is killing the magic and making the user write /foo{/:bar}?. This would also make things work with hostnames automatically, as you can then write {:host.}*.example.com.

@blakeembrey
Copy link

Another thing to flag with *, especially as you think about hostnames, is potential confusion with existing solutions. It's probably not a big deal though, since it seems inconsistent across the board already, but in a lot of hostname software/SSL certs I believe *.example.com is only a single layer (e.g. x.example.com and not x.y.example.com). Other places like Cloudflare Workers are just complete wildcard matches without "understanding" a segment of the URL.

@wanderview wanderview changed the title Notable differences from path-to-regex Tracking changes important to the polyfill Dec 4, 2020
@wanderview
Copy link
Member

Let's use this issue for tracking changes that are relevant to the polyfill.

Some code is starting to land in chromium. The changes you may want to update for right now are:

  1. Its based on path-to-regexp 6.2 and not the older version I originally thought I would use.
  2. C++ code does not yet support *, but I expect to layer that on top of 6.2. Maybe this can be upstreamed to path-to-regexp.
  3. The test() and exec() methods take a structured dictionary as input as a shorthand instead of requiring a full url. Probably the best thing to do if you want to duplicate this is to look at the code.
  4. You can see the options I'm using for path-to-regexp in each component here.

@wanderview
Copy link
Member

And I'm starting to add WPT tests, but I recommend waiting for this CL to land before trying to use them. It uses a separate data file of cases that should make it easier to integrate tests with other implementations/platforms.

@wanderview
Copy link
Member

I'm going to close this for now. In the future I'll open issues on:

https://github.com/kenchris/urlpattern-polyfill

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants