Tracking changes important to the polyfill #13

kenchris · 2020-10-06T09:12:50Z

I made a small polyfill on top of path-to-regex to play around with this and found a few differences.

First of all, something like

pathname: '/*.:imagetype(jpg|gif|png)'

doesn't work, the wildcard has to be (.*) instead. I do like the /* wildcard so maybe we can just document this as an improvement.

You expose named groups as result.pathname.groups - but JS regexp actually supports named groups now (path-to-regex doesn't use that feature) and that it exposed to result.groups.

Basically this

const imagePattern = new URLPattern({
  pathname: '/(.*).:imagetype(jpg|gif|png)'
});

let result = imagePattern.exec("/photo.jpg");

console.log(result)
console.log(result.pathname.groups['imagetype']);

could be done with regexps like:

const imageRegex = new RegExp("/(.*)\.(?<imagetype>jpg|gif|png)");
let res = imageRegex.exec("/photo.jpg");

console.log(res);
console.log(res.groups['imagetype']);

The text was updated successfully, but these errors were encountered:

wanderview · 2020-10-06T14:17:41Z

I plan to target the 1.7 version of path-to-regexp that includes * wildcards. Its possible path-to-regexp might also add it back to latest versions if we proceed with this plan.

You can see this by using the express route tester with versions set to 1.7.0 and options set to strict and end:

https://forbeslindesay.github.io/express-route-tester/

You expose named groups as result.pathname.groups - but JS regexp actually supports named groups now (path-to-regex doesn't use that feature) and that it exposed to result.groups.

I'm not sure what you are suggesting here. If you use the route tester above you can see that path-to-regexp does pull out named groups. It does it via a keys optional param to populate with the results. See:

https://github.com/pillarjs/path-to-regexp#named-parameters

kenchris · 2020-10-06T15:03:05Z

Oh, so that support was there in 1.7 and removed in future version? I think the support would be quite useful.

I'm not sure what you are suggesting here. If you use the route tester above you can see that path-to-regexp does pull out named groups. It does it via a keys optional param to populate with the results. See:

That I know, as I polyfilled it, but you added the results to result.pathname.groups but regex.exec already adds named groups to result.groups if using the named groups feature in JS regexps today (which path-to-regexp doesn't use).

So as the explainer suggested aligning with current JS regexp APIs, we should use .groups instead of .pathname.groups

wanderview · 2020-10-06T15:35:51Z

But in URLPattern we have multiple patterns. You can have a pathname pattern and a separate hostname pattern, etc. This is like having separate regex's for the different URL components. So to distinguish these I added the component name into the result structure.

I'm not sure how else to handle this to be honest. I could forbid conflicting named groups in the pathname and hostname patterns, for example, but that would not help with non-named groups where an index is used.

wanderview · 2020-10-06T15:36:39Z

For example:

new URLPattern({
  pathname: '/:product/index.html',
  hostname: ':product.example.com',
});

wanderview · 2020-10-06T15:40:32Z

Basically there is not a single path-to-regexp instance per URLPattern. There are multiple. One for each component of the URL. This is because our use case is to match the entire URL and not just the path which has been path-to-regexp primary goal.

kenchris · 2020-10-06T16:09:47Z

I see, that different would be nice to point out in the explainer.

wanderview · 2020-10-06T21:26:26Z

I can add that. I guess I was trying to keep the explainer focused on use cases and avoid going into too much API detail there. Hard to strike the right balance. I must admit I didn't anticipate anyone would attempt to polyfill from just the explainer.

There are a lot more details about URLPattern in the design doc. It goes into both the issues you encountered here.

kenchris · 2020-10-07T13:58:50Z

Could you give me comment rights?

The constructor builds a new URLPattern object from the given URLPatternInit dictionary.
Makes sense

A pattern is then compiled for each URL part. If the init specifies a pattern value for the part, then it is used.
Makes sense

Otherwise the pattern value is set to explicitly match the URL part from the baseURL.
You mean HREF part? like new URL(baseURL).href

Also what does pattern value refer to, there are many pattern values here (pathname, password, port etc)?

You mean that pathname, protocol and host ?

Also what is supposed to happen if protocol is set to "https" but baseURL is say "http://google.com/maps"?

If the baseURL is missing then then pattern value is set to a wildcard to match any value.

What pattern value again? You mean that "host" should be matched against ""? What is protocol is not given, should that also be "" then?

In the case of the path, it is set to "/*" to distinguish from a relative path.

So I read this as if pathname is not given and there is no baseURL it should be "/*"

Is baseURL always supposed to be an actual valid URL, or do you allow people to set o a pattern as dicussed above { baseURL: "*" }?

wanderview · 2020-10-07T15:51:11Z

So URLPattern does pattern matching on URL components individually. If any component fails to match, then the whole match fails.

All component patterns default to *, except pathname which defaults to /*.

If you specify a baseUrl then its used to populate the pattern for each URL component. It takes the component value from the baseUrl. So if the baseUrl is "https://example.com/foo", then protocol's pattern is "https", hostname's pattern is "example.com", pathname's pattern is "/foo", and all other components are set to the empty string.

The URLPattern init dictionary can then override any component pattern. For example:

new URLPattern({
  baseUrl: 'https://example.com/foo',
  pathname: '/*.jpg'
})

Ends up with the same component patterns I listed above, but the pathname pattern is overridden to /*.jpg.

I think this will become clearer when I actually write a spec with a processing model.

Also note, this conversation helped me realize there is a bug in my current design. There is currently a toRegexp() method on URLPattern, that currently returns a single regexp. That needs to change in some way to account for each URL component having a separate regexp.

wanderview · 2020-10-07T15:51:54Z

Overall I expect the API to continue to evolve a bit as I prototype things and discover issues. Sorry its still in flux.

wanderview · 2020-10-07T16:05:44Z

To extend the example above, if you did this instead:

new URLPattern({
  pathname: '/*.jpg'
});

Then protocol, hostname, and other components are all *. Just the pathname would have /*.jpg. This means it would match any jpg resource regardless of the origin, query params, etc.

kenchris · 2020-10-07T17:56:58Z

Also note, this conversation helped me realize there is a bug in my current design. There is currently a toRegexp() method on URLPattern, that currently returns a single regexp.

Yeah, I noticed that myself.

Also it doesn't make it clear what value is in the exec case. Like result.pathname.value I assume is the result of the regexp. But as there are multiple, what is result.value then?

kenchris · 2020-10-07T20:12:37Z

I have a pretty simple polyfill (so far only for URLPattern object) here: https://github.com/kenchris/urlpattern-polyfill/blob/main/src/index.js

It seems to work pretty well and makes it quite easy to play around with the API shape

wanderview · 2020-10-07T20:23:48Z

That looks great! I will try to play around with it when I get a chance.

blakeembrey · 2020-10-15T04:16:17Z

A couple of things I'd like to flag on the syntax, and this could be a whole different issue if that's more appropriate:

The expectation of * wasn't well defined which is the core reason for its removal - does * act as if .* (regexp) or as (.*) (unnamed matching group) or as /:x* (repeated segment). The implication of the first is that /foo/bar roughly becomes /foo/.*, the second that it'd become /foo(?:/.*)? and the third /foo(/[^\/]+)*.

This starts to touch on #14 and other issues flagged in the repo related to hostname and other segment matching. One thing I'd probably recommend, but does make it incompatible with path-to-regexp, is killing the magic and making the user write /foo{/:bar}?. This would also make things work with hostnames automatically, as you can then write {:host.}*.example.com.

blakeembrey · 2020-10-15T04:19:41Z

Another thing to flag with *, especially as you think about hostnames, is potential confusion with existing solutions. It's probably not a big deal though, since it seems inconsistent across the board already, but in a lot of hostname software/SSL certs I believe *.example.com is only a single layer (e.g. x.example.com and not x.y.example.com). Other places like Cloudflare Workers are just complete wildcard matches without "understanding" a segment of the URL.

wanderview · 2020-12-04T22:34:44Z

Let's use this issue for tracking changes that are relevant to the polyfill.

Some code is starting to land in chromium. The changes you may want to update for right now are:

Its based on path-to-regexp 6.2 and not the older version I originally thought I would use.
C++ code does not yet support *, but I expect to layer that on top of 6.2. Maybe this can be upstreamed to path-to-regexp.
The test() and exec() methods take a structured dictionary as input as a shorthand instead of requiring a full url. Probably the best thing to do if you want to duplicate this is to look at the code.
You can see the options I'm using for path-to-regexp in each component here.

wanderview · 2020-12-04T22:39:50Z

And I'm starting to add WPT tests, but I recommend waiting for this CL to land before trying to use them. It uses a separate data file of cases that should make it easier to integrate tests with other implementations/platforms.

wanderview · 2021-03-12T21:59:32Z

I'm going to close this for now. In the future I'll open issues on:

https://github.com/kenchris/urlpattern-polyfill

blakeembrey mentioned this issue Oct 15, 2020

Patterns as a string #22

Closed

wanderview changed the title ~~Notable differences from path-to-regex~~ Tracking changes important to the polyfill Dec 4, 2020

wanderview closed this as completed Mar 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking changes important to the polyfill #13

Tracking changes important to the polyfill #13

kenchris commented Oct 6, 2020

wanderview commented Oct 6, 2020

kenchris commented Oct 6, 2020

wanderview commented Oct 6, 2020

wanderview commented Oct 6, 2020 •

edited

wanderview commented Oct 6, 2020

kenchris commented Oct 6, 2020

wanderview commented Oct 6, 2020

kenchris commented Oct 7, 2020

wanderview commented Oct 7, 2020 •

edited

wanderview commented Oct 7, 2020

wanderview commented Oct 7, 2020

kenchris commented Oct 7, 2020

kenchris commented Oct 7, 2020

wanderview commented Oct 7, 2020

blakeembrey commented Oct 15, 2020

blakeembrey commented Oct 15, 2020

wanderview commented Dec 4, 2020

wanderview commented Dec 4, 2020

wanderview commented Mar 12, 2021

Tracking changes important to the polyfill #13

Tracking changes important to the polyfill #13

Comments

kenchris commented Oct 6, 2020

wanderview commented Oct 6, 2020

kenchris commented Oct 6, 2020

wanderview commented Oct 6, 2020

wanderview commented Oct 6, 2020 • edited

wanderview commented Oct 6, 2020

kenchris commented Oct 6, 2020

wanderview commented Oct 6, 2020

kenchris commented Oct 7, 2020

wanderview commented Oct 7, 2020 • edited

wanderview commented Oct 7, 2020

wanderview commented Oct 7, 2020

kenchris commented Oct 7, 2020

kenchris commented Oct 7, 2020

wanderview commented Oct 7, 2020

blakeembrey commented Oct 15, 2020

blakeembrey commented Oct 15, 2020

wanderview commented Dec 4, 2020

wanderview commented Dec 4, 2020

wanderview commented Mar 12, 2021

wanderview commented Oct 6, 2020 •

edited

wanderview commented Oct 7, 2020 •

edited