Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support matching strings with templates #4

Closed
tfausak opened this issue Apr 6, 2020 · 6 comments · Fixed by #10
Closed

Support matching strings with templates #4

tfausak opened this issue Apr 6, 2020 · 6 comments · Fixed by #10

Comments

@tfausak
Copy link
Owner

tfausak commented Apr 6, 2020

Motivated by #3. It should be possible to match a string, like http://search.example/?query=hello, against a template, like http://search.example/{?query}, to produce values, like [("query", stringValue "hello")].

This parsing is potentially ambiguous. If this functionality is provided, it could produce a list of possible matches. Or it could follow some rule, like expressions are as greedy as possible.

Since this is not something that's defined by the RFC, it would be worthwhile to come up with some test cases exercising this using a variety of operators and values.

@tfausak
Copy link
Owner Author

tfausak commented Apr 6, 2020

Once this is done, we should consider supporting using quasi quoted templates as Template Haskell patterns. For example:

case "http://search.example/?query=hello" of
  [uriTemplate|http://search.example/{?query}|] -> print query
  _ -> pure ()

@tfausak tfausak mentioned this issue Apr 6, 2020
@tfausak
Copy link
Owner Author

tfausak commented Apr 6, 2020

I think it could make sense to use URI templates for routing web applications, but doing so with a straightforward match function or a series of quasi quoted pattern matches would require checking each case in order. It might be worthwhile to see how various routing libraries solve this problem.

However I don't think it's necessary, especially for a first stab, to support matching a bunch of URI templates in a performant fashion. It's just something to think about.

@tfausak
Copy link
Owner Author

tfausak commented Apr 10, 2020

One interesting thing about this is that if you consider parsing to march along left to right, you could use earlier parses to disambiguate repeated variables. Not something I'd expect to ever come up in real life, but here's an example:

template: "{a}/{+a}{+b}"
input:    "a/a/b"

Naively the {+a} part could match a/b, but since the earlier {a} had to match a, we know that the later {+a} only matches a. That leaves {b} to match /b.

You don't even really need the left to right process to make that work. It's just that your end result has to be consistent for all the variables matched.

tfausak added a commit that referenced this issue Apr 11, 2020
@tfausak
Copy link
Owner Author

tfausak commented Apr 12, 2020

One minor problem with pattern matching on URI template quasi quotes is that variable names may not be valid Haskell identifiers. For example:

  • {A}: Anything starting with an uppercase letter can't be used as a value.
  • {0}: Anything starting with a number can't be used as a value.
  • {%00}: Percent signs will be parsed as operators.
  • {a.b}: Periods are very overloaded in Haskell. Suffice to say they can't be used in identifiers.
  • {_}: Underscores are allowed, but they mess with unused variable warnings.

Maybe none of that matters. I think TH can generate identifiers that the parser would reject, so it probably won't fail. And since the template is known at compile time, the user can simply choose variable names that work.

@tfausak
Copy link
Owner Author

tfausak commented Apr 18, 2020

I'm making progress on the gh-4-match branch. I've got extremely basic matching working right now. That means no operators, no modifiers, and only strings. Supporting everything else is going to be a challenge, that's for sure. Fortunately the existing test suite exercises all the interesting cases when expanding templates, which can be rejiggered to test parsing them too.

Off the top of my head, I expect the most challenging things will be:

  • Supporting prefix modifiers and maintaining consistency with them. For example if you match /a/ab against /{a:1}/{a}, it should return a =: "ab". This is going to require a new type to keep partial match information.

  • Dealing with + and # operators could involve a lot of backtracking. This could easily become a performance problem. In fact, I think you could get terrible performance just matching something like aaaaaaaa against {a}{a}. Increase the number of as until things slow to a crawl. (For a first cut, I don't mind bad performance that much. But if the goal is to match against user-supplied URLs, denial of service via pathological inputs should be guarded against.)

  • Support list and dictionary values, especially for + and # operators. Like prefixes, this may require a new data type to express that we're not sure what type of thing we're dealing with yet. It could be a string that happens to include separators, or it could be a list. In general I think matching should prefer simpler values. That is, strings preferred over lists and lists preferred ovre dictionaries.

@tfausak
Copy link
Owner Author

tfausak commented May 6, 2020

This has really nerd sniped me. I'm now making progress in the rewrite branch, which as you might guess has a bit wider scope than just matching. Regardless, it's now pretty close to matching on everything except variables with modifiers and non-string values. I can't claim that the performance is any good, but just getting it working is an important first step.

I think supporting prefix modifiers shouldn't be too bad. I have no plans to support matching lists, associative arrays, or explode modifiers. They're just too complicated.

@tfausak tfausak mentioned this issue May 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant