Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encode and decode path segments in urls when generating and recognizing, respectively #55

Closed
wants to merge 13 commits into from

Conversation

bantic
Copy link
Contributor

@bantic bantic commented Jun 19, 2015

This allows handlers like "/foo/:bar" to match a path with a url encoded
segment such as "/foo/abc%2Fdef" -> { params: { bar: "abc/def" } } (instead of incorrectly recognizing the value of ":bar" as "abc%2Fdef").

It also changes the route-recognizer to encode segments in the same way when generating a url.
For example, generating a postShow route with the path "/posts/:id" and params {id: 'abc/def'} will return the url "/posts/abc%2Fdef" instead of the incorrect "/posts/abc/def".

This code will fix this related ember issue: emberjs/ember.js#11497

Decoding and encoding URI segments is the way the rails router also works. I put together a demo Rails app to demonstrate.

Adds a flag RouteRecognizer.ENCODE_AND_DECODE_PATH_SEGMENTS that is initially true and can be set to false by consumers to opt-out of this bugfix.

@bantic bantic changed the title Decode encoded path segments in urls encode and decode path segments in urls when generating and recognizing, respectively Jun 21, 2015
@bantic
Copy link
Contributor Author

bantic commented Jun 21, 2015

Updated to handle the two complementary sides (recognizing urls and generating urls). This code will now fix the ember issue linked to in the description. Added a link to a sample rails repository showing that this the the way rails routing works, too.

@rwjblue
Copy link
Collaborator

rwjblue commented Jun 21, 2015

@machty - Can you review?

@krisselden
Copy link
Collaborator

@bantic this is the correct thing to do but will break an app that is manually encoding/decoding the dynamic segment. I would love to land this though because your test would show that the other PRs are flawed that are trying to do something similar in this area.

@bantic
Copy link
Contributor Author

bantic commented Apr 20, 2016

@krisselden What can I do to help move it along? Are you suggesting I can should/can the fix here so that an app that is manually decoding/encoding won't be broken?

@rwjblue
Copy link
Collaborator

rwjblue commented Apr 20, 2016

@bantic - I think that I would personally consider this a BUGFIX in Ember, even though it would break for folks that have already been manually working around the bug. I think we could add some sort of flag here to opt-in to the behavior so that we can land as a feature in Ember (which will give us more time to test and figure out an appropriate path forward). Looks like we also need a rebase when you have time...

@bantic bantic force-pushed the allow-encoded-segments-in-params branch 2 times, most recently from 18012da to 7a9da30 Compare April 20, 2016 14:56
@bantic
Copy link
Contributor Author

bantic commented Apr 20, 2016

@rwjblue @krisselden Rebased. I will add a flag to opt out of the behavior. How about
RouteRecognizer.DO_NOT_DECODE_AND_ENCODE_PATH_SEGMENTS = true; ?

@rwjblue
Copy link
Collaborator

rwjblue commented Apr 20, 2016

@bantic - Seems good to me, we can bikeshed the name if needed 😈 ...

@bantic bantic force-pushed the allow-encoded-segments-in-params branch from 5507657 to 3ae32bc Compare April 20, 2016 15:30
@bantic
Copy link
Contributor Author

bantic commented Apr 20, 2016

@rwjblue Done. I decided RouteRecognizer.ENCODE_AND_DECODE_PATH_SEGMENTS (default: true) was a better name for the flag. Consumers that want the old behavior can do RouteRecognizer.ENCODE_AND_DECODE_PATH_SEGMENTS = false;. LMK if that seems ok.

@rwjblue
Copy link
Collaborator

rwjblue commented Apr 20, 2016

Ya, on Ember's side, we will need to:

  • update and force ENCODE_AND_DECODE_PATH_SEGMENTS to false
  • add a feature flag that when enabled sets ENCODE_AND_DECODE_PATH_SEGMENTS to true
  • add some tests for both sides of the feature flag

@bantic
Copy link
Contributor Author

bantic commented Apr 20, 2016

👍 I'll make a PR there, hopefully be the end of the day today.

@stefanpenner
Copy link
Collaborator

With any flag, we need a plan in place to get off it. Can I assume that is just going to be when the feature flag in ember is removed, and the code moves to stable?

@rwjblue
Copy link
Collaborator

rwjblue commented Apr 20, 2016

@stefanpenner - Confirm. We will have to come up with a very tailored path for this feature. Using the feature in Ember to toggle the flag here is only the first step. The issue here is that this is likely a breaking change for folks already working around the bug that this fixes. We definitely want to fix this for others so they don't need to also rely on the broken behavior, but need to tackle the whole situation with care...

@stefanpenner
Copy link
Collaborator

@rwjblue 👍

This allows handlers like "/foo/:bar" to match a path with a url encoded
segment such as "/foo/abc%Fdef" -> { params: { bar: "abc/def" } }

See related issue emberjs/ember.js#11497
Setting this to false allows consumers to opt-out of the
encoding/decoding code (in case they were relying on manually handling
URLs with uri-encoded segments).
@bantic
Copy link
Contributor Author

bantic commented Apr 24, 2016

I'm working on a more robust PR that adds some coverage for more complex types of paths and also adds tests for star (*) segments. I'll update this later tonight.

@rwjblue
Copy link
Collaborator

rwjblue commented Apr 24, 2016

Awesome, thanks @bantic!

@bantic bantic force-pushed the allow-encoded-segments-in-params branch from 3ae32bc to 8790609 Compare April 25, 2016 01:33
@bantic bantic force-pushed the allow-encoded-segments-in-params branch from dc29b1d to ba99b23 Compare April 25, 2016 03:16
@bantic
Copy link
Contributor Author

bantic commented Apr 25, 2016

please do not merge
I've updated this after some feedback from @krisselden and I'm hoping to get feedback on this approach. I've added several additional test cases to clarify some of the edge cases that make this tricky, as well as codifying some of the existing behavior. The scope of this PR has grown a bit.

The current code in master includes a line path = decodeURI(path) in the router.recognize code that is intended to sort-of normalize paths when they are recognized. It means that if you give it the path "/ünicode" (which is not a valid URI) or the uri-encoded same path "/%C3%BCnicode" they will be "normalized" into the same string "/ünicode" before being matched against the router's static segments. This ensures that if you have added a route handler using the path string "/ünicode" then it will still be recognized from the uri-encoded path.

decodeURI isn't a safe normalizing method, though, because the percent character gets encoded as "%25" by both encodeURI and encodeURIComponent, so when the code in this PR attempts to decodeURIComponent a matching dynamic segment, it can encounter a malformed URI due to the previous decodeURI call.

For example, with the dynamic path "/user/:id" and a user id of "100%", it should be turned into a URI like so:

path = "/user/" + encodeURIComponent("100%"); // => "/user/100%25"

If, when recognizing, the path is "normalized" using decodeURI, the dynamic segment in the path gets doubly-decoded, and the following happens:

  • path become "/user/100%" after decodeURI call in router.recognize
  • dynamic segment ":id" matches string "100%"
  • decoding the dynamic segment fails: decodeURIComponent("100%") => URIError: URI malformed

Removing the path = decodeURI(path) call fixes this problem but causes recognizing a path with unicode in it to fail:

  • router has had the unicode path added: router.add(([{ path: "/ünicode", ... }]);
  • when recognizing, the uri-encoded path input string "/%C3%BCnicode" does not get "normalized" to "/ünicode"
  • it never gets recognized

In order to fix this, this PR changes the process so that instead of decoding-to-normalize the path when recognizing, it attempts to encode static path segments when they are added to the router so that the router always has (internally) a valid URI for each static path segment that it knows about.

Because it seems unlikely that users should be required to always pass in encoded paths when adding routes, this code adds a heuristic that attempts to determine when a static segment has been added that should be encoded (it should be encoded if: it has characters that need encoding or it appears to have "%" characters that are not intended to indicate percent-encoded sequences) and encodes the static segments (using encodeURIComponent) that appear to need encoding. This allows adding a route in its raw form (router.add([{ path: "/ünicode" }])) and in encoded form (router.add([{ path: "/%C3%BCnicode" }])). It makes the assumption that router.recognize(path) will be given a uri-encoded path, which may also be a breaking change.

I would like to get some feedback on whether this approach seems correct. I can improve the heuristic (or change or remove it) if necessary.
If it's helpful I can come up with some more examples that highlight how some existing behavior would be broken by this PR to help clarify the possible impact of the change.


function decodeURIWithoutPercents(string) {
return string.split(percentEncodedPercent).map(function(part) {
return decodeURI(part);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this closure doesn't need to be created every time this is called.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tring.split(percentEncodedPercent).map(decoeURI)

@cibernox
Copy link

cibernox commented May 3, 2016

I just want to chime in to highlight a bug that I found using ember-cli-mirage that I think it's related to this.

In the server, doing store.query('foo', { filter: { name: 'John' } }) I generate a URL that looks like this:

/foos?filter%5Bname%5D=John

But in ember-cli-mirage, doing request.queryParams I get { "filter[name]": "John" } instead of
{ filter: { name: 'John' } }.

I've tracked this to the parseQueryString method on this library.

I think that this method should be like the jQuery deparam method found here: https://github.com/chrissrogers/jquery-deparam/blob/master/jquery-deparam.js

This behaves the same way (afaik) than rails when parsing params, and probably other frameworks too:

Rack::Utils.parse_nested_query("filter%5Bname%5D=John") #=> {"filter"=>{"name"=>"John"}}

Is this issue fixed by this PR?

@bantic
Copy link
Contributor Author

bantic commented May 4, 2016

@cibernox This PR doesn't address query params, so it won't fix your issue, but that is definitely a bug.

@bantic
Copy link
Contributor Author

bantic commented May 4, 2016

This PR has grown in scope since it was originally opened, so I'm going to close it and re-open a new PR with a better description of the problem space and the attempted solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants