Various Zend\Mvc\Router\Http routers turn + into a space in path segments #2952

wants to merge 2 commits into


None yet

3 participants

gnicol commented Nov 13, 2012

A plus located in a path segment should be treated as a literal +.

Updating all http routers to use rawurlencode/rawurldecode to ensure this is the case.

Added tests to verify all characters are properly handled (including space) when present in either their raw or encoded forms.

gnicol added some commits Nov 7, 2012
@gnicol gnicol Updating the regex route to do a rawurldecode when
parsing params, previously the use of urldecode
would erroneously turn + characters into spaces.
@gnicol gnicol Updating all http routes to use rawurlencode and rawurldecode when
dealing with path segments. The previous use of urlencode/decode
would erroneously transform + characters into spaces.

Also updated tests with new checks to verify all characters which
don't absolutely require encoding pass through unchanged and that
encoding works correctly for all characters (even if they do not
strictly speaking require it).
DASPRiD commented Nov 13, 2012

For the tests, was there a reason to not use the data providers?

gnicol commented Nov 13, 2012

Yes, I wanted to add a test that the param foo+bar made it through ok. When done via the data provider the assemble test generates foo%2Bbar causing it to fail as the output doesn't match the starting value. The match test actually works great (which is the one I wanted) but I see no way to skip assemble.

The assemble method uses rawurlencode which is strict/conservative on output, anything other than a-z 0-9 -_.~ will be percent encoded.
The match method uses rawurldecode which is flexible on input, an un-encoded + is quite acceptable here.

I couldn't see a good way to fit testing the raw '+' case into the current provider; certainly open to suggestion though.

DASPRiD commented Nov 13, 2012

So both rawurlencode() and urlencode() will encode the + character?

gnicol commented Nov 13, 2012

Correct both rawurlencode and urlencode encode the +.
The key difference is rawurldecode keeps a raw + as + whereas urldecode turns it into a space (erroneously in a path segment).
The segment router actually defines a $urlencodeCorrectionMap to turn characters such as !$ (and now +) back into their raw version after encoding. The other routers lack this functionality so they encode on the way out.

If this could be pushed into a shared location it would make sense for the other routers but I wasn't clear where to shove it over to; possible creating a filter would make sense.

For our use case simply having the router accept a raw + as input and not turn it into a space was all thats needed. If the system puts put %2B instead of + on output it isn't ideal but is no worse than the existing handling of ! or $ for non-segment routers. Forgiving on input and strict on output is a pretty classic approach.

@weierophinney weierophinney added a commit that referenced this pull request Nov 19, 2012
@weierophinney weierophinney Merge branch 'hotfix/2952' into develop
Forward port #2952
@weierophinney weierophinney added a commit that closed this pull request Nov 19, 2012
@weierophinney weierophinney Merge branch 'hotfix/2952'
Close #2952
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment