url quote in wsgi.HTTPRequest #688

Closed
dram opened this Issue Mar 4, 2013 · 6 comments

Comments

Projects
None yet
2 participants

dram commented Mar 4, 2013

When quoting PATH_INFO in HTTPRequest, some safe characters may also be quoted, it will cause url routing mismatch.

According to a answer at SO [1], and code from Python2.7.3 (urllib.py L182) , additional safe argument is needed:

quote(fullurl, safe="%/:=&?~#+!$,;'@()*[]|")

[1] http://stackoverflow.com/a/845595

Owner

bdarnell commented Mar 5, 2013

Can you give more details about how exactly this is a problem? The line you quoted here is about dealing with buggy servers that return nonconformant Location headers; is this about buggy wsgi containers or is there another way this can come up? PEP 3333 suggests quoting the PATH_INFO variable with no added safe characters: http://www.python.org/dev/peps/pep-3333/#url-reconstruction

Some of those characters definitely don't look "safe" in this context - if your PATH_INFO contains a ? or # character you don't want to let it through unencoded.

dram commented Mar 5, 2013

I ran an app that some of its urls contain plus sign +. It works well when using tornado's own server, but failed when playing with WebTest.

If PEP 3333 suggests that, I think we should keep consistency between those two.

Owner

bdarnell commented Mar 6, 2013

Ah, OK. Plus signs are strange since they're redundant (anywhere you can use a + you can also use %20), but they don't have their special meaning in all parts of a url. I think perhaps for wsgi PATH_INFO we should be using urllib.unquote instead of unquote_plus, but the rules here are complicated and perhaps underspecified. The PEP warns that it may not even be possible to reconstruct the url as entered by the user because url encoding is not one-to-one.

Owner

bdarnell commented Mar 6, 2013

Ah, OK. Plus signs are strange since they're redundant (anywhere you can use a + you can also use %20), but they don't have their special meaning in all parts of a url. I think perhaps for wsgi PATH_INFO we should be using urllib.unquote instead of unquote_plus, but the rules here are complicated and perhaps underspecified. The PEP warns that it may not even be possible to reconstruct the url as entered by the user because url encoding is not one-to-one.

dram commented Mar 6, 2013

Yes, it is a bit complicated. Anyway, it's better not use those symbols in url, I'll get rid of them.

Owner

bdarnell commented May 12, 2013

I've changed wsgi.py to use unquote instead of unquote_plus.

bdarnell closed this May 12, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment