Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

url quote in wsgi.HTTPRequest #688

Closed
dram opened this Issue · 6 comments

2 participants

@dram

When quoting PATH_INFO in HTTPRequest, some safe characters may also be quoted, it will cause url routing mismatch.

According to a answer at SO [1], and code from Python2.7.3 (urllib.py L182) , additional safe argument is needed:

quote(fullurl, safe="%/:=&?~#+!$,;'@()*[]|")

[1] http://stackoverflow.com/a/845595

@bdarnell
Owner

Can you give more details about how exactly this is a problem? The line you quoted here is about dealing with buggy servers that return nonconformant Location headers; is this about buggy wsgi containers or is there another way this can come up? PEP 3333 suggests quoting the PATH_INFO variable with no added safe characters: http://www.python.org/dev/peps/pep-3333/#url-reconstruction

Some of those characters definitely don't look "safe" in this context - if your PATH_INFO contains a ? or # character you don't want to let it through unencoded.

@dram

I ran an app that some of its urls contain plus sign +. It works well when using tornado's own server, but failed when playing with WebTest.

If PEP 3333 suggests that, I think we should keep consistency between those two.

@bdarnell
Owner

Ah, OK. Plus signs are strange since they're redundant (anywhere you can use a + you can also use %20), but they don't have their special meaning in all parts of a url. I think perhaps for wsgi PATH_INFO we should be using urllib.unquote instead of unquote_plus, but the rules here are complicated and perhaps underspecified. The PEP warns that it may not even be possible to reconstruct the url as entered by the user because url encoding is not one-to-one.

@bdarnell
Owner

Ah, OK. Plus signs are strange since they're redundant (anywhere you can use a + you can also use %20), but they don't have their special meaning in all parts of a url. I think perhaps for wsgi PATH_INFO we should be using urllib.unquote instead of unquote_plus, but the rules here are complicated and perhaps underspecified. The PEP warns that it may not even be possible to reconstruct the url as entered by the user because url encoding is not one-to-one.

@dram

Yes, it is a bit complicated. Anyway, it's better not use those symbols in url, I'll get rid of them.

@bdarnell
Owner

I've changed wsgi.py to use unquote instead of unquote_plus.

@bdarnell bdarnell closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.