Skip to content

Tornado httpclient fails requesting a url that urllib works with. #559

Open
mitechie opened this Issue Jul 5, 2012 · 4 comments

3 participants

@mitechie
mitechie commented Jul 5, 2012

I've hit a url that the httpclient it failing for that works with urllib. Below is a snippet of code with the url and showing it produces a 400 bad request from the httpclient side.

import urllib
from tornado import httpclient

url = "https://blogs.msdn.com/b/jmeier/archive/2012/05/13/the-rapid-research-method.aspx?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed: jmeier (J.D. Meier's Blog)&Redirected=true"

fh = urllib.urlopen(url)
# This will load up the content just peachy...
content = fh.read()

# This will get me a 400 bad request response.
http = httpclient.HTTPClient()
try:
    response = http.fetch(url)
    print "Content should be in here."
except Exception, e:
    print "but it goes BOOM!"
@bdarnell
tornadoweb member
bdarnell commented Jul 6, 2012

Technically that url is invalid because urls are not supposed to contain spaces. Browsers and other HTTP clients tend to "helpfully" rewrite the invalid url, although I'm not sure if there are any rules as to the right way to do it (it's not as simple as urllib.quote, since you don't want to encode the ampersands and other characters that would normally be percent-encoded)

@mitechie
mitechie commented Jul 9, 2012

Yea, I was trying to find some way to break it apart and manually escape it, but when it worked in urllib I wondered if maybe this is something that could be picked up and ported or something.

@ajkerrigan

Under the covers, urllib.urlopen() calls urllib.quote() with a safe character set of "%/:=&?~#+!$,;'@()*[]|". Is it worth adding that "helpfulness" to Tornado's httpclient? If that type of magic is better handled outside Tornado, perhaps this issue can be closed.

@bdarnell
tornadoweb member
bdarnell commented Dec 1, 2013

Is there any documentation of what browsers do (or should do) here? (maybe in html5?) If there's a standard to follow then I'm OK with adding that to Tornado, but I'd rather not add a bit of copy/pasted magic that may or may not be the same as what's used elsewhere.

@bdarnell bdarnell added the httpclient label Jul 16, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.