Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scheme and host erroneously passed to HTTPConnection request method #8

Closed
mapio opened this issue Nov 14, 2011 · 6 comments · Fixed by #83 or #85
Closed

Scheme and host erroneously passed to HTTPConnection request method #8

mapio opened this issue Nov 14, 2011 · 6 comments · Fixed by #83 or #85

Comments

@mapio
Copy link

mapio commented Nov 14, 2011

I think there is a problem in the use of httplib.HTTPConnection method request when called at
line 213 of urllib3/connectionpool.py where you pass it the full URL, containing the scheme and host, instead of just the path (and query part), as show in httplib usage examples.

This ends up in a wrong HTTP request performed to the server. To see it, you can for instance run

python -m SimpleHTTPServer

in a shell and then, in another one, run

python -c 'from urllib3 import PoolManager; http = PoolManager(); http.request( "GET", "http://localhost:8000/this/is/an/example" )'

and compare what the access log in the first shell reports as compared to what happens if you do

curl  "http://localhost:8000/this/is/an/example"

I can submit a patch, but I'm not an urllib3 expert so I will probably miss some other place where the same error occurs.

@shazow
Copy link
Member

shazow commented Nov 14, 2011

Hi there, thank you for the report!

This was a conscious decision, but perhaps not the correct one.

The goal was to reduce complexity and avoid inexplicit behaviour. That is, when you make a request to "http://localhost:8000/this/is/an/example", that's exactly the request that urllib3 should be making.

The current workaround to achieve what you want is:

from urllib3 import PoolManager
http = PoolManager()
conn = http.connection_from_url("http://localhost:8000")
response = conn.request("GET", "/this/is/an/example")

When we do PoolManager.request, it does the same thing behind the scenes except it doesn't strip away the host like we did here manually. I agree that there should be an option to strip away the host (perhaps even by default).

Should this option be specified in the PoolManager constructor? Such as PoolManager(strip_host=True).

But then when should the stripping occur? If it happens in urlopen, then should we backport the same functionality outside of PoolManager? (ie. into ConnectionPool objects.)

@n1m3
Copy link

n1m3 commented Nov 14, 2011

The request should definitely be made with the path (and the query) only, because urllib3 is a HTTP/1.1 client.
RFC2616:

To allow for transition to absoluteURIs in all requests in future versions of HTTP, all HTTP/1.1 servers MUST accept the
absoluteURI form in requests, even though HTTP/1.1 clients will only generate them in requests to proxies.

@kennethreitz
Copy link

Excellent info. Thanks :)

@kennethreitz
Copy link

This isn't a bug in urllib3. It's doing exactly what it's told.

@shazow
Copy link
Member

shazow commented Jun 23, 2012

Btw, if anyone is in dire need, here's a handy basic recipe for doing "proper" url passing with redirection in urllib3:

import urlparse
import urllib3

http = urllib3.PoolManager()

def request(method, url, conn=None):
    if conn:
        # Request within the current host connection (used for redirect handling)
        if not url.startswith('/'):
            url = '/' + url
        r = conn.request(method, url, redirect=False, assert_same_host=False)
    else:
        p = urlparse.urlparse(url)
        conn = http.connection_from_host(p.hostname, p.port, p.scheme)
        r = conn.request(method, p.path, redirect=False, assert_same_host=False)

    is_redirect = r.get_redirect_location()
    if not is_redirect:
        return r
    print "Redirecting: %s" % is_redirect

    if '://' not in is_redirect:
        # Redirect to same host
        return request('GET', is_redirect, conn)

    return request('GET', is_redirect)

shazow added a commit that referenced this issue Jul 1, 2012
* Built-in redirect will switch method to 'GET' if status code is 303. (Issue #11)
* ``urllib3.PoolManager`` strips the scheme and host before sending the request uri. (Issue #8)
@shazow shazow closed this as completed Aug 4, 2012
@shazow
Copy link
Member

shazow commented Aug 4, 2012

Fixed in v1.5.

njsmith added a commit to njsmith/urllib3 that referenced this issue Apr 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants