"Usage" example at top of docs raises MaxRetryError #28

brandon-rhodes · 2012-01-16T19:33:20Z

The "Getting Started" section of the documentation suggests this simple test:

>>> import urllib3
>>> http = urllib3.PoolManager()
>>> r = http.request('GET', 'http://google.com/')

But actually running this code results in an exception:

Traceback (most recent call last):
  File "test.py", line 7, in <module>
    r = http.request('GET', 'http://www.google.com/')
  File "/home/brandon/v/local/lib/python2.7/site-packages/urllib3/request.py", line 65, in request
    **urlopen_kw)
  File "/home/brandon/v/local/lib/python2.7/site-packages/urllib3/request.py", line 78, in request_encode_url
    return self.urlopen(method, url, **urlopen_kw)
  File "/home/brandon/v/local/lib/python2.7/site-packages/urllib3/poolmanager.py", line 113, in urlopen
    return self.urlopen(method, e.new_url, **kw)
  File "/home/brandon/v/local/lib/python2.7/site-packages/urllib3/poolmanager.py", line 113, in urlopen
    return self.urlopen(method, e.new_url, **kw)
  File "/home/brandon/v/local/lib/python2.7/site-packages/urllib3/poolmanager.py", line 113, in urlopen
    return self.urlopen(method, e.new_url, **kw)
  File "/home/brandon/v/local/lib/python2.7/site-packages/urllib3/poolmanager.py", line 113, in urlopen
    return self.urlopen(method, e.new_url, **kw)
  File "/home/brandon/v/local/lib/python2.7/site-packages/urllib3/poolmanager.py", line 109, in urlopen
    return conn.urlopen(method, url, **kw)
  File "/home/brandon/v/local/lib/python2.7/site-packages/urllib3/connectionpool.py", line 309, in urlopen
    raise MaxRetryError(url)
urllib3.exceptions.MaxRetryError: Max retries exceeded for url: http://www.google.com/

The reason is that each attempt to make a connection is dying with HostChangedError because the is_same_host() method in connectionpool.py is getting back the tuple ('http', 'www.google.com', None) from get_host(url) but the slightly different tuple ('http', 'www.google.com', 80) when it combines self.scheme with self.host and self.port.

Attempting the test with the URL http://google.com:80/ also fails because a first successful request is made that redirects to http://www.google.com/ which then dies as well with the None != 80 problem.

Only a test with the URL http://www.google.com:80/ succeeds, because only in that case is the explicit port number present to make is_same_host() succeed, and no subsequent redirect occurs to break things.

It looks like either get_host() needs to throw in the port 80 when building its tuple, or — if that would ruin the purity and usefulness of that function — the is_same_host() function needs to detect the None port number coming back from get_host() and upgrade it to 80 or 443 as appropriate.

Or: is the problem that the self.port that is_same_host() is grabbing used to have the value None as well, and it's the port having the value 80 so early in the process that is the problem here?

If more experienced project members could point me in the right direction here, I would be happy to contribute a patch and pull request. I could even add a test for the usage example in the documentation, so that it stays working. :) Thanks for your work on urllib3!

The text was updated successfully, but these errors were encountered:

shazow · 2012-01-16T20:06:55Z

That is an amazingly complete and informative bug report, thank you Brandon!

I think the first thing to do is to write a basic test which catches the desired version of this scenario without depending on an external service, such as with dummyserver. I'd prefer to avoid tests that depend on external servers because it makes things really slow, even if that means the main example breaking occasionally. ;)

I would prefer to maintain the 'purity and usefulness' of the function as you put it, but I have a sneaking suspicion this will have to be broken eventually. But let's delay it if we can help it.

If you're up for contributing, I'm happy to help you along with this patch. Let me know if you need help writing the test or anything else. :)

brandon-rhodes · 2012-01-16T22:07:20Z

Thanks, shazow! Let's start with the test. I see a funky "dummyserver" full of operations (like redirection and gzipping) that actually belong to different tests over in the "test" directory. Are you opposed in principle to more contained tests that use real sockets but that do not require a separate process to be spawned? If I wrote such a test as an example would you be willing to take a look at it and critique it? (Or is the issue here that it has to be a separate process for reasons, like Windows, that I'm not privy to by merely reading over the source code?) And note that not using a separate process would get rid of the time.sleep(0.1) magic that hopes that the subprocess starts fast enough. :)

shazow · 2012-01-16T22:13:02Z

The reason why it's a separate thread is so that the tests can spawn their own server and not block (the meat is in dummyserver/testcase.py). Also the problem with "real sockets" is that you'd still have to implement an HTTP server on top of it. :-)

Originally you had to run dummyserver as a standalone thing separately while running tests. Have a look at the tests in the test/with_dummyserver/ directory for relevant examples.

If you can write a test that doesn't need an HTTP server by just using the core urllib3 logic, even better! Though I suspect it wouldn't be easy in this case without some additional refactoring/abstraction, which might not be worth the complexity.

brandon-rhodes · 2012-01-17T22:18:18Z

I have started my foray into testing with a quite modest test case of whether connection pools survive if the server hangs up on a socket then they try re-using it. The test succeeds (though it took me quite a bit of work to figure out how, and I learned some things about httplib in the process):

https://github.com/brandon-rhodes/urllib3/commit/e0ee3a26bf2528063eeb0695579e265cc66570d7

The test illustrates how I tend to write tests for low-level HTTP functionality. I wrote two small test functions: start_server() is a function that opens a listening server socket in a separate thread, then hands control over to a callback function provided by the test; and read_request() is a function that helps test "server code" read an HTTP request off of a socket.

With these two functions in place, my new test_recovery_when_server_closes_connection() has two simple parts: a server that twice accepts connections, answers a single HTTP request, then hangs up on the client; and a client that makes two requests through an HTTPConnectionPool to see whether the server hanging up on the first socket ruins the ability for the pool to answer the second request.

You will see that I avoid the race condition of whether the socket is really closed by the time the client tries re-using it with a simple Event, which works great since the server is simply a thread. A similar arrangement with a little temporary Queue over in start_server() makes tests extra-stable by always assuring that the server has the socket open and bound and listening before the client ever tries making a connection (thus avoiding sleep(0.1) and the delay it introduces).

I am now going to try writing an actual test for the situation reported in this bug. The situation is a bit of a problem, since it can only happen with port 80, which the test cannot (usually) open. My solution will probably be to temporarily insert a test scheme into poolmanager.port_by_scheme and use that scheme without a port number in my sample URL, but we'll see.

For now, I wanted you to be looking over this new flavor of test and letting me know what improvements I could make that might make it an acceptable pattern for urllib3 tests as I continue to write more. Thanks!

brandon-rhodes · 2012-01-17T22:44:30Z

And I now have a failing test case! Again, thanks to using a tiny thread for the server, both the client and server code needed for the test get to live inside of the test function, making it self-contained.

https://github.com/brandon-rhodes/urllib3/commit/6cd6854b4cf6bb09ad7ab84d115f5c942cd25503

shazow · 2012-01-18T05:41:49Z

Big +1 on removing the need for a sleep(). I wish we could build this into dummyserver eventually so that we don't need this boilerplate sitting around beside the actual tests.

At first glance, it looks good to me. Great job, Brandon!

I'll take a closer look this weekend and merge it in then. I left a couple of minor style comments, but nothing big. :-)

runtests: Stop displaying unrelated stack traces

amitcs100 · 2022-08-09T13:14:18Z

I am seeing this error with urllib3==1.26.9 as well.

brandon-rhodes mentioned this issue Jan 17, 2012

Fixes issue #28 #29

Merged

shazow mentioned this issue Jan 20, 2012

issue with PoolManager.request() method #31

Closed

shazow closed this as completed Jan 21, 2012

njsmith added a commit to njsmith/urllib3 that referenced this issue Apr 4, 2018

Merge pull request urllib3#28 from pquentin/hide-runtests-errors

d40b618

runtests: Stop displaying unrelated stack traces

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Usage" example at top of docs raises MaxRetryError #28

"Usage" example at top of docs raises MaxRetryError #28

brandon-rhodes commented Jan 16, 2012

shazow commented Jan 16, 2012

brandon-rhodes commented Jan 16, 2012

shazow commented Jan 16, 2012

brandon-rhodes commented Jan 17, 2012

brandon-rhodes commented Jan 17, 2012

shazow commented Jan 18, 2012

amitcs100 commented Aug 9, 2022

"Usage" example at top of docs raises MaxRetryError #28

"Usage" example at top of docs raises MaxRetryError #28

Comments

brandon-rhodes commented Jan 16, 2012

shazow commented Jan 16, 2012

brandon-rhodes commented Jan 16, 2012

shazow commented Jan 16, 2012

brandon-rhodes commented Jan 17, 2012

brandon-rhodes commented Jan 17, 2012

shazow commented Jan 18, 2012

amitcs100 commented Aug 9, 2022