Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Usage" example at top of docs raises MaxRetryError #28

Closed
brandon-rhodes opened this issue Jan 16, 2012 · 7 comments
Closed

"Usage" example at top of docs raises MaxRetryError #28

brandon-rhodes opened this issue Jan 16, 2012 · 7 comments

Comments

@brandon-rhodes
Copy link
Contributor

The "Getting Started" section of the documentation suggests this simple test:

>>> import urllib3
>>> http = urllib3.PoolManager()
>>> r = http.request('GET', 'http://google.com/')

But actually running this code results in an exception:

Traceback (most recent call last):
  File "test.py", line 7, in <module>
    r = http.request('GET', 'http://www.google.com/')
  File "/home/brandon/v/local/lib/python2.7/site-packages/urllib3/request.py", line 65, in request
    **urlopen_kw)
  File "/home/brandon/v/local/lib/python2.7/site-packages/urllib3/request.py", line 78, in request_encode_url
    return self.urlopen(method, url, **urlopen_kw)
  File "/home/brandon/v/local/lib/python2.7/site-packages/urllib3/poolmanager.py", line 113, in urlopen
    return self.urlopen(method, e.new_url, **kw)
  File "/home/brandon/v/local/lib/python2.7/site-packages/urllib3/poolmanager.py", line 113, in urlopen
    return self.urlopen(method, e.new_url, **kw)
  File "/home/brandon/v/local/lib/python2.7/site-packages/urllib3/poolmanager.py", line 113, in urlopen
    return self.urlopen(method, e.new_url, **kw)
  File "/home/brandon/v/local/lib/python2.7/site-packages/urllib3/poolmanager.py", line 113, in urlopen
    return self.urlopen(method, e.new_url, **kw)
  File "/home/brandon/v/local/lib/python2.7/site-packages/urllib3/poolmanager.py", line 109, in urlopen
    return conn.urlopen(method, url, **kw)
  File "/home/brandon/v/local/lib/python2.7/site-packages/urllib3/connectionpool.py", line 309, in urlopen
    raise MaxRetryError(url)
urllib3.exceptions.MaxRetryError: Max retries exceeded for url: http://www.google.com/

The reason is that each attempt to make a connection is dying with HostChangedError because the is_same_host() method in connectionpool.py is getting back the tuple ('http', 'www.google.com', None) from get_host(url) but the slightly different tuple ('http', 'www.google.com', 80) when it combines self.scheme with self.host and self.port.

Attempting the test with the URL http://google.com:80/ also fails because a first successful request is made that redirects to http://www.google.com/ which then dies as well with the None != 80 problem.

Only a test with the URL http://www.google.com:80/ succeeds, because only in that case is the explicit port number present to make is_same_host() succeed, and no subsequent redirect occurs to break things.

It looks like either get_host() needs to throw in the port 80 when building its tuple, or — if that would ruin the purity and usefulness of that function — the is_same_host() function needs to detect the None port number coming back from get_host() and upgrade it to 80 or 443 as appropriate.

Or: is the problem that the self.port that is_same_host() is grabbing used to have the value None as well, and it's the port having the value 80 so early in the process that is the problem here?

If more experienced project members could point me in the right direction here, I would be happy to contribute a patch and pull request. I could even add a test for the usage example in the documentation, so that it stays working. :) Thanks for your work on urllib3!

@shazow
Copy link
Member

shazow commented Jan 16, 2012

That is an amazingly complete and informative bug report, thank you Brandon!

I think the first thing to do is to write a basic test which catches the desired version of this scenario without depending on an external service, such as with dummyserver. I'd prefer to avoid tests that depend on external servers because it makes things really slow, even if that means the main example breaking occasionally. ;)

I would prefer to maintain the 'purity and usefulness' of the function as you put it, but I have a sneaking suspicion this will have to be broken eventually. But let's delay it if we can help it.

If you're up for contributing, I'm happy to help you along with this patch. Let me know if you need help writing the test or anything else. :)

@brandon-rhodes
Copy link
Contributor Author

Thanks, shazow! Let's start with the test. I see a funky "dummyserver" full of operations (like redirection and gzipping) that actually belong to different tests over in the "test" directory. Are you opposed in principle to more contained tests that use real sockets but that do not require a separate process to be spawned? If I wrote such a test as an example would you be willing to take a look at it and critique it? (Or is the issue here that it has to be a separate process for reasons, like Windows, that I'm not privy to by merely reading over the source code?) And note that not using a separate process would get rid of the time.sleep(0.1) magic that hopes that the subprocess starts fast enough. :)

@shazow
Copy link
Member

shazow commented Jan 16, 2012

The reason why it's a separate thread is so that the tests can spawn their own server and not block (the meat is in dummyserver/testcase.py). Also the problem with "real sockets" is that you'd still have to implement an HTTP server on top of it. :-)

Originally you had to run dummyserver as a standalone thing separately while running tests. Have a look at the tests in the test/with_dummyserver/ directory for relevant examples.

If you can write a test that doesn't need an HTTP server by just using the core urllib3 logic, even better! Though I suspect it wouldn't be easy in this case without some additional refactoring/abstraction, which might not be worth the complexity.

@brandon-rhodes
Copy link
Contributor Author

I have started my foray into testing with a quite modest test case of whether connection pools survive if the server hangs up on a socket then they try re-using it. The test succeeds (though it took me quite a bit of work to figure out how, and I learned some things about httplib in the process):

https://github.com/brandon-rhodes/urllib3/commit/e0ee3a26bf2528063eeb0695579e265cc66570d7

The test illustrates how I tend to write tests for low-level HTTP functionality. I wrote two small test functions: start_server() is a function that opens a listening server socket in a separate thread, then hands control over to a callback function provided by the test; and read_request() is a function that helps test "server code" read an HTTP request off of a socket.

With these two functions in place, my new test_recovery_when_server_closes_connection() has two simple parts: a server that twice accepts connections, answers a single HTTP request, then hangs up on the client; and a client that makes two requests through an HTTPConnectionPool to see whether the server hanging up on the first socket ruins the ability for the pool to answer the second request.

You will see that I avoid the race condition of whether the socket is really closed by the time the client tries re-using it with a simple Event, which works great since the server is simply a thread. A similar arrangement with a little temporary Queue over in start_server() makes tests extra-stable by always assuring that the server has the socket open and bound and listening before the client ever tries making a connection (thus avoiding sleep(0.1) and the delay it introduces).

I am now going to try writing an actual test for the situation reported in this bug. The situation is a bit of a problem, since it can only happen with port 80, which the test cannot (usually) open. My solution will probably be to temporarily insert a test scheme into poolmanager.port_by_scheme and use that scheme without a port number in my sample URL, but we'll see.

For now, I wanted you to be looking over this new flavor of test and letting me know what improvements I could make that might make it an acceptable pattern for urllib3 tests as I continue to write more. Thanks!

@brandon-rhodes
Copy link
Contributor Author

And I now have a failing test case! Again, thanks to using a tiny thread for the server, both the client and server code needed for the test get to live inside of the test function, making it self-contained.

https://github.com/brandon-rhodes/urllib3/commit/6cd6854b4cf6bb09ad7ab84d115f5c942cd25503

@shazow
Copy link
Member

shazow commented Jan 18, 2012

Big +1 on removing the need for a sleep(). I wish we could build this into dummyserver eventually so that we don't need this boilerplate sitting around beside the actual tests.

At first glance, it looks good to me. Great job, Brandon!

I'll take a closer look this weekend and merge it in then. I left a couple of minor style comments, but nothing big. :-)

@shazow shazow closed this as completed Jan 21, 2012
njsmith added a commit to njsmith/urllib3 that referenced this issue Apr 4, 2018
runtests: Stop displaying unrelated stack traces
@amitcs100
Copy link

I am seeing this error with urllib3==1.26.9 as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants