New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix deadlocks with pool_block=True and retries #1174

Merged
merged 18 commits into from Jun 22, 2017

Conversation

Projects
None yet
4 participants
@mattbillenstein
Contributor

mattbillenstein commented May 8, 2017

Replaces #1171

I'll look at including a test this week.

@Lukasa

Cool, this seems like we're getting a bit closer!

We definitely still need tests for these, but I'd be happy to take a patch in this form. We'll also want to add read() calls prior to the release in the two raise/return branches as well, but if you'd rather not deal with it straight away we can handle those in a separate PR.

@mattbillenstein

This comment has been minimized.

Contributor

mattbillenstein commented May 8, 2017

Ah, yes, updated the PR.

@Lukasa

Fab, so all we need now is some testing! You up for doing that?

# returning it to be released manually.
# Drain and release the connection for this response, since
# we're not returning it to be released manually.
response.read()
response.release_conn()

This comment has been minimized.

@Lukasa

Lukasa May 9, 2017

Contributor

Cool, so with all that code duplication about it'd be nice to pull this out to a handy-dandy helper function. Declaring it inline in urlopen would probably be enough.

This comment has been minimized.

@mattbillenstein

mattbillenstein May 9, 2017

Contributor

Hmm, yeah, I was sorta thinking it would be nice if the response did this for us -- it wouldn't make sense to release a connection back to the pool with data still in the pipe and it seems a bit awkward that a consumer of this lib would have to write the same pair of calls in their code.

This comment has been minimized.

@Lukasa

Lukasa May 10, 2017

Contributor

@mattbillenstein Yeah, so I had that thought too. I think it's possible that release_conn should check connection state. However, I think for release_conn the calculation is different: if the response is not complete when release_conn is called, probably the connection should be closed, not drained. Draining the connection here is still a specific-to-this-use-case operation, because here we have confidence that the connection is fine. However, when a user calls release_conn for any reason, we have no idea what's going on, and don't have any good way to keep track of what we think is going on.

So, I'd consider the enhancement to release_conn to be a separate concern, IMO: in this case we'll still want to drain the connection.

This comment has been minimized.

@sigmavirus24

sigmavirus24 May 10, 2017

Member

What if we add a new method exhaustively_release?

This comment has been minimized.

@Lukasa

Lukasa May 10, 2017

Contributor

I'm disinclined to extend the API surface of Response for this. It represents a particularly unusual use-case.

This comment has been minimized.

@mattbillenstein

mattbillenstein May 10, 2017

Contributor

response.release_conn(drain=True) ?

This comment has been minimized.

@Lukasa

Lukasa May 11, 2017

Contributor

Nope. =) This still represents an API extension, and additionally represents that kind of API wart which is a method that totally varies in its behaviour based on a boolean flag. Always better to have two methods than a method like that.

@codecov-io

This comment has been minimized.

codecov-io commented May 11, 2017

Codecov Report

Merging #1174 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@          Coverage Diff           @@
##           master   #1174   +/-   ##
======================================
  Coverage     100%    100%           
======================================
  Files          21      21           
  Lines        1973    1979    +6     
======================================
+ Hits         1973    1979    +6
Impacted Files Coverage Δ
urllib3/connectionpool.py 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e478d6a...047ef37. Read the comment docs.

@Lukasa

Ok cool, I'm happy with how that code looks. Time for us to come up with some tests!

@mattbillenstein

This comment has been minimized.

Contributor

mattbillenstein commented Jun 8, 2017

Added tests for this -- seems to be a single test failure on python 3.3 -- could you take a look?

@@ -666,6 +666,10 @@ def urlopen(self, method, url, body=None, headers=None, retries=None,
release_conn=release_conn, body_pos=body_pos,
**response_kw)
def drain_and_release_conn(response):
response.read()

This comment has been minimized.

@Lukasa

Lukasa Jun 9, 2017

Contributor

Here's a question: should we catch exceptions here? I feel like the answer is probably yes: they'll be ultimately a bit misleading if we hit them.

This comment has been minimized.

@mattbillenstein

mattbillenstein Jun 9, 2017

Contributor

Maybe yes.

In the case of a redirect, the entire response has probably already been read off the wire and is sitting in a buffer. And this is probably likely for many of the non-200 cases as well?

I'm going to push this to my production systems and see if I find any exceptions in this.

This comment has been minimized.

@Lukasa

Lukasa Jun 9, 2017

Contributor

The general concern is that we have, in the past, bumped into malformed responses in the past, including on redirects.

This comment has been minimized.

@mattbillenstein

mattbillenstein Jun 9, 2017

Contributor

Wouldn't that raise before getting here? We've parsed the redirect headers; this is just throwing away the response body; but I haven't traced through any of the code yet.

In any case, to be safe, happy to catch and discard the conn, but I'm not completely sure which types of exceptions we should try to catch here.

This comment has been minimized.

@Lukasa

Lukasa Jun 9, 2017

Contributor

Not necessarily: the body may be chunk encoded, or overlong, or a decent number of other possible wacky framing errors.

This comment has been minimized.

@Lukasa

Lukasa Jun 20, 2017

Contributor

"The handler above"?

This comment has been minimized.

@mattbillenstein

mattbillenstein Jun 20, 2017

Contributor

The exception handler directly above this function...

This comment has been minimized.

@Lukasa

Lukasa Jun 20, 2017

Contributor

Ah, I see. Yeah, we probably need most of those. CertificateError we don't need, but the rest seem like good things to catch.

This comment has been minimized.

@mattbillenstein

mattbillenstein Jun 20, 2017

Contributor

Ok, so that leads us back to how to properly discard the connection without modifying the Response interface?

This comment has been minimized.

@Lukasa

Lukasa Jun 20, 2017

Contributor

Nope. =) Response will handle this appropriately. In fact, we can even remove the explicit call to release_conn here as Response will do that automatically once the entire body is read.

@Lukasa

Cool, this is looking pretty good! We're much closer here now, I have a few smaller notes in the diff.

def setUp(self):
retries = Retry(
total=3,
backoff_factor=0.01,

This comment has been minimized.

@Lukasa

Lukasa Jun 20, 2017

Contributor

Do we need a backoff factor?

class TestRedirectPoolSize(HTTPDummyServerTestCase):
def setUp(self):
self.pool = HTTPConnectionPool(self.host, self.port, maxsize=10, retries=5, block=True)

This comment has been minimized.

@Lukasa

Lukasa Jun 20, 2017

Contributor

It might be worth adding some more verbose retry setup here to guarantee that redirects are in fact turned on, just in case this API changes for any reason.

total=3,
raise_on_status=False,
status_forcelist=[404],
)

This comment has been minimized.

@Lukasa

Lukasa Jun 20, 2017

Contributor

This needs to set redirect=True to avoid breaking if we ever change the kwarg defaults.

@Lukasa

Alright very nice! One small problem with coverage and we'll be good to go!

total=3,
raise_on_status=False,
status_forcelist=[404],
redirect = True,

This comment has been minimized.

@Lukasa

Lukasa Jun 20, 2017

Contributor

Mind removing the two spaces? Sorry. 😢

response.read()
except (TimeoutError, HTTPException, SocketError, ProtocolError,
BaseSSLError, SSLError) as e:
pass

This comment has been minimized.

@Lukasa

Lukasa Jun 20, 2017

Contributor

We're going to need a test that covers this sensibly: that'll basically mean sending back a malformed HTTP response to trigger this error, or at least triggering a timeout.

This comment has been minimized.

@mattbillenstein

mattbillenstein Jun 20, 2017

Contributor

Is there an example of how to do something like this in one of the existing tests?

This comment has been minimized.

@Lukasa

Lukasa Jun 21, 2017

Contributor

Sure, take a look at this one.

This comment has been minimized.

@mattbillenstein

mattbillenstein Jun 21, 2017

Contributor

Added a test and verified it raises ProtocolError in that function...

self.addCleanup(pool.close)
pool.urlopen('GET', '/not_found', preload_content=False)
self.assertEquals(pool.num_connections, 1)

This comment has been minimized.

@Lukasa

Lukasa Jun 21, 2017

Contributor

Can this be moved to test_socketlevel.py please?

@Lukasa

This comment has been minimized.

Contributor

Lukasa commented Jun 21, 2017

You've got a flake8 violation: mind resolving it? 😁

@mattbillenstein

This comment has been minimized.

Contributor

mattbillenstein commented Jun 21, 2017

Yeah, sorry, keep forgetting to check that -- should be clean now.

@Lukasa

This comment has been minimized.

Contributor

Lukasa commented Jun 21, 2017

Still failing. 😉

def test_pool_size_retry_drain_fail(self):
def socket_handler(listener):
while 1:

This comment has been minimized.

@Lukasa

Lukasa Jun 21, 2017

Contributor

We don't want this to run forever: can we make sure it only runs the number of times we expect it to? (Should be twice)

@@ -1399,7 +1399,7 @@ class TestRetryPoolSizeDrainFail(SocketDummyServerTestCase):
def test_pool_size_retry_drain_fail(self):
def socket_handler(listener):
while 1:
for _ in xrange(2):

This comment has been minimized.

@Lukasa

Lukasa Jun 21, 2017

Contributor

xrange isn't present on Python 3. We don't mind an intermediate list being created on Py 2 because it's only two elements long, so regular range is fine.

@Lukasa

Cool, this looks really good. Can you add an entry to the changelog for me please?

@Lukasa

Lukasa approved these changes Jun 22, 2017

Fantastic, thanks so much for your work!

@Lukasa Lukasa merged commit fec00a7 into urllib3:master Jun 22, 2017

4 checks passed

codecov/patch 100% of diff hit (target 100%)
Details
codecov/project 100% (target 100%)
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment