Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove race conditions in PoolManager causing ClosedPoolErrors (#1252) #1257

Closed
wants to merge 13 commits into from

Conversation

reversefold
Copy link
Contributor

@reversefold reversefold commented Sep 3, 2017

Fix for #1252. The implementation here is a little clunky for now as I was trying to change the existing code/logic as little as possible to avoid regressions.

List of changes:

  • Introduces test_retry_with_too_many_pools which shows the ClosedPoolError happening in master
  • Fixes an issue in the dummyserver's retry_after handler in Python 3 when using the default value for status
  • Adds a PersistentRetry class which wraps Retry and allows for the same object to be used after increment is called
  • Changes the retry logic in HTTPConnectionPool and PoolManager to loop instead of recurse
  • Moves the base HTTPConnectionPool.urlopen functionality to HTTPConnectionPool.urlopen_only to allow for it to be called by PoolManager with no retry logic
  • Refactors all retry logic in HTTPConnectionPool.urlopen into URLOpenWithRetries which takes a function to call for urlopen_only functionality.
    • Both PoolManager and HTTPConnectionPool end up calling HTTPConnectionPool.urlopen_only eventually
    • HTTPConnectionPool's logic should be pretty much exactly the same as it was
    • PoolManager passes in a function for urlopen_only which acquires a pool from self.pools then calls urlopen_only on it. This means that if URLOpenWithRetries ends up retrying (currently will only happen if a Retry-After header is received) then PoolManager will create a new pool if the old one has been removed, avoiding the ClosedPoolError that can happen.
  • PoolManager acquires self.pools.lock a second time now, before running its internal urlopen_only logic, to ensure that the lock stays acquired until the HTTPConnectionPool calls _get_conn. It passes the lock to HTTPConnectionPool.urlopen_only so that it can release the lock immediately once _get_conn is called.

Caveats/Potential improvements:

  • Both URLOpenWithRetries and PoolManager still implement different redirect/retry logic. It would be nice to move all retry logic into URLOpenWithRetries.
    • Since PoolManager forces redirect=False when calling HTTPConnectionPool.urlopen(_only) there is only the Retry-After logic which can potentially cause a race condition on retry. Refactoring this further would reduce the complexity and remove the need for setting redirect=False.
  • Passing the lock into HTTPConnectionPool.urlopen_only is a little hacky as currently implemented. It would be nice to have the connection acquiring logic also refactored out so that passing it through like this isn't needed and we can use the lock as a context manager.
  • PersistentRetry may not be needed. This was implemented to ensure that the retries were maintained properly through the refactor as there are extra calls it may need to be passed through to maintain the context.
  • Retry logic changed to loops makes the tracebacks slightly less informative with regards to retries but reduces the frame stack when retrying.
    • This should be able to be reverted if wanted. I originally made this change to make the logic a little easier to reason about and avoid potential issues when refactoring.
  • URLOpenWithRetries is an odd name
  • URLOpenWithRetries could possibly be changed to be a mixin, parent class, or wrapper instead of taking a function as a parameter to make its use more obvious
  • {[testenv]setenv} change may not be needed but I ran into a lot of parsing errors with tox in some environments due to this.

I tested these changes on OSX (10.12.2) and on Ubuntu 16.04 with tox, as well as on travis-ci. There are random test failures, as mentioned in #1256, but the failures seem to correspond with those seen randomly on master in my tests. I have had each tox environment pass with no errors multiple times so I am fairly sure that there are no regressions.

https://travis-ci.org/reversefold/urllib3/builds/271452531 shows a fully passing run of this branch (except for gae, which appears to be broken).


This change is Reviewable

@Lukasa
Copy link
Sponsor Contributor

Lukasa commented Sep 3, 2017

So this is generally an interesting direction, but I think it suffers from a bit of excessive complexity. 😁 The general goal seems to be achievable with slightly less overhead by invoking a careful refactor of urlopen to call a helper function that does no redirection, with redirection logic elsewhere.

I think we should consider removing all retry logic from the ConnectionPool classes, which I think should make your life easier. Would it?

@reversefold
Copy link
Contributor Author

@Lukasa Removing the retry logic in the ConnectionPool classes would certainly reduce the complexity. As I said, I was trying to make this work without changing any existing functionality as I wasn't sure how much of this is intended to be externally exposed. Removing the retries from ConnectionPool seems like it could be a break in backwards compatibility.

If you're fine with moving that functionality to PoolManager I can make an attempt at that but it will likely take some test changes, making it a little harder to ensure that there are no regressions in top-level functionality.

Are there any places other than PoolManager which would need access to the retry logic.

…Wrapper manages it.

Rename URLOpenWithRetry to URLOpenRetryWrapper.
@codecov-io
Copy link

codecov-io commented Sep 4, 2017

Codecov Report

Merging #1257 into master will decrease coverage by 0.1%.
The diff coverage is 98.07%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1257      +/-   ##
==========================================
- Coverage     100%   99.89%   -0.11%     
==========================================
  Files          21       21              
  Lines        1985     1994       +9     
==========================================
+ Hits         1985     1992       +7     
- Misses          0        2       +2
Impacted Files Coverage Δ
urllib3/poolmanager.py 99.27% <93.33%> (-0.73%) ⬇️
urllib3/connectionpool.py 99.62% <98.87%> (-0.38%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5206734...39822b8. Read the comment docs.

@reversefold
Copy link
Contributor Author

Removed PoolManager's redirect code as it isn't needed any more with the retry wrapper (with the addition of the urljoin call to support relative redirects). Now all retry/redirect code is in the wrapper.

Also renamed URLOpenWithRetries to URLOpenRetryWrapper and made a few other touchups.

The urlopen functions in PoolManager and HTTPConnectionPool are now fairly minimal as all of the logic now resides in urlopen_only and URLOpenRetryWrapper.

…or PoolManager and HTTPConnectionPool.

Remove self.proxy and self.retries as they were not really needed.
@reversefold
Copy link
Contributor Author

Ok, I've made the retry logic into a mixin and removed some more cruft that I'd ended up with during the initial work. This feels pretty solid to me. The only major issues I still have with it are the name urlopen_only and the passing of the lock into urlopen_only. This still passes all tests without any test changes, so no tested logic should have regressed.

@Lukasa
Copy link
Sponsor Contributor

Lukasa commented Sep 4, 2017

Cool, so for my sake I think this needs to be broken into smaller PRs with consistent logical changes. 😁

@reversefold
Copy link
Contributor Author

I'll see what I can do.

@reversefold
Copy link
Contributor Author

@Lukasa I've opened 3 issues and PRs for the smaller parts of this PR. Once those are merged I'll work on the larger PRs which build upon them.

@sethmlarson
Copy link
Member

I'm closing this PR due to staleness (see #1370) if you want this to get merged still I'll reopen following an update to this patch. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants