More granular retry support #260

kevinburke · 2013-10-16T18:07:30Z

I'm not sure what the interface for this would look like, or even at what level this would be implemented, but essentially, more function-level control over retries would be awesome. This is roughly the behavior we build in around a requests.request:

We want to retry pretty much everything on idempotent requests that's not a 2xx level response - for GET, PUT, DELETE methods, because these are safe to retry.
We want to retry all connection errors and connection timeouts - these indicate the request never made it to the other machine. We usually use an exponential backoff here, in the case of network failure
In general, we don't want to retry failed POST requests, whether they were timeouts, closed connections, or 5xx errors - consider a customer getting billed multiple times, or duplicate accounts being created, or something else bad.

Again, this might not be appropriate to build in at the library level (maybe we need better primitives for this internally) but adding all this logic around our http requests leads to pretty gnarly code.

Some kind of Retries object would be nice, especially because it mirrors the Timeout interface and there are a lot of things you could specify if you wanted to run wild with it, say the following:

connection_retries, the number of times to retry a connection error
what i'd like to call read_retries or the number of times to retry in a situation where a connection was made to the server. a timeout, a closed connection, or an unacceptable HTTP error code, like 500
retry_codes, a list of integers representing HTTP status codes, could also have named constants for NON_200, 5XX_ERROR, etc, would default to 500 range
method_whitelist, a list of HTTP methods to retry, defaults to HEAD, GET, PUT, DELETE, OPTIONS, TRACE, per the spec
backoff_factor, some kind of multiplier to control how fast backoff occurs. Defaults to 0. Algorithm would be something like backoff_factor * (2 ** (retry_attempt_number)), so 1, 2, 4, 8...

The text was updated successfully, but these errors were encountered:

shazow · 2013-10-16T18:11:28Z

@kevinburke Could you aggregate the discussion in #245, #145 (comment) and other places into this bug description?

Would be nice to present some code examples of how we want the API to look like.

Also you make a good point about the HTTP methods having different retry behaviour. I'd like for this to be configurable, maybe some kind of Retry(..., method_whitelist=['GET', 'PUT', 'DELETE']) which would of course have a sensible default, too.

kevinburke · 2013-10-17T16:00:48Z

Yeah, I was thinking a Retries object would be nice especially because it mirrors the Timeout interface and there are a lot of things you could specify if you wanted to run wild with it, say the following:

connection_retries, the number of times to retry a connection error
what i'd like to call read_retries or the number of times to retry in a situation where a connection was made to the server. a timeout, a closed connection, or an unacceptable HTTP error code, like 500
retry_codes, a list of integers representing HTTP status codes, could also have named constants for NON_200, 5XX_ERROR, etc, would default to 500 range
either method_whitelist, a list of methods, or retry_idempotent_requests, a boolean indicating whether to retry only idempotent HTTP methods or all of them.
backoff_factor, some kind of multiplier to control how fast backoff occurs. Defaults to 1. Algorithm would be something like backoff_factor * (2 ** (retry_attempt_number)), so 1, 2, 4, 8...

shazow · 2013-10-17T16:07:59Z

Sounds good.

Two comments:

I'm -1 on retry_indempotent_requests because indempotency is very subjective to the server implementation—not everyone does RESTful services. I still advocate for method_whitelist with a sensible default.
backoff_factor should default to 0, to mimic current behavior (also not our place to time.sleep(...) people's apps by default).

kevinburke · 2013-10-17T16:12:30Z

Ok. I copied the "spec" into the issue description

kevinburke · 2013-10-18T00:21:59Z

This would also be neat to write as a decorator, but this would not really be possible inside the library

@retry(connect=3, read=3, methods=['GET'])
def make_request(method, url, headers, query_string, data):
    """ make the request """

shazow · 2013-10-18T10:44:15Z

This would also be neat to write as a decorator, ...

@kevinburke Cute thought. Maybe something for https://github.com/shazow/unstdlib.py. :P

shazow · 2013-10-18T10:56:58Z

So what does our retry object look like so far? I imagine it something like...

class Retry(object):
    DEFAULT_METHOD_WHITELIST = set([
        'HEAD', 'GET', 'PUT', 'DELETE', 'OPTIONS', 'TRACE'])

    DEFAULT_STATUS_WHITELIST = [range(100,400), 408, 504]

    def __init__(self,
        total=None, redirect=5,
        error_total=5, error_timeout=None, error_connect=None, error_read=None,
        method_whitelist=self.DEFAULT_METHOD_WHITELIST,
        status_whitelist=self.DEFAULT_STATUS_WHITELIST,
        backoff_factor=0):
        ...

The Retry constructor params should probably avoid mentioning "retry" and "retries" as it would be redundant. Also should be consistent whether we use singular or plural, suffix or prefix. I could be convinced in either direction. (timeout_errors instead of error_timeout?)

Presumably each retry count parameter would act as a maximum. That is, if I specify a total=5, and error_timeout=10, it'll still raise a MaxRetryError after 5 retries. (Any debate here?)

And finally, the main things we want is to allow people to bring in their own retry logic easily. This means:

Contained logic which can be overridden by extending our Retry class.
Reasonable external API such that people can bring in their own Retry-like duck-typed classes altogether.

Perhaps the external API should be some method where we pass it an exception which represents the failure. This method can decide whether to raise a MaxRetryError or not. retry_obj.assert(e)? Or something else... Or maybe it should just return True/False instead of raising. Not sure.

@piotr-dobrogost @alsroot @pasha-r Are we missing anything?

kevinburke · 2013-10-19T14:21:01Z

How about naming it RetryPolicy and having the kwarg be retry_policy?

Pros:

much clearer what the object represents for someone reading the code
no mussing around with isinstance on the retries kwarg
avoid awkwardness of retries=Retry(5) or retries = Retries(foo=bar)

Cons:

No symmetry between retry and timeout util names
one param named retries and one named retry_policy

On Friday, October 18, 2013, Andrey Petrov wrote:

So what does our retry object look like so far? I imagine it something
like...

class Retry(object):
DEFAULT_METHOD_WHITELIST = set([
'HEAD', 'GET', 'PUT', 'DELETE', 'OPTIONS', 'TRACE'])
def __init__(self,
    total=None, error_timeout=None, error_connect=None, error_read=None,
    method_whitelist=Retry.DEFAULT_METHOD_WHITELIST,
    codes_whitelist=None,
    backoff_factor=0):
    ...
The Retry constructor params should probably avoid mentioning "retry" and
"retries" as it would be redundant. Also should be consistent whether we
use singular or plural, suffix or prefix. I could be convinced in either
direction. (timeout_errors instead of error_timeout?)

And finally, the main things we want is to allow people to bring in their
own retry logic easily. This means:

Contained logic which can be overridden by extending our Retry
class.

Reasonable external API such that people can bring in their own
Retry-like duck-typed classes altogether.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/260#issuecomment-26587086
.

Kevin Burke | 415-723-4116 | www.twilio.com

shazow · 2013-10-19T14:33:00Z

much clearer what the object represents for someone reading the code

What would you assume a Retry object represents, if not some Retry state? By the same logic, Timeout should be TimeoutPolicy. IMO the suffix is superfluous given the context of where it's used.

no mussing around with isinstance on the retries kwarg

It's just in one place and purely a convenience shortcut to building a simple Retry object.

Alternatively, what happens if someone provides both a retries int and a retry_policy object? I think that's even more confusing.

avoid awkwardness of retries=Retry(5) or retries = Retries(foo=bar)

This looks pretty sensible to me:

http.request('GET', 'google.com', retries=Retry(error_connect=3, backoff_factor=2))

Or better yet...

retry_policy = Retry(error_connect=3, backoff_factor=2, method_whitelist=['GET'])

...
http.request('GET', 'google.com', retries=retry_policy)

Certainly no worse than timeout=Timeout(...).

I'm still leaning towards Retry over RetryPolicy. I could be convinced to go with Retries, but I'm a fan of keeping things singular by default. Also consistent with Timeout which is singular.

kevinburke · 2013-10-20T06:14:13Z

Ok fair enough :)

kevinburke · 2013-10-20T06:16:05Z

A bit worried about the need for separate read and timeout retries, and a total number since different errors imply different things, but maybe this is because I can't foresee myself using them

shazow · 2013-10-20T08:47:18Z

I didn't foresee myself using anything other than total, either. If that was the case, there wouldn't be a need for this :)

See linked Github issues for examples of people wanting different things.

Really I think we need a total and a map of {exception -> num_retries}, which could be one of the params we can have people pass. But it would be annoying to have to build a map every time, so some sane shortcuts sounds like a good idea. We can discuss the shortcuts after the fact if you think this is a better starting point.

kevinburke · 2013-10-20T22:51:37Z

So... the semantics of total are a little tricky.

If total_errors = 3 and connect_errors = 3, pretty straightforward, no conflicts.
If total_errors = 1 and connect_errors = 3, seems like total_errors should win.
However if total_errors = 1 and connect_errors = 0 and there is a connect error, which wins? Do we retry one time or raise?

shazow · 2013-10-21T08:00:58Z

0 should be raise immediately, no retries.

pasha-r · 2013-10-21T10:17:57Z

Great design guys! I'm loving it :)

Couple of possible problem places though.

4XX HTTP error codes - if you got one, usually it means that the same request won't succeed either, so it's not good to retry 403'd request (at least not by default)
when I got 5XX oftentimes I want to do something with the reply - log it, parse it, if there was something specific in reply I may reconsider making retries, e.t.c.

Possible solutions.

Defaults should be optimal for simple use cases. Like no POST in method's whitelist and no 300-499 codes in code's whitelist.
May be there should be some point in Rerty class clearly designed for a code injection?

shazow · 2013-10-21T13:09:36Z

@pasha-r

Defaults should be optimal for simple use cases. Like no POST in method's whitelist and no 300-499 codes in code's whitelist.

+1 on the codes_whitelist (maybe it should be status_whitelist). Could allow including (x)range objects, such as status_whitelist=[range(100,400), 408, 504] in Python 3. In fact, maybe this is a good sensible default? (Thoughts?)

May be there should be some point in Rerty class clearly designed for a code injection?

How do you feel about extending the Retry class for your logging/parsing code? We'll make sure to have a nice method you could override. I want to avoid doing hooks registries in urllib3 if we can help it.

@kevinburke,

Also another consideration: Does a redirect count as a retry? Probably should have a redirect value, too. Updated #260 (comment) accordingly. (Should we move this example into the original issue description?)

shazow · 2013-10-21T13:15:23Z

Also, what happens when you do a POST while it's not whitelisted, but you get a 302? Does that count as a GET because it converts the request, so it does a redirect and counts a redirect retry? (I think so.)

Either way, this scenario needs to be well-documented. :)

kevinburke · 2013-10-21T15:27:47Z

Trying to summarize this...

+1 on having some kind of method to log a response, will add a dummy method that can be overridden.
I'd imagine each HTTP request going over the wire should count as a retry. We already decrement the retry counter for redirects, seems logical to continue this behavior. And agree on the need for documentation :)
0 retries can raise immediately, sure, but my question still stands if you pass in total_errors = 2 and connect_errors = 1 and get 1 connection error, does the total or the connect number win?

shazow · 2013-10-21T15:46:13Z

0 retries can raise immediately, sure, but my question still stands if you pass in total_errors = 2 and connect_errors = 1 and get 1 connection error, does the total or the connect number win?

error_connect=1 means that we will retry once for a connect-related error. error_total=2 means we retry 2 times for errors. If there there is 1 connection error, everything is fine.

If there are 2 connection errors, then error_connect=1 would prompt a MaxRetryError (maybe we need a MaxConnectRetryError subclass?).

If there is 1 connect error and 1 read error, everything is fine (error_total=2 allows for 2 error retries).

If there is 1 connect error and 2 read errors, then error_total=2 would cause a MaxRetryError.

kevinburke · 2014-01-22T08:02:19Z

So with that then the semantics become:

if the user specifies total_errors, ignore or raise ValueError if they also pass in any of connect_errors, read_errors, redirect, etc. Those values become otherwise meaningless, if I did enough thinking about the problem.

shazow · 2014-01-22T09:10:12Z

Nar, total_errors is maximum combined errors. If connect_errors or whatever is exceeded first, you get a respective connect error.

Another example:

Retry(total_errors=3, connect_errors=2, read_errors=2)

If we get 3 read_errors, you get a ReadError-related exception.
If you get 3 connect_errors, you get a ConnectError-related exception.
If, say, you got 2 read_errors and 2 connect_errors, then you get a total_errors-related exception.

Specifying total_errors is meaningful along-side other error maximums. No need for ValueError or what have you.

In the scenario where total_errors is larger than the combined other errors, then yes--you will never see a total_errors-related exception. That's okay.

kevinburke · 2014-01-22T18:08:50Z

So... I am becoming less confident in the interface as we've laid it out. I'm just not sure it's as intuitive as it could be, or that it solves the use cases I have in mind. Will try to think some more about this.

shazow · 2014-01-22T18:51:35Z

Can you share these use cases? I'm just keeping in mind the use cases of the feature requests urllib3 has had over the years (see mentioned issues in above replies). :)

By default pool.request(..., retries=Retry(n)) works exactly the same way pool.request(..., retries=n) works today. I feel this is important and intuitive. If more granular constraints are required, they can be specified. Combined with other timeouts, total_errors acts as a upper bound which is important when building clients for services or crawlers with requests/time limits (ie. need to make sure I don't make more than N requests per hour).

kevinburke · 2014-01-22T19:40:19Z

I had an idea to do something like https://gist.github.com/kevinburke/8565777, but reading that, it's not really intuitive either.

I'll keep thinking about it. I just want to make sure the interface makes it pretty clear what's going to happen in every case.

shazow · 2014-01-22T22:01:56Z

Can you expand on what you feel is unclear in the scenarios I described?

shazow · 2014-07-03T23:17:33Z

\o/

shazow mentioned this issue Oct 16, 2013

Max retries exceeded exception on stale pooled connections #245

Closed

shazow mentioned this issue Oct 21, 2013

Potential implementation of urllib3.util.Retry #269

Closed

kevinburke mentioned this issue Jan 29, 2014

Retries #326

Merged

shazow closed this as completed Jul 3, 2014

stephanebruckert mentioned this issue Oct 28, 2020

POST playlist_add_items not retrying (HTTPError: 429 API rate limit exceeded) spotipy-dev/spotipy#577

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More granular retry support #260

More granular retry support #260

kevinburke commented Oct 16, 2013

shazow commented Oct 16, 2013

kevinburke commented Oct 17, 2013

shazow commented Oct 17, 2013

kevinburke commented Oct 17, 2013

kevinburke commented Oct 18, 2013

shazow commented Oct 18, 2013

shazow commented Oct 18, 2013

kevinburke commented Oct 19, 2013

shazow commented Oct 19, 2013

kevinburke commented Oct 20, 2013

kevinburke commented Oct 20, 2013

shazow commented Oct 20, 2013

kevinburke commented Oct 20, 2013

shazow commented Oct 21, 2013

pasha-r commented Oct 21, 2013

shazow commented Oct 21, 2013

shazow commented Oct 21, 2013

kevinburke commented Oct 21, 2013

shazow commented Oct 21, 2013

kevinburke commented Jan 22, 2014

shazow commented Jan 22, 2014

kevinburke commented Jan 22, 2014

shazow commented Jan 22, 2014

kevinburke commented Jan 22, 2014

shazow commented Jan 22, 2014

shazow commented Jul 3, 2014

More granular retry support #260

More granular retry support #260

Comments

kevinburke commented Oct 16, 2013

shazow commented Oct 16, 2013

kevinburke commented Oct 17, 2013

shazow commented Oct 17, 2013

kevinburke commented Oct 17, 2013

kevinburke commented Oct 18, 2013

shazow commented Oct 18, 2013

shazow commented Oct 18, 2013

kevinburke commented Oct 19, 2013

shazow commented Oct 19, 2013

kevinburke commented Oct 20, 2013

kevinburke commented Oct 20, 2013

shazow commented Oct 20, 2013

kevinburke commented Oct 20, 2013

shazow commented Oct 21, 2013

pasha-r commented Oct 21, 2013

shazow commented Oct 21, 2013

shazow commented Oct 21, 2013

kevinburke commented Oct 21, 2013

shazow commented Oct 21, 2013

kevinburke commented Jan 22, 2014

shazow commented Jan 22, 2014

kevinburke commented Jan 22, 2014

shazow commented Jan 22, 2014

kevinburke commented Jan 22, 2014

shazow commented Jan 22, 2014

shazow commented Jul 3, 2014