Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More granular retry support #260

Closed
kevinburke opened this issue Oct 16, 2013 · 26 comments
Closed

More granular retry support #260

kevinburke opened this issue Oct 16, 2013 · 26 comments

Comments

@kevinburke
Copy link
Contributor

I'm not sure what the interface for this would look like, or even at what level this would be implemented, but essentially, more function-level control over retries would be awesome. This is roughly the behavior we build in around a requests.request:

  • We want to retry pretty much everything on idempotent requests that's not a 2xx level response - for GET, PUT, DELETE methods, because these are safe to retry.
  • We want to retry all connection errors and connection timeouts - these indicate the request never made it to the other machine. We usually use an exponential backoff here, in the case of network failure
  • In general, we don't want to retry failed POST requests, whether they were timeouts, closed connections, or 5xx errors - consider a customer getting billed multiple times, or duplicate accounts being created, or something else bad.

Again, this might not be appropriate to build in at the library level (maybe we need better primitives for this internally) but adding all this logic around our http requests leads to pretty gnarly code.

Some kind of Retries object would be nice, especially because it mirrors the Timeout interface and there are a lot of things you could specify if you wanted to run wild with it, say the following:

  • connection_retries, the number of times to retry a connection error
  • what i'd like to call read_retries or the number of times to retry in a situation where a connection was made to the server. a timeout, a closed connection, or an unacceptable HTTP error code, like 500
  • retry_codes, a list of integers representing HTTP status codes, could also have named constants for NON_200, 5XX_ERROR, etc, would default to 500 range
  • method_whitelist, a list of HTTP methods to retry, defaults to HEAD, GET, PUT, DELETE, OPTIONS, TRACE, per the spec
  • backoff_factor, some kind of multiplier to control how fast backoff occurs. Defaults to 0. Algorithm would be something like backoff_factor * (2 ** (retry_attempt_number)), so 1, 2, 4, 8...
@shazow
Copy link
Member

shazow commented Oct 16, 2013

@kevinburke Could you aggregate the discussion in #245, #145 (comment) and other places into this bug description?

Would be nice to present some code examples of how we want the API to look like.

Also you make a good point about the HTTP methods having different retry behaviour. I'd like for this to be configurable, maybe some kind of Retry(..., method_whitelist=['GET', 'PUT', 'DELETE']) which would of course have a sensible default, too.

@kevinburke
Copy link
Contributor Author

Yeah, I was thinking a Retries object would be nice especially because it mirrors the Timeout interface and there are a lot of things you could specify if you wanted to run wild with it, say the following:

  • connection_retries, the number of times to retry a connection error
  • what i'd like to call read_retries or the number of times to retry in a situation where a connection was made to the server. a timeout, a closed connection, or an unacceptable HTTP error code, like 500
  • retry_codes, a list of integers representing HTTP status codes, could also have named constants for NON_200, 5XX_ERROR, etc, would default to 500 range
  • either method_whitelist, a list of methods, or retry_idempotent_requests, a boolean indicating whether to retry only idempotent HTTP methods or all of them.
  • backoff_factor, some kind of multiplier to control how fast backoff occurs. Defaults to 1. Algorithm would be something like backoff_factor * (2 ** (retry_attempt_number)), so 1, 2, 4, 8...

@shazow
Copy link
Member

shazow commented Oct 17, 2013

Sounds good.

Two comments:

  • I'm -1 on retry_indempotent_requests because indempotency is very subjective to the server implementation—not everyone does RESTful services. I still advocate for method_whitelist with a sensible default.
  • backoff_factor should default to 0, to mimic current behavior (also not our place to time.sleep(...) people's apps by default).

@kevinburke
Copy link
Contributor Author

Ok. I copied the "spec" into the issue description

@kevinburke
Copy link
Contributor Author

This would also be neat to write as a decorator, but this would not really be possible inside the library

@retry(connect=3, read=3, methods=['GET'])
def make_request(method, url, headers, query_string, data):
    """ make the request """

@shazow
Copy link
Member

shazow commented Oct 18, 2013

This would also be neat to write as a decorator, ...

@kevinburke Cute thought. Maybe something for https://github.com/shazow/unstdlib.py. :P

@shazow
Copy link
Member

shazow commented Oct 18, 2013

So what does our retry object look like so far? I imagine it something like...

class Retry(object):
    DEFAULT_METHOD_WHITELIST = set([
        'HEAD', 'GET', 'PUT', 'DELETE', 'OPTIONS', 'TRACE'])

    DEFAULT_STATUS_WHITELIST = [range(100,400), 408, 504]

    def __init__(self,
        total=None, redirect=5,
        error_total=5, error_timeout=None, error_connect=None, error_read=None,
        method_whitelist=self.DEFAULT_METHOD_WHITELIST,
        status_whitelist=self.DEFAULT_STATUS_WHITELIST,
        backoff_factor=0):
        ...

The Retry constructor params should probably avoid mentioning "retry" and "retries" as it would be redundant. Also should be consistent whether we use singular or plural, suffix or prefix. I could be convinced in either direction. (timeout_errors instead of error_timeout?)

Presumably each retry count parameter would act as a maximum. That is, if I specify a total=5, and error_timeout=10, it'll still raise a MaxRetryError after 5 retries. (Any debate here?)

And finally, the main things we want is to allow people to bring in their own retry logic easily. This means:

  1. Contained logic which can be overridden by extending our Retry class.
  2. Reasonable external API such that people can bring in their own Retry-like duck-typed classes altogether.

Perhaps the external API should be some method where we pass it an exception which represents the failure. This method can decide whether to raise a MaxRetryError or not. retry_obj.assert(e)? Or something else... Or maybe it should just return True/False instead of raising. Not sure.

@piotr-dobrogost @alsroot @pasha-r Are we missing anything?

@kevinburke
Copy link
Contributor Author

How about naming it RetryPolicy and having the kwarg be retry_policy?

Pros:

  • much clearer what the object represents for someone reading the code
  • no mussing around with isinstance on the retries kwarg
  • avoid awkwardness of retries=Retry(5) or retries = Retries(foo=bar)

Cons:

  • No symmetry between retry and timeout util names
  • one param named retries and one named retry_policy

On Friday, October 18, 2013, Andrey Petrov wrote:

So what does our retry object look like so far? I imagine it something
like...

class Retry(object):
DEFAULT_METHOD_WHITELIST = set([
'HEAD', 'GET', 'PUT', 'DELETE', 'OPTIONS', 'TRACE'])

def __init__(self,
    total=None, error_timeout=None, error_connect=None, error_read=None,
    method_whitelist=Retry.DEFAULT_METHOD_WHITELIST,
    codes_whitelist=None,
    backoff_factor=0):
    ...

The Retry constructor params should probably avoid mentioning "retry" and
"retries" as it would be redundant. Also should be consistent whether we
use singular or plural, suffix or prefix. I could be convinced in either
direction. (timeout_errors instead of error_timeout?)

And finally, the main things we want is to allow people to bring in their
own retry logic easily. This means:

  1. Contained logic which can be overridden by extending our Retry
    class.
  2. Reasonable external API such that people can bring in their own
    Retry-like duck-typed classes altogether.


Reply to this email directly or view it on GitHubhttps://github.com//issues/260#issuecomment-26587086
.


Kevin Burke | 415-723-4116 | www.twilio.com

@shazow
Copy link
Member

shazow commented Oct 19, 2013

  • much clearer what the object represents for someone reading the code

What would you assume a Retry object represents, if not some Retry state? By the same logic, Timeout should be TimeoutPolicy. IMO the suffix is superfluous given the context of where it's used.

  • no mussing around with isinstance on the retries kwarg

It's just in one place and purely a convenience shortcut to building a simple Retry object.

Alternatively, what happens if someone provides both a retries int and a retry_policy object? I think that's even more confusing.

  • avoid awkwardness of retries=Retry(5) or retries = Retries(foo=bar)

This looks pretty sensible to me:

http.request('GET', 'google.com', retries=Retry(error_connect=3, backoff_factor=2))

Or better yet...

retry_policy = Retry(error_connect=3, backoff_factor=2, method_whitelist=['GET'])

...
http.request('GET', 'google.com', retries=retry_policy)

Certainly no worse than timeout=Timeout(...).

I'm still leaning towards Retry over RetryPolicy. I could be convinced to go with Retries, but I'm a fan of keeping things singular by default. Also consistent with Timeout which is singular.

@kevinburke
Copy link
Contributor Author

Ok fair enough :)

@kevinburke
Copy link
Contributor Author

A bit worried about the need for separate read and timeout retries, and a total number since different errors imply different things, but maybe this is because I can't foresee myself using them

@shazow
Copy link
Member

shazow commented Oct 20, 2013

I didn't foresee myself using anything other than total, either. If that was the case, there wouldn't be a need for this :)

See linked Github issues for examples of people wanting different things.

Really I think we need a total and a map of {exception -> num_retries}, which could be one of the params we can have people pass. But it would be annoying to have to build a map every time, so some sane shortcuts sounds like a good idea. We can discuss the shortcuts after the fact if you think this is a better starting point.

@kevinburke
Copy link
Contributor Author

So... the semantics of total are a little tricky.

  • If total_errors = 3 and connect_errors = 3, pretty straightforward, no conflicts.
  • If total_errors = 1 and connect_errors = 3, seems like total_errors should win.
  • However if total_errors = 1 and connect_errors = 0 and there is a connect error, which wins? Do we retry one time or raise?

@shazow
Copy link
Member

shazow commented Oct 21, 2013

0 should be raise immediately, no retries.

@pasha-r
Copy link

pasha-r commented Oct 21, 2013

Great design guys! I'm loving it :)

Couple of possible problem places though.

  1. 4XX HTTP error codes - if you got one, usually it means that the same request won't succeed either, so it's not good to retry 403'd request (at least not by default)
  2. when I got 5XX oftentimes I want to do something with the reply - log it, parse it, if there was something specific in reply I may reconsider making retries, e.t.c.

Possible solutions.

  1. Defaults should be optimal for simple use cases. Like no POST in method's whitelist and no 300-499 codes in code's whitelist.
  2. May be there should be some point in Rerty class clearly designed for a code injection?

@shazow
Copy link
Member

shazow commented Oct 21, 2013

@pasha-r

  1. Defaults should be optimal for simple use cases. Like no POST in method's whitelist and no 300-499 codes in code's whitelist.

+1 on the codes_whitelist (maybe it should be status_whitelist). Could allow including (x)range objects, such as status_whitelist=[range(100,400), 408, 504] in Python 3. In fact, maybe this is a good sensible default? (Thoughts?)

  1. May be there should be some point in Rerty class clearly designed for a code injection?

How do you feel about extending the Retry class for your logging/parsing code? We'll make sure to have a nice method you could override. I want to avoid doing hooks registries in urllib3 if we can help it.

@kevinburke,

Also another consideration: Does a redirect count as a retry? Probably should have a redirect value, too. Updated #260 (comment) accordingly. (Should we move this example into the original issue description?)

@shazow
Copy link
Member

shazow commented Oct 21, 2013

Also, what happens when you do a POST while it's not whitelisted, but you get a 302? Does that count as a GET because it converts the request, so it does a redirect and counts a redirect retry? (I think so.)

Either way, this scenario needs to be well-documented. :)

@kevinburke
Copy link
Contributor Author

Trying to summarize this...

  • +1 on having some kind of method to log a response, will add a dummy method that can be overridden.
  • I'd imagine each HTTP request going over the wire should count as a retry. We already decrement the retry counter for redirects, seems logical to continue this behavior. And agree on the need for documentation :)
  • 0 retries can raise immediately, sure, but my question still stands if you pass in total_errors = 2 and connect_errors = 1 and get 1 connection error, does the total or the connect number win?

@shazow
Copy link
Member

shazow commented Oct 21, 2013

  • 0 retries can raise immediately, sure, but my question still stands if you pass in total_errors = 2 and connect_errors = 1 and get 1 connection error, does the total or the connect number win?

error_connect=1 means that we will retry once for a connect-related error. error_total=2 means we retry 2 times for errors. If there there is 1 connection error, everything is fine.

If there are 2 connection errors, then error_connect=1 would prompt a MaxRetryError (maybe we need a MaxConnectRetryError subclass?).

If there is 1 connect error and 1 read error, everything is fine (error_total=2 allows for 2 error retries).

If there is 1 connect error and 2 read errors, then error_total=2 would cause a MaxRetryError.

@kevinburke
Copy link
Contributor Author

So with that then the semantics become:

  • if the user specifies total_errors, ignore or raise ValueError if they also pass in any of connect_errors, read_errors, redirect, etc. Those values become otherwise meaningless, if I did enough thinking about the problem.

@shazow
Copy link
Member

shazow commented Jan 22, 2014

Nar, total_errors is maximum combined errors. If connect_errors or whatever is exceeded first, you get a respective connect error.

Another example:

Retry(total_errors=3, connect_errors=2, read_errors=2)

  • If we get 3 read_errors, you get a ReadError-related exception.
  • If you get 3 connect_errors, you get a ConnectError-related exception.
  • If, say, you got 2 read_errors and 2 connect_errors, then you get a total_errors-related exception.

Specifying total_errors is meaningful along-side other error maximums. No need for ValueError or what have you.

In the scenario where total_errors is larger than the combined other errors, then yes--you will never see a total_errors-related exception. That's okay.

@kevinburke
Copy link
Contributor Author

So... I am becoming less confident in the interface as we've laid it out. I'm just not sure it's as intuitive as it could be, or that it solves the use cases I have in mind. Will try to think some more about this.

@shazow
Copy link
Member

shazow commented Jan 22, 2014

Can you share these use cases? I'm just keeping in mind the use cases of the feature requests urllib3 has had over the years (see mentioned issues in above replies). :)

By default pool.request(..., retries=Retry(n)) works exactly the same way pool.request(..., retries=n) works today. I feel this is important and intuitive. If more granular constraints are required, they can be specified. Combined with other timeouts, total_errors acts as a upper bound which is important when building clients for services or crawlers with requests/time limits (ie. need to make sure I don't make more than N requests per hour).

@kevinburke
Copy link
Contributor Author

I had an idea to do something like https://gist.github.com/kevinburke/8565777, but reading that, it's not really intuitive either.

I'll keep thinking about it. I just want to make sure the interface makes it pretty clear what's going to happen in every case.

@shazow
Copy link
Member

shazow commented Jan 22, 2014

Can you expand on what you feel is unclear in the scenarios I described?

@kevinburke kevinburke mentioned this issue Jan 29, 2014
@shazow
Copy link
Member

shazow commented Jul 3, 2014

\o/

@shazow shazow closed this as completed Jul 3, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants