Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retry logic to requests #33

Merged
merged 9 commits into from Oct 17, 2017
Merged

Add retry logic to requests #33

merged 9 commits into from Oct 17, 2017

Conversation

Joneser
Copy link
Contributor

@Joneser Joneser commented Oct 4, 2017

This is a first PR to get feedback on my approach

  • Uses the retry library to wrap the promises as retriable
  • Add configurable function in order to perform transient error checks
  • Improve logging/tracking of retried requests
  • Add tests for retry logic

Fixes #18

@shuhei
Copy link
Contributor

shuhei commented Oct 4, 2017

Nice work! I have a couple of questions:

  1. How will this interact with the circuit breaker? I expect one attempt to be counted as one call in CB. For example, a fault tolerant request that failed twice and made it with the third attempt should be counted as 2 failures and 1 success in CB.
  2. Do we need an additional public method for fault tolerant requests? I think the existing request method can handle retries using the retry count.

@shuhei
Copy link
Contributor

shuhei commented Oct 4, 2017

Also:

  1. Shouldn't this provide a way for users to specify which errors are transient?
  2. Should exiting options like dropRequestAfter be affected by retry?

@Joneser
Copy link
Contributor Author

Joneser commented Oct 9, 2017

@shuhei

  1. With this implementation it will behave as you expect, if it fails twice and the third attempt succeeds, this will count as 2 failures and 1 success.
  2. Due to the use of the retry logic, I thought it would be easiest to just use a separate method instead of adding additional logic into the existing function. This also has the benefit of not affecting the existing users (not that it should, but just to be safe)
  3. I will add configuration options for the transient error configuration today.
  4. To be honest I'm not sure, I'll also investigate this today.

@grassator
Copy link
Contributor

@Joneser regarding 2. I'm also with @shuhei — you can split the logic internally, but strive to keep a simple external API. Having a default retry count of 1 should result in the same behavior as we currently have.

@Joneser
Copy link
Contributor Author

Joneser commented Oct 9, 2017

@grassator @shuhei No problem, I'll move it into the same function while I am adding the transient error configuration

@Joneser
Copy link
Contributor Author

Joneser commented Oct 11, 2017

@shuhei @grassator I have now moved the retry logic into the same function and have defaulted the config to 0 retries, so that it behaves exactly as it did before.

For the transient error configuration, I decided on a function that can be passed in by the consumer to allow them to customise the configuration any way they want. I thought that would be easier than expecting lists of status codes or error strings. Perhaps this should just be a noop by default instead of what I have there?

Does anyone have suggestions on how to approach the logging of the retries? Should we allow the user to pass in their own logger which will get called or some kind of aggregator? Perhaps we should pass the retry count for a request in the response/error object the same way as the timings?

@Joneser Joneser changed the title Add fault tolerant (retriable) requests Add retry logic to requests Oct 11, 2017
@@ -1,5 +1,7 @@
'use strict';

/* eslint-disable */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to disable only a certain check and not whole eslint?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woops my bad, missed this!

README.md Outdated
Occasionaly target server can have high latency for a short period of time, or in the case of a stack of servers, one server can be having issues
and retrying the request will allow perron to attempt to access one of the other servers that currently aren't facing issues.

By default `perron` has retry logic implemented, but configured to perform 0 retries. Internally `perron` [node-retry](https://github.com/tim-kos/node-retry) to handle the retry logic
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably missing "uses" after "Internally perron"

README.md Outdated
const catWatch = new ServiceClient({
hostname: 'catwatch.opensource.zalan.do',
// These are the default settings
retryOptions: {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that README should provide useful example options, since that is what users will copy-paste by default. If I read node-retry documentation formula right, current config will result in a not linear distribution between 200 and 300ms with more than half delays being 300ms.

Copy link
Contributor Author

@Joneser Joneser Oct 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any recommendations as to what you would prefer here? Currently this example is the same as what the default retryOptions are configured in perron (but with retries set to 1), so I should also update there if you believe that the current config is not appropriate

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just change maxTimeout to 400 probably

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok no worries, I'll update :)

lib/client.js Outdated
randomize: true,
// this is the default function to check transient errors
// users should define their own check in order to handle errors as they require
transientErrorCheck: (err) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the default function shouldn't be just () => true ?

README.md Outdated
minTimeout: 200,
maxTimeout: 300,
randomize: true,
transientErrorCheck: (err) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need an arrow function here:

transientErrorCheck(err) {

lib/client.js Outdated
@@ -248,13 +273,27 @@ class ServiceClient {

return new Promise((resolve, reject) => this.breaker.run(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like my comment got lost somehow, but here I'm a little worried that you will call failure callback multiple times inside a single breaker.run call, which doesn't seem logically right and also might cause issues. Can we change that so each retry is also a breaker.run?

Copy link
Contributor Author

@Joneser Joneser Oct 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll look into restructuring it this afternoon.
Any thoughts on how we can provide information on the number of retries performed to the user of perron? I was thinking of possibly adding a value to the response/error object

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be OK, but I'm not sure so far what is the best way to provide information about each stage of the request, would be open to any ideas, as now error / info reporting is not really great in perron at the moment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing with @shuhei I will add a callback function like onRetry that the consumer of perron can define in order to track the retry action how they like

lib/client.js Outdated
// it takes the error as an argument
transientErrorCheck(err) {
// by default don't retry if the circuit breaker is open
if (err.type === ServiceClient.CIRCUIT_OPEN) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this would be a good idea to have by default, but again if you just want it to return true by default I can remove this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about renaming this as shouldRetry? Open circuit breaker is also literally transient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In relation to this naming, it would conflict with here: https://github.com/zalando-incubator/perron/pull/33/files#diff-50cfa59973c04321b5da0c6da0fdf4feR314

What would your naming preference be for the already defined shouldRetry

Copy link
Contributor

@shuhei shuhei Oct 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing shouldRetry is just an internal variable, which matters less than public API. Now I think that the existing one is a result of retry module's decision, which may have already triggered a retry. So how about retrying, retryingNext or willRetry?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, will update

lib/client.js Outdated
* minTimeout?: number,
* maxTimeout?: number,
* randomize?: boolean
* },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have transientErrorCheck and onRetry here.

lib/client.js Outdated
}, this.options.retryOptions);

if (this.options.retryOptions.minTimeout > this.options.retryOptions.maxTimeout) {
throw new Error('The `retryInterval` must be equal to or greater than the `minTimeout`');
Copy link
Contributor

@shuhei shuhei Oct 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maxTimeout intead of retryInterval?

lib/client.js Outdated
reject(error);
return;
}
onRetry(currentAttempt, error);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be currentAttempt + 1? The retry will be the next attempt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first time onRetry is called, currentAttempt is 1 which I think is correct

lib/client.js Outdated
// it receives the current retry attempt and error as args
// eslint-disable-next-line
onRetry(currentAttempt, err) {
return currentAttempt;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you return this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is just a default placeholder, do you think an empty function would be better?

Copy link
Contributor

@shuhei shuhei Oct 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if we return something here, people will think that they should also return something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

README.md Outdated
attempt the retries or not depending on the type of error. If the function returns true and the number of retries hasn't been exceeded, the request can be retried.

There is also an onRetry function which can be defined by the user of `perron`. This function is called every time a retry request will be triggered.
It is provided the currentAttempt, as well as the error that is causing the retry.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to refer to what currentAttempt means. Is it 0 or 1 for the first retry?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first time onRetry is triggered it is 1, I'll update the readme with more info.

it('should perform the desired number of retries based on the configuration', (done) => {
let numberOfRetries = 0;
clientOptions.retryOptions = {
retries: 1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd put 2 or more just to make sure that it retries multiple times.

assert.equal(err.type, ServiceClient.CIRCUIT_OPEN);
done();
});
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding a test for the transient error function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done :)

lib/client.js Outdated
})
.catch(error => {
failure(); reject(error);
failure();
const shouldRetry = operation.retry(error);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh, by the way, this is already triggering retry! This should come after the transient error check.

Copy link
Contributor Author

@Joneser Joneser Oct 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like this?

const canRetry = currentAttempt > retries;
if (!shouldRetry(error) || !canRetry) {
    reject(error);
    return;
}
if(operation.retry(error)) {
    onRetry(currentAttempt, error);
}  

Copy link
Contributor Author

@Joneser Joneser Oct 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or this probably makes more sense:

if (!shouldRetry(error)) {
    reject(error);
    return;
}
if(!operation.retry(error)) {
    reject(error);
    return;
}
onRetry(currentAttempt, error);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the second looks good to me.

README.md Outdated
minTimeout: 200,
maxTimeout: 400,
randomize: true,
shouldRetry(err) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it, just the error might not be enough. You might need info from original request as well to make a judgement (e.g. only retry GET requests)

Copy link
Contributor Author

@Joneser Joneser Oct 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'll add the original request info as well

@Joneser
Copy link
Contributor Author

Joneser commented Oct 17, 2017

@grassator @shuhei Ready for the next review :)

@grassator
Copy link
Contributor

👍

1 similar comment
@shuhei
Copy link
Contributor

shuhei commented Oct 17, 2017

👍

@grassator grassator merged commit fd9a203 into zalando-incubator:master Oct 17, 2017
@grassator
Copy link
Contributor

@Joneser Thank you for the contribution! I released 0.4.0-beta1 to NPM

@Joneser
Copy link
Contributor Author

Joneser commented Oct 17, 2017

Thanks a million @grassator and @shuhei for the reviews!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants