Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancement(networking)!: Fibonacci instead of fixed backoff #1006

Merged
merged 11 commits into from Jan 13, 2020

Conversation

lukesteensen
Copy link
Member

This is just a quick spike to start the discussion, not quite ready to be merged.

We currently use a fixed delay between retries, which is not ideal. Fibonacci backoff is a simple strategy that does pretty well in evaluations and is easy to implement with no added dependencies.

If we want to switch to this strategy, there are a few things we'll need to decide:

  1. Do we change the retry_backoff_secs config item to specify that it's a base unit of a more complex strategy?
  2. How should we cap the delay? The spike just hardcoded 10 seconds as the max, but this is something that could vary greatly between different users and different downstream systems. Would we add a max_backoff_secs config item to every relevant sink?

Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
@binarylogic
Copy link
Contributor

I spoke to @lukesteensen about this offline and mentioned this AWS post on exponential backoffs + jitter.

@LucioFranco @a-rodin do either of you have inisghts or experience with this?

@binarylogic
Copy link
Contributor

Do we change the retry_backoff_secs config item to specify that it's a base unit of a more complex strategy?

Also, this might be a good time to introduce #1022. I'd vote for a retry_policy.strategy attribute that defaults to fibonacci. I think the value is probably pretty low for configurable strategies, but I assume there are scenarios where another strategy would be preferrable. Ex: the AWS post I previously linked to.

@binarylogic binarylogic changed the title enhancement: fibonacci instead of fixed backoff enhancement(networking): fibonacci instead of fixed backoff Oct 14, 2019
@ghost
Copy link

ghost commented Oct 15, 2019

I don't have experience with Fibonacci backoff, but I'm thinking could it make sense to provide ability to set not only initial backoff, but ability to skip steps?

My reasoning is that for exponential backoff we have two parameter to control, initial value and the exponent base. For Fibonacci numbers we can set only the initial value. But as an analog for the base it is possible to take not every number from the Fibonacci sequence, but instead only each nth number where n is a user-controlled parameter

@lukesteensen
Copy link
Member Author

I think the value is probably pretty low for configurable strategies, but I assume there are scenarios where another strategy would be preferable. Ex: the AWS post I previously linked to.

Yeah, the value of supporting a bunch of different strategies is likely very low. The post you linked is specifically about the case of optimistic concurrency control and conflicting updates to a database. I don't really expect that requests from Vector to an external system will ever be considered conflicting and cause correlated failures in the same way. We could do a little simulation of our own, but our model (service being up/down/overloaded) isn't as interesting or precise.

I don't have experience with Fibonacci backoff, but I'm thinking could it make sense to provide ability to set not only initial backoff, but ability to skip steps?

I see the simpler configuration as a benefit of fibonacci over exponential. We could definitely explore an option like this if we see some demand from users, but I don't expect many people to need that level of control.

@LucioFranco
Copy link
Contributor

But as an analog for the base it is possible to take not every number from the Fibonacci sequence, but instead only each nth number where n is a user-controlled parameter

Good idea 👍

Do we change the retry_backoff_secs config item to specify that it's a base unit of a more complex strategy?

what about initial_backoff_delay_secs?

How should we cap the delay? The spike just hardcoded 10 seconds as the max, but this is something that could vary greatly between different users and different downstream systems. Would we add a max_backoff_secs config item to every relevant sink?

I think it makes sense to add that config option. Would we want to default like we do now to max u64?

This overall looks great!

@binarylogic binarylogic changed the title enhancement(networking): fibonacci instead of fixed backoff enhancement(networking): Fibonacci instead of fixed backoff Nov 1, 2019
@binarylogic
Copy link
Contributor

Noting, we are waiting to merge this since it is a breaking change. We'll be bundling it with other configuration changes.

@binarylogic binarylogic added the meta: breaking change Anything that breaks backward compatibility. label Nov 6, 2019
@LucioFranco LucioFranco marked this pull request as ready for review January 13, 2020 20:41
@LucioFranco LucioFranco removed their request for review January 13, 2020 20:41
Signed-off-by: Lucio Franco <luciofranco14@gmail.com>
Signed-off-by: Lucio Franco <luciofranco14@gmail.com>
@LucioFranco
Copy link
Contributor

Ok @lukesteensen this has the rest of the changes if you can take a quick look.

@binarylogic I was having trouble adding a new field to batching_sink.rb, I am going to push now an update to the descriptions but will need some help changing the field names and adding a new one.

As a summary, the changes are as follows:

  • Instead of fixed delays between retries, we use the Fibonacci sequence.
  • retry_backoff_secs has been changed to retry_initial_backoff_secs since it only describes the first delay from first failed request and first retry. This is also the seed of the fib sequence.
  • A new retry_max_duration_secs option which specifies the largest delay between retries, current default is set to 10 seconds.

Signed-off-by: Lucio Franco <luciofranco14@gmail.com>
@LucioFranco LucioFranco added the needs: docs Needs documentation updates label Jan 13, 2020
Signed-off-by: binarylogic <bjohnson@binarylogic.com>
Signed-off-by: binarylogic <bjohnson@binarylogic.com>
Signed-off-by: binarylogic <bjohnson@binarylogic.com>
@binarylogic binarylogic removed the needs: docs Needs documentation updates label Jan 13, 2020
@binarylogic binarylogic changed the title enhancement(networking): Fibonacci instead of fixed backoff enhancement(networking)!: Fibonacci instead of fixed backoff Jan 13, 2020
Signed-off-by: Lucio Franco <luciofranco14@gmail.com>
Copy link
Contributor

@binarylogic binarylogic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@binarylogic binarylogic merged commit efec952 into master Jan 13, 2020
@binarylogic binarylogic deleted the fib-backoff branch January 13, 2020 23:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta: breaking change Anything that breaks backward compatibility.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants