Rework the retrier · talaia-labs/rust-teos@f6a60a9

Commit

Rework the retrier

This is an attempt to rework the retrier logic to simplify how it works
and make it less error prone. This is done by making the retry manager
object responsible for both:
1- adding new retriers and extending current ones
2- removing retriers when they finish their work

This way, we don't need a mutex to gaurd the retriers hashmap & we are
sure there is no adding/extending retriers and removing them happending
at the same time, because only the retry manager does it and not
single retriers (i.e. retriers can't remove themselves from the retriers
hashmap).

The retry manager logic goes as follows:
1- drain the unreachable towers channel till it's empty, and store the pending appointments (locators to be exact) in the pending appointments set for each retrier.
2- remove any finished retrier (ones that succeeded and have no more pending appointments) and failed retriers (ones that failed to send their appointments).
3- start all the non-running retriers left after removing failed and finished retrieres.

Retriers will signal thier status so that the retry manager could
determine which retriers to keep, which to remove, and which to re-start.

We also set tower as unreachable when destroying the tower's retrier
and not after completing backoff. This makes it so that the tower is
unreachable until its retrier is destroyed, thus manual tower retry
by the user will fail with an error till the tower's retrier is destroyed.

If we were to set the unreachable tower status after the backoff, then manual
user retries might get discarded completely without an error because retrier
set the tower state to unreachable too early thus allowing the user to
perform manual retries, but if the user does manual retry, it won't get
carried out, since the retry manager will remove that retrier anyway as
it failed to deliver its pending appointments.

Loading branch information

mariocynicys authored and sr-gi committed Sep 16, 2022

1 parent c5f9c3a commit f6a60a9

watchtower-plugin/src/main.rs

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -615,8 +615,8 @@ async fn main() -> Result<(), Error> {
  
                60

            };

            tokio::spawn(async move {

                RetryManager::new(state_clone)

                    .manage_retry(max_elapsed_time, max_interval_time, rx)

                RetryManager::new(state_clone, rx, max_elapsed_time, max_interval_time)

                    .manage_retry()

                    .await

            });

            plugin.join().await

0 comments on commit `f6a60a9`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `f6a60a9`

Commit

There are no files selected for viewing

0 comments on commit f6a60a9

0 comments on commit `f6a60a9`