Better handle runner concurrency #1197

adejanovski · 2022-05-31T16:29:34Z

The current way of limiting the number of concurrent runners is done by checking the number of runners in the local Reaper instance. Paused runs still have a runner thread, which gets counted as well, resulting in paused repairs potentially blocking other repairs.
Another problem that can arise is that different reaper instance could start runs in a different order, resulting in different runs being authorized depending on the instance.
The current set of changes orders the runners by repair run creation date (the run id is a timeuuid), to prioritize older runs, and also takes into account the state of the run, so that paused runs let running one be processed.

max-melentyev · 2023-02-01T12:23:01Z

Is it possible that this prevents reaper from running parallel repairs on different clusters? It looks like old code checked existing runs filtering them by cluster name but now there is no filtering.

adejanovski · 2023-02-01T12:28:17Z

Is it possible that this prevents reaper from running parallel repairs on different clusters? It looks like old code checked existing runs filtering them by cluster name but now there is no filtering.

Yes, it's the case. And we need to address this so that the limit applies per cluster.

adejanovski · 2023-02-01T12:31:07Z

I created an issue to track this.
@max-melentyev, we happily accept PRs if you're willing to contribute the fix ;)

max-melentyev · 2023-02-01T15:12:22Z

I'll take a look if I can fix it.
Meanwhile, could you please review #1255 and #1267 ?

After thelastpickle#1197 active repairs in one cluster prevent repairs in others. This diff makes repairs in different clusters independent again.

After #1197 active repairs in one cluster prevent repairs in others. This diff makes repairs in different clusters independent again.

adejanovski added 2 commits May 31, 2022 13:37

Improve handling of concurrency limits for repair runners

fd57d21

Include repair runner state in filter to handle concurrency

c23d823

adejanovski requested a review from adutra May 31, 2022 16:29

adutra approved these changes Jun 9, 2022

View reviewed changes

adejanovski merged commit dde0bba into master Jun 9, 2022

adejanovski mentioned this pull request Feb 1, 2023

Apply concurrent repair run limits per cluster #1269

Closed

max-melentyev mentioned this pull request Feb 1, 2023

Fix running concurrent repairs on different clusters #1270

Merged

adejanovski pushed a commit that referenced this pull request Feb 9, 2023

Fix running concurrent repairs on different clusters (#1270)

3bd8393

After #1197 active repairs in one cluster prevent repairs in others. This diff makes repairs in different clusters independent again.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better handle runner concurrency #1197

Better handle runner concurrency #1197

adejanovski commented May 31, 2022

max-melentyev commented Feb 1, 2023

adejanovski commented Feb 1, 2023

adejanovski commented Feb 1, 2023

max-melentyev commented Feb 1, 2023

Better handle runner concurrency #1197

Better handle runner concurrency #1197

Conversation

adejanovski commented May 31, 2022

max-melentyev commented Feb 1, 2023

adejanovski commented Feb 1, 2023

adejanovski commented Feb 1, 2023

max-melentyev commented Feb 1, 2023