Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better handle runner concurrency #1197

Merged
merged 2 commits into from Jun 9, 2022
Merged

Conversation

adejanovski
Copy link
Contributor

The current way of limiting the number of concurrent runners is done by checking the number of runners in the local Reaper instance. Paused runs still have a runner thread, which gets counted as well, resulting in paused repairs potentially blocking other repairs.
Another problem that can arise is that different reaper instance could start runs in a different order, resulting in different runs being authorized depending on the instance.
The current set of changes orders the runners by repair run creation date (the run id is a timeuuid), to prioritize older runs, and also takes into account the state of the run, so that paused runs let running one be processed.

@adejanovski adejanovski requested a review from adutra May 31, 2022 16:29
@adejanovski adejanovski merged commit dde0bba into master Jun 9, 2022
@max-melentyev
Copy link
Contributor

Is it possible that this prevents reaper from running parallel repairs on different clusters? It looks like old code checked existing runs filtering them by cluster name but now there is no filtering.

@adejanovski
Copy link
Contributor Author

Is it possible that this prevents reaper from running parallel repairs on different clusters? It looks like old code checked existing runs filtering them by cluster name but now there is no filtering.

Yes, it's the case. And we need to address this so that the limit applies per cluster.

@adejanovski
Copy link
Contributor Author

I created an issue to track this.
@max-melentyev, we happily accept PRs if you're willing to contribute the fix ;)

@max-melentyev
Copy link
Contributor

I'll take a look if I can fix it.
Meanwhile, could you please review #1255 and #1267 ?

max-melentyev added a commit to max-melentyev/cassandra-reaper that referenced this pull request Feb 1, 2023
After thelastpickle#1197 active repairs in one cluster prevent repairs in others.
This diff makes repairs in different clusters independent again.
max-melentyev added a commit to max-melentyev/cassandra-reaper that referenced this pull request Feb 1, 2023
After thelastpickle#1197 active repairs in one cluster prevent repairs in others.
This diff makes repairs in different clusters independent again.
adejanovski pushed a commit that referenced this pull request Feb 9, 2023
After #1197 active repairs in one cluster prevent repairs in others.
This diff makes repairs in different clusters independent again.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants