New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issues when listing repair runs using state priorities #1307
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few questions, issues/suggestions here. I haven't manually tested yet pending what you'd like to do about those.
I'm also confused how the manual testing we did on the last PR failed to catch this issue.
The manual testing I did basically mirrored what was in the previous acceptance test - do you know why that failed to catch the problems? I don't want to miss another issue this time around...
@@ -604,9 +604,9 @@ public Response abortRepairRunSegment(@PathParam("id") UUID repairRunId, @PathPa | |||
* @return all know repair runs for a cluster. | |||
*/ | |||
@GET | |||
@Path("/cluster/{cluster_name}") | |||
@Path("/cluster/{clusterName}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: I think this might be the wrong way around. Generally shouldn't URI query strings use snake case? Would it be better to change this in the UI?
Issue: Also, as this is a breaking change to the API I'm not sure what our approach should be here. Does the API have a versioning policy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is a pathParam, so it doesn't matter that much. The query param from listRepairRuns on the other hand could indeed be a breaking change.
Let me see if I can turn this around.
@@ -175,7 +175,7 @@ private void prepareStatements() { | |||
|
|||
public int getSegmentAmountForRepairRun(UUID runId) { | |||
return (int) session | |||
.execute(getRepairSegmentCountByRunIdAndStatePrepStmt.bind(runId)) | |||
.execute(getRepairSegmentCountByRunIdPrepStmt.bind(runId)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: It looks like this didn't actually have enough parameters bound, how was it ever running?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was failing :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't recall ever seeing it in the logs, and I know manual testing worked. Maybe that was on the other endpoint - weird.
@@ -415,16 +415,26 @@ Feature: Using Reaper | |||
Then reaper has no longer the last added cluster in storage | |||
${cucumber.upgrade-versions} | |||
|
|||
@sidecar | |||
@sidecar | |||
@current_test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you probably want to remove this before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally do 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've had a tough time testing this, partially because the current behaviour is so dependant on what order particular repair runs were added in, as well as whether all clusters vs a specific cluster are filtered on.
When testing I:
- Built reaper from a clean state on this branch.
- Ran the new test.
- Switched to main, rebuilt and ran Reaper.
Looking at the assertions here to compare my results I found that;
- Without any filtering on cluster, I got the expected number of paused repairs showing up (I didn't check that the number of ABORTED repairs was corrrect).
- When filtering on
fake1
cluster, I got the expected display of repairs, both paused and aborted. - When filtering on cluster
test
, I got the expected paused repairs, but no aborted repairs, which is indeed wrong. - When filtering on
fake2
I get one paused and only 3 aborted repairs, where it should have been 1 and 9 respectively.
Given this behaviour on master
, I am not convinced that we are necessarily replicating the issues described in the ticket with the new acceptance test, since the issue described no filtering on the clusters.
However, I am satisfied that we are diagnosing a bug, since we are missing aborted repairs which could be shown under the given limit settings. I have confirmed that this behaviour is resolved by the new branch, which (on my manual testing) leads to the expected results being displayed for all three clusters.
Once the @current_test
tag is removed from the acceptance tests, this is ready to merge.
Fixes #1272
Several changes have been made to the
RepairRunResource.java
file, including updating thegetRepairRunsForCluster
method to useclusterName
instead ofcluster_name
as a pathparameter, updating the
listRepairRuns
method to useclusterName
instead ofcluster_name
as a query parameter. These changes fix bugs that have been existing for a while it seems, but were hidden by how repairs were listed by pure chronological order and client side filters applied in the UI.
In the RepairSegmentDao.java file, the
getSegmentAmountForRepairRun
method hasbeen updated to use
getRepairSegmentCountByRunIdPrepStmt
instead ofgetRepairSegmentCountByRunIdAndStatePrepStmt
, which was a bug.A bug was fixed in the
RepairRunDao.getRepairRunsForClusterPrioritiseRunning()
method, which was not really applying a limit on theflattenedUuids
list assubList()
returns a new list, it doesn't modify in place the list it's applied to.Additionally, this pull request includes updates to the acceptance tests to add fake
clusters and verify that ongoing repairs are prioritized over finished ones when listing them.
This checks that repair runs across multiple clusters are prioritized based on their state and that depending on the filters applied, we get the expected runs displayed.