Fix issues when listing repair runs using state priorities #1307

adejanovski · 2023-06-08T12:35:13Z

Several changes have been made to the RepairRunResource.java file, including updating the
getRepairRunsForCluster method to use clusterName instead of cluster_name as a path
parameter, updating the listRepairRuns method to use clusterName instead of cluster_name
as a query parameter. These changes fix bugs that have been existing for a while it seems, but were hidden by how repairs were listed by pure chronological order and client side filters applied in the UI.

In the RepairSegmentDao.java file, the getSegmentAmountForRepairRun method has
been updated to use getRepairSegmentCountByRunIdPrepStmt instead of
getRepairSegmentCountByRunIdAndStatePrepStmt, which was a bug.

A bug was fixed in the RepairRunDao.getRepairRunsForClusterPrioritiseRunning() method, which was not really applying a limit on the flattenedUuids list as subList() returns a new list, it doesn't modify in place the list it's applied to.

Additionally, this pull request includes updates to the acceptance tests to add fake
clusters and verify that ongoing repairs are prioritized over finished ones when listing them.
This checks that repair runs across multiple clusters are prioritized based on their state and that depending on the filters applied, we get the expected runs displayed.

Miles-Garnsey

I have a few questions, issues/suggestions here. I haven't manually tested yet pending what you'd like to do about those.

I'm also confused how the manual testing we did on the last PR failed to catch this issue.

The manual testing I did basically mirrored what was in the previous acceptance test - do you know why that failed to catch the problems? I don't want to miss another issue this time around...

Miles-Garnsey · 2023-06-09T05:14:26Z

src/server/src/main/java/io/cassandrareaper/resources/RepairRunResource.java

@@ -604,9 +604,9 @@ public Response abortRepairRunSegment(@PathParam("id") UUID repairRunId, @PathPa
   * @return all know repair runs for a cluster.
   */
  @GET
-  @Path("/cluster/{cluster_name}")
+  @Path("/cluster/{clusterName}")


Suggestion: I think this might be the wrong way around. Generally shouldn't URI query strings use snake case? Would it be better to change this in the UI?

Issue: Also, as this is a breaking change to the API I'm not sure what our approach should be here. Does the API have a versioning policy?

This one is a pathParam, so it doesn't matter that much. The query param from listRepairRuns on the other hand could indeed be a breaking change.
Let me see if I can turn this around.

Miles-Garnsey · 2023-06-09T05:34:51Z

src/server/src/main/java/io/cassandrareaper/storage/cassandra/RepairSegmentDao.java

@@ -175,7 +175,7 @@ private void prepareStatements() {

  public int getSegmentAmountForRepairRun(UUID runId) {
    return (int) session
-        .execute(getRepairSegmentCountByRunIdAndStatePrepStmt.bind(runId))
+        .execute(getRepairSegmentCountByRunIdPrepStmt.bind(runId))


Question: It looks like this didn't actually have enough parameters bound, how was it ever running?

it was failing :)

I don't recall ever seeing it in the logs, and I know manual testing worked. Maybe that was on the other endpoint - weird.

… the API

Miles-Garnsey · 2023-06-13T04:59:34Z

...er/src/test/resources/io.cassandrareaper.acceptance/integration_reaper_functionality.feature

@@ -415,16 +415,26 @@ Feature: Using Reaper
    Then reaper has no longer the last added cluster in storage
  ${cucumber.upgrade-versions}

-@sidecar
+  @sidecar
+  @current_test


I think you probably want to remove this before merging.

I totally do 👍

Miles-Garnsey

I've had a tough time testing this, partially because the current behaviour is so dependant on what order particular repair runs were added in, as well as whether all clusters vs a specific cluster are filtered on.

When testing I:

Built reaper from a clean state on this branch.
Ran the new test.
Switched to main, rebuilt and ran Reaper.

Looking at the assertions here to compare my results I found that;

Without any filtering on cluster, I got the expected number of paused repairs showing up (I didn't check that the number of ABORTED repairs was corrrect).
When filtering on fake1 cluster, I got the expected display of repairs, both paused and aborted.
When filtering on cluster test, I got the expected paused repairs, but no aborted repairs, which is indeed wrong.
When filtering on fake2 I get one paused and only 3 aborted repairs, where it should have been 1 and 9 respectively.

Given this behaviour on master, I am not convinced that we are necessarily replicating the issues described in the ticket with the new acceptance test, since the issue described no filtering on the clusters.

However, I am satisfied that we are diagnosing a bug, since we are missing aborted repairs which could be shown under the given limit settings. I have confirmed that this behaviour is resolved by the new branch, which (on my manual testing) leads to the expected results being displayed for all three clusters.

Once the @current_test tag is removed from the acceptance tests, this is ready to merge.

adejanovski added 3 commits June 8, 2023 13:45

Fix issues when listing repair runs using state priorities

df4e4d0

Fix long time bugs on listing running repairs from the UI

62cef41

Remove addition of test resources

6e4b0d0

adejanovski marked this pull request as ready for review June 8, 2023 17:08

adejanovski requested a review from Miles-Garnsey June 8, 2023 17:45

Miles-Garnsey reviewed Jun 9, 2023

View reviewed changes

adejanovski added 2 commits June 9, 2023 09:43

Revert the cluster name query param to snake case to prevent breaking…

d1e97c8

… the API

Fix autobuild conflicts with some IDEs

c33a71c

adejanovski requested a review from Miles-Garnsey June 9, 2023 11:24

Miles-Garnsey mentioned this pull request Jun 13, 2023

Fix missing running repairs. #1304

Closed

Miles-Garnsey reviewed Jun 13, 2023

View reviewed changes

Miles-Garnsey previously approved these changes Jun 13, 2023

View reviewed changes

Remove test cucumber tag

f45cd64

adejanovski dismissed Miles-Garnsey’s stale review via f45cd64 June 13, 2023 09:30

adejanovski merged commit 2bfef00 into master Jun 13, 2023
22 checks passed

adejanovski deleted the fix-repairs-display-multicluster branch June 13, 2023 17:06

adejanovski added a commit that referenced this pull request Jun 22, 2023

Fix issues when listing repair runs using state priorities (#1307)

b7101d4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix issues when listing repair runs using state priorities #1307

Fix issues when listing repair runs using state priorities #1307

adejanovski commented Jun 8, 2023 •

edited

Miles-Garnsey left a comment

Miles-Garnsey Jun 9, 2023

adejanovski Jun 9, 2023

Miles-Garnsey Jun 9, 2023

adejanovski Jun 9, 2023

Miles-Garnsey Jun 12, 2023

Miles-Garnsey Jun 13, 2023

adejanovski Jun 13, 2023

Miles-Garnsey left a comment

Fix issues when listing repair runs using state priorities #1307

Fix issues when listing repair runs using state priorities #1307

Conversation

adejanovski commented Jun 8, 2023 • edited

Miles-Garnsey left a comment

Choose a reason for hiding this comment

Miles-Garnsey Jun 9, 2023

Choose a reason for hiding this comment

adejanovski Jun 9, 2023

Choose a reason for hiding this comment

Miles-Garnsey Jun 9, 2023

Choose a reason for hiding this comment

adejanovski Jun 9, 2023

Choose a reason for hiding this comment

Miles-Garnsey Jun 12, 2023

Choose a reason for hiding this comment

Miles-Garnsey Jun 13, 2023

Choose a reason for hiding this comment

adejanovski Jun 13, 2023

Choose a reason for hiding this comment

Miles-Garnsey left a comment

Choose a reason for hiding this comment

adejanovski commented Jun 8, 2023 •

edited