Improve status reporting in UI #1262

Miles-Garnsey · 2023-01-10T05:41:23Z

The status page doesn't always show repairs which are actually running. We should sort the returned results according to which statuses are most "interesting" and show the most interesting rows first.

Fixes

#1217

…assed.

…rder.

…ws in getRepairRunsForClusterPrioritiseRunning.

…atusPrepStmt`.

…r and state and then query the main `repair_runs` table using the found UUIDs.

…ist.size(), limit.orElse(...))

adejanovski

Here's the Cucumber scenario to verify the feature is correctly implemented:

@sidecar
  Scenario Outline: Verify that ongoing repairs are prioritized over finished ones when listing the runs
    Given that reaper <version> is running
    And reaper has no cluster in storage
    When an add-cluster request is made to reaper with authentication
    Then reaper has the last added cluster in storage
    And a new repair is added for "test" and keyspace "test_keyspace"
    And I add and abort 10 repairs for "test" and keyspace "test_keyspace2"
    Then when I list the last 10 repairs, I can see 1 repairs at "NOT_STARTED" state
    And when I list the last 10 repairs, I can see 9 repairs at "ABORTED" state
    When the last added cluster is deleted
    Then reaper has no longer the last added cluster in storage
  ${cucumber.upgrade-versions}

The 2 new steps to implement are:

And I add and abort ?? repairs for "???" and keyspace "???"
Then when I list the last ?? repairs, I can see ? repairs at "????" state

…iority statuses. Fix issue in MemoryStorage and add test for it.

src/server/src/test/java/io/cassandrareaper/acceptance/BasicSteps.java

adejanovski · 2023-01-16T12:39:14Z

src/server/src/test/java/io/cassandrareaper/acceptance/BasicSteps.java

+  @And("^I add 11 and abort the most recent 10 repairs for cluster \"([^\"]*)\" and keyspace \"([^\"]*)\"$")
+  public void addAndAbortRepairs(String clusterName, String keyspace) throws Throwable {


11 and 10 should be variables, so that we can reuse this step if needed with different numbers.

Suggested change

@And("^I add 11 and abort the most recent 10 repairs for cluster \"([^\"]*)\" and keyspace \"([^\"]*)\"$")

public void addAndAbortRepairs(String clusterName, String keyspace) throws Throwable {

@And("^I add (\\d+) and abort the most recent (\\d+) repairs for cluster \"([^\"]*)\" and keyspace \"([^\"]*)\"$")

public void addAndAbortRepairs(int nbRepairRuns, int abortedRepairRuns, String clusterName, String keyspace) throws Throwable {

I've made the requested changes. Thinking on this, should this not be several steps? So perhaps the spec should be:

When I add 11 repairs

When I set the state on the most recent 10 repairs to ABORTED

That would better promote reusability I think.

adejanovski · 2023-01-16T12:44:31Z

src/server/src/test/java/io/cassandrareaper/acceptance/BasicSteps.java

+
+      RUNNERS.parallelStream().forEach(runner -> {
+        Integer iter = 1;
+        while (iter <= 11) {


You seem to be creating just 10 repairs here, not 11.
Also I don't think you should use a parallel stream across runners, because each runner will create 10 runs here.
Just pick the first runner by using RUNNERS.get(0).callReaper(...) at line 2913, and remove the surrounding stream.

I was wondering about that. The runners all run against the same cluster? Changed as suggested.

RE creating too few repairs, with this logic I create 1,2,3,4,5,6,7,8,9,10,11 - which is 11 I'm pretty sure?

I used my fingers to count, so someone needs to rescind my math postgrad if I'm wrong about this.

adejanovski

Almost there, I still have a few request so that we don't create confusion amongst users.

adejanovski · 2023-02-13T13:24:01Z

src/server/src/main/java/io/cassandrareaper/resources/RepairRunResource.java

-    final Collection<RepairRun> repairRuns = context.storage.getRepairRunsForCluster(clusterName, limit);
+    final Collection<RepairRun> repairRuns = context
+        .storage
+        .getRepairRunsForClusterPrioritiseRunning(clusterName, limit);


You need to do the same on line 665. It's the method that lists repair runs throughput all registered clusters.
Currently it's still using the old getRepairRunsForCluster() method which isn't optimized to prioritize running repairs.

The problem is that on line 665, it is adding the runs from ALL clusters into a list. I'll need to re-sort the runs so that the statuses are in the right order irrespective of cluster. Once you let me know the precise way you want the sorting done, I might just encapsulate it in a function so that we aren't duplicating that logic in multiple places.

Right, you'll need to merge lists from different clusters and re-sort to apply the limits.
FYI, repair ids are timeuuids, which makes it possible to apply the sorting on them directly using their time component.

This appears to be done and working now, although the cucumber test doesn't test against multiple clusters. I'm not sure if I should add a multi-cluster test to confirm that this endpoint does indeed work?

I think that would be fairly hard to achieve with the limited resources we have at our disposal.
A mocked test wouldn't help much I guess, so we're left with manual testing for multi cluster 🤷

src/server/src/main/java/io/cassandrareaper/storage/MemoryStorage.java

src/server/src/main/resources/db/cassandra/032_add_2i_status.cql

src/server/src/test/java/io/cassandrareaper/acceptance/BasicSteps.java

Co-authored-by: Alexander Dejanovski <alex@thelastpickle.com>

…r_runs/` as well as `/repair_runs/cluster`

… and the timeUUID to determine ordering.

Miles-Garnsey · 2023-02-15T07:26:49Z

I've created a new method on RepairRun - SortByRunState which implements the logic you want using the existing function isTerminated(). I've then subbed that method in for MemoryStorage and inside the RepairRunResource calls (CassandraStorage uses different logic which already partitions the terminated/unterminated statuses, so I think that is OK but please let me know if I'm wrong).

My only concern with this is that I don't like having methods on RepairRun since it doesn't have a lot right now and is almost a model object (having few methods beyond a builder); is there a better place for this do you think? Maybe even in RepairRunResource, but then the storage layer would need to call the presentation layer, which seems questionable too?

Maybe we shouldn't be doing the sorting in the storage layer at all since this type of sorting is purely a presentation layer concern (so perhaps implement in 'RepairRunResource' and leave responses from storage layers unsorted)? I'm open to feedback on the design here, as you know the patterns in this codebase better than me.

Miles-Garnsey · 2023-02-16T05:15:04Z

Manual testing suggests that this isn't working again for some reason. I've tried flipping the ordering in these conditions, but neither ordering appears to give us the result we want in the UI:

  if (!o1.getRunState().isTerminated() && o2.getRunState().isTerminated()) {
          return 1; // o2 appears first.
        }  else if (o1.getRunState().isTerminated() && !o2.getRunState().isTerminated()) {
          return -1; // o1 appears first.

More concerning, I haven't seen the cucumber tests failing either way, so they don't appear to be detecting errors.

It may be the case that there is some ordering being applied in the UI, since I note that the repair runs are always ordered by start time (not creation time, which UUID should give us, I think?) I'll investigate further.

src/server/src/main/java/io/cassandrareaper/resources/RepairRunResource.java

src/server/src/main/java/io/cassandrareaper/core/RepairRun.java

Miles-Garnsey · 2023-02-17T04:44:24Z

Manual testing confirms this PR works as intended. The outstanding questions are:

Should we restructure the cucumber test functions so that starting and aborting repairs are separate functions?
Should we make this a multicluster test to better cover the repair_runs/ endpoint (which queries for all clusters, and isn't currently fully covered by the e2e tests).

adejanovski

I have a suggestion to simplify the code a little bit.
No need to try testing multi cluster stuff in cucumber, we won't have enough resources in GHA to do so.
Also, it's fine to keep create/abort in the same cucumber step for now. If we need it later, we can refactor it in two separate steps.

adejanovski · 2023-02-17T09:10:08Z

src/server/src/main/java/io/cassandrareaper/resources/RepairRunResource.java

-    final Collection<RepairRun> repairRuns = context.storage.getRepairRunsForCluster(clusterName, limit);
+    final Collection<RepairRun> repairRuns = context
+        .storage
+        .getRepairRunsForClusterPrioritiseRunning(clusterName, limit);


I think that would be fairly hard to achieve with the limited resources we have at our disposal.
A mocked test wouldn't help much I guess, so we're left with manual testing for multi cluster 🤷

adejanovski · 2023-02-17T16:47:42Z

src/server/src/main/java/io/cassandrareaper/storage/CassandraStorage.java

+    for (RunState state :
+        Arrays
+            .stream(RunState.values())
+            .filter(v ->
+                Arrays.asList("RUNNING", "PAUSED", "NOT_STARTED")
+                    .contains(v.toString()))
+            .collect(Collectors.toList())
+    ) {


suggestion (non-blocking): Do you think this could be simplified to the following?

for (String state:Arrays.asList("RUNNING", "PAUSED", "NOT_STARTED")){

I'm not sure why we need the stream().filter().collect() here, but I could be missing something.

We could, it would be simpler. I'll make this change.

Miles-Garnsey added 3 commits January 10, 2023 16:18

Migration to add secondary index on repair runs table.

7d3abf9

RepairRunResource should prioritise RUNNING repairs when a limit is p…

37d6eaf

…assed.

Stub out the bones of queries required to use new secondary index.

b06d357

Miles-Garnsey force-pushed the feature/better-status-reporting branch from 46e83b7 to b06d357 Compare January 10, 2023 05:44

Miles-Garnsey linked an issue Jan 11, 2023 that may be closed by this pull request

Force running repairs to always show up in Running on Repairs page #1217

Closed

1 task

Miles-Garnsey added 7 commits January 11, 2023 15:52

Shift around order or RepairRun.RunState so it reflects our desired o…

21e3e7a

…rder.

Implement getRepairRunsForClusterPrioritiseRunning for MemoryStorage.

8af069e

Additional import for CassandraStorage.

12046fe

Checkstyle, rename migration.

e31d419

A more elegant way to iterate over the RunStates and query for them.

bced1f1

Checkstyle, make sure MemoryStorage returns only the first <limit> ro…

e0d0bf9

…ws in getRepairRunsForClusterPrioritiseRunning.

Fix use of incorrectly named column in `getRepairRunForClusterWhereSt…

d61b8d6

…atusPrepStmt`.

Miles-Garnsey force-pushed the feature/better-status-reporting branch from 90fd282 to 5eefcfd Compare January 13, 2023 04:53

Use index table repair_run_by_cluster_v2 to obtain UUIDs for cluste…

def2799

…r and state and then query the main `repair_runs` table using the found UUIDs.

Miles-Garnsey force-pushed the feature/better-status-reporting branch from 5eefcfd to def2799 Compare January 13, 2023 05:01

Avoid out of bounds list indices by ensuring that we always use min(l…

b5c0989

…ist.size(), limit.orElse(...))

Miles-Garnsey force-pushed the feature/better-status-reporting branch from 86c1626 to b5c0989 Compare January 13, 2023 06:36

adejanovski reviewed Jan 13, 2023

View reviewed changes

Miles-Garnsey added 3 commits January 13, 2023 20:00

Integration test definition.

2ee5be8

Re-order the RunState enum according to clarified requirements.

094c190

Checkstyle...

1bb531f

Miles-Garnsey force-pushed the feature/better-status-reporting branch from cc5d146 to bf77898 Compare January 16, 2023 08:58

Rework the storage layer queries so that they only use 2i for high pr…

5e126e6

…iority statuses. Fix issue in MemoryStorage and add test for it.

Miles-Garnsey force-pushed the feature/better-status-reporting branch from bf77898 to 5e126e6 Compare January 16, 2023 09:33

Fix acceptance test arity issue.

6c6028f

adejanovski reviewed Jan 16, 2023

View reviewed changes

src/server/src/test/java/io/cassandrareaper/acceptance/BasicSteps.java Outdated Show resolved Hide resolved

adejanovski reviewed Jan 16, 2023

View reviewed changes

Alex's suggestions.

330b0bb

Miles-Garnsey force-pushed the feature/better-status-reporting branch from 3d23dab to 330b0bb Compare January 17, 2023 00:05

Miles-Garnsey force-pushed the feature/better-status-reporting branch from 500662f to e8d061e Compare January 18, 2023 03:39

Miles-Garnsey added 2 commits January 18, 2023 15:28

Need to set repair run to RUNNING before pausing/aborting.

23c06d1

While loop not terminating in tests, let's try a for loop.

4cf7a80

Miles-Garnsey force-pushed the feature/better-status-reporting branch 2 times, most recently from d911355 to b2f6ac3 Compare January 19, 2023 02:14

Make number of repairs added and aborted configurable.

e38cc83

Miles-Garnsey force-pushed the feature/better-status-reporting branch from b2f6ac3 to e38cc83 Compare January 19, 2023 03:05

adejanovski requested changes Feb 13, 2023

View reviewed changes

Miles-Garnsey and others added 3 commits February 14, 2023 14:49

Update src/server/src/main/resources/db/cassandra/032_add_2i_status.cql

86d3a5a

Co-authored-by: Alexander Dejanovski <alex@thelastpickle.com>

Add additional test to check RepairRunStatus ordering against `/repai…

3d9e2f0

…r_runs/` as well as `/repair_runs/cluster`

New ordering function for Lists of RepairRuns, using isTerminated()…

aa8fb9c

… and the timeUUID to determine ordering.

Miles-Garnsey force-pushed the feature/better-status-reporting branch from 96e642f to 0a0faaa Compare February 15, 2023 07:29

Checkstyle...

093633c

Miles-Garnsey force-pushed the feature/better-status-reporting branch 2 times, most recently from 2e09f7a to 093633c Compare February 16, 2023 05:13

adejanovski requested changes Feb 16, 2023

View reviewed changes

src/server/src/main/java/io/cassandrareaper/resources/RepairRunResource.java Outdated Show resolved Hide resolved

src/server/src/main/java/io/cassandrareaper/core/RepairRun.java Outdated Show resolved Hide resolved

Miles-Garnsey added 4 commits February 17, 2023 12:33

Move SortByRunState() into RepairRunService.

e935f54

Use getRepairRunsForCluster for the repair_runs/ endpoint.

106043f

Checkstyle...

ccc1f3e

Flip the ordering in RunState comparator.

b9320a4

adejanovski previously approved these changes Feb 17, 2023

View reviewed changes

Miles-Garnsey dismissed adejanovski’s stale review via 728c08c February 20, 2023 01:23

Simplify filtering logic for runstates in CassandraStorage.

b23b40e

Miles-Garnsey force-pushed the feature/better-status-reporting branch from 728c08c to b23b40e Compare February 20, 2023 03:04

adejanovski approved these changes Feb 20, 2023

View reviewed changes

Miles-Garnsey merged commit ed9c4da into master Feb 20, 2023

Miles-Garnsey mentioned this pull request Jun 22, 2023

Add missing Astra migrations #1310

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve status reporting in UI #1262

Improve status reporting in UI #1262

Miles-Garnsey commented Jan 10, 2023

adejanovski left a comment

adejanovski Jan 16, 2023

Miles-Garnsey Jan 19, 2023

adejanovski Jan 16, 2023

Miles-Garnsey Jan 16, 2023

Miles-Garnsey Jan 17, 2023

adejanovski left a comment

adejanovski Feb 13, 2023

Miles-Garnsey Feb 14, 2023

adejanovski Feb 14, 2023

Miles-Garnsey Feb 15, 2023

adejanovski Feb 17, 2023

Miles-Garnsey commented Feb 15, 2023

Miles-Garnsey commented Feb 16, 2023

Miles-Garnsey commented Feb 17, 2023

adejanovski left a comment

adejanovski Feb 17, 2023

adejanovski Feb 17, 2023

Miles-Garnsey Feb 20, 2023

		@And("^I add 11 and abort the most recent 10 repairs for cluster \"([^\"])\" and keyspace \"([^\"])\"$")
		public void addAndAbortRepairs(String clusterName, String keyspace) throws Throwable {

Improve status reporting in UI #1262

Improve status reporting in UI #1262

Conversation

Miles-Garnsey commented Jan 10, 2023

adejanovski left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adejanovski left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Miles-Garnsey commented Feb 15, 2023

Miles-Garnsey commented Feb 16, 2023

Miles-Garnsey commented Feb 17, 2023

adejanovski left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment