Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Topology change resilience for incremental repair #1235

Merged
merged 22 commits into from Nov 16, 2022

Conversation

Miles-Garnsey
Copy link
Contributor

@Miles-Garnsey Miles-Garnsey commented Oct 27, 2022

Ensure that, in an incremental repair, the replica list is updated on every repair run. This addresses cases where the node IPs change.

Fixes #1213

@Miles-Garnsey Miles-Garnsey marked this pull request as draft October 27, 2022 01:32
@Miles-Garnsey
Copy link
Contributor Author

This is still a work in progress. I have written a test (which is passing), but it isn't clear to me that this should actually be working.

  1. On line 470 I make a call to clusterFacade.getRangeToEndpointMap without actually having called .connect.
  2. When attempting to extract the token range from segment, I only extract the first and last tokens. I need to confirm that this will work even when vnodes are present (although given my limited understanding of what a segment is, I think it may be OK?)

@Miles-Garnsey
Copy link
Contributor Author

Miles-Garnsey commented Oct 28, 2022

Manual testing of this change suggests that there are possible improvements still available, but that it does address the specific requirements in the ticket.

First, I tried starting the repair, pausing it, then doing a rolling restart on the cluster. I've included a log below showing the results. The problem there was that no coordinators could be found. test-dc1-reaper-94769775c-npd4n.log

But that problem is subtly different to the one we are trying to address here. The matter at hand is not dealing with a full rolling restart (where every IP changes) but just the case where one node's IP changes. I recommend creating another ticket to handle this more extreme case by redoing the DNS lookup so that the list of potential coordinators is refreshed for cases where an FQDN is used as the contact point.

So to make this test more specific I created a new repair, paused it, took note of the host which the current segment was running against, and restarted just that node (leaving the other two in place).

I do still get the below error, but the repair appears to reschedule correctly:

java.lang.IndexOutOfBoundsException: Index 0 out of bounds for length 0
	at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64)
	at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70)
	at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:248)
	at java.base/java.util.Objects.checkIndex(Objects.java:372)
	at java.base/java.util.ArrayList.get(ArrayList.java:459)
	at io.cassandrareaper.service.RepairRunner.isAllowedToRun(RepairRunner.java:267)
	at io.cassandrareaper.service.RepairRunner.run(RepairRunner.java:234)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
	at com.codahale.metrics.InstrumentedScheduledExecutorService$InstrumentedRunnable.run(InstrumentedScheduledExecutorService.java:241)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at com.codahale.metrics.InstrumentedThreadFactory$InstrumentedRunnable.run(InstrumentedThreadFactory.java:66)
	at java.base/java.lang.Thread.run(Thread.java:829)

You'll note from the below screenshot that it does switch to a different coordinator for the second two segments, which I think is the behaviour we want.

Screen Shot 2022-10-28 at 2 59 15 pm

@Miles-Garnsey Miles-Garnsey marked this pull request as ready for review October 28, 2022 04:11
@Miles-Garnsey
Copy link
Contributor Author

Miles-Garnsey commented Nov 8, 2022

We took this back to the drawing board because our JMX calls were not returning any way to identify the primary replica when doing range -> endpoint queries.

This new approach instead tracks the hostID associated with a given segment and looks up the current IP associated with it when starting each run. It then uses that IP as the coordinator to ensure we are hitting the same endpoint even if the IP has changed.

…e correctly propagated into the potentialCoordinators so that repairs correctly pick them up.
Copy link
Contributor

@adejanovski adejanovski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome sauce @Miles-Garnsey! Approved ✅

@adejanovski adejanovski merged commit 7cc571b into master Nov 16, 2022
adejanovski pushed a commit that referenced this pull request Nov 16, 2022
Allows incremental repair to survive nodes changing IP address during the repair.
The hostID is now stored in each segment and the ip address is recomputed from it when the segment runs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incremental Repair Stalls on Cassandra 4
2 participants