Skip to content

Commit

Permalink
Another fix for `IllegalStateException: startTime must be set if segm…
Browse files Browse the repository at this point in the history
…ent is RUNNING or DONE`

 this time due to context.storage.updateRepairSegment() not properly setting startTime after coordinator.triggerRepair() returns 0 and segment is marked as DONE.

The segment's startTime is not set until in `processTriggeredSegment(..)`
  But in `runRepair()` there's a number of exit (return) points possible before this method is called (and without startTime being set).

ref: #291 (comment)
  • Loading branch information
michaelsembwever committed Jan 24, 2018
1 parent ff0214f commit 67d6721
Showing 1 changed file with 31 additions and 23 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -245,38 +245,50 @@ private boolean runRepair() {
.endTime(DateTime.now())
.build(segment.getRunId()));
repairRunner.killAndCleanupRunner();

context.storage.updateRepairSegment(
segment
.with()
.state(RepairSegment.State.DONE)
.startTime(DateTime.now())
.endTime(DateTime.now())
.build(segmentId));
return false;

return false;
}

LOG.debug("Enter synchronized section with segment ID {}", segmentId);
synchronized (condition) {
commandId = coordinator.triggerRepair(
segment.getStartToken(),
segment.getEndToken(),
keyspace,
validationParallelism,
tablesToRepair,
fullRepair,
repairUnit.getDatacenters());
try {
LOG.debug("Enter synchronized section with segment ID {}", segmentId);
synchronized (condition) {

if (commandId == 0) {
LOG.info("Nothing to repair for keyspace {}", keyspace);
context.storage.updateRepairSegment(
segment.with().coordinatorHost(coordinator.getHost()).state(RepairSegment.State.DONE).build(segmentId));
SEGMENT_RUNNERS.remove(segment.getId());
closeJmxConnection(Optional.fromNullable(coordinator));
return true;
segment.with().coordinatorHost(coordinator.getHost()).startTime(DateTime.now()).build(segmentId));

commandId = coordinator.triggerRepair(
segment.getStartToken(),
segment.getEndToken(),
keyspace,
validationParallelism,
tablesToRepair,
fullRepair,
repairUnit.getDatacenters());

if (0 != commandId) {
processTriggeredSegment(segment, coordinator);
} else {
LOG.info("Nothing to repair for segment {} in keyspace {}", segmentId, keyspace);

context.storage.updateRepairSegment(
segment.with().state(RepairSegment.State.DONE).endTime(DateTime.now()).build(segmentId));

SEGMENT_RUNNERS.remove(segment.getId());
closeJmxConnection(Optional.fromNullable(coordinator));
}
}
processTriggeredSegment(segment, coordinator);
closeJmxConnection(Optional.fromNullable(coordinator));
} finally {
LOG.debug("Exiting synchronized section with segment ID {}", segmentId);
}
closeJmxConnection(Optional.fromNullable(coordinator));
}
} catch (ReaperException e) {
LOG.warn("Failed to connect to a coordinator node for segment {}", segmentId, e);
Expand All @@ -290,15 +302,11 @@ private boolean runRepair() {
.histogram(MetricRegistry.name(SegmentRunner.class, "open-files"))
.update(getOpenFilesAmount());
}
LOG.debug("Exiting synchronized section with segment ID {}", segmentId);
return true;
}

private void processTriggeredSegment(final RepairSegment segment, final JmxProxy coordinator) {

context.storage.updateRepairSegment(
segment.with().coordinatorHost(coordinator.getHost()).startTime(DateTime.now()).build(segmentId));

repairRunner.updateLastEvent(
String.format("Triggered repair of segment %s via host %s", segment.getId(), coordinator.getHost()));

Expand Down

3 comments on commit 67d6721

@sergeyyegournov
Copy link

@sergeyyegournov sergeyyegournov commented on 67d6721 Jan 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michaelsembwever, I think this commit does not fix the #291

I still see the same behavior in case of "Nothing to repair for segment" gets logged - and that is probably because startTime is still not set.

LOG.info("Nothing to repair for segment {} in keyspace {}", segmentId, keyspace); context.storage.updateRepairSegment(segment.with().state(RepairSegment.State.DONE).endTime(DateTime.now()).build(segmentId)
So this results in

INFO   [2018-01-28 02:15:08,613] [cassteamcitychi:f6737830-03fa-11e8-bc86-f717f3d3716d:f6741473-03fa-11e8-bc86-f717f3d3716d] i.c.s.SegmentRunner - Nothing to repair for segment f6741473-03fa-11e8-bc86-f717f3d3716d in keyspace tvr
INFO   [2018-01-28 02:15:08,618] [cassteamcitychi:f6737830-03fa-11e8-bc86-f717f3d3716d] i.c.s.RepairRunner - Running segment for range (-8771981655974388929,-1468077576244025793]
ERROR  [2018-01-28 02:15:08,619] [cassteamcitychi:f6737830-03fa-11e8-bc86-f717f3d3716d:f6741473-03fa-11e8-bc86-f717f3d3716d] i.c.s.RepairRunner - Executing SegmentRunner failed
java.lang.IllegalStateException: if endTime is set, so must startTime be set
        at com.google.common.base.Preconditions.checkState(Preconditions.java:174)
        at io.cassandrareaper.core.RepairSegment$Builder.build(RepairSegment.java:202)
        at io.cassandrareaper.service.SegmentRunner.runRepair(SegmentRunner.java:282)
        at io.cassandrareaper.service.SegmentRunner.run(SegmentRunner.java:133)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
        at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
        at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

So I tried changing this to

context.storage.updateRepairSegment(
                  segment.with().state(RepairSegment.State.DONE)
                  .startTime(DateTime.now())
                  .endTime(DateTime.now()).build(segmentId));

and it solved the problem

@michaelsembwever
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the segment variable was not being updated with the new state on line 264.

thanks for the catch.
fix is in #323

@sergeyyegournov would you be able to test and verify the PR?

@sergeyyegournov
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michaelsembwever Thanks. Verified - I do not see the issue any longer

Please sign in to comment.