Metrics that would be good to have #207

elsmorian · 2017-09-25T10:08:16Z

After a chat with @adejanovski about if Reaper exported metrics, he mentioned its currently in master but not many metrics were exported yet, and if I had any requests to make a GitHub issue. For us, useful metrics would be:

Number of segments pending to be repaired (so we can chart progress over time, similar to Cassandra's own Repair PendingTasks metric
Number of segments repaired per second (likely to be low for big repairs but still handy)
Number of postponed repair events due to high load / repairs already running per second.

elsmorian · 2017-10-10T10:48:50Z

Oh, and the current number of nodes in each data centre that are up or down, that would be super helpful!

rzvoncek · 2017-10-25T10:59:04Z

Hi!

I'll look into this one. And while I'm at it, I'll add two more metrics:

repair progress (per cluster): once plotted in a dashboard, repair progress being flat will nicely show stalled repairs.
- This is somewhat similar to the number of segments above, let me see which one (or both) to include.
time of last successful repair (per cluster): will make it easy to spot missing repairs on cluster

elsmorian · 2017-10-25T22:16:14Z

@rzvoncek 👍 totally agree on that, thanks for having a look into this!

rzvoncek · 2017-10-28T09:50:50Z

Hi.

I've ended up not adding the postpones metric, because there already is something similar:

"io.cassandrareaper.service.SegmentRunner.postpone.null.testcluster.keyspace1" : {
      "count" : 13
    },

The null should be a coordinator host, but for some reason it doesn't populate for me :-/.

Add metrics for repair progress + time since last repair. Fixes #207.

elsmorian · 2017-11-01T15:11:35Z

Thanks for adding these in :)

rzvoncek self-assigned this Oct 25, 2017

rzvoncek pushed a commit that referenced this issue Oct 28, 2017

Add metrics for repair progress + time since last repair. Fixes #207.

e24a3a2

rzvoncek closed this as completed in ee37c5a Oct 31, 2017

rzvoncek added a commit that referenced this issue Oct 31, 2017

Merge pull request #256 from thelastpickle/radovan/207

c6723c7

Add metrics for repair progress + time since last repair. Fixes #207.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metrics that would be good to have #207

Metrics that would be good to have #207

elsmorian commented Sep 25, 2017

elsmorian commented Oct 10, 2017

rzvoncek commented Oct 25, 2017

elsmorian commented Oct 25, 2017

rzvoncek commented Oct 28, 2017

elsmorian commented Nov 1, 2017

Metrics that would be good to have #207

Metrics that would be good to have #207

Comments

elsmorian commented Sep 25, 2017

elsmorian commented Oct 10, 2017

rzvoncek commented Oct 25, 2017

elsmorian commented Oct 25, 2017

rzvoncek commented Oct 28, 2017

elsmorian commented Nov 1, 2017