Incorrect values for millisSinceLastRepairForSchedule in prometheusMetrics #1454

rtib · 2024-01-03T09:35:33Z

I'm running Reaper in sidecar mode and tried to monitor it via Prometheus metrics. I've created a dashboard showing repair progress along with the time since the last scheduled repair. While repair progress is working fine, time since last repair is showing correct value for only one schedule, others are arrested to epoch.

Reaper shows on the webui that all schedules have run within 7 days

and the schedules are still active

When looking at the prometheusMetrics endpoint, however, the values exported are wrong

However, this is not a prometheus-exporter issue, the Dropwizard report contains the same problem

Looking into it I've found that millis since epoch is the fallback value for a repair schedule if no repairs from this schedule were completed.

cassandra-reaper/src/server/src/main/java/io/cassandrareaper/service/RepairScheduleService.java

Line 190 in 0661688

    
           .orElse(DateTime.now().getMillis()); // Return epoch if no repairs from this schedule were completed

rtib · 2024-01-03T09:56:53Z

Taking a look into the database, the according repair_run entry does have a valid end_time.

rtib · 2024-01-03T10:50:10Z

Digging a bit deeper unveiled that the last_run field of repair_schedule_v1 contains null for all but one entries. That makes millisSinceLastRepairForSchedule to fall back to return epoch.

Nassz · 2024-01-03T10:55:06Z

This metric is also not working for multiple hosts (for 3 nodes cluster), which can answer why U have only 1 KS metric. Alternatively, U can use 7days - io_cassandrareaper_service_RepairRunner_millisSinceLastRepair :)

rtib · 2024-01-03T13:39:24Z

That would also contain manually started repairs, which is okay, but then it is hard to distinguish between schedules and manual runs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect values for millisSinceLastRepairForSchedule in prometheusMetrics #1454

Incorrect values for millisSinceLastRepairForSchedule in prometheusMetrics #1454

rtib commented Jan 3, 2024 •

edited by adejanovski

Loading

rtib commented Jan 3, 2024

rtib commented Jan 3, 2024

Nassz commented Jan 3, 2024

rtib commented Jan 3, 2024

Incorrect values for millisSinceLastRepairForSchedule in prometheusMetrics #1454

Incorrect values for millisSinceLastRepairForSchedule in prometheusMetrics #1454

Comments

rtib commented Jan 3, 2024 • edited by adejanovski Loading

rtib commented Jan 3, 2024

rtib commented Jan 3, 2024

Nassz commented Jan 3, 2024

rtib commented Jan 3, 2024

rtib commented Jan 3, 2024 •

edited by adejanovski

Loading