[docdb] Expose timeout metrics from the RPC layer #2196
Labels
area/docdb
YugabyteDB core features
kind/enhancement
This is an enhancement of an existing feature
priority/medium
Medium priority issue
Projects
Jira Link: DB-1595
In absence of explicit application level metrics, timeouts at the TS layer might be the most likely indicator of user impact. If we can expose timeout counters from top-level TS RPCs (read/write), and potentially even from proxies (cql/ysql), we can use those as a high level filter for system issues (back-pressure, network hiccups, etc).
Furthermore, ensuring this stays at a flat 0 during rolling restarts would help as validation of providing zero downtime operations!
The text was updated successfully, but these errors were encountered: