New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Platform] Add metrics for xCluster replication #3820
Labels
Projects
Milestone
Comments
rkarthik007
changed the title
[YW] Add metrics for 2DC
[Platform] Add metrics for xCluster replication
Jun 1, 2020
Status update - currently in design. |
andrewc-dev
pushed a commit
that referenced
this issue
Jul 6, 2020
Summary: Add "Replication" tab into universe overview if enabled and query for replication lag metric as implemented in https://phabricator.dev.yugabyte.com/D8733 For now, show the latest value of the committed lag metric. Convert metric number to human-readable units where possible, ie if large enough, so minutes or seconds lag time instead of microseconds. Test Plan: Note: User's feature config must set `universes.details.replication: 'available'` in order for Replication tab to appear. Go to universe overview and then the Replication tab. Confirm that the page displays the metric for `tserver_async_replication_lag_micros` as set in DocDB layer. Confirm in Prometheus that the value is correct. Example of Replication page with lag 0: {F13767} Example of metric number with large lag: {F13769} Reviewers: ram, rahuldesirazu, sshevchenko Reviewed By: sshevchenko Subscribers: ui, jenkins-bot Differential Revision: https://phabricator.dev.yugabyte.com/D8738
Phase 1 is mostly complete. We currently do not show the status on the destination cluster, only on the source cluster. This seems to be a limitation of the CDC architecture as currently implemented. @ndeodhar |
andrewc-dev
pushed a commit
that referenced
this issue
Jul 14, 2020
Summary: Add "Replication" tab into universe overview if enabled and query for replication lag metric as implemented in https://phabricator.dev.yugabyte.com/D8733 For now, show the latest value of the committed lag metric. Convert metric number to human-readable units where possible, ie if large enough, so minutes or seconds lag time instead of microseconds. Test Plan: Note: User's feature config must set `universes.details.replication: 'available'` in order for Replication tab to appear. Go to universe overview and then the Replication tab. Confirm that the page displays the metric for `tserver_async_replication_lag_micros` as set in DocDB layer. Confirm in Prometheus that the value is correct. Example of Replication page with lag 0: {F13767} Example of metric number with large lag: {F13769} Reviewers: ram, rahuldesirazu, sshevchenko Reviewed By: sshevchenko Subscribers: ui, jenkins-bot Differential Revision: https://phabricator.dev.yugabyte.com/D8738
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This is a master task to add xCluster replication metrics into Platform. The following needs to be added.
Phase 1
✅ Add the cluster-wide max-lag metrics into Prometheus metrics from the YugabyteDB
✅ Add the max lag metric as a metric graph into Platform
✅ Add a new tab for replication in Platform (universe details)
✅ Replication tab: show if the cluster is caught up or not (max lag is 0 if caught up).
✅ Replication tab: show by default on the source cluster. If not caught up, we should show the max lag as time (seconds, etc).
Phase 2 - v2.3
⬜️ Send alerts on high replication lag
⬜️ Ability to configure replication lag thresholds for sending alerts
Phase 3
⬜️ Give table level breakdown for the above in the replication status page
⬜️ Max lag in terms of op ID at cluster level
⬜️ Max lag per table in terms of op ID
cc: @bmatican @ramkumarvs @rkarthik007 @schoudhury
Aha! Link: https://yugabyte-test.aha.io/features/PLATFORM-644
The text was updated successfully, but these errors were encountered: