Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocDB] The statistic threads_running_thread_pool for YCQL shows incorrect value #13575

Closed
fritshoogland-yugabyte opened this issue Aug 11, 2022 · 7 comments
Assignees
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue

Comments

@fritshoogland-yugabyte
Copy link

fritshoogland-yugabyte commented Aug 11, 2022

Jira Link: DB-3154

Description

When querying the metrics for the YCQL (12000) endpoint, I get the following values:

  • metrics endpoint: 18446744073709551609
  • prometheus-metrics endpoint: -7

These do not seem like realistic numbers to me, and therefore give the impression this statistic is broken.
I now find this value with version 2.15.2.0-b83, but witnessed this value for quite some time.

@fritshoogland-yugabyte fritshoogland-yugabyte added area/docdb YugabyteDB core features status/awaiting-triage Issue awaiting triage labels Aug 11, 2022
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Aug 11, 2022
@bmatican bmatican assigned rthallamko3 and unassigned bmatican Aug 11, 2022
@bmatican
Copy link
Contributor

@rthallamko3 , something to add to the tooling bucket. I'm also intrigued this shows up now, but didn't before, perhaps some components were moved around, leading to some reporting on the YCQL webserver?

@yugabyte-ci yugabyte-ci removed the status/awaiting-triage Issue awaiting triage label Aug 11, 2022
@fritshoogland-yugabyte
Copy link
Author

fritshoogland-yugabyte commented Aug 11, 2022 via email

@rthallamko3
Copy link
Contributor

@yusong-yan , This is a good ramp up item about metrics.

@fritshoogland-yugabyte
Copy link
Author

This seems to have gone silent?

@yusong-yan
Copy link
Contributor

This seems to have gone silent?

I already have the fix, and it is in the process of reviewing. Sorry for the delay!

@fritshoogland-yugabyte
Copy link
Author

Sorry! Is there a way I could have found this out myself?

@bmatican
Copy link
Contributor

@fritshoogland-yugabyte For work in progress, you'd have to check Phabricator, or be explicitly added on those diffs as a subscriber. See https://phabricator.dev.yugabyte.com/D19359

yusong-yan added a commit that referenced this issue Oct 20, 2022
Summary:
To track threads_started_category and threads_running_category correctly, rewrite ThreadCategoryTracker. The previous implementation can only take one metric_entity from multiple servers. It means when a new metric entity comes in, the tracker will replace old metric entity with the new one. For example, after creating tablet_server, ThreadCategoryTracker will track threads for yb.tabletserver. When the cql_server metric gets created, ThreadCategoryTracker will stop tracking threads for yb.tabletserver, instead track it only for cql_server. This is also the root cause of the negative metric value for thread_running_category in cql.server as we see in GitHub issue [[ #13575 13575 ]].

The new version of ThreadCategoryTracker is able to register a unique copy of these metrics in multiple servers. Plus, it uses function gauges which have the same style as how we track other metrics in thread.cc, such as cpu_stime, threads_started, voluntary_context_switches.

Test Plan: Tested on local machine, both 12000/prometheus-metrics and 9000/prometheus-metrics have the same metric values of thread_running_category and thread_started_category

Reviewers: amitanand, rsami, rthallam

Reviewed By: rsami, rthallam

Subscribers: bogdan, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D19359
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

5 participants