Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/mongodbreceiver] Added new mongodb metrics to achieve parity with Telegraf #37227

Merged
merged 24 commits into from
Mar 7, 2025

Conversation

chan-tim-sumo
Copy link
Contributor

@chan-tim-sumo chan-tim-sumo commented Jan 14, 2025

Description

  mongodb.queries_per_sec
  mongodb.inserts_per_sec
  mongodb.commands_per_sec
  mongodb.getmores_per_sec
  mongodb.deletes_per_sec
  mongodb.updates_per_sec
  mongodb.flushes_per_sec
  mongodb.active.writes
  mongodb.active.reads
  mongodb.wtcache.bytes.read
  mongodb.cache.dirty.percent
  mongodb.cache.used.percent
  mongodb.page_faults

Link to tracking issue

Testing

Screenshot examples:

Documentation

@chan-tim-sumo chan-tim-sumo requested a review from a team as a code owner January 14, 2025 20:18
@github-actions github-actions bot requested a review from schmikei January 14, 2025 20:19
@chan-tim-sumo chan-tim-sumo force-pushed the chan-tim_mongodbMetrics branch 2 times, most recently from 7a46972 to fcd84f3 Compare January 14, 2025 20:44
@chan-tim-sumo chan-tim-sumo force-pushed the chan-tim_mongodbMetrics branch from cb0771b to 9c33d36 Compare January 14, 2025 20:46
@chan-tim-sumo
Copy link
Contributor Author

actively fixing the build errors and merge conflicts :)

@chan-tim-sumo
Copy link
Contributor Author

chan-tim-sumo commented Jan 16, 2025

only failing check is lychee:

# Summary

| Status        | Count |
|---------------|-------|
| 🔍 Total      | 0     |
| ✅ Successful | 0     |
| ⏳ Timeouts   | 0     |
| 🔀 Redirected | 0     |
| 👻 Excluded   | 0     |
| ❓ Unknown    | 0     |
| 🚫 Errors     | 0     |
No links were found. This usually indicates a configuration error.
If this was expected, set 'failIfEmpty: false' in the args.

not really sure how to fix this, doesn't seem like anything is wrong? this check isn't required to pass to merge this PR.

@atoulme atoulme changed the title [receiver/mongodbreceiver] Added new mongodb metrics to acheive parity with Telegraf [receiver/mongodbreceiver] Added new mongodb metrics to achieve parity with Telegraf Jan 17, 2025
@chan-tim-sumo
Copy link
Contributor Author

chan-tim-sumo commented Jan 28, 2025

@schmikei saw that you're the code owner for this repo. can you take a look when you're free?

as well as a 2nd pr related to this (broken into 2 parts): #37517

thanks!

Copy link
Contributor

@schmikei schmikei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, thank you for your contribution! I think I want to deliberate on some metric naming and potentially consolidate the command rate metrics into a single one.

I also am getting torn between having some version checks for reading from WiredTiger as if my memory recollects it was added to the summaryStats in 4.4

https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/15ab1613406a841bfdb4c7a969291807a08d8f7f/receiver/mongodbreceiver/scraper.go#L52

value_type: double
aggregation_temporality: delta
monotonic: false
mongodb.commands_per_sec:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like that mongodb.commands.rate could better consolidate metrics here and combine via a metric attribute

  • insert
  • getmore
  • delete
  • update
  • find

Pretty much the commands in this table I think could be consolidated into a single rate metric with unit {query}/s: https://www.mongodb.com/docs/manual/reference/command/#query-and-write-operation-commands

Copy link
Contributor Author

@chan-tim-sumo chan-tim-sumo Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this is a good idea, the reason why I didn't go with this approach was because this is to match with telegraf metrics which has specific (naming?) rate metrics like:

queries per sec
inserts per sec
etc

while putting this into consolidate metric where I'm thinking something like mongodb.operation.rate similar to mongodb.operation.count and while the metric shows up, it'll be under an operation instead of its own metric (if that makes sense? including an screenshot here
here

and I'm not exactly sure if telegraf will know the difference between:

 1. mongodb.operation.rate with operation=query or operation=insert
 2. mongodb.queries.rate

just to be safe, I went with mongodb.queries_per_sec (or after your suggestion mongodb.queries.rate).

let me know what you think!

Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@chan-tim-sumo chan-tim-sumo force-pushed the chan-tim_mongodbMetrics branch from 7b0d4fc to ab9fee4 Compare February 26, 2025 01:28
@chan-tim-sumo chan-tim-sumo force-pushed the chan-tim_mongodbMetrics branch from 4881ac6 to 4b7896b Compare February 26, 2025 01:42
Copy link
Member

@dmitryax dmitryax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to calculate the rate in this receiver for already emitted set of metrics? Calculating rate from cumulative seems like something that can easily be done on any metrics backend AFAIK. Even if it's needed to be done in the collector, why not make a generic solution for that?

I believe it must be even possible already by applying:

  1. Cumulative to delta processor
  2. Scaling the values deviding them by the collection interval using the scale_metric function in transform processor

@chan-tim-sumo
Copy link
Contributor Author

@schmikei if you can take a look at this PR again would appreciate it! the failures seems to be from the kafkatopicsobserver, not related to my change 🤔

@github-actions github-actions bot removed the Stale label Feb 26, 2025
@chan-tim-sumo
Copy link
Contributor Author

chan-tim-sumo commented Feb 26, 2025

Why do we need to calculate the rate in this receiver for already emitted set of metrics? Calculating rate from cumulative seems like something that can easily be done on any metrics backend AFAIK. Even if it's needed to be done in the collector, why not make a generic solution for that?

I believe it must be even possible already by applying:

  1. Cumulative to delta processor
  2. Scaling the values deviding them by the collection interval using the scale_metric function in transform processor

whoops just saw this, we commented at the same time yesterday 😆

the reason I was thinking of calculating the rate in this receiver was because while there are already a set of emitted metrics which I believe to be mongodb.operation.count which stores the total count for specific/all the operations. so was thinking of adding rate metrics to be in line with telegraf metrics

oh wow hm, didn't know that there was a cumulativetodeltaprocessor. not too familiar with this, will look into this today!

@chan-tim-sumo
Copy link
Contributor Author

taking a look at the readme

As a result, the cumulativetodelta processor may not work as expected
if used in a deployment of multiple collectors. When using this processor 
it is best for the data source to being sending data to a single collector.

not sure if this is a viable solution since this is best for a single collector 🤔

@chan-tim-sumo
Copy link
Contributor Author

bump @schmikei

@chan-tim-sumo
Copy link
Contributor Author

@atoulme can you take a look when you're free? thanks!

@atoulme
Copy link
Contributor

atoulme commented Mar 6, 2025

I can take a look, but please resolve conflicts first.

@chan-tim-sumo chan-tim-sumo force-pushed the chan-tim_mongodbMetrics branch from 69aec2d to ec4795b Compare March 6, 2025 20:42
Copy link
Contributor

@schmikei schmikei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this generally looks good, the only concern I have is around how useful these rate metrics are going to be reported as the collector process if restarted we might get gaps in data depending on the scrape_interval which is why I think we generally want to avoid adding rate calcs in receivers, but I don't feel like blocking your efforts for too much longer.

I see the value in these metrics and think its a good change, I don't have full approver status here so would love to get someone more familiar with the data model if they have a different perspective on it.

@chan-tim-sumo
Copy link
Contributor Author

I think this generally looks good, the only concern I have is around how useful these rate metrics are going to be reported as the collector process if restarted we might get gaps in data depending on the scrape_interval which is why I think we generally want to avoid adding rate calcs in receivers, but I don't feel like blocking your efforts for too much longer.

I see the value in these metrics and think its a good change, I don't have full approver status here so would love to get someone more familiar with the data model if they have a different perspective on it.

Also open to other comments as well from someone have a different perspective. If people wouldn't want to use these metrics, they can set it to false right? and thanks for taking a look Keith, appreciate it!

@atoulme atoulme merged commit db099f2 into open-telemetry:main Mar 7, 2025
156 checks passed
@github-actions github-actions bot added this to the next release milestone Mar 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants