Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporter][prometheusremotewrite] Additional metrics for the exporter #37912

Conversation

gotjosh
Copy link
Contributor

@gotjosh gotjosh commented Feb 13, 2025

Description

Often, we have to do anylsis of the current state of the exporter which includes the current level of concurrency and throughput.

For the concurrency, we need to rely on revising each individual configuration which is time consuming, here I'm proposing to have it as a metric that we can keep a record of and see it through observability.

For the throughput, while tecnically otelcol_processor_batch_batch_send_size_count exists, this is not divisible by type (e.g. metrics vs traces) and is not specific to exporter. Whilst I'm also interested in splitting batch_send_size_count by type (because that also includes OTLP batches and this metric is only for RW), I think this is part of a separate conversation and it doesn't hurt to have the single counter that's specific for remote write.

There's one more thing here (and please advise if you'd rather see a different PR for it) - I'm also adding the url as attribute to ALL the metrics as I think this is key to identify where the exporter is pointing to.

This is tecnically my first contribution to the repo so I'd appreciate any guidiance on what else can I need to do. Telemetry tests appear to be auto generated so I didn't add any to this PR but please let me know otherwise.

Link to tracking issue

This technically a prequite for the work that I'm about to do as part of open-telemetry/opentelemetry-collector#12255 but is not directly related and can go individually.

Signed-off-by: gotjosh josue.abreu@gmail.com

Copy link

linux-foundation-easycla bot commented Feb 13, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

Copy link
Member

@ArthurSens ArthurSens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a super quick look, I'll do proper testing and review tomorrow but I had one doubt already

@@ -101,6 +111,7 @@ func newPRWTelemetry(set exporter.Settings) (prwTelemetry, error) {
telemetryBuilder: telemetryBuilder,
otelAttrs: []attribute.KeyValue{
attribute.String("exporter", set.ID.String()),
attribute.String("endpoint", endpointURL.String()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we add this attribute to otelAttrs, will this be added to all metrics of the exporter? Or just the new ones you're adding in the PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends on what you mean with "all metrics of the exporter".

It'll certainly be added to all metrics defined specifically for the exporter but not the generic ones, in this case we'll add them to:

  • otelcol_exporter_prometheusremotewrite_consumers
  • otelcol_exporter_prometheusremotewrite_sent_batch_count
  • otelcol_exporter_prometheusremotewrite_translated_time_series
  • otelcol_exporter_prometheusremotewrite_failed_translations

but not to other metrics such asotelcol_exporter_queue_capacity or otelcol_exporter_send_failed_metric_points

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Often, we have to do anylsis of the current state of the exporter which includes the current level of concurrency and throughput.

For the concurrency, we need to rely on revising each individual configuration which is time consuming, here I'm proposing to have it as a metric that we can keep a record of and see it through observability.

For the throughput, while tecnically `otelcol_processor_batch_batch_send_size_count` exists, this is not divisible by type (e.g. metrics vs traces) and is not specific to exporter. Whilst I'm also interested in splitting `batch_send_size_count` by type (because that also includes OTLP batches and this metric is only for RW), I think this is part of a separate conversation and it doesn't hurt to have the single counter that's specific for remote write.

There's one more thing here (and please advise if you'd rather see a different PR for it) - I'm also adding the `url` as attribute to _ALL_ the metrics  as I think this is key to identify where the exporter is pointing to.

This is tecnically my first contribution to the repo so I'd appreciate any guidiance on what else can I need to do. Telemetry tests appear to be auto generated so I didn't add any to this PR but please let me know otherwise.

Signed-off-by: gotjosh <josue.abreu@gmail.com>
@gotjosh gotjosh force-pushed the additional-metrics-for-remote-write-exporter branch from 942ecaf to d459f90 Compare February 14, 2025 09:22
the exporter automatically adds `total` at the end so I think this is good enough.
@gotjosh gotjosh marked this pull request as ready for review February 14, 2025 14:51
@gotjosh gotjosh requested a review from a team as a code owner February 14, 2025 14:51
@gotjosh
Copy link
Contributor Author

gotjosh commented Feb 14, 2025

I think this is now ready for reviewed, I have manually tested my changes and they look from my side.

Does this need a changelog entry? I couldn't tell if we do changelog entries for new metrics.
Screenshot 2025-02-14 at 14 16 03
Screenshot 2025-02-14 at 14 15 12

Comment on lines 120 to 131
prwTelemetry, err := newPRWTelemetry(set)
telemetry, err := newPRWTelemetry(set, endpointURL)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed this variable because it was shadowing a package name.

Copy link
Member

@ArthurSens ArthurSens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add the remoteWriteURL as an attribute in a different PR, it's very easy to split and we can easily rollback PRs in case something unrelated goes wrong.

Could someone with permissions trigger the CI?

Signed-off-by: gotjosh <josue.abreu@gmail.com>
@gotjosh
Copy link
Contributor Author

gotjosh commented Feb 14, 2025

I think we should add the remoteWriteURL as an attribute in a different PR, it's very easy to split and we can easily rollback PRs in case something unrelated goes wrong.

Done in f535f8e

We don't need it anymore

Signed-off-by: gotjosh <josue.abreu@gmail.com>
Signed-off-by: gotjosh <josue.abreu@gmail.com>
@gotjosh
Copy link
Contributor Author

gotjosh commented Feb 15, 2025

Thanks @ArthurSens, I would appreciate if anyone could please kick in the CI again - I believe I have addressed everything now that I have figured out how to run the test / linters locally.

Signed-off-by: gotjosh <josue.abreu@gmail.com>
Copy link
Member

@ArthurSens ArthurSens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: gotjosh <josue.abreu@gmail.com>
@@ -14,6 +22,14 @@ Number of translation operations that failed to translate metrics from Otel to P
| ---- | ----------- | ---------- | --------- |
| 1 | Sum | Int | true |

### otelcol_exporter_prometheusremotewrite_sent_batch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@gotjosh gotjosh Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I also re-add count which I had previously removed thinking it was redudant?

Use count Instead of Pluralization for UpDownCounters

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a Counter, not an updowncounter, so the updowncounter guidance does not apply here

Metric names SHOULD NOT be pluralized, unless the value being recorded represents discrete instances of a countable quantity. Generally, the name SHOULD be pluralized only if the unit of the metric in question is a non-unit (like {fault} or {operation}).

This is a count of discrete instances of a countable quantity, so it should be plural.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in dd3c466

Appreciate the feedback - can you please kick off the CI again?

Signed-off-by: gotjosh <josue.abreu@gmail.com>
enabled: true
description: Number of configured workers to use to fan out the outgoing requests
unit: "{consumer}"
sum:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a sum, and not a gauge? In my understanding this should be a value representing the current configuration, not a counter/sum of any form.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the same 😄 , but @dashpole and I had a conversation about it: #37912 (comment)

Let me know what you think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the part that was mostly throwing me off was using "Add" to set the value, but I see that it's only set on exporter initialization, so functionally it should be fine. 👍

Copy link
Contributor

@jmichalek132 jmichalek132 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Signed-off-by: gotjosh <josue.abreu@gmail.com>
@andrzej-stencel andrzej-stencel added the ready to merge Code review completed; ready to merge by maintainers label Mar 3, 2025
@andrzej-stencel andrzej-stencel merged commit f2747bd into open-telemetry:main Mar 3, 2025
168 checks passed
@github-actions github-actions bot added this to the next release milestone Mar 3, 2025
@gotjosh gotjosh deleted the additional-metrics-for-remote-write-exporter branch March 3, 2025 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exporter/prometheusremotewrite ready to merge Code review completed; ready to merge by maintainers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants