-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[exporter][prometheusremotewrite] Additional metrics for the exporter #37912
[exporter][prometheusremotewrite] Additional metrics for the exporter #37912
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a super quick look, I'll do proper testing and review tomorrow but I had one doubt already
@@ -101,6 +111,7 @@ func newPRWTelemetry(set exporter.Settings) (prwTelemetry, error) { | |||
telemetryBuilder: telemetryBuilder, | |||
otelAttrs: []attribute.KeyValue{ | |||
attribute.String("exporter", set.ID.String()), | |||
attribute.String("endpoint", endpointURL.String()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we add this attribute to otelAttrs, will this be added to all metrics of the exporter? Or just the new ones you're adding in the PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depends on what you mean with "all metrics of the exporter".
It'll certainly be added to all metrics defined specifically for the exporter but not the generic ones, in this case we'll add them to:
otelcol_exporter_prometheusremotewrite_consumers
otelcol_exporter_prometheusremotewrite_sent_batch_count
otelcol_exporter_prometheusremotewrite_translated_time_series
otelcol_exporter_prometheusremotewrite_failed_translations
but not to other metrics such asotelcol_exporter_queue_capacity
or otelcol_exporter_send_failed_metric_points
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Often, we have to do anylsis of the current state of the exporter which includes the current level of concurrency and throughput. For the concurrency, we need to rely on revising each individual configuration which is time consuming, here I'm proposing to have it as a metric that we can keep a record of and see it through observability. For the throughput, while tecnically `otelcol_processor_batch_batch_send_size_count` exists, this is not divisible by type (e.g. metrics vs traces) and is not specific to exporter. Whilst I'm also interested in splitting `batch_send_size_count` by type (because that also includes OTLP batches and this metric is only for RW), I think this is part of a separate conversation and it doesn't hurt to have the single counter that's specific for remote write. There's one more thing here (and please advise if you'd rather see a different PR for it) - I'm also adding the `url` as attribute to _ALL_ the metrics as I think this is key to identify where the exporter is pointing to. This is tecnically my first contribution to the repo so I'd appreciate any guidiance on what else can I need to do. Telemetry tests appear to be auto generated so I didn't add any to this PR but please let me know otherwise. Signed-off-by: gotjosh <josue.abreu@gmail.com>
942ecaf
to
d459f90
Compare
the exporter automatically adds `total` at the end so I think this is good enough.
prwTelemetry, err := newPRWTelemetry(set) | ||
telemetry, err := newPRWTelemetry(set, endpointURL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed this variable because it was shadowing a package name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add the remoteWriteURL as an attribute in a different PR, it's very easy to split and we can easily rollback PRs in case something unrelated goes wrong.
Could someone with permissions trigger the CI?
Signed-off-by: gotjosh <josue.abreu@gmail.com>
Done in f535f8e |
We don't need it anymore Signed-off-by: gotjosh <josue.abreu@gmail.com>
Signed-off-by: gotjosh <josue.abreu@gmail.com>
Thanks @ArthurSens, I would appreciate if anyone could please kick in the CI again - I believe I have addressed everything now that I have figured out how to run the test / linters locally. |
Signed-off-by: gotjosh <josue.abreu@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: gotjosh <josue.abreu@gmail.com>
@@ -14,6 +22,14 @@ Number of translation operations that failed to translate metrics from Otel to P | |||
| ---- | ----------- | ---------- | --------- | | |||
| 1 | Sum | Int | true | | |||
|
|||
### otelcol_exporter_prometheusremotewrite_sent_batch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on https://github.com/open-telemetry/semantic-conventions/blob/e70853397b73ca9cb85b1df45182b6a3e299c85e/docs/general/naming.md#naming-rules-for-counters-and-updowncounters, it seems like this should be batches
, rather than batch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I also re-add count which I had previously removed thinking it was redudant?
Use count Instead of Pluralization for UpDownCounters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a Counter, not an updowncounter, so the updowncounter guidance does not apply here
Metric names SHOULD NOT be pluralized, unless the value being recorded represents discrete instances of a countable quantity. Generally, the name SHOULD be pluralized only if the unit of the metric in question is a non-unit (like {fault} or {operation}).
This is a count of discrete instances of a countable quantity, so it should be plural.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in dd3c466
Appreciate the feedback - can you please kick off the CI again?
Signed-off-by: gotjosh <josue.abreu@gmail.com>
enabled: true | ||
description: Number of configured workers to use to fan out the outgoing requests | ||
unit: "{consumer}" | ||
sum: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a sum, and not a gauge? In my understanding this should be a value representing the current configuration, not a counter/sum of any form.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the same 😄 , but @dashpole and I had a conversation about it: #37912 (comment)
Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the part that was mostly throwing me off was using "Add" to set the value, but I see that it's only set on exporter initialization, so functionally it should be fine. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Signed-off-by: gotjosh <josue.abreu@gmail.com>
Description
Often, we have to do anylsis of the current state of the exporter which includes the current level of concurrency and throughput.
For the concurrency, we need to rely on revising each individual configuration which is time consuming, here I'm proposing to have it as a metric that we can keep a record of and see it through observability.
For the throughput, while tecnically
otelcol_processor_batch_batch_send_size_count
exists, this is not divisible by type (e.g. metrics vs traces) and is not specific to exporter. Whilst I'm also interested in splittingbatch_send_size_count
by type (because that also includes OTLP batches and this metric is only for RW), I think this is part of a separate conversation and it doesn't hurt to have the single counter that's specific for remote write.There's one more thing here (and please advise if you'd rather see a different PR for it) - I'm also adding theurl
as attribute to ALL the metrics as I think this is key to identify where the exporter is pointing to.This is tecnically my first contribution to the repo so I'd appreciate any guidiance on what else can I need to do. Telemetry tests appear to be auto generated so I didn't add any to this PR but please let me know otherwise.
Link to tracking issue
This technically a prequite for the work that I'm about to do as part of open-telemetry/opentelemetry-collector#12255 but is not directly related and can go individually.
Signed-off-by: gotjosh josue.abreu@gmail.com