-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
metricstarttimeprocessor: Copy true reset strategy from prometheus receiver #37855
Conversation
3260ddf
to
769ca83
Compare
metricstarttime: | ||
|
||
# specify the strategy to use for setting the start time | ||
strategy: true_reset_point |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are other strategies that you envision this processor supporting? using another metric's value for the start time (process_start_time_seconds
)? Wouldn't those also need to use the true_reset_point
strategy to deal with resets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a metric's value resets, it seems like we should always reset the start time, but I don't think that should involve inserting a "true reset" point. I think it should be implicit for all adjustment strategies (maybe with an on/off config option, similar to fallback to collector start time?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/data-model.md#resets-and-gaps hoping to find a term for the other common mode -- but didn't find one.
If I had to make one up, it would be "missing_start_point" or similar, which is nearly identical logic to the presentation in this PR, except the value is unchanged. The first time the processor sees the series w/ a missing start time, it will enter the first timestamp it sees as the start time w/ unchanged value. Subsequent resets will be detected using the heuristic for cumulative values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this option should only determine how to handle the initial missing start point. If we know a series has reset (e.g. because the value decreases), we should transition to a generic "use the most recent point's start timestamp going forward" handler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall
I think the JobsMap
should be renamed and use a hash of the resource attributes instead but I saw you created a TODO for it. Feel free to create an issue and assign it to me.
func (a *Adjuster) AdjustMetrics(_ context.Context, metrics pmetric.Metrics) (pmetric.Metrics, error) { | ||
for i := 0; i < metrics.ResourceMetrics().Len(); i++ { | ||
rm := metrics.ResourceMetrics().At(i) | ||
// TODO: Produce a hash of all resource attributes, rather than just job + instance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] Create an issue for this and assign to me?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw I like to use the OTel Go attributes.Set
for this sort of mapping.
} | ||
|
||
if currentDist.Flags().NoRecordedValue() { | ||
// TODO: Investigate why this does not reset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] Is there an issue for this already? Not sure if this is being tracked anywhere yet (I also see the TODO int he prometheusreceiver, just don't want to lose this when we remove the adjuster from there)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is copied from existing Prometheus code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is copied from the prometheus receiver. There isn't an issue for this. Not sure it is big enough to warrant one.
metricstarttime: | ||
|
||
# specify the strategy to use for setting the start time | ||
strategy: true_reset_point |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/data-model.md#resets-and-gaps hoping to find a term for the other common mode -- but didn't find one.
If I had to make one up, it would be "missing_start_point" or similar, which is nearly identical logic to the presentation in this PR, except the value is unchanged. The first time the processor sees the series w/ a missing start time, it will enter the first timestamp it sees as the start time w/ unchanged value. Subsequent resets will be detected using the heuristic for cumulative values.
func (a *Adjuster) AdjustMetrics(_ context.Context, metrics pmetric.Metrics) (pmetric.Metrics, error) { | ||
for i := 0; i < metrics.ResourceMetrics().Len(); i++ { | ||
rm := metrics.ResourceMetrics().At(i) | ||
// TODO: Produce a hash of all resource attributes, rather than just job + instance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw I like to use the OTel Go attributes.Set
for this sort of mapping.
} | ||
|
||
if currentDist.Flags().NoRecordedValue() { | ||
// TODO: Investigate why this does not reset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is copied from existing Prometheus code?
type Config struct{} | ||
type Config struct { | ||
Strategy string `mapstructure:"strategy"` | ||
GCInterval time.Duration `mapstructure:"gc_interval"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the gc_interval
configuration option be documented?
Also it looks like this option is specific to the true_reset_point
strategy and not a general processor option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gc_interval
will be used by other strategies once they are added. @ridwanmsharif can you document gc_interval in a follow-up?
Description
Copy the "metric adjuster" from the prometheus receiver to a new "true_reset_point" strategy in the metricstarttimeprocessor
Link to tracking issue
Part of #37186
Testing
Copied unit tests
Documentation
Updated the README.
Notes for reviewers
AdjustMetrics
to match the processMetrics function so that it doesn't need an additional wrapper layer.