Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metricstarttimeprocessor: Copy true reset strategy from prometheus receiver #37855

Merged
merged 10 commits into from
Mar 3, 2025

Conversation

dashpole
Copy link
Contributor

Description

Copy the "metric adjuster" from the prometheus receiver to a new "true_reset_point" strategy in the metricstarttimeprocessor

Link to tracking issue

Part of #37186

Testing

Copied unit tests

Documentation

Updated the README.

Notes for reviewers

  • I would recommend reviewing commit-by-commit. The first commit is copied verbatim, with minimal changes other than splitting into multiple files.
  • I changed the function signature of AdjustMetrics to match the processMetrics function so that it doesn't need an additional wrapper layer.
  • I removed the MetricsAdjuster interface, since it isn't needed (we implement the processor function for metrics now).
  • I had to remove the validation of job + instance being present because it didn't pass generated tests. Regardless, we should not rely on those in a generic processor.

@dashpole dashpole force-pushed the true_reset branch 5 times, most recently from 3260ddf to 769ca83 Compare February 11, 2025 20:45
@dashpole dashpole added the enhancement New feature or request label Feb 12, 2025
@dashpole dashpole marked this pull request as ready for review February 12, 2025 18:42
@dashpole dashpole requested a review from a team as a code owner February 12, 2025 18:42
@dashpole
Copy link
Contributor Author

cc @ridwanmsharif @jmacd @ArthurSens

metricstarttime:

# specify the strategy to use for setting the start time
strategy: true_reset_point
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are other strategies that you envision this processor supporting? using another metric's value for the start time (process_start_time_seconds)? Wouldn't those also need to use the true_reset_point strategy to deal with resets?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a metric's value resets, it seems like we should always reset the start time, but I don't think that should involve inserting a "true reset" point. I think it should be implicit for all adjustment strategies (maybe with an on/off config option, similar to fallback to collector start time?).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/data-model.md#resets-and-gaps hoping to find a term for the other common mode -- but didn't find one.

If I had to make one up, it would be "missing_start_point" or similar, which is nearly identical logic to the presentation in this PR, except the value is unchanged. The first time the processor sees the series w/ a missing start time, it will enter the first timestamp it sees as the start time w/ unchanged value. Subsequent resets will be detected using the heuristic for cumulative values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this option should only determine how to handle the initial missing start point. If we know a series has reset (e.g. because the value decreases), we should transition to a generic "use the most recent point's start timestamp going forward" handler.

Copy link
Member

@ridwanmsharif ridwanmsharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall

I think the JobsMap should be renamed and use a hash of the resource attributes instead but I saw you created a TODO for it. Feel free to create an issue and assign it to me.

func (a *Adjuster) AdjustMetrics(_ context.Context, metrics pmetric.Metrics) (pmetric.Metrics, error) {
for i := 0; i < metrics.ResourceMetrics().Len(); i++ {
rm := metrics.ResourceMetrics().At(i)
// TODO: Produce a hash of all resource attributes, rather than just job + instance.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] Create an issue for this and assign to me?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw I like to use the OTel Go attributes.Set for this sort of mapping.

}

if currentDist.Flags().NoRecordedValue() {
// TODO: Investigate why this does not reset.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] Is there an issue for this already? Not sure if this is being tracked anywhere yet (I also see the TODO int he prometheusreceiver, just don't want to lose this when we remove the adjuster from there)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is copied from existing Prometheus code?

Copy link
Contributor Author

@dashpole dashpole Mar 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copied from the prometheus receiver. There isn't an issue for this. Not sure it is big enough to warrant one.

metricstarttime:

# specify the strategy to use for setting the start time
strategy: true_reset_point
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/data-model.md#resets-and-gaps hoping to find a term for the other common mode -- but didn't find one.

If I had to make one up, it would be "missing_start_point" or similar, which is nearly identical logic to the presentation in this PR, except the value is unchanged. The first time the processor sees the series w/ a missing start time, it will enter the first timestamp it sees as the start time w/ unchanged value. Subsequent resets will be detected using the heuristic for cumulative values.

func (a *Adjuster) AdjustMetrics(_ context.Context, metrics pmetric.Metrics) (pmetric.Metrics, error) {
for i := 0; i < metrics.ResourceMetrics().Len(); i++ {
rm := metrics.ResourceMetrics().At(i)
// TODO: Produce a hash of all resource attributes, rather than just job + instance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw I like to use the OTel Go attributes.Set for this sort of mapping.

}

if currentDist.Flags().NoRecordedValue() {
// TODO: Investigate why this does not reset.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is copied from existing Prometheus code?

@dashpole dashpole added the ready to merge Code review completed; ready to merge by maintainers label Mar 2, 2025
type Config struct{}
type Config struct {
Strategy string `mapstructure:"strategy"`
GCInterval time.Duration `mapstructure:"gc_interval"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the gc_interval configuration option be documented?

Also it looks like this option is specific to the true_reset_point strategy and not a general processor option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gc_interval will be used by other strategies once they are added. @ridwanmsharif can you document gc_interval in a follow-up?

@andrzej-stencel andrzej-stencel merged commit a2f2b8c into open-telemetry:main Mar 3, 2025
167 checks passed
@github-actions github-actions bot added this to the next release milestone Mar 3, 2025
@dashpole dashpole deleted the true_reset branch March 3, 2025 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request processor/metricstarttime ready to merge Code review completed; ready to merge by maintainers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants