Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement query pushdown for a subset of aggregations #4917

Merged
merged 4 commits into from Dec 16, 2021

Conversation

fpetkovski
Copy link
Contributor

@fpetkovski fpetkovski commented Nov 30, 2021

Certain aggregations can be executed safely on leaf nodes without
worrying about data duplication or overlap. One such example is the max
function which can be computed on local data by the leaves before it is
computed globally by the querier.

This commit implements local aggregation in the Prometheus sidecar for
all functions which are safe to execute locally. The feature can be enabled by
passing the --enable-feature evaluate-queries option to the sidecar.

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

Verification

@fpetkovski fpetkovski force-pushed the query-pushdown branch 5 times, most recently from 8cb1b92 to 753be73 Compare December 1, 2021 09:44
pkg/store/prometheus.go Outdated Show resolved Hide resolved
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, good start! Looks generally good, but the topk/bottomk I don't think can be pushdown-ed easily 🤔 Essentially the top from node X might be not the top from part Y.

Method: "GET",
}

matrix, _, err := p.client.QueryRange(ctx, p.base, r.ToPromQL(), r.MinTime, r.MaxTime, r.QueryHints.Step/1000, queryOpts)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it always range query?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it needs to be because of lookback, but I can double check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we can always use a query range, we just need to use a positive step for instant queries.

pkg/store/storepb/custom.go Outdated Show resolved Hide resolved
pkg/store/storepb/custom.go Outdated Show resolved Hide resolved
pkg/store/storepb/rpc.proto Outdated Show resolved Hide resolved
@bwplotka
Copy link
Member

bwplotka commented Dec 1, 2021

cc @GiedriusS what if we start with this slowly into main?

@fpetkovski
Copy link
Contributor Author

We could also add a flag to the sidecar to toggle this feature on and off. Something like a feature flag.

pkg/store/prometheus.go Outdated Show resolved Hide resolved
pkg/store/prometheus.go Outdated Show resolved Hide resolved
pkg/store/prometheus.go Outdated Show resolved Hide resolved
pkg/store/prometheus.go Outdated Show resolved Hide resolved
@fpetkovski
Copy link
Contributor Author

I think we might want to add some tests as well to make sure we don't introduce a regression. I'll try to do that by EOW

@bwplotka
Copy link
Member

bwplotka commented Dec 2, 2021

Related to #305

Copy link
Member

@GiedriusS GiedriusS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we reuse the same --enable-feature pattern for this instead of a separate flag?

@fpetkovski fpetkovski force-pushed the query-pushdown branch 6 times, most recently from c8823d6 to f91d036 Compare December 6, 2021 12:04
bwplotka
bwplotka previously approved these changes Dec 9, 2021
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this work - solid PR. LGTM, just small nits. Also we are missing, docs but we can add them in separate PR!

cc @GiedriusS @squat @brancz

cmd/thanos/config.go Outdated Show resolved Hide resolved
pkg/store/prometheus.go Outdated Show resolved Hide resolved
pkg/api/query/v1.go Show resolved Hide resolved
return err
}

matrix = make(model.Matrix, 0, len(vector))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO to myself: Add issue about remote read understanding aggr or streamed PromQL

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pkg/store/prometheus.go Outdated Show resolved Hide resolved
pkg/store/storepb/query_hints.go Outdated Show resolved Hide resolved
pkg/store/storepb/rpc.proto Outdated Show resolved Hide resolved
test/e2e/e2ethanos/services.go Outdated Show resolved Hide resolved
GiedriusS
GiedriusS previously approved these changes Dec 10, 2021
Copy link
Member

@GiedriusS GiedriusS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Played around with this locally - looks good modulo @bwplotka's comments. I'm not sure how I dismissed Bartek's review with a commit that I didn't even author or push 😂 Let's play with it behind a feature flag and see how we can further improve upon this base 👍

}
matrix = result
} else {
vector, _, err := p.client.QueryInstant(s.Context(), p.base, r.ToPromQL(), time.Unix(r.MaxTime/1000, 0), opts)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
vector, _, err := p.client.QueryInstant(s.Context(), p.base, r.ToPromQL(), time.Unix(r.MaxTime/1000, 0), opts)
vector, _, err := p.client.QueryInstant(s.Context(), p.base, r.ToPromQL(), timestamp.Time(r.MaxTime), opts)

@GiedriusS
Copy link
Member

@fpetkovski any update on this? I'd like to merge this as a good base so that we could continue further push down related work.

@fpetkovski
Copy link
Contributor Author

Ah sorry I saw it was approved and didn't see the comments that followed up. I'll address them today so that we can merge it 👍

@fpetkovski
Copy link
Contributor Author

fpetkovski commented Dec 16, 2021

@GiedriusS all comments should be addressed. The e2e tests seem to be failing but I don't think the failure is related to this PR

pkg/query/querier.go Outdated Show resolved Hide resolved
Certain aggregations can be executed safely on leaf nodes without
worrying about data duplication or overlap. One such example is the max
function which can be computed on local data by the leaves before it is
computed globally by the querier.

This commit implements local aggregation in the Prometheus sidecar for
all functions which are safe to execute locally. The feature can be
enabled by passing the `--enable-feature evaluate-queries` flag to the sidecar.

Signed-off-by: fpetkovski <filip.petkovsky@gmail.com>
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
GiedriusS
GiedriusS previously approved these changes Dec 16, 2021
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
GiedriusS
GiedriusS previously approved these changes Dec 16, 2021
@GiedriusS GiedriusS enabled auto-merge (squash) December 16, 2021 14:07
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
@GiedriusS GiedriusS merged commit 79e70da into thanos-io:main Dec 16, 2021
@bwplotka
Copy link
Member

💪🏽

"max_over_time",
"min",
"min_over_time",
"group",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not count() btw.? I guess some others would also work, but I guess you're just starting with a few...

Copy link
Member

@GiedriusS GiedriusS Jun 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because then PromQL engine goes ahead and calculates count() again on the already correct results i.e. pushed down data thus corrupting it. I have suggested something here but it got no attention so I haven't pursued it further. An alternative is to edit the AST of the query but it got denied as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, I had assumed there was already some facility for changing the outer operation (in the count case, to sum), but that's not required for min/max/group.

The comment on prometheus/prometheus#10101 is interesting though. Indeed, if the same series can exist on two leafs (I think this is only the case for HA deduplication in Thanos, right?), then pushing rate() or even count() down in a correct way would be impossible (if you do the count across two HA replicas and you get 10 on each, you don't know if you really counted the same 10 series both times or counted different ones, or some overlapping sets).

"max",
"max_over_time",
"min",
"min_over_time",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I ask why avg_over_time is not safe? Is it because in Thanos we do deduplication at Query time?
If we do write time deduplication like Cortex then this seems fine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's the reason. I believe it should be safe if you have unique data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants