-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deduplication in StoreAPI #2700
Comments
No, we dedup the duplicated, identical chunks yes, but only if the series is duplicated without "replica label removing" phase. This ticket is about doing the replica rem phase and some kind of initial dedup on Store API. I think it would be a solid request, except that it might be highly expensive. This is because for true deduplication you have to decode chunks and encode them again (because StoreAPI exposes chunks). On Querier, however, PromQL operates on samples so we decode + duplicate in one go, there is little overhead. This is something to be aware of here. There is option C, where we could do partial deduplication, so we could remove labels and merge chunks without encoding/decoding in a way that: For 1:1 dups we kill dups, for time overlapping chunks that are not identical we just pass them through. Querier sorts chunks and deduplicates them anyway, so that could work. Still, for Prometheus replicas, it's not very helpful - chunks are rather never identical due to scrape interval randomness. So.. at then end, I don't really see much value. I think we should rather port deduplication to |
I agree doing deduplication in thanos-remote-read makes sense if there's not really a use case elsewhere. The attraction for me is the code for deduplication isn't currently a public API, but I'm sure I can work something out. (I still think there are cases where being able to push some of the querier work to other places has value but I suspect it's not a priority right now as there's plenty of more important things to work on. e.g. It might be useful for a custom store API server to know that deduplication is happening or not to optimise its responses.) |
Hello 👋 Looks like there was no activity on this issue for last 30 days. |
Hello 👋 Looks like there was no activity on this issue for last 30 days. |
Closing for now as promised, let us know if you need this to be reopened! 🤗 |
Currently deduplication is done in the querier, at the Select stage. This means Store API gRPC queries, even going via query do not get deduplication. For thanos-remote-read this means it does not get deduplication via querier.
Add a store API request option to ask for deduplication.
This could be useful for other purposes, e.g. it would be possible to push deduplication down to a child query node, to avoid sending extra data over the network (likely would need to be optional, even if #2699 made it possible to do this in a streaming way this would mean decoding chunks twice).
The text was updated successfully, but these errors were encountered: