Odd store and Query deduplication behaviour #6419
Unanswered
avidpontoon
asked this question in
Questions & Answers
Replies: 1 comment
-
This looks a lot like #6257, which I just ran into as well with similar results to you. If it is, downgrading Thanos will help. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello Everyone.
I am using Thanos receive in a statefulset with two pods and a replication factor of 2, here are my args:
The first label that gets presented by the receive pods is the receive_replica one, this is to make the metrics that come from each receive pod different.
This feeds into object storage served my Minio at the moment, a storage gateway is setup here. That is a pretty generic setup, nothing special there.
The issue I am seeing is deduplication of the data in Receive and Store. The deduplication is working fine from only data in Receive:
This is an example of unduplicated data for a single instance coming only from Receive:
Data has been sanitised by all the red sections are exactly the same across all the metrics.
When I enable deduplication, I get a single metric like I expect:
This has basically removed the prometheus_replica and receive_replica labels as per my Query config:
However, if I look in the graph and look a couple of hours ago, I see some duplicated metrics:
The maroon metric is the one that has the whole series and the other two just have a small portion, these are coming from Storage Gateway, I know this because they disappear when I remove the --store config from Query pointing to the store gateway.
I don't understand why Query is unable to deduplicate these and whats happening, if I change the execution time in Query to a time that matches some duplication on the graph, I do see all 3 metrics with deduplication on:
The top two are the duplicate ones and have the prometheus_replica label still attached to them, even though query should be deduplicating that.
Looking at those 3 metrics un-deduplicated:
These are the exact same as before so im not sure why Query is having a hard time with them.
Ive tried doing deduplication in storage using the compactor but it seems you can only deduplicate on the external labels that are attached to the meta.json and these are only from receive, so Compactor cannot deduplicate based on prometheus_replica it seems.
Does anyone have any ideas what is going on here? Its fine if I only have a single prometheus replica, as soon as I add two this issue presents. If I remote the storegatway from the Query configuration then the issue goes too, so can tell this is coming from Store.
Likewise, if I remove both Receive's from my Query config and just leave store in there, I get this for the time above:
Interestingly there are two different values there. And when I try to deduplicate those, I am left with two metrics instead of 1:
I have seen Query deduplicate different values from Receive previously (seen in the first two screenshots), so am not sure if this is my problem? Is this different with the StoreAPI getting data from the object storage?
I would really like to know if I can solve this as its currently preventing us from scaling up the prom instances that remote write into Receive as we start getting these duplicates.
I am running version 0.31.0 for all Thanos services
Beta Was this translation helpful? Give feedback.
All reactions