-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with deduplication alogrithm in Thanos #7364
Comments
The query in the UI uses [1m] ~ from the sample it feels like you have 30s scrape frequency. Does it also happen with [5m]? I wrote a qucik test with your given inputs and the result series looks somewhat like:
|
It looks like all samples are there and do have proper 30s scrape interval between them; it could be that your 1m windows are aligned in a way that only one sample is contained in the window which would break rate. I think this is an issue with too small window, but the deduplication result looks somewhat correct to me except that we have one sample too much at the beginning |
seems related to this issue: #981 |
We have a pretty straightforward Thanos setup which consists of a querier, two Prometheus replicas and their corresponding two sidecars, each co-existing with their own Prometheus instance. Both the Prometheus replicas share the exact same configuration and scrape the same set of targets. The sidecars use Prometheus remote read API's for querying.
Recently we saw that for some target, one of the Prometheus replicas experienced scrape failures due to timeouts, which created data collection gaps. The other prometheus replica, however, didn't face any such issues and there were no data collection gaps there.
Our expectation was that while querying data for this target via Thanos Querier, these gaps will be automatically filled by the deduplication algorithm. However, this didn't happen, and Thanos selected data from the replica which had data gaps.
Here's the graph with deduplication disabled (first replica selected):
Here's the graph with deduplication disabled (second replica selected):
Here's the graph with deduplication enabled:
Here is the raw data from both the replicas for the same time range:
Raw data for this timeseries from both the replicas
Query =
node_cpu_seconds_total{mode='iowait',instance='<masked>',cpu="0"}[5m]
_replica=occ-node-A
9389.87 1713668216.753
9390.03 1713668306.753
9390.33 1713668336.753
9391.36 1713668426.753
9391.38 1713668456.753
9393.49 1713668486.753
_replica=oce-node-A
9389.94 1713668224.198
9389.95 1713668254.198
9390.02 1713668284.198
9390.03 1713668314.198
9390.33 1713668344.198
9390.83 1713668374.198
9391.13 1713668404.198
9391.38 1713668434.198
9391.61 1713668464.198
9393.53 1713668494.198
Thanos version: 0.33.0
Prometheus version: 2.51.1
The text was updated successfully, but these errors were encountered: