irate/rate query issue on the deduplicated metrics #5025

gabrielbatir · 2022-01-04T09:41:40Z

Thanos, Prometheus and Golang version used:

Thanos deployed with kube-prometheus. Initial issue detected while using thanos 0.23.1. Upgraded only thanos query to 0.24.0 but still having this issue.

Thanos compact:
compact --wait --log.level=info --log.format=logfmt --objstore.config=$(OBJSTORE_CONFIG) --data-dir=/var/thanos/compact --debug.accept-malformed-index --retention.resolution-raw=360d --retention.resolution-5m=360d --retention.resolution-1h=360d --delete-delay=8h --deduplication.replica-label=prometheus_replica --deduplication.replica-label=rule_replica --deduplication.replica-label=replica

Thanos query:
query --grpc-address=0.0.0.0:10901 --http-address=0.0.0.0:9090 --log.level=debug --log.format=logfmt --query.replica-label=prometheus_replica --query.replica-label=rule_replica --query.replica-label=replica --store=dnssrv+_grpc._tcp.prometheus-network-monitoring-thanos-sidecar.network-monitoring.svc.cluster.local --store=dnssrv+_grpc._tcp.thanos-store-network-monitoring.network-monitoring.svc.cluster.local --query.timeout=5m --query.lookback-delta=15m --query.auto-downsampling

Prometheus: quay.io/prometheus/prometheus:v2.29.2
--web.console.templates=/etc/prometheus/consoles --web.console.libraries=/etc/prometheus/console_libraries --config.file=/etc/prometheus/config_out/prometheus.env.yaml --storage.tsdb.path=/prometheus --storage.tsdb.retention.time=5d --web.enable-lifecycle --storage.tsdb.no-lockfile --web.external-url=https://prom-network-monitoring --web.route-prefix=/ --storage.tsdb.max-block-duration=2h --storage.tsdb.min-block-duration=2h

Thanos sidecar version: 0.23.1

Object Storage Provider:
Minio

What happened:
This is one of the metrics where this issue manifests:

This is how an irate looks like:

The red line is the irate of the deduplicated metric while the yellow and blue are the irates for the metrics from prometheus.

Pressing Execute multiple times, sometimes the graph looks like this:

What you expected to happen:
I would expect for the irate of the deduplicated metric to look like the original from prometheus.

How to reproduce it (as minimally and precisely as possible):

Full logs to relevant components:
I could not find any error in thanos compact or thanos query logs.

Anything else we need to know:
This is how a rate query looks like on the same metric:

It looks somewhat better then the irate query but I would have expected to be closer to the query on the original data from prometheus.

This is how the values of the original and deduplicated metrics look like:

The third line is the deduplicated metric.

Is this a bug or something wrong in my configuration?

The text was updated successfully, but these errors were encountered:

yeya24 · 2022-01-04T09:58:23Z

Deduplication doesn't work in your case because that's for 1:1 deduplication so mainly in receiver cases.
In the case of HA prometheus, you cannot do deduplication and the two series merge into one. I think that's the reason why irate won't work for you. If you disable deduplication then it should work.

gabrielbatir · 2022-01-04T10:01:31Z

Now I noticed that the deduplicated stream actually contains the metrics from both prometheus replicas and in most cases the values are identical for both.
I am not really sure how irate works but could this be the issue?

Could this be fixed by using --deduplication.func=penalty for the compactor? From the docs this option seems to break irate in other ways.

gabrielbatir · 2022-01-04T11:43:31Z

@yeya24 From what I can see at https://thanos.io/tip/components/compact.md/#enabling-vertical-compaction in order to enable vertical compaction I need to use that hidden flag.
From that page I understand that it is not enabled by default yet in my case, as you can see above, I do have it but compact does vertical compaction.

Maybe the note at

thanos/cmd/thanos/compact.go

Line 719 in 56d99eb

    
           "NOTE: This flag is ignored and (enabled) when --deduplication.replica-label flag is set.").

could be put in the documentation.

RayHuangCN · 2022-01-05T06:52:26Z

I meet the same case.

yeya24 · 2022-01-05T07:02:47Z

Could this be fixed by using --deduplication.func=penalty for the compactor? From the docs this option seems to break irate in other ways.

Ideally it should solve the issue but now the functionality is not very stable when downsampling is also enabled so I wouldn't recommend enabling it.

I am not really sure how irate works but could this be the issue?

irate works by calculating the rate only based on the last two data points from your series.
For example, we have two series:

{replica=1} 1@1641055871  10000@1641055886
{replica=2} 1@1641055870  10001@1641055885

If irate function is called at time 1641055885 then the last two points are 1@1641055871 and 10001@1641055885 respectively. So the value diff is about 10000.
If irate function is called at time 1641055871 ~ 1641055884 then the last two points are 1@1641055870 and
1@1641055871. Now the value diff is 0.

This is just an example. If you are using rate then it might go wrong, too when counter values mistakenly resets.

Maybe the note at could be put in the documentation.

Yeah that's a good idea. Would you like to contribute and update the doc?

stale · 2022-04-17T05:54:19Z

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

stale · 2022-05-01T16:12:20Z

Closing for now as promised, let us know if you need this to be reopened! 🤗

stale bot added the stale label Apr 17, 2022

stale bot closed this as completed May 1, 2022

jonasmatthias mentioned this issue Dec 22, 2022

DATA LOSS: thanos-compact deduplication is experimental and should not be enabled by default thanos-io/kube-thanos#290

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

irate/rate query issue on the deduplicated metrics #5025

irate/rate query issue on the deduplicated metrics #5025

gabrielbatir commented Jan 4, 2022

yeya24 commented Jan 4, 2022

gabrielbatir commented Jan 4, 2022

gabrielbatir commented Jan 4, 2022

RayHuangCN commented Jan 5, 2022

yeya24 commented Jan 5, 2022 •

edited

Loading

stale bot commented Apr 17, 2022

stale bot commented May 1, 2022

irate/rate query issue on the deduplicated metrics #5025

irate/rate query issue on the deduplicated metrics #5025

Comments

gabrielbatir commented Jan 4, 2022

yeya24 commented Jan 4, 2022

gabrielbatir commented Jan 4, 2022

gabrielbatir commented Jan 4, 2022

RayHuangCN commented Jan 5, 2022

yeya24 commented Jan 5, 2022 • edited Loading

stale bot commented Apr 17, 2022

stale bot commented May 1, 2022

yeya24 commented Jan 5, 2022 •

edited

Loading