No metrics displayed with Thanos engine ranged query (go_routines[1d]) in query distributes installation (Prometheus engine returns correct output) #8078

s0rl0v · 2025-01-29T13:47:00Z

Thanos, Prometheus and Golang version used:
Thanos - v0.37.2
Prometheus - v2.55.1
Golang - v1.23.4

Object Storage Provider:
Azure
Huawei OBS

What happened:
I've set up Thanos Query in distributed mode with the following stores (aka local queries) in config:

query-az.int.zone
query-hw.int.zone

extraArgs:

--query.timeout=5m
--query.mode=distributed

The local queries have Thanos Gateways hostnames in their corresponding configurations (Prometheus installations with Thanos sidecars).

When I query distributed query with range query (e.g. go_threads[1d]) via Thanos engine it returns nothing

, Prometheus engine - returns correct output.

Querying local query directly also yields correct results with either Thanos and Promethus engines.

What you expected to happen:
Distributed query returns the same output with Promethues and Thanos engines.

How to reproduce it (as minimally and precisely as possible):

Have Thanos installation with Thanos Gateways.
Set up distributed Query with two local queries in --store extra args.
Search any ranged metric (e.g. go_threads[1d]) with Thanos Enginer.
See nothing

Full logs to relevant components:
no errors whatsoever

Anything else we need to know:

dosubot · 2025-01-29T13:50:26Z

Hey there, @s0rl0v! I'm here to assist you with any bugs, questions, or contributions you might have. Let's tackle this together!

The issue you're experiencing with Thanos not displaying metrics for range queries in distributed mode could be related to several factors specific to the Thanos engine's handling of distributed queries. Here are some potential areas to investigate:

Query Mode Configuration: Ensure that the --query.mode=distributed flag is correctly set for all relevant components. This mode is essential for enabling distributed query execution across multiple stores ^[1].
Thanos Gateway Configuration: Verify that the Thanos Gateways are correctly configured and accessible from the Thanos Query component. Any misconfiguration here could lead to issues in fetching data from the stores.
Replica Labels and Deduplication: Check if the --query.replica-label is set correctly to handle deduplication across your stores. Incorrect deduplication settings might result in missing data in the query results ^[2].
Partial Response Handling: Consider enabling partial responses using the --query.partial-response flag. This can help in scenarios where some stores might not be responding, allowing the query to return partial results instead of nothing ^[2].
Logs and Debugging: Although you mentioned there are no errors, enabling more verbose logging might provide additional insights into what might be going wrong during the query execution.
Version-Specific Issues: While there are no known issues specifically documented for Thanos v0.37.2 related to distributed queries, it's worth checking the release notes or any recent issues in the Thanos repository for any updates or patches that might address this behavior.

By reviewing these configurations and settings, you might be able to identify the root cause of the issue with the Thanos engine in distributed mode.

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other}

s0rl0v · 2025-01-29T13:56:40Z

Query Mode Configuration: Ensure that the --query.mode=distributed flag is correctly set for all relevant components. This mode is essential for enabling distributed query execution across multiple stores [1].

The flag set only for global query, local ones are without it.

Thanos Gateway Configuration: Verify that the Thanos Gateways are correctly configured and accessible from the Thanos Query component. Any misconfiguration here could lead to issues in fetching data from the stores.

The gateways are present in Stores menu, there are no errors in query logs assiciated with store unavailability.

Replica Labels and Deduplication: Check if the --query.replica-label is set correctly to handle deduplication across your stores. Incorrect deduplication settings might result in missing data in the query results [2].

replica label is set to "site", Azure metrics have value set to "azure", huawei ones are set to "huawei"

Partial Response Handling: Consider enabling partial responses using the --query.partial-response flag. This can help in scenarios where some stores might not be responding, allowing the query to return partial results instead of nothing [2].

enabled

Logs and Debugging: Although you mentioned there are no errors, enabling more verbose logging might provide additional insights into what might be going wrong during the query execution.

I'll check this out.

Version-Specific Issues: While there are no known issues specifically documented for Thanos v0.37.2 related to distributed queries, it's worth checking the release notes or any recent issues in the Thanos repository for any updates or patches that might address this behavior.

There are no known issues, I've used search before creating this issue.

fpetkovski · 2025-01-29T14:09:18Z

Could you click on the Endpoints tab and share a screenshot?

s0rl0v · 2025-01-29T14:38:29Z

@fpetkovski Sure!
This is what distributed endpoints look like:

These are endpoints from local query (non-distributed)

Lavaerius · 2025-03-05T14:39:57Z

Hi there. Is there a chance this is related to this issue?

#7757

ibrahimasow1 · 2025-03-28T13:39:15Z

I have this same exact issue.
I defined --query.replica-label replica --query.replica-label host in the central querier.
Here are the debug logs from the querier when we are using the prometheus query engine vs the thanos query engine.

# PROMETHEUS QUERY ENGINE
Mar 28 13:31:57 server thanos[269408]: ts=2025-03-28T13:31:57.600201832Z caller=proxy.go:320 level=debug component=proxy request="min_time:1743082317501 max_time:1743168717501 matchers:<name:\"__name__\" value:\"go_threads\" > max_resolution_window:3600000 aggregates:COUNT aggregates:SUM partial_response_strategy:ABORT without_replica_labels:\"host\" without_replica_labels:\"replica\" " msg="Series: started fanout streams" status="Store Addr: 10.0.1.88:11901 LabelSets: {@thanos_compatibility_store_type=\"store\"},{host=\"10.0.1.88\", replica=\"replica-3\"} MinTime: 1742812380035 MaxTime: 9223372036854775807 queried;Store Addr: 10.0.1.10:11901 LabelSets: {@thanos_compatibility_store_type=\"store\"},{host=\"10.0.1.10\", replica=\"replica-2\"} MinTime: 1742817892408 MaxTime: 9223372036854775807 queried;Store Addr: 10.0.1.175:11901 LabelSets: {@thanos_compatibility_store_type=\"store\"},{host=\"10.0.1.175\", replica=\"replica-1\"} MinTime: 1742817889672 MaxTime: 9223372036854775807 queried;Store Addr: 10.0.1.204:11901 LabelSets: {@thanos_compatibility_store_type=\"store\"},{host=\"10.0.1.204\", replica=\"replica-1\"} MinTime: 1742817904708 MaxTime: 9223372036854775807 queried;Store Addr: 10.0.1.96:11901 LabelSets: {@thanos_compatibility_store_type=\"store\"},{host=\"10.0.1.96\", replica=\"replica-2\"} MinTime: 1742817892386 MaxTime: 9223372036854775807 queried;Store Addr: 10.0.1.148:11901 LabelSets: {@thanos_compatibility_store_type=\"store\"},{host=\"10.0.1.148\", replica=\"replica-3\"} MinTime: 1742817887992 MaxTime: 9223372036854775807 queried;Store Addr: 10.0.1.197:11901 LabelSets: {@thanos_compatibility_store_type=\"store\"},{host=\"10.0.1.197\", replica=\"replica-1\"} MinTime: 1742817882624 MaxTime: 9223372036854775807 queried;Store Addr: 10.0.1.217:11901 LabelSets: {@thanos_compatibility_store_type=\"store\"},{host=\"10.0.1.217\", replica=\"replica-2\"} MinTime: 1742817880379 MaxTime: 9223372036854775807 queried;Store Addr: 10.0.1.70:11901 LabelSets: {@thanos_compatibility_store_type=\"store\"},{host=\"10.0.1.70\", replica=\"replica-2\"} MinTime: 1742812380035 MaxTime: 9223372036854775807 queried;Store Addr: 10.0.1.243:11901 LabelSets: {@thanos_compatibility_store_type=\"store\"},{host=\"10.0.1.243\", replica=\"replica-1\"} MinTime: 1742812380035 MaxTime: 9223372036854775807 queried;Store Addr: 10.0.1.101:11901 LabelSets: {@thanos_compatibility_store_type=\"store\"},{host=\"10.0.1.101\", replica=\"replica-1\"} MinTime: 1742817882624 MaxTime: 9223372036854775807 queried;Store Addr: 10.0.1.171:11901 LabelSets: {@thanos_compatibility_store_type=\"store\"},{host=\"10.0.1.171\", replica=\"replica-3\"} MinTime: 1742817909187 MaxTime: 9223372036854775807 queried;Store Addr: 10.0.1.211:11901 LabelSets: {@thanos_compatibility_store_type=\"store\"},{host=\"10.0.1.211\", replica=\"replica-3\"} MinTime: 1742812380035 MaxTime: 9223372036854775807 queried;Store Addr: 10.0.1.231:11901 LabelSets: {@thanos_compatibility_store_type=\"store\"},{host=\"10.0.1.231\", replica=\"replica-1\"} MinTime: 1742817885892 MaxTime: 9223372036854775807 queried;Store Addr: 10.0.1.64:11901 LabelSets: {@thanos_compatibility_store_type=\"store\"},{host=\"10.0.1.64\", replica=\"replica-2\"} MinTime: 1742812380035 MaxTime: 9223372036854775807 queried;Store Addr: 10.0.1.112:11901 LabelSets: {@thanos_compatibility_store_type=\"store\"},{host=\"10.0.1.112\", replica=\"replica-2\"} MinTime: 1742812380035 MaxTime: 9223372036854775807 queried;Store Addr: 10.0.1.21:11901 LabelSets: {@thanos_compatibility_store_type=\"store\"},{host=\"10.0.1.21\", replica=\"replica-3\"} MinTime: 1742812380035 MaxTime: 9223372036854775807 queried;Store Addr: 10.0.1.250:11901 LabelSets: {@thanos_compatibility_store_type=\"store\"},{host=\"10.0.1.250\", replica=\"replica-3\"} MinTime: 1742817888076 MaxTime: 9223372036854775807 queried"

# THANOS QUERY ENGINE
Mar 28 13:32:51 server thanos[269408]: ts=2025-03-28T13:32:51.019767929Z caller=remote_engine.go:250 level=debug msg="Executed remote query" query=go_threads[1d] time=46.534866ms remote_peak_samples=0 remote_total_samples=0

I was using thanos-0.34.1, then upgraded to thanos-0.37.2 but still facing the same issue.

MichaHoffmann · 2025-03-30T05:01:09Z

Are you using storage GW? If so what's your retention?

MichaHoffmann · 2025-03-30T05:04:04Z

Can you try this flag please: queryDistributedWithOverlappingInterval := cmd.Flag("query.distributed-with-overlapping-interval", "Allow for distributed queries using an engines lowest MinT.").Hidden().Default("false").Bool()

RainbowHerbicides · 2025-03-31T10:09:00Z

Have pretty much the same problem and same version of almost all components (Thanos and Prometheus identical).
@MichaHoffmann https://github.com/thanos-io/thanos/blob/v0.37.2/cmd/thanos/query.go#L80 There is no such flag defined in 0.37.2 version of Thanos

RainbowHerbicides · 2025-03-31T10:57:49Z

My quick and dirty fix was to add 1s in time partitioning (I have 7 days retention in Receive):

      timePartitioning:
        - min: "-26w"
          max: "-7d1s"

Still, some permanent solution would be better

MichaHoffmann · 2025-03-31T12:45:31Z

Flag might be on main and soon 0.38 can you try with 0.38rc and the flag?

ibrahimasow1 · 2025-04-01T11:05:33Z

@MichaHoffmann I can confirm the 0.38.0-rc.1 with the flag --query.distributed-with-overlapping-interval enabled fixes my issue.

RainbowHerbicides · 2025-04-03T10:23:25Z

@MichaHoffmann made quick and dirty installation in our development cluster and looks like it indeed fix this problem and make my workaround redundant 👍

MichaHoffmann · 2025-04-03T18:41:51Z

It's an interesting issue that I think we understand but that's tricky to fix automatically. I hope one day we won't need the flag anymor!

RainbowHerbicides · 2025-04-04T10:26:13Z

@MichaHoffmann It can be documented for anyone that does not intent to use this flag. Since even 1s offset to Storage Gateway`s time partitioning is more that enough to implement fix that make long range query work correctly (I do understand that it indeed looks like more of a hack rather than correct solution, but if it works and solve my issue without necessity to add anything or perform update- its good in my opinion).

dosubot bot added bug component: query labels Jan 29, 2025

s0rl0v changed the title ~~No metrics displayed via Thanos engine ranged query (go_routines[1d]) in query distributes installation~~ No metrics displayed with Thanos engine ranged query (go_routines[1d]) in query distributes installation (Prometheus engine returns correct output) Jan 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No metrics displayed with Thanos engine ranged query (go_routines[1d]) in query distributes installation (Prometheus engine returns correct output) #8078

No metrics displayed with Thanos engine ranged query (go_routines[1d]) in query distributes installation (Prometheus engine returns correct output) #8078

s0rl0v commented Jan 29, 2025 •

edited

Loading

dosubot bot commented Jan 29, 2025

s0rl0v commented Jan 29, 2025 •

edited

Loading

fpetkovski commented Jan 29, 2025

s0rl0v commented Jan 29, 2025 •

edited

Loading

Lavaerius commented Mar 5, 2025

ibrahimasow1 commented Mar 28, 2025 •

edited

Loading

MichaHoffmann commented Mar 30, 2025

MichaHoffmann commented Mar 30, 2025

RainbowHerbicides commented Mar 31, 2025 •

edited

Loading

RainbowHerbicides commented Mar 31, 2025

MichaHoffmann commented Mar 31, 2025

ibrahimasow1 commented Apr 1, 2025 •

edited

Loading

RainbowHerbicides commented Apr 3, 2025

MichaHoffmann commented Apr 3, 2025

RainbowHerbicides commented Apr 4, 2025

No metrics displayed with Thanos engine ranged query (go_routines[1d]) in query distributes installation (Prometheus engine returns correct output) #8078

No metrics displayed with Thanos engine ranged query (go_routines[1d]) in query distributes installation (Prometheus engine returns correct output) #8078

Comments

s0rl0v commented Jan 29, 2025 • edited Loading

dosubot bot commented Jan 29, 2025

s0rl0v commented Jan 29, 2025 • edited Loading

fpetkovski commented Jan 29, 2025

s0rl0v commented Jan 29, 2025 • edited Loading

Lavaerius commented Mar 5, 2025

ibrahimasow1 commented Mar 28, 2025 • edited Loading

MichaHoffmann commented Mar 30, 2025

MichaHoffmann commented Mar 30, 2025

RainbowHerbicides commented Mar 31, 2025 • edited Loading

RainbowHerbicides commented Mar 31, 2025

MichaHoffmann commented Mar 31, 2025

ibrahimasow1 commented Apr 1, 2025 • edited Loading

RainbowHerbicides commented Apr 3, 2025

MichaHoffmann commented Apr 3, 2025

RainbowHerbicides commented Apr 4, 2025

s0rl0v commented Jan 29, 2025 •

edited

Loading

s0rl0v commented Jan 29, 2025 •

edited

Loading

s0rl0v commented Jan 29, 2025 •

edited

Loading

ibrahimasow1 commented Mar 28, 2025 •

edited

Loading

RainbowHerbicides commented Mar 31, 2025 •

edited

Loading

ibrahimasow1 commented Apr 1, 2025 •

edited

Loading