Thanos multi cluster query, hierarchical works, is all-to-all possible/advisable? #8172
Replies: 1 comment 1 reply
-
Hey there, @mrtworo! I'm here to help you with any bugs or questions you have. Let's work together to solve the issue you're facing. Using an all-to-all configuration with Thanos Query as StoreAPI endpoints across multiple clusters is possible, but it can lead to issues like query amplification, which you've experienced. This occurs when queries are redundantly sent across clusters, leading to increased resource usage and potential OOM (Out of Memory) errors. To mitigate query amplification, consider the following configurations:
While an all-to-all setup is technically feasible, it requires careful configuration and resource management to avoid the issues you've encountered. If the hierarchical approach works well, it might be more stable and efficient for your environment. To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hey All!
We are working multi cluster observability in our 3 cluster env, let's call them A, B and C.
Thanos Query grpc services are exposed and reachable cross clusters.
When configured to have Thanos Query in central cluster A aware of own Thanos Sidecars and Store + additionally Thanos Query from clusters B and C everything works like a charm. Hierarchical approach delivered without any issues.
However we wanted to have visibility to all clusters from each cluster. When clusters B and C were configured to also be aware of remaining Thanos Queries besides its own Sidecars and Stores there was a reactor meltdown. It seemed as queries were amplified between clusters and were causing OOM in all components involved.
During the time just Metric Label browser was used in Grafana just to confirm connectivity.
Alternatively we can just configure all-to-all by specifying all endpoints remote Query uses, i.e. cluster A is aware of it's own Stores and Sidecars as well as of all Stores and Sidecars in clusters B and C, it seems inconvenient, but should avoid query amplification.
Is all-to-all scenario using just Thanos Query as StoreAPI endpoints doable? Did we miss some configuration?
Beta Was this translation helpful? Give feedback.
All reactions