Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thanos queries taking too long #7088

Open
dgaponcic opened this issue Jan 23, 2024 · 2 comments
Open

Thanos queries taking too long #7088

dgaponcic opened this issue Jan 23, 2024 · 2 comments

Comments

@dgaponcic
Copy link

Hi everyone,

We have been using Thanos for a few months now. It is deployed using sidecars, and the historical data is stored in S3.

Our setup:

  • Thanos version 0.32.5
  • Kubernetes v1.25.3
  • Prometheus stores locally the metrics for 2 weeks.
  • The compactor is running and the data is compacted and downsampled:
    * --retention.resolution-raw=30d
    * --retention.resolution-5m=30d
    * --retention.resolution-1h=10y

The problem

I want to see some statistics for the last 30 days, but the dashboard takes very long to load. When I ask for data for the period now-30d to now, it takes 25 seconds for the dashboard to load.

When I change the timeframe to now-45d to now-15d (also 30 days, but the data is available only in S3), the same dashboard takes only 10 seconds to load.

I can see in the logs that all the sidecars are contacted. I'm not sure how exactly the response is aggregated, probably for 2 weeks the data is taken from the sidecars, and the rest from the bucket.

Similarly, if I query for 2 weeks (14 days) of data:

  • now-14d to now - takes 25 seconds to load
  • now-30d to now-16d - takes 8 seconds to load

It seems that when the timeframe includes the last 2 weeks of data (the period for which the data is still kept locally by Prometheus) the queries take much longer.


How can I speed up the queries? Am I doing something wrong in our setup? It seems like an obvious use case, but it takes very long to load, to the point where the dashboard is unusable.

Thank you for any potential ideas!

@douglascamata
Copy link
Contributor

douglascamata commented Jan 23, 2024

Unless you properly configured your Thanos Store Gateways min and max time to exclude the timeframe covered by the sidecars, you're getting a lot of duplicated data that the Thanos Queriers have to work hard to de-duplicate. Hopefully you are also configuring labels correctly for deduplication 😬

@MichaHoffmann
Copy link
Contributor

Can you maybe provide a trace of the problematic query?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants