Announce highest TSDB min time #6253

fpetkovski · 2023-04-01T07:57:34Z

The distributed query engine queries both store and ingest components in parallel and deduplicates samples by prioritizing those coming from components with a higher max time. Different Prometheus/Receive components can apply retention independetly of each other which can cause remote aggregates to be executed on partial data. This will lead to incorrect results because these partial aggregates will be prioritized over data coming from stores. The problem is first described in thanos-io/promql-engine#187.

To address this problem, this commit exposes an info field called guaranteed_min_time (better names are welcome :) ) from all stores APIs. The field indicates what is the minimum time that a store API guarantees to hold across all TSDBs. For stores APIs holding a single TSDB (Prometheus, Ruler) this field is calculated by using the retention and min block duration parameters. The Querier/Proxy component calculate the value by taking the highest guaranteed_min_time across all TSDBs.

Using this field, the distributed engine can scope the start time of remote queries to a time for which data is guaranteed to be complete.

I added CHANGELOG entry for this change.
Change is not relevant to the end user.

Changes

Expose guaranteed_min_time info field.

Verification

Tested manually and using unit tests. The field is also exposed in the UI.

The distributed query engine queries both store and ingest components in parallel and deduplicates samples by prioritizing those coming from components with a higher max time. Different Prometheus/Receive components can apply retention independetly of each other which can cause remote aggregates to be executed on partial data. This will lead to incorrect results because these partial aggregates will be prioritized over data coming from stores. The problem is first described in thanos-io/promql-engine#187. To address this problem, this commit exposes an info field called guaranteed_min_time from all stores APIs. The field indicates what is the minimum time that the store guarantees to hold across all TSDBs. For stores holding a single TSDB (Prometheus, Ruler) this field is calculated by using the retention and min block duration parameters. The Querier/Proxy component calculate the value by taking the highest guaranteed_min_time across all TSDBs. Using this field, the distributed engine can scope the start time of remote queries to a time for which data is guaranteed to be complete. Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

The retention flag in Prometheus has been renamed to `storage.tsdb.retention.time` while Thanos still uses `storage.tsdb.retention`. This leads to the guaranteed min time being calculated incorrectly for Prometheus instances. This commit fixes the issue. Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

GiedriusS

This feels a bit like a hack because it sounds to me like the core of the problem is that InfoResponse only contains one StoreInfo whereas nowadays multiple TSDBs are supported in one component. 🤔 I wonder if it would be a lot of work to instead introduce repeated StoreInfo. Also, what makes this StoreInfo special? Perhaps we would like to have repeated TargetsInfo and so on?

fpetkovski · 2023-04-27T13:17:42Z

Yeah the root of the problem is that StoreInfo has multiple LabelSets but only one min and max time for all of them. Ideally we want each LabelSet to have its own range. We could then propagate this information to engine so it can decide how to scope a query. In practice we want to make sure that replicas of a label set are always queried on the same time range.

So maybe StoreInfo needs to have a repeated TSDBInfo, and TSDBInfo would have ZLabelSet and min and max time.

fpetkovski · 2023-05-02T15:47:44Z

@matej-g @GiedriusS this is what this suggestion would look like: #6329

stale · 2023-06-18T07:10:36Z

Hello 👋 Looks like there was no activity on this amazing PR for the last 30 days.
Do you mind updating us on the status? Is there anything we can help with? If you plan to still work on it, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next week, this issue will be closed (we can always reopen a PR if you get back to this!). Alternatively, use remind command if you wish to be reminded at some point in future.

pull-request-size bot added the size/L label Apr 1, 2023

fpetkovski force-pushed the guaranteed-min-time branch 2 times, most recently from 2f6cd59 to 209d0b3 Compare April 12, 2023 14:09

fpetkovski force-pushed the guaranteed-min-time branch from 209d0b3 to fce6436 Compare April 12, 2023 14:09

GiedriusS reviewed Apr 27, 2023

View reviewed changes

stale bot added the stale label Jun 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Announce highest TSDB min time #6253

Announce highest TSDB min time #6253

Uh oh!

fpetkovski commented Apr 1, 2023

Uh oh!

GiedriusS left a comment

Uh oh!

fpetkovski commented Apr 27, 2023

Uh oh!

fpetkovski commented May 2, 2023

Uh oh!

stale bot commented Jun 18, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Announce highest TSDB min time #6253

Are you sure you want to change the base?

Announce highest TSDB min time #6253

Uh oh!

Conversation

fpetkovski commented Apr 1, 2023

Changes

Verification

Uh oh!

GiedriusS left a comment

Choose a reason for hiding this comment

Uh oh!

fpetkovski commented Apr 27, 2023

Uh oh!

fpetkovski commented May 2, 2023

Uh oh!

stale bot commented Jun 18, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants