Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to v2.28.1 - > v2.29.0 An error was reported #5862

Closed
BigbigY opened this issue Nov 4, 2022 · 15 comments · Fixed by #6082
Closed

Upgrade to v2.28.1 - > v2.29.0 An error was reported #5862

BigbigY opened this issue Nov 4, 2022 · 15 comments · Fixed by #6082
Labels

Comments

@BigbigY
Copy link

BigbigY commented Nov 4, 2022

Error message:

Error executing query: expanding series: proxy Series(): rpc error: code = Aborted desc = No StoreAPIs matched for this query

Symptom The same query generates an error after v2.29.0 is upgraded, and v2.28.1 is rolled back to recover the query

An error is as follows
image

@GiedriusS
Copy link
Member

GiedriusS commented Nov 4, 2022

Hmm, maybe something related to #5296. Do you know how we could reproduce this? 🤔 Also, do you see all expected StoreAPI nodes in the Stores page?

@BigbigY
Copy link
Author

BigbigY commented Nov 5, 2022

Do not check the specific problem and error, directly roll back the version, Looking forward to repair

The new 'thanos_frontend_sharding_middleware_queries_total' metrics added to query-fe was not found

@hanjm
Copy link
Member

hanjm commented Nov 7, 2022

I upgrade 0.29.0.rc early and not found problem, I mainly use sidecar mode, so just upgrade sidecar, querier, query-frontend, not include receiver.

today when i upgrade receiver, the 'No StoreAPIs matched for this query' problem occured.

it only error when query the tenant which use receive mode with thanos-query, query other tenant is work ok, query via thanos-query-frontend is work ok.

the receive print this logs

2022-11-07 18:27:56 level=warn ts=2022-11-07T10:27:56.053906788Z caller=proxy.go:282 component=receive component=proxy request="min_time:1667816575801 max_time:1667816875801 matchers:<name:\"app\" value:\"xxx\" > matchers:<name:\"namespace\" value:\"xxx\" > matchers:<name:\"server\" value:\"xxx\" > matchers:<name:\"tenant_id\" value:\"default\" > matchers:<name:\"__name__\" value:\"process_resident_memory_bytes\" > aggregates:COUNT aggregates:SUM partial_response_disabled:true " err="No StoreAPIs matched for this query" stores="store LabelSets: {receive_cluster=\"cluster_xxx\", receive_replica=\"thanos-receiver-ingestor-xxx-0\", tenant_id=\"xxx\"} Mint: 1667436599098 Maxt: 9223372036854775807 filtered out: external labels [{receive_cluster=\"cluster_xxx\", receive_replica=\"thanos-receiver-ingestor-xxx-0\", tenant_id=\"xxx\"}] does not match request label matchers: [app=\"xxx\" namespace=\"xxx\" server=\"xx\" tenant_id=\"default\" __name__=\"process_resident_memory_bytes\"];store LabelSets: {receive_cluster=\"cluster_xxx\", receive_replica=\"thanos-receiver-ingestor-xxx-0\", tenant_id=\"default\"} Mint: 9223372036854775807 Maxt: 9223372036854775807 filtered out: does not have data within this time period: [1667816575801,1667816875801]. Store time ranges: [9223372036854775807,9223372036854775807]"

I think the PR #4886 may be relevant.

@hanjm
Copy link
Member

hanjm commented Nov 7, 2022

hi @GiedriusS @matej-g

I notice the prometheus code minTime is math.MaxInt64 but in thanos multi tsdb code it is math.MinInt64, so the query is filtered out unexpectedly?

https://github.com/prometheus/prometheus/blob/6dd4e907a31a58d1b738aa5debbfc9c5e1ed32ac/tsdb/head.go#L288-L289

thanos/pkg/store/tsdb.go

Lines 115 to 116 in ef3a331

func (s *TSDBStore) TimeRange() (int64, int64) {
var minTime int64 = math.MinInt64

@matej-g
Copy link
Collaborator

matej-g commented Nov 10, 2022

Hey @hanjm, it's possible, although I cannot see how this breaks the time range filtering

@GiedriusS
Copy link
Member

@BigbigY do you use Thanos Receive?

@hanjm
Copy link
Member

hanjm commented Nov 12, 2022

@matej-g
the receiver instance have two tenant data:

  1. tenant_id=comp with time range is Mint: 1667436599098 Maxt: 9223372036854775807
  2. tenant_id=default with time range is Mint: 9223372036854775807 Maxt: 9223372036854775807

the receiver may exposed multi tenant label info, but only one time range info.

thanos/pkg/store/proxy.go

Lines 218 to 236 in ef3a331

func (s *ProxyStore) TimeRange() (int64, int64) {
stores := s.stores()
if len(stores) == 0 {
return math.MinInt64, math.MaxInt64
}
var minTime, maxTime int64 = math.MaxInt64, math.MinInt64
for _, s := range stores {
storeMinTime, storeMaxTime := s.TimeRange()
if storeMinTime < minTime {
minTime = storeMinTime
}
if storeMaxTime > maxTime {
maxTime = storeMaxTime
}
}
return minTime, maxTime
}

when query receiver in querier, it will be not filtered out.

store Addr: *.*.*.*:10907 LabelSets: {receive_cluster=\"cluster_comp\", receive_replica=\"thanos-receiver-ingestor-comp-0\", tenant_id=\"comp\"},{receive_cluster=\"cluster_comp\", receive_replica=\"thanos-receiver-ingestor-comp-0\", tenant_id=\"default\"} Mint: 1667436599098 Maxt: 9223372036854775807 queried

when request arrived the receiver, it will filtered out.

2022-11-07 18:27:56 level=warn ts=2022-11-07T10:27:56.053906788Z caller=proxy.go:282 component=receive component=proxy request="min_time:1667816575801 max_time:1667816875801 matchers:<name:\"app\" value:\"xxx\" > matchers:<name:\"namespace\" value:\"xxx\" > matchers:<name:\"server\" value:\"xxx\" > matchers:<name:\"tenant_id\" value:\"default\" > matchers:<name:\"__name__\" value:\"process_resident_memory_bytes\" > aggregates:COUNT aggregates:SUM partial_response_disabled:true " err="No StoreAPIs matched for this query" stores="store LabelSets: {receive_cluster=\"cluster_xxx\", receive_replica=\"thanos-receiver-ingestor-xxx-0\", tenant_id=\"comp\"} Mint: 1667436599098 Maxt: 9223372036854775807 filtered out: external labels [{receive_cluster=\"cluster_xxx\", receive_replica=\"thanos-receiver-ingestor-xxx-0\", tenant_id=\"comp\"}] does not match request label matchers: [app=\"symphony\" namespace=\"Production\" server=\"piano\" tenant_id=\"default\" __name__=\"process_resident_memory_bytes\"];store LabelSets: {receive_cluster=\"cluster_xxx\", receive_replica=\"thanos-receiver-ingestor-comp-0\", tenant_id=\"default\"} Mint: 9223372036854775807 Maxt: 9223372036854775807 filtered out: does not have data within this time period: [1667816575801,1667816875801]. Store time ranges: [9223372036854775807,9223372036854775807]"

then the receiver return a Abort error.

@hanjm
Copy link
Member

hanjm commented Nov 12, 2022

I think the receiver should not return abort error event if no client matched in multi-tsdb, right?

@yeya24
Copy link
Contributor

yeya24 commented Nov 13, 2022

Sounds like a valid bug. Help wanted

@yeya24 yeya24 added the bug label Nov 13, 2022
@trevorriles
Copy link

We are seeing this and are not using Thanos Receive.

@BigbigY
Copy link
Author

BigbigY commented Nov 18, 2022

@BigbigY do you use Thanos Receive?

Thanos Receive has not been upgraded

@matej-g
Copy link
Collaborator

matej-g commented Nov 18, 2022

It looks like then this might be because of changes in query? Either way thanks @hanjm for the report, I'll try to take a look whether I can reproduce / fix this.

@matej-g
Copy link
Collaborator

matej-g commented Nov 25, 2022

Hey @hanjm , thanks for the pointers, I did not try to reproduce it, just went through the code (although it should not be too hard to create a test case if my thinking is correct) and I think you're right, but it's not only because of time range.

I believe the filtering in the downstream proxy store (in receiver) works correctly - because for one store we cannot match labels (different tenant) and for the other store we cannot match time range, it correctly returns that no stores were found.

The problem seems to be that the upstream proxy store (in querier) does not have enough information to filter out this downstream proxy store. Since when the downstream store returns info to upstream store, it reports labels and time range for both stores (combined) - the upstream store therefore cannot filter it out. However, once we hit the downstream proxy, we find that in fact neither one of the two stores match and we go to no matching store error and this is what we see in the upstream proxy store (https://github.com/thanos-io/thanos/blob/main/pkg/store/proxy.go#L322).

I'm not sure how this worked before #5296 and if this could have changed, but does this make sense @hanjm @GiedriusS?

@bwplotka
Copy link
Member

bwplotka commented Dec 22, 2022

This was also brought to us by @SuperQ at Community Hours.

Generally, the problem that Ben is having is that when multiple Queries are chained, no Store matched situation might be not an error. It's was done as error for easy debugging on configuration error, but not it is impacting correct queries.

Sounds like #5937 is the solution, we will take a look, thanks!

@SuperQ
Copy link
Contributor

SuperQ commented Dec 22, 2022

In #5937, I wonder if we need to get that fancy with a control flag.

SuperQ added a commit to SuperQ/thanos that referenced this issue Jan 29, 2023
It's normal and not an error if a query does not match due to no
downstream stores. This is common when querying with external labels and
tiered query servers.

This bug was introduced in thanos-io#5296

Fixes: thanos-io#5862

Signed-off-by: SuperQ <superq@gmail.com>
SuperQ added a commit to SuperQ/thanos that referenced this issue Jan 29, 2023
It's normal and not an error if a query does not match due to no
downstream stores. This is common when querying with external labels and
tiered query servers.

This bug was introduced in thanos-io#5296

Fixes: thanos-io#5862

Signed-off-by: SuperQ <superq@gmail.com>
SuperQ added a commit to SuperQ/thanos that referenced this issue Jan 29, 2023
It's normal and not an error if a query does not match due to no
downstream stores. This is common when querying with external labels and
tiered query servers.

This bug was introduced in thanos-io#5296

Fixes: thanos-io#5862

Signed-off-by: SuperQ <superq@gmail.com>
SuperQ added a commit to SuperQ/thanos that referenced this issue Jan 29, 2023
It's normal and not an error if a query does not match due to no
downstream stores. This is common when querying with external labels and
tiered query servers.

This bug was introduced in thanos-io#5296

Fixes: thanos-io#5862

Signed-off-by: SuperQ <superq@gmail.com>
SuperQ added a commit to SuperQ/thanos that referenced this issue Jan 29, 2023
It's normal and not an error if a query does not match due to no
downstream stores. This is common when querying with external labels and
tiered query servers.

This bug was introduced in thanos-io#5296

Fixes: thanos-io#5862

Signed-off-by: SuperQ <superq@gmail.com>
SuperQ added a commit to SuperQ/thanos that referenced this issue Jan 29, 2023
It's normal and not an error if a query does not match due to no
downstream stores. This is common when querying with external labels and
tiered query servers.

This bug was introduced in thanos-io#5296

Fixes: thanos-io#5862

Signed-off-by: SuperQ <superq@gmail.com>
SuperQ added a commit to SuperQ/thanos that referenced this issue Jan 29, 2023
It's normal and not an error if a query does not match due to no
downstream stores. This is common when querying with external labels and
tiered query servers.

This bug was introduced in thanos-io#5296

Fixes: thanos-io#5862

Signed-off-by: SuperQ <superq@gmail.com>
bwplotka pushed a commit that referenced this issue Feb 2, 2023
It's normal and not an error if a query does not match due to no
downstream stores. This is common when querying with external labels and
tiered query servers.

This bug was introduced in #5296

Fixes: #5862

Signed-off-by: SuperQ <superq@gmail.com>
ngraham20 pushed a commit to ngraham20/thanos that referenced this issue Apr 17, 2023
It's normal and not an error if a query does not match due to no
downstream stores. This is common when querying with external labels and
tiered query servers.

This bug was introduced in thanos-io#5296

Fixes: thanos-io#5862

Signed-off-by: SuperQ <superq@gmail.com>
ngraham20 pushed a commit to ngraham20/thanos that referenced this issue Apr 17, 2023
It's normal and not an error if a query does not match due to no
downstream stores. This is common when querying with external labels and
tiered query servers.

This bug was introduced in thanos-io#5296

Fixes: thanos-io#5862

Signed-off-by: SuperQ <superq@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants