Add proposal for store instance high availability #404

mattbostock · 2018-07-03T09:32:40Z

The format of this proposal mirrors an existing draft proposal:
#387

We should to arrive at consensus on the open issues described in the proposal (under the 'Open issues' section) before merging.

/cc @xjewer @fabxc @Bplotka

The format of this proposal mirrors an existing draft proposal: thanos-io#387

domgreen · 2018-07-04T11:13:20Z

docs/proposals/201807_store_instance_high_availability.md

+
+## Open issues
+
+### Querying from multiple buckets or object stores


I think I personally prefer this approach, would allow us to run queries that span multiple buckets.

Example - we collect info from two tenants each of which gets uploaded to their own bucket tenantA and tenantB if we wanted to query the metrics for both adding the a Identifier to the InfoResponse we could have the bucket name on there and it would be valid to wait for both responses.

A further use case of this could be to have buckets that are historical / archive data and some that are within x days and would mean that you could spin up store nodes quicker for the recent bucket as would require less overhead at start.

Agree here with @domgreen. Maybe slight change in gateway: bool into something like gateway string with bucket name would allow us for basic functional sharding of stores in addition HA. What do you think @mattbostock ?

it makes sense having a string instead of bool

I agree, I think using a string identifier is the way forward.

The Protobuf definition could be:

bool deduplicated = 4 string store_group_id = 5

The deduplicated boolean would determine how the query instance handles retrieving the data (i.e. whether to wait for all results). I think I prefer deduplicated over gateway as it's more specific to the functionality we care about in this case. Sidecar instances would have deduplicated set to false. Bucket stores would have deduplicated set to true.

It'd be up to the specific implementation to determine what the store_group_id string is. For bucket stores, I think concatenating the object store URL and bucket name would make sense. For all other stores, I'd leave it empty for now until we have a good reason to use it.

The logic would work like this:

if deduplicated: query_and_use_fastest_response_per_store_group else query_and_wait_for_all_responses

Stores would be grouped by store_group_id. Stores having an empty store_group_id field would be considered all part of the same group.

Since boolean fields are false by default in Protobufs, deduplicated would be false and existing instances would be treated as before for backwards compatibility (i.e. the query instance will wait for all responses, subject to timeouts).

If you agree, I'll update the proposal to match.

do we really need a redundant deduplicated option? Isn't it enough having specified store_group_id as a object store+bucket_name as identifier? Thus, we don't have to wait for all responses if stores have the same store_group_id.

we can account only non-zero string in store_group_id for the deduplication, hence older version would work as before

I think relying on an empty string is too implicit as it alters the behaviour of how responses are treated.

I don't think this's a disadvantage, if you describe the behaviour in the documentation and the help. Otherwise you have to bear in mind both options to be working as expected.

Sorry for the massive delay in reviews and thanks for this (:

store_group_id makes sense to me. deduplicated is tricky. I don't like it because both store gateways with query_and_use_fastest_response_per_store_group and sidecars with query_and_wait_for_all_responses logic are kind of duplicated, so we can work on naming definitely.

However, what about another idea:
In Info let's add:

store_group_id

peer_type (like it is in gossip cluster.PeerState.Type).

I am proposing this because I am trying to figure out how to make deduplication flow consistent across ALL components. Including not yet solved: Ruler deduplication!

Basically, with this we can deduct correct behavior everywhere:

Store gateway: store_group_id would be as @mattbostock suggested and peerType=store so query can perform query_and_use_fastest_response_per_store_group logic.

Sidecars: store_group_id would be all external labels as it is ID of those. peerType=source so query can assume query_and_wait_for_all_responses flow with some replica label deduplication.

Ruler: The tricky part is that we might want these in HA, but it's hard to define external labels for these, so having some arbitrary store_group_id would be perfect. Plus ruler is peerType=source so same replica deduplication might apply here as well.

I am not saying we want solve ruler with this proposal, just that we might want to reuse same thing and move ALL to store_group_id for deduplication logic for consistency.

Tricky part here:
4 Rules (in 2 different groups) - it would be not trivial to implement this, because the query_and_wait_for_all_responses + dedup is done truly by removing (or not) replica label. But that is something not for proposal anyway.

What do you think?

Thanks for the comments all.

@Bplotka: I think your proposal makes sense, I'll update this PR to use that.

bwplotka

Thanks for it!

bwplotka · 2018-07-04T23:02:21Z

docs/proposals/201807_store_instance_high_availability.md

+failed but similar data was recorded by another Prometheus server in the same
+failure domain. To support deduplication, Thanos must wait for all Thanos
+sidecar servers to return their data (subject to timeouts) before returning a
+response to a client.


It is worth to mention, that we DO not ignore these timeouts, even with the HA sidecar scenario we treat one replica being inaccessible as an "error case" - which might be wrong or confusing and definitely different to store gateway case (:

Good shout, I'll make that explicit.

On second thoughts, I'm not sure if we need to expand on this as I want to avoid confusing the reader by giving too much detail. Maybe I should remove the 'subject to timeouts' part?

bwplotka · 2018-07-04T23:07:24Z

docs/proposals/201807_store_instance_high_availability.md

+that do not explicitly set a value for `gateway` will not be affected.
+
+If a store instance is a gateway, query instances will treat each store
+instance in a label group as having access to the same data. Query instances


How to tell if stores are from same label groups if they do not put any labels in current logic?

I think this is related to the alternative mentioned below. will comment there

bwplotka · 2018-07-04T23:19:16Z

docs/proposals/201807_store_instance_high_availability.md

+
+## Open issues
+
+### Querying from multiple buckets or object stores


Agree here with @domgreen. Maybe slight change in gateway: bool into something like gateway string with bucket name would allow us for basic functional sharding of stores in addition HA. What do you think @mattbostock ?

mattbostock · 2018-07-17T17:13:09Z

Updated based on feedback, comments welcome.

bwplotka · 2018-07-19T10:46:09Z

@mattbostock I don't know why we did not ask about this, but....

For HA, is that not enough to just put multiple replicas of store gateways behind some loadbalancer (for example 2 replica statefulset behind k8s service)? (:

This will not work for gossip (as in gossip pods talks to pods), but wil work for static --store flag just fine

bwplotka · 2018-08-27T09:04:54Z

Thanks for your work here, but I think we need to merge it as rejected ): We want gossip to be removed soon and this gives clear, easy win to achieve HA for store gateways -> just by using K8s service or any other simple RR balancer.

However this proposal might be a good base for other features in soon future:

exposing some store gateway external labels (configured statically)
having single store gateway supporting multiple buckets

bwplotka · 2018-10-26T23:46:31Z

Merged, just to move it to rejected state in next PR and have it for future reference.

Add proposal for store instance high availability

1ca587d

The format of this proposal mirrors an existing draft proposal: thanos-io#387

domgreen reviewed Jul 4, 2018

View reviewed changes

domgreen added feature request/improvement proposal labels Jul 4, 2018

mattbostock mentioned this pull request Jul 4, 2018

HA handling for store nodes #199

Closed

bwplotka reviewed Jul 4, 2018

View reviewed changes

Update HA store proposal based on feedback

41403db

bwplotka merged commit 6795acd into thanos-io:master Oct 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add proposal for store instance high availability #404

Add proposal for store instance high availability #404

mattbostock commented Jul 3, 2018

domgreen Jul 4, 2018 •

edited

bwplotka Jul 4, 2018

xjewer Jul 5, 2018 •

edited

mattbostock Jul 9, 2018

xjewer Jul 10, 2018

xjewer Jul 10, 2018

mattbostock Jul 11, 2018

xjewer Jul 12, 2018

bwplotka Jul 13, 2018

mattbostock Jul 17, 2018

bwplotka left a comment

bwplotka Jul 4, 2018

mattbostock Jul 9, 2018 •

edited

mattbostock Jul 17, 2018

bwplotka Jul 4, 2018

bwplotka Jul 4, 2018

bwplotka Jul 4, 2018

mattbostock commented Jul 17, 2018

bwplotka commented Jul 19, 2018 •

edited

bwplotka commented Aug 27, 2018

bwplotka commented Oct 26, 2018 •

edited


		## Open issues

		### Querying from multiple buckets or object stores

Add proposal for store instance high availability #404

Add proposal for store instance high availability #404

Conversation

mattbostock commented Jul 3, 2018

domgreen Jul 4, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xjewer Jul 5, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bwplotka left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattbostock Jul 9, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattbostock commented Jul 17, 2018

bwplotka commented Jul 19, 2018 • edited

bwplotka commented Aug 27, 2018

bwplotka commented Oct 26, 2018 • edited

domgreen Jul 4, 2018 •

edited

xjewer Jul 5, 2018 •

edited

mattbostock Jul 9, 2018 •

edited

bwplotka commented Jul 19, 2018 •

edited

bwplotka commented Oct 26, 2018 •

edited