Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add proposal for store instance high availability #404

Merged
merged 2 commits into from
Oct 26, 2018

Conversation

mattbostock
Copy link
Contributor

The format of this proposal mirrors an existing draft proposal:
#387

We should to arrive at consensus on the open issues described in the proposal (under the 'Open issues' section) before merging.

/cc @xjewer @fabxc @Bplotka

The format of this proposal mirrors an existing draft proposal:
thanos-io#387

## Open issues

### Querying from multiple buckets or object stores
Copy link
Contributor

@domgreen domgreen Jul 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I personally prefer this approach, would allow us to run queries that span multiple buckets.

Example - we collect info from two tenants each of which gets uploaded to their own bucket tenantA and tenantB if we wanted to query the metrics for both adding the a Identifier to the InfoResponse we could have the bucket name on there and it would be valid to wait for both responses.

A further use case of this could be to have buckets that are historical / archive data and some that are within x days and would mean that you could spin up store nodes quicker for the recent bucket as would require less overhead at start.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree here with @domgreen. Maybe slight change in gateway: bool into something like gateway string with bucket name would allow us for basic functional sharding of stores in addition HA. What do you think @mattbostock ?

Copy link
Contributor

@xjewer xjewer Jul 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it makes sense having a string instead of bool

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I think using a string identifier is the way forward.

The Protobuf definition could be:

bool deduplicated = 4
string store_group_id = 5

The deduplicated boolean would determine how the query instance handles retrieving the data (i.e. whether to wait for all results). I think I prefer deduplicated over gateway as it's more specific to the functionality we care about in this case. Sidecar instances would have deduplicated set to false. Bucket stores would have deduplicated set to true.

It'd be up to the specific implementation to determine what the store_group_id string is. For bucket stores, I think concatenating the object store URL and bucket name would make sense. For all other stores, I'd leave it empty for now until we have a good reason to use it.

The logic would work like this:

if deduplicated:
  query_and_use_fastest_response_per_store_group
else
   query_and_wait_for_all_responses

Stores would be grouped by store_group_id. Stores having an empty store_group_id field would be considered all part of the same group.

Since boolean fields are false by default in Protobufs, deduplicated would be false and existing instances would be treated as before for backwards compatibility (i.e. the query instance will wait for all responses, subject to timeouts).

If you agree, I'll update the proposal to match.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need a redundant deduplicated option? Isn't it enough having specified store_group_id as a object store+bucket_name as identifier? Thus, we don't have to wait for all responses if stores have the same store_group_id.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can account only non-zero string in store_group_id for the deduplication, hence older version would work as before

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think relying on an empty string is too implicit as it alters the behaviour of how responses are treated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this's a disadvantage, if you describe the behaviour in the documentation and the help. Otherwise you have to bear in mind both options to be working as expected.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the massive delay in reviews and thanks for this (:

store_group_id makes sense to me. deduplicated is tricky. I don't like it because both store gateways with query_and_use_fastest_response_per_store_group and sidecars with query_and_wait_for_all_responses logic are kind of duplicated, so we can work on naming definitely.

However, what about another idea:
In Info let's add:

  • store_group_id
  • peer_type (like it is in gossip cluster.PeerState.Type).

I am proposing this because I am trying to figure out how to make deduplication flow consistent across ALL components. Including not yet solved: Ruler deduplication!

Basically, with this we can deduct correct behavior everywhere:

  • Store gateway: store_group_id would be as @mattbostock suggested and peerType=store so query can perform query_and_use_fastest_response_per_store_group logic.
  • Sidecars: store_group_id would be all external labels as it is ID of those. peerType=source so query can assume query_and_wait_for_all_responses flow with some replica label deduplication.
  • Ruler: The tricky part is that we might want these in HA, but it's hard to define external labels for these, so having some arbitrary store_group_id would be perfect. Plus ruler is peerType=source so same replica deduplication might apply here as well.

I am not saying we want solve ruler with this proposal, just that we might want to reuse same thing and move ALL to store_group_id for deduplication logic for consistency.

Tricky part here:
4 Rules (in 2 different groups) - it would be not trivial to implement this, because the query_and_wait_for_all_responses + dedup is done truly by removing (or not) replica label. But that is something not for proposal anyway.

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments all.

@Bplotka: I think your proposal makes sense, I'll update this PR to use that.

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for it!

failed but similar data was recorded by another Prometheus server in the same
failure domain. To support deduplication, Thanos must wait for all Thanos
sidecar servers to return their data (subject to timeouts) before returning a
response to a client.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is worth to mention, that we DO not ignore these timeouts, even with the HA sidecar scenario we treat one replica being inaccessible as an "error case" - which might be wrong or confusing and definitely different to store gateway case (:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good shout, I'll make that explicit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thoughts, I'm not sure if we need to expand on this as I want to avoid confusing the reader by giving too much detail. Maybe I should remove the 'subject to timeouts' part?

that do not explicitly set a value for `gateway` will not be affected.

If a store instance is a gateway, query instances will treat each store
instance in a label group as having access to the same data. Query instances
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to tell if stores are from same label groups if they do not put any labels in current logic?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is related to the alternative mentioned below. will comment there


## Open issues

### Querying from multiple buckets or object stores
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree here with @domgreen. Maybe slight change in gateway: bool into something like gateway string with bucket name would allow us for basic functional sharding of stores in addition HA. What do you think @mattbostock ?

@mattbostock
Copy link
Contributor Author

Updated based on feedback, comments welcome.

@bwplotka
Copy link
Member

bwplotka commented Jul 19, 2018

@mattbostock I don't know why we did not ask about this, but....

For HA, is that not enough to just put multiple replicas of store gateways behind some loadbalancer (for example 2 replica statefulset behind k8s service)? (:

This will not work for gossip (as in gossip pods talks to pods), but wil work for static --store flag just fine

@bwplotka
Copy link
Member

Thanks for your work here, but I think we need to merge it as rejected ): We want gossip to be removed soon and this gives clear, easy win to achieve HA for store gateways -> just by using K8s service or any other simple RR balancer.

However this proposal might be a good base for other features in soon future:

  • exposing some store gateway external labels (configured statically)
  • having single store gateway supporting multiple buckets

@bwplotka bwplotka merged commit 6795acd into thanos-io:master Oct 26, 2018
@bwplotka
Copy link
Member

bwplotka commented Oct 26, 2018

Merged, just to move it to rejected state in next PR and have it for future reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants