Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decide on Shard Replication Scheme #16

Closed
fhaynes opened this issue Oct 19, 2018 · 0 comments
Closed

Decide on Shard Replication Scheme #16

fhaynes opened this issue Oct 19, 2018 · 0 comments

Comments

@fhaynes
Copy link
Member

fhaynes commented Oct 19, 2018

This relates to issue #14.

ElasticSearch Shards

This describes the basics of ES sharding.

Primary Shards

In ES, an index is created with a certain number of Primary shards (5 is common). These Primary shards are writable, and when data is written to an index, a Primary shard is chosen to receive it.

The number of primary shards for an index cannot be changed after creation.

Replica Shards

Each Primary shard has 0 or more Replica shards. After a Primary shard writes data, it is replicated to the Replica shards belonging to that Primary shard. Replica shards can be used to handle read requests, but not write. Ideally, all Replica shards are located on a different node/server.

Rebalancing

This means redistributing data amongst the cluster members. It is usually done in two situations.

Node Addition or Removal

If more nodes are added to a cluster, shards should be rebalanced to evenly distribute the load. This can be done in many ways. With consistent hashing, the algorithm itself will tell you what needs to be moved where.

With a leader architecture, a process will need to choose which shards to move where. This can be based on CPU, user-defined tags, memory usage or any other metadata we have about the state of the system.

Hot Shard Problem

It is common for a situation to arise where one node is overloaded because it holds shards for popular indices. This may be a temporary situation, or a longer-term issue. In this scenario, it is desireable to redistribute the hot shards to machines with less load.

Toshi

I suggest using Consul to track shard assignments along with leader election.

@fhaynes fhaynes mentioned this issue Oct 19, 2018
17 tasks
@fhaynes fhaynes closed this as completed Dec 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant