You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In ES, an index is created with a certain number of Primary shards (5 is common). These Primary shards are writable, and when data is written to an index, a Primary shard is chosen to receive it.
The number of primary shards for an index cannot be changed after creation.
Replica Shards
Each Primary shard has 0 or more Replica shards. After a Primary shard writes data, it is replicated to the Replica shards belonging to that Primary shard. Replica shards can be used to handle read requests, but not write. Ideally, all Replica shards are located on a different node/server.
Rebalancing
This means redistributing data amongst the cluster members. It is usually done in two situations.
Node Addition or Removal
If more nodes are added to a cluster, shards should be rebalanced to evenly distribute the load. This can be done in many ways. With consistent hashing, the algorithm itself will tell you what needs to be moved where.
With a leader architecture, a process will need to choose which shards to move where. This can be based on CPU, user-defined tags, memory usage or any other metadata we have about the state of the system.
Hot Shard Problem
It is common for a situation to arise where one node is overloaded because it holds shards for popular indices. This may be a temporary situation, or a longer-term issue. In this scenario, it is desireable to redistribute the hot shards to machines with less load.
Toshi
I suggest using Consul to track shard assignments along with leader election.
The text was updated successfully, but these errors were encountered:
This relates to issue #14.
ElasticSearch Shards
This describes the basics of ES sharding.
Primary Shards
In ES, an index is created with a certain number of Primary shards (5 is common). These Primary shards are writable, and when data is written to an index, a Primary shard is chosen to receive it.
The number of primary shards for an index cannot be changed after creation.
Replica Shards
Each Primary shard has 0 or more Replica shards. After a Primary shard writes data, it is replicated to the Replica shards belonging to that Primary shard. Replica shards can be used to handle read requests, but not write. Ideally, all Replica shards are located on a different node/server.
Rebalancing
This means redistributing data amongst the cluster members. It is usually done in two situations.
Node Addition or Removal
If more nodes are added to a cluster, shards should be rebalanced to evenly distribute the load. This can be done in many ways. With consistent hashing, the algorithm itself will tell you what needs to be moved where.
With a leader architecture, a process will need to choose which shards to move where. This can be based on CPU, user-defined tags, memory usage or any other metadata we have about the state of the system.
Hot Shard Problem
It is common for a situation to arise where one node is overloaded because it holds shards for popular indices. This may be a temporary situation, or a longer-term issue. In this scenario, it is desireable to redistribute the hot shards to machines with less load.
Toshi
I suggest using Consul to track shard assignments along with leader election.
The text was updated successfully, but these errors were encountered: