RFC: Follower Snapshot #135

BusyJay · 2018-10-30T10:55:47Z

For now snapshot is always sent from leader to follower, which is not always sufficient. For example, consider there are 5 nodes in two data center, (1, 2) and (3, 4, 5). If 5 is leader and 1 needs a snapshot, then data have to be transferred across data center.

But in fact any nodes in cluster can send a snapshot once requested logs are applied. So it's possible that 1 requests a snapshot aggressively from 2, so that data can be transferred internally.

Support requesting snapshot aggressively is also useful for recovery from snapshot files corruption.

siddontang · 2018-10-31T09:26:05Z

We can name this feature Follower snapshot :-)

Maybe we can even support Follower replication 🤔

Hoverbear · 2018-10-31T10:59:35Z

We've previously talked about how this feature would be a desirable feature for our use case in TiKV, and I think it can be useful for others as well.

I'd definitely love to do this, but if anyone else wants to tackle it please feel encouraged to let us know and start it!

It sounds like the generalized, simplest definition of the feature is:

The ability to request snapshots from any node to any other node, so long as the snapshot only contains committed data.

I think doing this would open the door for a second, future feature to support follower replication. However this feature will require some more consideration. Let's plan for that after this one? I can open another issue for it, so we can separate the discussion

Fullstop000 · 2019-01-11T06:48:14Z

I'd like to work on this and I came up with a few questions which may need to be discussed probably:

If a node tries to get snapshot from an another node, should the node ignore msgs from leader (even heartbeat) to avoid log replication ?
Do we need a timeout mechanism for this kind of snapshot request if the target node can not deliver message to the requester ?

siddontang · 2019-01-11T07:03:39Z

Thanks @Fullstop000

In the current Raft implementation, if the leader is sending the snapshot to the follower, it can still send heartbeat messages to it to keep alive, so I think we still need to let leader do this.
Now there is no timeout mechanism for snapshot sending, we will handle this outside the Raft library.

siddontang · 2019-01-11T07:05:52Z

@Fullstop000
You can send your Wechat account to my email tl@pingcap.com if you want to a real-time discussion.

Fullstop000 · 2019-01-11T07:41:38Z

@siddontang Thanks for the quick reply.

Considering about the point of efficiency @BusyJay mentioned here, in the situation of 2 IDCs, once leader has applied the AddNode , it'll start communication with the new node which can be less efficient to send snapshot between IDC than internal transporting and will increase working load to the leader.

We can save 1.5 roundtrip and crossing IDC data sending from leader if node ignores the msgs but it may introduce some extra complexity into raft algorithm.

I prefer to keep raft layer clean and let third party ( such as pd ? ) to do the control generally.

betwins · 2022-03-05T10:09:38Z

I agree with siddontang, it is better to keep raft layer clean. raft is complicated and is critical to protect data consistency across nodes. one new mechanism introduced should be ensured not to break raft protocol.

Hoverbear added Feature Related to a major feature. Optimization Performance related optimizations. Request for Comment A proposal to be considered. Analogous to an RFC in TiKV/Rust. labels Oct 31, 2018

Hoverbear changed the title ~~Support requesting snapshot aggressively~~ RFC: Support requesting snapshot aggressively Oct 31, 2018

Hoverbear mentioned this issue Oct 31, 2018

RFC: Support Follower Replication #136

Open

Hoverbear changed the title ~~RFC: Support requesting snapshot aggressively~~ RFC: Follower Snapshot Nov 7, 2018

Fullstop000 mentioned this issue May 12, 2019

[WIP] Introduce Follower replication #238

Closed

BusyJay mentioned this issue Sep 17, 2021

A simple algorithm for fully utilizing bandwidth for scaling in tikv/pd#4137

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Follower Snapshot #135

RFC: Follower Snapshot #135

BusyJay commented Oct 30, 2018

siddontang commented Oct 31, 2018

Hoverbear commented Oct 31, 2018

Fullstop000 commented Jan 11, 2019

siddontang commented Jan 11, 2019

siddontang commented Jan 11, 2019

Fullstop000 commented Jan 11, 2019

betwins commented Mar 5, 2022 •

edited

Loading

RFC: Follower Snapshot #135

RFC: Follower Snapshot #135

Comments

BusyJay commented Oct 30, 2018

siddontang commented Oct 31, 2018

Hoverbear commented Oct 31, 2018

Fullstop000 commented Jan 11, 2019

siddontang commented Jan 11, 2019

siddontang commented Jan 11, 2019

Fullstop000 commented Jan 11, 2019

betwins commented Mar 5, 2022 • edited Loading

betwins commented Mar 5, 2022 •

edited

Loading