Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aborted rewrite-rewrite causes data loss #288

Closed
tabokie opened this issue Jan 9, 2023 · 2 comments · Fixed by #290
Closed

aborted rewrite-rewrite causes data loss #288

tabokie opened this issue Jan 9, 2023 · 2 comments · Fixed by #290
Labels
bug Something isn't working

Comments

@tabokie
Copy link
Member

tabokie commented Jan 9, 2023

Say we have the data:

append: [11, 15]
rewrite: [1, 10]

When doing rewrite on the whole rewrite queue, it panicked. The old rewrite queue is not cleaned up and only partial output is written:

append: [11, 15]
rewrite: [1. 10] [1, 5]

After restart, only logs from [11, 15] will be recovered.

@tabokie tabokie added the bug Something isn't working label Jan 9, 2023
@BusyJay
Copy link
Member

BusyJay commented Jan 9, 2023

There are two phenomenons in my tests:

  1. After restarted, one raft group lost data and the tikv panicked again.
  2. After second panic, another group that was successfully started in 1 also became corrupted.

This issue can only explain 2, right?

@tabokie
Copy link
Member Author

tabokie commented Jan 9, 2023

Both. As long as there's an on-going purge_expired_files at the restart time, it's possible to lose some Raft logs in the front.

tabokie added a commit that referenced this issue Jan 30, 2023
A second attempt to fix #288.

This PR introduces the concept of "atomic group". It is then used by rewrite-rewrite operation to make sure the rewrite of each region is perceived as an atomic operation.

A group of writes is made atomic by each carrying a special marker. During recovery, log batch with the marker will be stashed until all parts of the group are found. Caveats for this approach is commented near `AtomicGroupBuilder`.

Also fixed a bug that a partial rewrite-rewrite (due to batch being split) is not applied correctly.

Signed-off-by: tabokie <xy.tao@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants