Optimize log file format for faster recovery #83

MrCroxx · 2021-08-05T05:40:01Z

The new log file format can be found in annotation in
LogBatch::encode_to_bytes.

What's changed?

Optimize log file format for faster recovery.
Refine EntryIndex fields in memtable.

TODO:

There are 2 [PLEASE REVIEW] todo(s).
Optimize read_file for the new log format.
Benchmark

Signed-off-by: MrCroxx mrcroxx@outlook.com

The new log file format can be found in annotation in `LogBatch::encode_to_bytes`. What's changed? + Optimize log file format for faster recovery. + Refine `EntryIndex` fields in memtable. TODO: + [ ] There is a [PLEASE REVIEW] todo. + [ ] Optimize `read_file` for the new log format. Signed-off-by: MrCroxx <mrcroxx@outlook.com>

MrCroxx · 2021-08-05T05:52:32Z

@tabokie I found that LogBatch::entries_size was replaced with LogBatch::approximate_size in raft-engine#82, but it is not used for now. I'm confused why we need the the field? Is it prepared for metric?

src/memtable.rs

Signed-off-by: MrCroxx <mrcroxx@outlook.com>

src/log_batch.rs

Instead of reading the whole file, recovery now just read the needed parts of the log file, which saves recovery time. Whats changed? + `LogBatchFileReader`: yeild recovered `LogBatch` from log file, which interally maintains a buffer. + Move common file operation functions from `crate::file_pipe_log` to `crate::util` + All tests passed. Signed-off-by: MrCroxx <mrcroxx@outlook.com>

Signed-off-by: MrCroxx <mrcroxx@outlook.com>

src/log_batch.rs

Only write the first index of a `LogItem::Entries`, for the entries it carries are always continuous. Signed-off-by: MrCroxx <mrcroxx@outlook.com>

Signed-off-by: MrCroxx <mrcroxx@outlook.com>

src/memtable.rs

src/log_batch.rs

src/reader.rs

Signed-off-by: MrCroxx <mrcroxx@outlook.com>

+ Add bench `bench_recovery`. + Add failpoint in `open` to disable page cache when benching recovery. Signed-off-by: MrCroxx <mrcroxx@outlook.com>

tabokie

Good work, let's see some results! BTW You can put the benchmark into a separate PR.

Cargo.toml

benches/bench_recovery.rs

Signed-off-by: MrCroxx <mrcroxx@outlook.com>

MrCroxx · 2021-08-13T08:52:39Z

/rebuild

src/engine.rs

tabokie · 2021-08-13T10:33:21Z

src/file_pipe_log.rs

-        let mut offset = (FILE_MAGIC_HEADER.len() + Version::len()) as u64;
+        debug!("recover from log file {:?}:{:?}", queue, file_id);
+        let fd = self.get_queue(queue).get_fd(file_id)?;
+        let mut reader = LogBatchFileReader::new(


How about we implement this logic in recover_queue, so to reuse LogBatchFileReader and avoid buffering log batches.

That's a good idea. But now some function like get_queue is only implemented for FilePipeLog, but recover_queue is implemented for the trait PepeLog. Is that fine that we implement a recover_queue only for FilePipeLog version?

Indeed, ideally we shouldn't expose log file. Maybe we could replace read_file_into_log_batch with replay_log_batch, which takes in a closure Fn(LogBatch).

In the next pr I'll implement parallel recovery. I'm not sure if the modification is good for it. What about modify it in the next pr or a seperate pr?

Actually I have a big PR coming up tomorrow, and will fix this issue there.

src/reader.rs

src/file_pipe_log.rs

src/reader.rs

src/log_batch.rs

Signed-off-by: MrCroxx <mrcroxx@outlook.com>

Previously, `LogBatch` is partially filled (only entries_index needs to be recovered), which is unreasonable. This commit seperate entries from itmes, and add structs to hold them. Signed-off-by: MrCroxx <mrcroxx@outlook.com>

Signed-off-by: MrCroxx <mrcroxx@outlook.com>

tabokie

Code changes look good overall, still needs some benchmarks to evaluate the regression of write performance and IO usage.

src/reader.rs

src/log_batch.rs

Signed-off-by: MrCroxx <mrcroxx@outlook.com>

src/reader.rs

Signed-off-by: MrCroxx <mrcroxx@outlook.com>

MrCroxx · 2021-08-18T13:25:55Z

Benchmark (recovery):

configs:

config	0 - default	1 - compressed	2 - small-batch	3 - 10GB
total size	1GB	1GB	1GB	10GB
region count	100	100	100	1000
batch size	1MB	1MB	1KB	1MB
item size	1KB	1KB	256B	1KB
entry size	256B	256B	32B	256B
compression threshold	-	8KB	-	-

env: slow ssd, fadvise - DONTNEED has been called before reads to disable page cache.

config	0 - default	1 - compressed	2 - small-batch	3 - 10GB
old	5.3239s	5.7234s	17.207s	64.410s
new	1.1660s	1.2687s	7.5835s	15.332s
	-78.1%	-77.8%	-55.9%	-76.2%

MrCroxx · 2021-08-19T04:34:22Z

Benchmark (stress):

old:

--compression-threshold 8KB --regions 1000 --write-entry-count 10 --write-region-count 10 --write-sync true --write-threads 100 --time 600
[write]
Throughput(QPS) = 2991.55
Latency(μs) min = 224, avg = 33403.32, p50 = 26607, p90 = 65503, p95 = 78463, p99 = 116799, p99.9 = 184831, max = 332543
Fairness = 99.2%
Write Bandwidth = 2.6MiB/s

new:

--compression-threshold 8KB --regions 1000 --write-entry-count 10 --write-region-count 10 --write-sync true --write-threads 100 --time 600
[write]
Throughput(QPS) = 3270.73
Latency(μs) min = 289, avg = 30545.84, p50 = 26719, p90 = 52479, p95 = 65791, p99 = 92479, p99.9 = 133503, max = 422655
Fairness = 99.1%
Write Bandwidth = 3.8MiB/s

WB / QPS new / old * 100% = 133% (include randomness)

MrCroxx · 2021-08-19T04:34:46Z

@tabokie Is this pr ready to merge?

Signed-off-by: MrCroxx <mrcroxx@outlook.com>

MrCroxx · 2021-08-19T10:53:34Z

Benchmark (recovery, limited by batch count):

configs:

config	0 - default	1 - compressed	2 - 10k
total batch count	1000	1000	10000
region count	100	100	1000
batch size	1MB	1MB	1MB
item size	1KB	1KB	1KB
entry size	256B	256B	256B
compression threshold	-	8KB	-

env: slow ssd, fadvise - DONTNEED has been called before reads to disable page cache.

config	0 - default	1 - compressed	2 - 10k
old	4.1913s	4.3947s	50.943s
new	0.9803s	1.0350s	11.382s
	-77.1%	-76.4%	-77.7%

The new log file format can be found in annotation in `LogBatch::encode_to_bytes`. What's changed? + Optimize log file format for faster recovery. + Refine `EntryIndex` fields in memtable. Signed-off-by: MrCroxx <mrcroxx@outlook.com>

The new log file format can be found in annotation in `LogBatch::encode_to_bytes`. What's changed? + Optimize log file format for faster recovery. + Refine `EntryIndex` fields in memtable. Signed-off-by: MrCroxx <mrcroxx@outlook.com> Signed-off-by: tabokie <xy.tao@outlook.com>

The new log file format can be found in annotation in `LogBatch::encode_to_bytes`. What's changed? + Optimize log file format for faster recovery. + Refine `EntryIndex` fields in memtable. Signed-off-by: MrCroxx <mrcroxx@outlook.com>

MrCroxx commented Aug 5, 2021

View reviewed changes

src/memtable.rs Outdated Show resolved Hide resolved

MrCroxx added 2 commits August 5, 2021 14:43

Refine LogBatch::approximate_size().

e56b413

Signed-off-by: MrCroxx <mrcroxx@outlook.com>

Merge branch 'master' into log-format-for-recovery

91dfa7b

Signed-off-by: MrCroxx <mrcroxx@outlook.com>

tabokie reviewed Aug 5, 2021

View reviewed changes

src/log_batch.rs Outdated Show resolved Hide resolved

MrCroxx added 5 commits August 5, 2021 20:34

rename LogBatch.approximate_size with perfix items

e21cadc

Signed-off-by: MrCroxx <mrcroxx@outlook.com>

remove forgetten annotations

9c6e0a4

Signed-off-by: MrCroxx <mrcroxx@outlook.com>

Merge branch 'master' into log-format-for-recovery

9e824b7

Signed-off-by: MrCroxx <mrcroxx@outlook.com>

Merge branch 'master' into log-format-for-recovery

7b7468d

tabokie reviewed Aug 9, 2021

View reviewed changes

src/log_batch.rs Outdated Show resolved Hide resolved

MrCroxx added 4 commits August 9, 2021 18:52

Modify LogItem::Entries format.

1f144f4

Only write the first index of a `LogItem::Entries`, for the entries it carries are always continuous. Signed-off-by: MrCroxx <mrcroxx@outlook.com>

Assert entries indexes in a log item are continuous.

edbe409

Signed-off-by: MrCroxx <mrcroxx@outlook.com>

Fix build on github action runner.

d59ba0a

Signed-off-by: MrCroxx <mrcroxx@outlook.com>

Add recover-only option to stress.

894f68c

Signed-off-by: MrCroxx <mrcroxx@outlook.com>

tabokie reviewed Aug 10, 2021

View reviewed changes

src/memtable.rs Outdated Show resolved Hide resolved

src/memtable.rs Outdated Show resolved Hide resolved

src/log_batch.rs Outdated Show resolved Hide resolved

src/reader.rs Outdated Show resolved Hide resolved

MrCroxx added 4 commits August 10, 2021 13:46

Make recovery_read_block_size configurable.

0250999

Signed-off-by: MrCroxx <mrcroxx@outlook.com>

refine log batch format to { header | entries | footer }

941b26e

Signed-off-by: MrCroxx <mrcroxx@outlook.com>

remove recover-only option from stress.

786db6e

Signed-off-by: MrCroxx <mrcroxx@outlook.com>

Add recovery benchmarking.

0ad644a

+ Add bench `bench_recovery`. + Add failpoint in `open` to disable page cache when benching recovery. Signed-off-by: MrCroxx <mrcroxx@outlook.com>

tabokie reviewed Aug 12, 2021

View reviewed changes

Cargo.toml Outdated Show resolved Hide resolved

benches/bench_recovery.rs Outdated Show resolved Hide resolved

benches/bench_recovery.rs Outdated Show resolved Hide resolved

MrCroxx added 2 commits August 12, 2021 19:45

Enhance ReadableSize, fix fadvise target.

e03f5e8

Signed-off-by: MrCroxx <mrcroxx@outlook.com>

Revert e03f5e8, 0ad644a (benchmark).

a3d7f67

Signed-off-by: MrCroxx <mrcroxx@outlook.com>

MrCroxx force-pushed the log-format-for-recovery branch from 6d105d4 to a3d7f67 Compare August 12, 2021 12:58

MrCroxx changed the title ~~Optimize log file format for faster recovery.~~ Optimize log file format for faster recovery Aug 13, 2021

tabokie reviewed Aug 13, 2021

View reviewed changes

MrCroxx added 2 commits August 15, 2021 19:39

tiny fixes

2fc97a3

Signed-off-by: MrCroxx <mrcroxx@outlook.com>

refine reader impl.

a3e955b

Signed-off-by: MrCroxx <mrcroxx@outlook.com>

MrCroxx added 3 commits August 15, 2021 20:40

move generate_entries and generate_entry_indexes to test_util.rs

dc7516a

Signed-off-by: MrCroxx <mrcroxx@outlook.com>

refine LogBatch

90ecfb5

Previously, `LogBatch` is partially filled (only entries_index needs to be recovered), which is unreasonable. This commit seperate entries from itmes, and add structs to hold them. Signed-off-by: MrCroxx <mrcroxx@outlook.com>

refine peek in reader

58d47e5

Signed-off-by: MrCroxx <mrcroxx@outlook.com>

tabokie approved these changes Aug 17, 2021

View reviewed changes

MrCroxx added 4 commits August 18, 2021 14:28

refine read_to

cf2a567

Signed-off-by: MrCroxx <mrcroxx@outlook.com>

unwrap entries in Logbatch

6e12418

Signed-off-by: MrCroxx <mrcroxx@outlook.com>

remove redundant wrappers

43154e3

Signed-off-by: MrCroxx <mrcroxx@outlook.com>

remove expired comment

f3fd575

Signed-off-by: MrCroxx <mrcroxx@outlook.com>

tabokie reviewed Aug 18, 2021

View reviewed changes

src/reader.rs Outdated Show resolved Hide resolved

src/reader.rs Outdated Show resolved Hide resolved

remove slice in reader

a3f0283

Signed-off-by: MrCroxx <mrcroxx@outlook.com>

Merge branch 'master' into log-format-for-recovery

a6c9f41

Signed-off-by: MrCroxx <mrcroxx@outlook.com>

tabokie merged commit f18b737 into tikv:master Aug 19, 2021

tabokie mentioned this pull request Aug 31, 2021

Make Raft Engine production ready for TiKV #95

Closed

27 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize log file format for faster recovery #83

Optimize log file format for faster recovery #83

MrCroxx commented Aug 5, 2021 •

edited

Loading

MrCroxx commented Aug 5, 2021

tabokie left a comment

MrCroxx commented Aug 13, 2021

tabokie Aug 13, 2021

MrCroxx Aug 13, 2021

tabokie Aug 13, 2021

MrCroxx Aug 18, 2021

tabokie Aug 18, 2021

tabokie left a comment

MrCroxx commented Aug 18, 2021

MrCroxx commented Aug 19, 2021 •

edited

Loading

MrCroxx commented Aug 19, 2021

MrCroxx commented Aug 19, 2021 •

edited

Loading

Optimize log file format for faster recovery #83

Optimize log file format for faster recovery #83

Conversation

MrCroxx commented Aug 5, 2021 • edited Loading

MrCroxx commented Aug 5, 2021

tabokie left a comment

Choose a reason for hiding this comment

MrCroxx commented Aug 13, 2021

tabokie Aug 13, 2021

Choose a reason for hiding this comment

MrCroxx Aug 13, 2021

Choose a reason for hiding this comment

tabokie Aug 13, 2021

Choose a reason for hiding this comment

MrCroxx Aug 18, 2021

Choose a reason for hiding this comment

tabokie Aug 18, 2021

Choose a reason for hiding this comment

tabokie left a comment

Choose a reason for hiding this comment

MrCroxx commented Aug 18, 2021

MrCroxx commented Aug 19, 2021 • edited Loading

MrCroxx commented Aug 19, 2021

MrCroxx commented Aug 19, 2021 • edited Loading

MrCroxx commented Aug 5, 2021 •

edited

Loading

MrCroxx commented Aug 19, 2021 •

edited

Loading

MrCroxx commented Aug 19, 2021 •

edited

Loading