Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various improvements to fuzzers. #214

Merged
merged 5 commits into from
Oct 25, 2022
Merged

Various improvements to fuzzers. #214

merged 5 commits into from
Oct 25, 2022

Conversation

jamii
Copy link
Contributor

@jamii jamii commented Oct 21, 2022

  • Track written sectors in test/storage and use this to calculate space amplification in fuzzers.
  • Checkpoint the superblock in fuzzers. (If we don't do this then used grid blocks are never released and we quickly exhaust storage.)
  • Set puts_since_compact and compacts_per_checkpoint to approximate production settings.
  • Run fuzzers in release mode first, and only run in debug mode if we find a crash.
  • Ensure that fuzzers produce stack traces even in release builds.

I removed c64b81b since we don't currently have a way to test it.

@jamii jamii force-pushed the jamii-space-amplification branch 3 times, most recently from 263620d to 2051b3a Compare October 21, 2022 18:31
@jamii
Copy link
Contributor Author

jamii commented Oct 21, 2022

This is interesting:

info(fuzz): Fuzz seed = 7434994071298363834
info(lsm_tree_fuzz): fuzz_op_distribution = { 0.33, 0.27, 0.07, 0.33 }
...
debug(lsm_tree_fuzz): Running fuzz_ops[50537] == FuzzOp{ .compact = void }
debug(lsm_tree_fuzz): storage.size_used = 1073774592/1765949440
debug(tree): lsm.table.TableType(Key,Value,Key.compare_keys,Key.key_from_value,(struct Key constant),Key.tombstone,Key.tombstone_from_key)_test: compact_start: op=16813 op_min=16812 beat=2/4
debug(manifest_log): 146395948273679478347972844105912659700: flush: writing 0 block(s)
debug(manifest_log): 146395948273679478347972844105912659700: insert: level=0 checksum=21016152226608822075864703892615501686 address=16383 flags=0 snapshot=16812..18446744073709551615
debug(manifest_log): 146395948273679478347972844105912659700: compacted: checksum=81452542766762941911685952732550488579 address=16379 frees=3/4
debug(superblock_manifest): remove: tree=146395948273679478347972844105912659700 checksum=81452542766762941911685952732550488579 address=16379 manifest.blocks=0/65536
debug(lsm_tree_fuzz): Running fuzz_ops[50538] == FuzzOp{ .compact = void }
debug(lsm_tree_fuzz): storage.size_used = 1073774592/1765949440
debug(tree): lsm.table.TableType(Key,Value,Key.compare_keys,Key.key_from_value,(struct Key constant),Key.tombstone,Key.tombstone_from_key)_test: compact_start: op=16814 op_min=16814 beat=3/4
debug(tree): lsm.table.TableType(Key,Value,Key.compare_keys,Key.key_from_value,(struct Key constant),Key.tombstone,Key.tombstone_from_key)_test: compacting immutable table to level 0 (values.len=1 snapshot_min=16812 compaction.op_min=16814 table_count=2)
debug(tree): lsm.table.TableType(Key,Value,Key.compare_keys,Key.key_from_value,(struct Key constant),Key.tombstone,Key.tombstone_from_key)_test: compact_tick() for immutable table to level 0
debug(manifest_log): 146395948273679478347972844105912659700: insert: level=0 checksum=21016152226608822075864703892615501686 address=16383 flags=0 snapshot=16812..16815
debug(tree): lsm.table.TableType(Key,Value,Key.compare_keys,Key.key_from_value,(struct Key constant),Key.tombstone,Key.tombstone_from_key)_test: compact_tick() complete for immutable table to level 0
debug(lsm_tree_fuzz): Running fuzz_ops[50539] == FuzzOp{ .compact = void }
debug(lsm_tree_fuzz): storage.size_used = 1073774592/1765949440
debug(tree): lsm.table.TableType(Key,Value,Key.compare_keys,Key.key_from_value,(struct Key constant),Key.tombstone,Key.tombstone_from_key)_test: compact_start: op=16815 op_min=16814 beat=4/4
debug(tree): lsm.table.TableType(Key,Value,Key.compare_keys,Key.key_from_value,(struct Key constant),Key.tombstone,Key.tombstone_from_key)_test: compact_tick() for immutable table to level 0
thread 275160 panic: reached unreachable code
/nix/store/g94blv8pk5z0h4maivk3ymcrafrcbwl8-zig-0.9.1/lib/zig/std/debug.zig:225:14: 0x22197b in std.debug.assert (lsm_tree_fuzz)
    if (!ok) unreachable; // assertion failure
             ^
/home/jamie/tigerbeetle/src/test/storage.zig:326:15: 0x2a95c4 in test.storage.Storage.assert_bounds_and_alignment (lsm_tree_fuzz)
        assert(offset + buffer.len <= storage.size);
              ^
/home/jamie/tigerbeetle/src/test/storage.zig:276:44: 0x289569 in test.storage.Storage.write_sectors (lsm_tree_fuzz)
        storage.assert_bounds_and_alignment(buffer, offset_in_storage);
                                           ^
/home/jamie/tigerbeetle/src/lsm/grid.zig:300:50: 0x2c7004 in lsm.grid.GridType(test.storage.Storage).start_write (lsm_tree_fuzz)
            grid.superblock.storage.write_sectors(
                                                 ^
/home/jamie/tigerbeetle/src/lsm/grid.zig:278:29: 0x2b3aa7 in lsm.grid.GridType(test.storage.Storage).write_block (lsm_tree_fuzz)
            grid.start_write(write);
                            ^
/home/jamie/tigerbeetle/src/lsm/compaction.zig:357:44: 0x2b2348 in lsm.compaction.CompactionType(lsm.table.TableType(Key,Value,Key.compare_keys,Key.key_from_value,(struct Key constant),Key.tombstone,Key.tombstone_from_key),test.storage.Storage,lsm.table_immutable.TableImmutableIteratorType).io_write_start (lsm_tree_fuzz)
                compaction.grid.write_block(
                                           ^
/home/jamie/tigerbeetle/src/lsm/compaction.zig:334:38: 0x29c2ef in lsm.compaction.CompactionType(lsm.table.TableType(Key,Value,Key.compare_keys,Key.key_from_value,(struct Key constant),Key.tombstone,Key.tombstone_from_key),test.storage.Storage,lsm.table_immutable.TableImmutableIteratorType).compact_tick (lsm_tree_fuzz)
            compaction.io_write_start(.data);
                                     ^
/home/jamie/tigerbeetle/src/lsm/tree.zig:671:40: 0x282bbc in lsm.tree.TreeType(lsm.table.TableType(Key,Value,Key.compare_keys,Key.key_from_value,(struct Key constant),Key.tombstone,Key.tombstone_from_key),test.storage.Storage,[]const u8{108,115,109,46,116,97,98,108,101,46,84,97,98,108,101,84,121,112,101,40,75,101,121,44,86,97,108,117,101,44,75,101,121,46,99,111,109,112,97,114,101,95,107,101,121,115,44,75,101,121,46,107,101,121,95,102,114,111,109,95,118,97,108,117,101,44,40,115,116,114,117,99,116,32,75,101,121,32,99,111,110,115,116,97,110,116,41,44,75,101,121,46,116,111,109,98,115,116,111,110,101,44,75,101,121,46,116,111,109,98,115,116,111,110,101,95,102,114,111,109,95,107,101,121,41,95,116,101,115,116}).compact_tick (lsm_tree_fuzz)
                compaction.compact_tick(Tree.compact_tick_callback_table_immutable);
                                       ^
/home/jamie/tigerbeetle/src/lsm/tree.zig:649:34: 0x2781bc in lsm.tree.TreeType(lsm.table.TableType(Key,Value,Key.compare_keys,Key.key_from_value,(struct Key constant),Key.tombstone,Key.tombstone_from_key),test.storage.Storage,[]const u8{108,115,109,46,116,97,98,108,101,46,84,97,98,108,101,84,121,112,101,40,75,101,121,44,86,97,108,117,101,44,75,101,121,46,99,111,109,112,97,114,101,95,107,101,121,115,44,75,101,121,46,107,101,121,95,102,114,111,109,95,118,97,108,117,101,44,40,115,116,114,117,99,116,32,75,101,121,32,99,111,110,115,116,97,110,116,41,44,75,101,121,46,116,111,109,98,115,116,111,110,101,44,75,101,121,46,116,111,109,98,115,116,111,110,101,95,102,114,111,109,95,107,101,121,41,95,116,101,115,116}).compact_drive (lsm_tree_fuzz)
                tree.compact_tick(&tree.compaction_table_immutable);
                                 ^
/home/jamie/tigerbeetle/src/lsm/tree.zig:514:31: 0x26c991 in lsm.tree.TreeType(lsm.table.TableType(Key,Value,Key.compare_keys,Key.key_from_value,(struct Key constant),Key.tombstone,Key.tombstone_from_key),test.storage.Storage,[]const u8{108,115,109,46,116,97,98,108,101,46,84,97,98,108,101,84,121,112,101,40,75,101,121,44,86,97,108,117,101,44,75,101,121,46,99,111,109,112,97,114,101,95,107,101,121,115,44,75,101,121,46,107,101,121,95,102,114,111,109,95,118,97,108,117,101,44,40,115,116,114,117,99,116,32,75,101,121,32,99,111,110,115,116,97,110,116,41,44,75,101,121,46,116,111,109,98,115,116,111,110,101,44,75,101,121,46,116,111,109,98,115,116,111,110,101,95,102,114,111,109,95,107,101,121,41,95,116,101,115,116}).compact (lsm_tree_fuzz)
            tree.compact_drive();
                              ^
/home/jamie/tigerbeetle/src/lsm/tree_fuzz.zig:218:25: 0x26c080 in Environment.compact (lsm_tree_fuzz)
        env.tree.compact(tree_compact_callback, op);
                        ^
/home/jamie/tigerbeetle/src/lsm/tree_fuzz.zig:281:32: 0x25a139 in Environment.run (lsm_tree_fuzz)
                    env.compact(op);
                               ^
/home/jamie/tigerbeetle/src/lsm/tree_fuzz.zig:339:24: 0x253139 in run_fuzz_ops (lsm_tree_fuzz)
    try Environment.run(&storage, fuzz_ops);
                       ^
/home/jamie/tigerbeetle/src/lsm/tree_fuzz.zig:407:21: 0x24b171 in main (lsm_tree_fuzz)
    try run_fuzz_ops(fuzz_ops);

The last few ops are all compacts. The first two don't increase storage.size_used but the last one crashes.

@jamii
Copy link
Contributor Author

jamii commented Oct 21, 2022

With puts only on id=0 and no removes, we can still exhaust arbitrary storage. The causes seems likely to be https://github.com/tigerbeetledb/tigerbeetle/blob/43d3587ce24a683eccc17830cbc8a9dd27d97540/src/lsm/compaction.zig#L521.

@jamii jamii mentioned this pull request Oct 22, 2022
@jamii
Copy link
Contributor Author

jamii commented Oct 22, 2022

I added code to release grid blocks associated with level_a in compaction. I haven't taken the time to understand the surrounding code so my changes are probably wrong.

However, I can't test them because the only compaction that is every performed in the fuzzer is from the immutable table to level 0 (run zig build lsm_tree_fuzz -- --seed 7434994071298363834 2>&1 | grep -i compacting). That seems surprising.

@jamii
Copy link
Contributor Author

jamii commented Oct 22, 2022

Same for zig run src/lsm/test.zig.

@jamii
Copy link
Contributor Author

jamii commented Oct 22, 2022

It looks like level 0 only ever has one table. With lsm_table_size_max = 64 * 1024 * 1024 none of our tests are generating enough keys to fill a single table.

@jamii
Copy link
Contributor Author

jamii commented Oct 22, 2022

Aha, grid blocks are only released when the superblock checkpoints, which the fuzzers are not currently doing.

@jamii jamii force-pushed the jamii-space-amplification branch 3 times, most recently from a157223 to 3b83ed1 Compare October 24, 2022 17:43
@jamii
Copy link
Contributor Author

jamii commented Oct 24, 2022

Suppose we have enough accounts to fill a single table in level 0. Account has 10 indexes, so that's 10 tables. Every compaction of the immutable table into level 0 acquires storage for 10 new tables, and the old storage only gets freed when the superblock checkpoints. This happens on average every 32 compacts in the fuzzer (and exactly every 32 compacts in prod?). So we have at minimum 32x space amplification from garbage collection alone.

@jamii
Copy link
Contributor Author

jamii commented Oct 24, 2022

If we put 1e6 accounts in the fuzz run, then we need at minimum 1e6 * 128 bytes * 32 garbage = 3.8gb just for the account object tree, let alone secondary indexes. So we should expect the forest fuzzer to exhaust 1gb storage pretty quickly, certainly long before it can test compaction of level 0.

If we don't do this then used grid blocks are never released and we quickly exhaust storage.

Set `puts_since_compact` and `compacts_per_checkpoint` to approximate production settings.
@jamii jamii changed the title Draft: investigating fuzz crashes from exhausting storage Various improvements to fuzzers. Oct 24, 2022
@jamii jamii marked this pull request as ready for review October 24, 2022 18:46
build.zig Show resolved Hide resolved
src/lsm/forest_fuzz.zig Show resolved Hide resolved
src/lsm/forest_fuzz.zig Show resolved Hide resolved
src/test/storage.zig Outdated Show resolved Hide resolved
@jamii jamii merged commit 92dfb9f into main Oct 25, 2022
@jamii jamii deleted the jamii-space-amplification branch October 25, 2022 16:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants