LSM: Revert groove to use update + insert and vendor hasUniqueRepresentation #1239

cb22 · 2023-10-23T08:39:57Z

Motivation

Whenever we update an object, we always have the old object around. Previously, we used to do a lookup to find the old one in any case, to give a simpler upsert function signature to the state machine - but this was resulting in duplicated work.

Changes

Revert that, going back to having an explicit update and insert with the state machine choosing which is better for its needs. Additionally, switch to using stdx.equal_bytes in the Groove update hot path. This required vendoring a small fix for ziglang/zig#17592.

Benchmarks

These two things combined give a nice performance bump:

# on tmpfs
./scripts/benchmark.sh --print-batch-timings --transfer-count=10_000_000 --account-count=10 --id-order=sequential

Before:

1362 batches in 29.25 s
load offered = 1000000 tx/s
load accepted = 341867 tx/s

After:

1356 batches in 25.52 s
load offered = 1000000 tx/s
load accepted = 391846 tx/s

Works around ziglang/zig#17592.

Save a lookup by requiring the old object to be passed in to the Groove, and only check if constants.verify is true.

src/lsm/groove.zig

cb22 · 2023-10-23T08:44:50Z

src/stdx.zig

+
+        .Struct => |info| {
+            // Packed structs are always unique.
+            if (info.backing_integer != null) {


If a struct is packed, we don't need to recursively check it like the unpacked case below, since Zig ensures packed structs can only contain other packed structsa.

Let's add a couple of tests here though, just to sanity-check things.

Also, for a good measure, let's assert that the bitsize of all fields adds up to the sizeof the whole struct (that is, that we don't have padding bits).

I am thinking about cases like

const S = packed struct { x: u24, y: u3, };

I think the language may work both as "the spare bits after y are zeroed" and "the spare bits after y are arbitrary". I think they are actually zeroed, but let's assert that we just don't have such weird structs altogether.

src/state_machine.zig

src/lsm/groove.zig

The expectation is that the state machine will call .update() when there is in fact an update. We still check stdx.equal_bytes gated behind constants.verify. We also double check key_from_value and that the update is not a tombstone, too.

batiati · 2023-10-23T11:01:02Z

Can we add hasUniqueRepresentation to tidy.banned(...)?

…rder * target first style for copies gets const protection, which we can't take advantage of here. * There's no real convention, because this .update() method doesn't update a parameter with an existing one, but rather updates the groove based off of both. * There's no performance cost to using a struct (asm generated on godbolt was identical and double checked on the benchmark).

src/stdx.zig

matklad · 2023-10-23T17:00:13Z

src/stdx.zig

+
+        .Struct => |info| {
+            // Packed structs are always unique.
+            if (info.backing_integer != null) {


Let's add a couple of tests here though, just to sanity-check things.

matklad · 2023-10-23T17:12:45Z

src/stdx.zig

+
+        .Struct => |info| {
+            // Packed structs are always unique.
+            if (info.backing_integer != null) {


Also, for a good measure, let's assert that the bitsize of all fields adds up to the sizeof the whole struct (that is, that we don't have padding bits).

I am thinking about cases like

const S = packed struct { x: u24, y: u3, };

I think the language may work both as "the spare bits after y are zeroed" and "the spare bits after y are arbitrary". I think they are actually zeroed, but let's assert that we just don't have such weird structs altogether.

src/lsm/groove.zig

matklad · 2023-10-23T17:36:57Z

src/lsm/groove.zig

            }

+            groove.objects.put(new);
+            groove.objects_cache.upsert(new);


This makes my my Rust eye twitch a bit. Isn't this shared mutable aliasing? I think the old object might be the object from the cache (maybe(old == old_from_cache) in the if above). If that's the case, I think we might be clobbering old here, in which case the inline for below would get wrong results when accessing old fields?

We have an explicit guard against this above; since I've been bitten by this before here 😄:

assert(new != old);

I think that's enough to prevent it from happening?

No, I think that's different. The new is an object on state_machine stack:

tigerbeetle/src/state_machine.zig

Line 870 in c367d1d

var dr_account_new = dr_account.*;

the old is an object from the cache:

tigerbeetle/src/state_machine.zig

Line 813 in c367d1d

const dr_account = self.forest.grooves.accounts.get(t.debit_account_id) orelse return .debit_account_not_found;

So new and old are different.

However, when we do groove.objects_cache.upsert(new); this might invalidate old still, if it was in the cache. That is, after this line I would expect new.* and old.* to be equal, because the old memory was overwritten when we upserted new.

Not 100% sure that this could happen, but I think it does?

Ahhh yes, that is the case. Good catch!

In the past, I took an explicit stack copy of old within the function which prevented that. We could move the object cache upsert to the end - there's no reason it needs to be done before the index trees - with an explicit comment explaining the above. That way we can get correct behaviour without a stack copy, but it still feels a bit fragile...

I think that's ok, as long as we document that with a maybe.

I wonder if we even can require that this is the object from the cache? It must be, if we are updating, right?

I am also reeeeeally wondering why CI is green here 🤔 in release, I can see compiler hoisting the loads to before the cache update, which would give the correct behavior.

But in debug, I expect the overwrite to be reflected robustly. Could you also take a look what's up with tests passing here? Either my theory about the overwrite is wrong (good case), or we somehow don't test this case at all (which would be pretty scary).

Could you also take a look what's up with tests passing here? Either my theory about the overwrite is wrong (good case), or we somehow don't test this case at all (which would be pretty scary).

As far as I can tell, we don't have any tests (unit or fuzz) that actually check the result of the secondary index trees, and in this case, if old.* == new.* it'll just skip updating the secondary index so there's nothing that's hit on the way down 😬...

Also just noticed the forest fuzz (which I forgot to run / update in this PR) has also been broken for a few weeks. I need to look at proper infra for fuzzing ASAP, but maybe as a workaround for now we can just run each fuzz for a fixed time on CI?

We previously did run some more fuzzes in CI, but with a fixed seed for a single iteration.

I wonder if we even can require that this is the object from the cache? It must be, if we are updating, right?

It could in theory be a stack copy, but we don't have an explicit use case for that yet so I've made the check assert(old_from_cache == old);.

Also just noticed the forest fuzz (which I forgot to run / update in this PR) has also been broken for a few weeks.

I think that is fixed as part of #1234, which merged this morning.

sentientwaffle · 2023-10-23T20:45:50Z

src/lsm/groove.zig

+            if (constants.verify) {
+                const old_from_cache = groove.objects_cache.get(@field(old, primary_field)).?;
+                assert(@field(old_from_cache, primary_field) == @field(old, primary_field));
+                assert(old_from_cache.timestamp == old.timestamp);


Can you assert that stdx.bytes_equal(Object, old, old_from_cache);?

I'm now aserting they're actually one and the same: #1239 (comment)

matklad · 2023-10-24T10:37:04Z

src/lsm/groove.zig

+            // objects_cache. If we upsert first, old.* == new.* and no secondary indexes will
+            // be updated!
+            groove.objects_cache.upsert(new);
+            groove.objects.put(new);


For an extra tripple check, can we assert here that now old & new contents is the same?

Ok - so that won't actually hold; if the old value comes from the stash of the cache_map, and we insert a new value, it'll go into the cache - so now old.* != new.*.

The assertion at the start, assert(old_from_cache == old); will still hold because if it comes from the stash from the caller, it should still come from the stash inside.

matklad · 2023-10-24T10:38:51Z

src/stdx.zig

-                return true;
+            // Only consider packed structs unique if they are byte aligned.
+            if (info.backing_integer) |backing_integer| {
+                return @sizeOf(@Type(.{ .Struct = info })) * 8 == @bitSizeOf(backing_integer);


@sizeOf(T) would be simpler I think?

... yes, I haven't had my Monster yet. 🤣

I think it's time to go pull an espresso too... ☕

matklad · 2023-10-24T11:25:13Z

src/lsm/groove.zig

@@ -910,9 +910,7 @@ pub fn GrooveType(
            // Sanity check to ensure the caller didn't accidentally pass in an alias.
            assert(new != old);

-            if (has_id) {
-                assert(old.id == new.id);


👍 I actually do line one-line without braces form quite a bit more, when it works

Yes, we don't need to use kernel style indented returns etc.

cb22 added 2 commits October 23, 2023 10:26

Vendor hasUniqueRepresentation to account for packed structs

9848228

Works around ziglang/zig#17592.

LSM: Revert groove to use update + insert

66fab94

Save a lookup by requiring the old object to be passed in to the Groove, and only check if constants.verify is true.

cb22 assigned matklad Oct 23, 2023

cb22 commented Oct 23, 2023

View reviewed changes

src/lsm/groove.zig Outdated Show resolved Hide resolved

cb22 commented Oct 23, 2023

View reviewed changes

jorangreef reviewed Oct 23, 2023

View reviewed changes

src/state_machine.zig Outdated Show resolved Hide resolved

src/lsm/groove.zig Outdated Show resolved Hide resolved

src/lsm/groove.zig Outdated Show resolved Hide resolved

cb22 added 2 commits October 23, 2023 13:14

Tidy: add std hasUniqueRepresentation to banned list

c367d1d

batiati reviewed Oct 23, 2023

View reviewed changes

src/stdx.zig Outdated Show resolved Hide resolved

matklad reviewed Oct 23, 2023

View reviewed changes

sentientwaffle reviewed Oct 23, 2023

View reviewed changes

LSM: Fuzz fix, review changes

0733fb7

matklad reviewed Oct 24, 2023

View reviewed changes

LSM: Review changes

e81a087

matklad reviewed Oct 24, 2023

View reviewed changes

matklad approved these changes Oct 24, 2023

View reviewed changes

cb22 added this pull request to the merge queue Oct 24, 2023

cb22 mentioned this pull request Oct 24, 2023

LSM: Switch forest fuzz to stdx.equal_bytes #1243

Merged

Merged via the queue into main with commit f821719 Oct 24, 2023
29 checks passed

cb22 deleted the cb22/update-upsert branch October 24, 2023 13:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LSM: Revert groove to use update + insert and vendor hasUniqueRepresentation #1239

LSM: Revert groove to use update + insert and vendor hasUniqueRepresentation #1239

cb22 commented Oct 23, 2023

cb22 Oct 23, 2023

matklad Oct 23, 2023

matklad Oct 23, 2023

batiati commented Oct 23, 2023

matklad Oct 23, 2023

matklad Oct 23, 2023

matklad Oct 23, 2023

cb22 Oct 23, 2023

matklad Oct 23, 2023

cb22 Oct 23, 2023

matklad Oct 23, 2023

matklad Oct 23, 2023

cb22 Oct 24, 2023

cb22 Oct 24, 2023

sentientwaffle Oct 24, 2023

sentientwaffle Oct 23, 2023

cb22 Oct 24, 2023

matklad Oct 24, 2023

cb22 Oct 24, 2023

matklad Oct 24, 2023

cb22 Oct 24, 2023

jorangreef Oct 24, 2023

matklad Oct 24, 2023

jorangreef Oct 24, 2023

LSM: Revert groove to use update + insert and vendor hasUniqueRepresentation #1239

LSM: Revert groove to use update + insert and vendor hasUniqueRepresentation #1239

Conversation

cb22 commented Oct 23, 2023

Motivation

Changes

Benchmarks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

batiati commented Oct 23, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment