state_machine: reduce memory usage by about 200 MiB #1429

matklad · 2024-01-15T15:29:00Z

This one is tricky! The big picture here is that we have a cache of objects, which is a normal cache with arbitrary eviction policy.

However, we want to maintain an invariant --- all objects touched by a bar of events must not be evicted during this bar.

To achieve that, we place a stash below the cache. The job of a stash is to catch all objects that fall out from the cache inside a single bar (between bars, the stash is reset).

What's the size of the stash that we need?

The conservative estimate is the number of queries for the cache. That is, inserts + lookups, and that is, using the old logic,

@as(u32, ObjectTree.Table.value_count_max) +
    (options.prefetch_entries_max * constants.lsm_batch_multiple)

The insight of this commit is that a lookup and an insert for the same key are double counted that way.

In other words, what we are interested in is not the amount of queries to the cache overall, but the amount of different keys the queries touch.

And for most of operations, we are actually going to update exactly the keys we've prefetched.

The three exceptions are:

lookup transfers
lookup accounts
fetching dependant transfer for posting/voiding

@as

This one is tricky! The big picture here is that we have a cache of objects, which is a normal cache with arbitrary eviction policy. However, we want to maintain an invariant --- all objects touched by a bar of events must not be evicted during this bar. To achieve that, we place a stash below the cache. The job of a stash is to catch all objects that fall out from the cache inside a single bar (between bars, the stash is reset). What's the size of the stash that we need? The conservative estimate is the number of queries for the cache. That is, inserts + lookups, and that is, using the old logic, @as(u32, ObjectTree.Table.value_count_max) + (options.prefetch_entries_max * constants.lsm_batch_multiple) The insight of this commit is that a lookup and an insert _for the same key_ are double counted that way. In other words, what we are interested in is not the amount of queries to the cache overall, but the amount of _different keys_ the queries touch. And for most of operations, we are actually going to update exactly the keys we've prefetched. The three exceptions are: - lookup transfers - lookup accounts - fetching dependant transfer for posting/voiding

matklad assigned cb22 Jan 15, 2024

matklad force-pushed the matklad/thinner-cache-map branch 3 times, most recently from b40a869 to 13a5fcf Compare January 16, 2024 15:31

matklad force-pushed the matklad/thinner-cache-map branch 2 times, most recently from dcabfce to a3eb0a7 Compare January 17, 2024 12:37

cb22 approved these changes Jan 19, 2024

View reviewed changes

matklad added this pull request to the merge queue Jan 19, 2024

Merged via the queue into main with commit f7cf98e Jan 19, 2024
25 checks passed

matklad deleted the matklad/thinner-cache-map branch January 19, 2024 13:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

state_machine: reduce memory usage by about 200 MiB #1429

state_machine: reduce memory usage by about 200 MiB #1429

matklad commented Jan 15, 2024

state_machine: reduce memory usage by about 200 MiB #1429

state_machine: reduce memory usage by about 200 MiB #1429

Conversation

matklad commented Jan 15, 2024