Snapshot is the process of persisting current data live store (in memory) to disk for fast dimension table recovery (as alternative to replay every events), and enable merging and purging redo-logs.
File layout on disk
Base on table level configurations, when the scheduler ticks it will check whether: number of mutations on live store is over a threshold, or a pre defined time interval passed. If either condition is satisfied for a dimension table, a snapshot will be created for that table.
Snapshot manager will record current live store status:
number of mutations,
last read record, then start persisting live shards into disk, after which it will update live store and metastore with latest status.
When a table is bootstrapped, the recovery process will check with metastore on the latest snapshot info, and use latest available snapshot to fast rebuild table.