Side-by-side comparison of three modern table/storage formats through runnable Python wrappers and agent-driven exploration.
| Format | Category | Key concept |
|---|---|---|
| Delta Lake | ACID data-lake table | Transaction log (_delta_log) drives versioning; checkpoints compact the log |
| Apache Iceberg | Analytic table format | Snapshot/manifest tree separates metadata from data; branches are snapshot refs |
| SlateDB | Embedded LSM key-value store | WAL → L0 SSTs → compacted runs; every state transition is visible in manifests |
This repo is for educational purposes and designed to be explored with a coding agent :
- Clone the repo and install dependencies.
- Open a chat.
- Ask questions like "show me what happens to Iceberg metadata when I append a row" or "walk me through the SlateDB LSM phases".
- The agent writes and runs code using the wrappers in
table_formats_demo/, saves tables underdemo_output/<format>/, and exports human-readable YAML explanation files underdemo_output/<format>/scratch/. - The agent explains what changed and links to the generated files.
Rule for agents: always use the wrappers in
table_formats_demo/(or write code that calls their APIs) to generate outputs. Never hand-craft YAML or JSON manifests. SeeAGENTS.mdfor full guidance.
All the code in this repo are coding agent generated, with very brief review by author. The repo is only meant for demo and educational purposes.
table_formats_demo/
├── base/ # Shared models (TableRow, OperationResult, …) and abstract TableFormat
├── delta/ # DeltaFormat – wraps deltalake
├── iceberg/ # IcebergFormat – wraps pyiceberg with SQLite catalog
├── slatedb/ # SlateDBFormat – wraps slatedb (LSM-tree key-value store)
└── utils/ # yaml_helpers, logging_config
demos/ # Runnable end-to-end demo scripts
tests/ # pytest test suite (one file per wrapper)
demo_output/ # Generated at runtime, git-ignored
delta/
users/ # Delta table files
scratch/ # YAML explanation files (transaction log entries)
iceberg/
default/users/ # Iceberg data + metadata
catalog/ # SQLite catalog
scratch/ # YAML explanation files (metadata snapshots, manifests)
slatedb/
users/ # SlateDB table files (wal/, manifest/, compacted/)
scratch/ # YAML explanation files (manifest versions)
Prerequisites: Python 3.11+, uv
git clone https://github.com/vigneshc/TableFormatsDemo.git
cd TableFormatsDemo
uv sync --devEach demo script creates a fresh table, runs a complete lifecycle of operations, and writes YAML scratch files for inspection.
uv run python demos/delta_demo.py
uv run python demos/iceberg_demo.py
uv run python demos/slatedb_demo.pyOutput is written to demo_output/<format>/. Scratch files land in demo_output/<format>/scratch/.
uv run pytest
uv run pytest --cov=table_formats_demo --cov-report=htmlfrom table_formats_demo.delta.delta_format import DeltaFormat
delta = DeltaFormat(base_path="demo_output/delta", table_name="users")
delta.create_table(initial_data=...) # writes _delta_log/00000000000000000000.json
delta.append_data(...) # new version entry in transaction log
delta.perform_maintenance() # checkpoint + vacuum + compact
delta.export_scratch("demo_output/delta/scratch") # YAML per log entryfrom table_formats_demo.iceberg.iceberg_format import IcebergFormat
iceberg = IcebergFormat(base_path="demo_output/iceberg", table_name="users")
iceberg.create_table(initial_data=...) # snapshot 0, manifest list/manifest
iceberg.append_data(...) # new snapshot with new manifest
iceberg.perform_maintenance() # full-table rewrite → compacted snapshot
iceberg.create_branch("feature") # named snapshot ref in metadata
iceberg.export_scratch("demo_output/iceberg/scratch") # YAML per .json and .avro filefrom table_formats_demo.slatedb.slatedb_format import SlateDBFormat
db = SlateDBFormat(base_path="demo_output/slatedb", table_name="users")
db.create_table(initial_data=...) # opens DB, writes rows
db.flush_wal_only() # WAL SST on disk, memtable unchanged
db.flush_memtable_to_l0() # memtable → L0 SST, manifest updated
db.compact_l0_to_lower_levels() # L0 SSTs → compacted runs
db.create_clone(clone_name="snapshot") # zero-copy checkpoint-based clone
db.export_scratch("demo_output/slatedb/scratch") # YAML per manifest version| Concept | Delta Lake | Iceberg | SlateDB |
|---|---|---|---|
| Versioning unit | Transaction log entry | Snapshot | Manifest |
| Metadata format | JSON (+ Parquet checkpoint) | JSON + Avro | FlatBuffer (readable via admin API) |
| Compaction | optimize.compact() |
Full-table overwrite | L0 → compacted runs |
| Branching | Not supported (use clone) | Snapshot refs | Checkpoint-based clone |
| Catalog | None (path-based) | SQL (SQLite here) | N/A |
MIT