[codex] Implement native tabular cache path by k82cn · Pull Request #464 · xflops/flame

k82cn · 2026-05-19T01:30:43Z

Summary

Add native Arrow table cache payload support in the Rust object cache and disk storage.
Update FlamePy to cache PyArrow tables/batches and optional pandas/polars DataFrames directly instead of wrapping them as opaque bytes.
Refresh the RFE318 cache design notes and add Rust, Python unit, and E2E coverage for the native path.

Impact

DataSet/DataFrame-style cache writes can now preserve Arrow schemas and record batches as first-class cache payloads while keeping the existing opaque object path, ObjectRef shape, and legacy cache file compatibility.

Validation

cargo check -p flame-object-cache
cargo test -p flame-object-cache
cargo clippy -p flame-object-cache --all-targets -- -D warnings
cargo fmt --all -- --check
sdk/python/.venv/bin/python -m pytest sdk/python/tests/test_cache.py
sdk/python/.venv/bin/ruff check sdk/python/src/flamepy/core/cache.py sdk/python/tests/test_cache.py e2e/tests/test_cache.py
sdk/python/.venv/bin/ruff format --check sdk/python/src/flamepy/core/cache.py sdk/python/tests/test_cache.py e2e/tests/test_cache.py
git diff --check

Note

The new E2E tests were added but not run against a live local Flame cache here because the local flame.yaml lacks a current-context.

gemini-code-assist

Code Review

This pull request implements a native direct cache path for tabular data (PyArrow, pandas, and polars) to avoid the overhead of cloudpickle and opaque binary wrapping. Key changes include a new ObjectPayload enum in Rust to distinguish between opaque and native Arrow table payloads, updated storage logic to persist original Arrow schemas with reserved metadata, and a Python-side classifier to detect tabular objects. Feedback highlights an efficiency concern in get_flight_info where a full disk load is triggered just to retrieve a schema, and a reliability issue in the Python SDK where pa.ipc.new_file should use a context manager to ensure proper file closure during exceptions.

gemini-code-assist · 2026-05-19T01:36:20Z

+            match self.cache.get(&object_key).await {
+                Ok(object) => match object.payload {
+                    ObjectPayload::ArrowTable { schema, .. } => {
+                        Bytes::from(encode_schema(&schema)?)
+                    }
+                    ObjectPayload::Opaque(_) => Bytes::new(),
+                },
+                Err(_) => Bytes::new(),
+            }


Calling self.cache.get(&object_key).await inside get_flight_info can be inefficient for large native tables that are not currently resident in memory. This operation triggers a full load of all record batches from disk just to extract the schema. Consider implementing a lighter-weight storage operation that only retrieves the schema from the Arrow IPC file header.

gemini-code-assist · 2026-05-19T01:36:20Z

+        writer = pa.ipc.new_file(object_path, payload.schema)
+        for batch in payload.batches:
+            writer.write_batch(batch)
        writer.close()


The pa.ipc.RecordBatchFileWriter (returned by pa.ipc.new_file) should be used as a context manager to ensure the file is properly closed and the footer is written even if an exception occurs during write_batch. While writer.close() is called at the end of the block, it won't be reached if an error occurs in the loop.

Suggested change

writer = pa.ipc.new_file(object_path, payload.schema)

for batch in payload.batches:

writer.write_batch(batch)

writer.close()

with pa.ipc.new_file(object_path, payload.schema) as writer:

for batch in payload.batches:

writer.write_batch(batch)

codecov · 2026-05-19T02:11:50Z

Codecov Report

❌ Patch coverage is 89.75904% with 17 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
object_cache/src/cache.rs	88.54%	11 Missing ⚠️
object_cache/src/storage/disk.rs	94.11%	4 Missing ⚠️
object_cache/src/storage/none.rs	0.00%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

Implement native tabular cache path

73ed2f8

k82cn marked this pull request as ready for review May 19, 2026 01:32

gemini-code-assist Bot reviewed May 19, 2026

View reviewed changes

Address native cache review feedback

5b7752b

k82cn merged commit 644a414 into xflops:main May 19, 2026
6 checks passed

k82cn deleted the flm_318_3 branch May 19, 2026 02:11

k82cn mentioned this pull request May 19, 2026

Flame Object Cache enhancement #318

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codex] Implement native tabular cache path#464

[codex] Implement native tabular cache path#464
k82cn merged 2 commits into
xflops:mainfrom
k82cn:flm_318_3

k82cn commented May 19, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 19, 2026

Uh oh!

gemini-code-assist Bot May 19, 2026

Uh oh!

Uh oh!

codecov Bot commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

k82cn commented May 19, 2026

Summary

Impact

Validation

Note

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov Bot commented May 19, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant