Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/designs/RFE318-cache/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## [Unreleased]

### Added
- Design for issue #318 item 3: native DataSet/DataFrame cache payloads stored as Arrow schemas and record batches instead of opaque binary rows.
- Native Arrow table payload support in object-cache Flight and disk storage paths.
- FlamePy native tabular payload classification for PyArrow tables/batches plus optional pandas/polars DataFrames.
- Unit and E2E coverage for native Arrow cache payloads.
- `patch` operation support in `ObjectCache` and `FlightCacheServer` (PR #6)
- `patch_object` function in Python SDK (PR #6)
- Append-only semantics for object updates (PR #6)
Expand Down
251 changes: 200 additions & 51 deletions docs/designs/RFE318-cache/FS.md

Large diffs are not rendered by default.

17 changes: 16 additions & 1 deletion docs/designs/RFE318-cache/STATUS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,20 @@
# Object Cache Implementation Status

## ✅ Implemented: Native DataSet/DataFrame Direct Cache Path

Issue #318 item 3 is implemented for tabular payloads that can be represented as Arrow data directly in object cache, instead of wrapping them as pickled objects or Arrow IPC bytes inside the opaque `{version, data}` row.

Scope:
- Detect PyArrow tables/batches, pandas DataFrames, polars DataFrames, and adapter-backed Dataset-like objects in FlamePy without adding mandatory pandas/polars/datasets dependencies.
- Stream and persist native tabular payloads as Arrow schemas and record batches with reserved `flame.cache.*` schema metadata.
- Preserve the existing opaque object path, existing `ObjectRef` shape, and legacy `version/data` cache files.
- Keep native tabular patch/merge semantics out of scope for this item.

Verification:
- Rust object-cache unit coverage includes native Arrow payload storage, reload, and Flight `do_get` schema/batch streaming.
- Python unit coverage includes native payload classification, direct remote `do_put` schema emission, and native Arrow `do_get` parsing.
- E2E tests were added for native Arrow table put/get and update paths.

## Migration from Naive Cache

✅ **Completed**: The naive HTTP-based cache in `flame-executor-manager` has been removed. The following changes were made:
Expand All @@ -15,7 +30,7 @@ The object cache is now provided as an embedded library within the `flame-execut
### Core Implementation
- ✅ Rust Arrow Flight server implemented (`object_cache/src/cache.rs`)
- ✅ Disk persistence using Arrow IPC format
- ✅ Key-based storage organization (`session_id/object_id`)
- ✅ Key-based storage organization (`app_name/session_id/object_id`)
- ✅ In-memory index with HashMap
- ✅ Object loading from disk on startup
- ✅ Configuration support (flame-cluster.yaml with storage path)
Expand Down
33 changes: 33 additions & 0 deletions e2e/tests/test_cache.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
import uuid

import flamepy.core.cache as cache_module
import pyarrow as pa
import pyarrow.flight as flight
import pytest
from flamepy.core import FlameContext, ObjectRef, get_object, patch_object, put_object, update_object
Expand All @@ -37,6 +38,38 @@ def test_cache_put_and_get():
assert result == test_data


def test_cache_put_and_get_native_arrow_table():
"""Test that Arrow tables round-trip through the native cache path."""
key_prefix = f"test-app/test-native-arrow-{uuid.uuid4().hex[:8]}"
table = pa.table({"col1": [1, 2, 3], "col2": ["a", "b", "c"]})

ref = put_object(key_prefix, table)

assert ref.version == 1

result = get_object(ref)
assert isinstance(result, pa.Table)
assert result.to_pydict() == table.to_pydict()
assert b"flame.cache.format" not in (result.schema.metadata or {})


def test_cache_native_arrow_table_update():
"""Test that native Arrow table updates rewrite the base object."""
key_prefix = f"test-app/test-native-arrow-update-{uuid.uuid4().hex[:8]}"
table = pa.table({"value": [1, 2, 3]})
updated = pa.table({"value": [4, 5], "label": ["x", "y"]})

ref = put_object(key_prefix, table)
updated_ref = update_object(ref, updated)

assert updated_ref.key == ref.key
assert updated_ref.version == ref.version + 1

result = get_object(updated_ref)
assert isinstance(result, pa.Table)
assert result.to_pydict() == updated.to_pydict()


def test_cache_update():
"""Test update operation."""
key_prefix = "test-app/test-session-002"
Expand Down
Loading
Loading