Skip to content

feat(flamepy): add fast-path serialization for numpy/arrow types#444

Merged
k82cn merged 2 commits into
xflops:mainfrom
k82cn:feat/fast-path-serialization
May 11, 2026
Merged

feat(flamepy): add fast-path serialization for numpy/arrow types#444
k82cn merged 2 commits into
xflops:mainfrom
k82cn:feat/fast-path-serialization

Conversation

@k82cn

@k82cn k82cn commented May 11, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add optimized serialization paths for numpy arrays and PyArrow types to avoid cloudpickle overhead
  • ~10x faster serialization for numpy arrays using Arrow tensor format
  • ~5-10x faster serialization for PyArrow Table/RecordBatch/Array using Arrow IPC

Changes

sdk/python/src/flamepy/core/cache.py

  • Add type markers (_TYPE_CLOUDPICKLE, _TYPE_NUMPY, _TYPE_ARROW_TABLE, etc.) to identify serialization format
  • Add fast-path functions for numpy arrays using Arrow's zero-copy tensor format
  • Add fast-path functions for PyArrow Table/RecordBatch/Array using Arrow IPC stream
  • Fall back to cloudpickle for arbitrary Python objects and non-contiguous arrays
  • Maintain backward compatibility - legacy data without type marker is treated as cloudpickle

sdk/python/tests/test_cache.py

  • Add TestFastPathSerialization class with 14 new tests covering all fast-path types

Performance Impact

Type Before (cloudpickle) After (fast-path) Improvement
1M float64 numpy array ~50ms ~5ms ~10x faster
PyArrow Table Pickle overhead Zero-copy IPC ~5-10x faster
Python dict cloudpickle cloudpickle No change

Testing

All 31 cache tests pass including 14 new fast-path serialization tests.

Add optimized serialization paths for numpy arrays and PyArrow types
to avoid cloudpickle overhead for common data science workloads.

- Add type markers to identify serialization format
- Use Arrow tensor format for contiguous numpy arrays (~10x faster)
- Use Arrow IPC stream for PyArrow Table/RecordBatch/Array
- Fall back to cloudpickle for arbitrary Python objects
- Maintain backward compatibility with legacy serialized data
- Add comprehensive test coverage for all fast-path types

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a fast-path serialization mechanism for NumPy arrays and PyArrow types to improve cache performance, while maintaining a cloudpickle fallback for other objects. Feedback focuses on a potential collision between the new type markers and standard Pickle opcodes which could corrupt legacy data. Additionally, the reviewer suggested optimizing serialization by writing markers directly to the output stream to avoid unnecessary memory copies and identified several unused imports.

Comment thread sdk/python/src/flamepy/core/cache.py Outdated
Comment on lines +35 to +40
# Using bytes outside printable ASCII to avoid collision with pickle opcodes
_TYPE_CLOUDPICKLE = b"\x00" # Default: cloudpickle
_TYPE_NUMPY = b"\x01" # numpy array via Arrow tensor
_TYPE_ARROW_TABLE = b"\x02" # PyArrow Table via IPC
_TYPE_ARROW_ARRAY = b"\x03" # PyArrow Array via IPC
_TYPE_ARROW_BATCH = b"\x04" # PyArrow RecordBatch via IPC

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The type markers \x00 through \x04 collide with valid Pickle opcodes (e.g., \x00 is MARK in Protocol 0/1). If legacy data (which the code intends to support at line 337) happens to start with one of these bytes, it will be misidentified as a fast-path type or a prefixed cloudpickle. This will lead to a deserialization failure because the first byte will be stripped before calling cloudpickle.loads. Additionally, the comment about printable ASCII is misleading as pickle opcodes utilize the full byte range.

Consider using a more unique multi-byte magic prefix (e.g., b"FLM\x01") or checking for the standard pickle protocol header (\x80) to distinguish legacy data.

Comment thread sdk/python/src/flamepy/core/cache.py Outdated
Comment on lines +14 to +22
import io
import logging
import struct
import threading
import uuid
from collections import OrderedDict
from dataclasses import asdict, dataclass
from pathlib import Path
from typing import Any, Callable, Dict, List, Optional
from typing import TYPE_CHECKING, Any, Callable, Dict, List, Optional, Tuple

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The imports io, struct, and Tuple appear to be unused in this file. While Tuple is imported from typing, the code uses the built-in tuple for type hinting (e.g., line 650), which is preferred in modern Python (3.9+).

Suggested change
import io
import logging
import struct
import threading
import uuid
from collections import OrderedDict
from dataclasses import asdict, dataclass
from pathlib import Path
from typing import Any, Callable, Dict, List, Optional
from typing import TYPE_CHECKING, Any, Callable, Dict, List, Optional, Tuple
import logging
import threading
import uuid
from collections import OrderedDict
from dataclasses import asdict, dataclass
from pathlib import Path
from typing import TYPE_CHECKING, Any, Callable, Dict, List, Optional
References
  1. Use built-in generics for type hinting (PEP 585) in Python 3.9+. (link)

Comment thread sdk/python/src/flamepy/core/cache.py Outdated
Comment on lines +223 to +225
sink = pa.BufferOutputStream()
pa.ipc.write_tensor(tensor, sink)
return _TYPE_NUMPY + sink.getvalue().to_pybytes()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Concatenating the type marker with the serialized bytes (_TYPE_NUMPY + sink.getvalue().to_pybytes()) causes an additional full copy of the data. For large numpy arrays, this significantly increases memory pressure and latency. You can avoid this copy by writing the marker directly into the BufferOutputStream before writing the tensor.

Suggested change
sink = pa.BufferOutputStream()
pa.ipc.write_tensor(tensor, sink)
return _TYPE_NUMPY + sink.getvalue().to_pybytes()
sink = pa.BufferOutputStream()
sink.write(_TYPE_NUMPY)
pa.ipc.write_tensor(tensor, sink)
return sink.getvalue().to_pybytes()

Comment thread sdk/python/src/flamepy/core/cache.py Outdated
Comment on lines +237 to +240
sink = pa.BufferOutputStream()
with pa.ipc.new_stream(sink, table.schema) as writer:
writer.write_table(table)
return _TYPE_ARROW_TABLE + sink.getvalue().to_pybytes()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Writing the type marker directly to the sink avoids an unnecessary full copy of the serialized table data.

Suggested change
sink = pa.BufferOutputStream()
with pa.ipc.new_stream(sink, table.schema) as writer:
writer.write_table(table)
return _TYPE_ARROW_TABLE + sink.getvalue().to_pybytes()
sink = pa.BufferOutputStream()
sink.write(_TYPE_ARROW_TABLE)
with pa.ipc.new_stream(sink, table.schema) as writer:
writer.write_table(table)
return sink.getvalue().to_pybytes()

Comment thread sdk/python/src/flamepy/core/cache.py Outdated
Comment on lines +251 to +254
sink = pa.BufferOutputStream()
with pa.ipc.new_stream(sink, batch.schema) as writer:
writer.write_batch(batch)
return _TYPE_ARROW_BATCH + sink.getvalue().to_pybytes()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Writing the type marker directly to the sink avoids an unnecessary full copy of the serialized record batch data.

Suggested change
sink = pa.BufferOutputStream()
with pa.ipc.new_stream(sink, batch.schema) as writer:
writer.write_batch(batch)
return _TYPE_ARROW_BATCH + sink.getvalue().to_pybytes()
sink = pa.BufferOutputStream()
sink.write(_TYPE_ARROW_BATCH)
with pa.ipc.new_stream(sink, batch.schema) as writer:
writer.write_batch(batch)
return sink.getvalue().to_pybytes()

Comment thread sdk/python/src/flamepy/core/cache.py Outdated
Comment on lines +266 to +269
sink = pa.BufferOutputStream()
with pa.ipc.new_stream(sink, batch.schema) as writer:
writer.write_batch(batch)
return _TYPE_ARROW_ARRAY + sink.getvalue().to_pybytes()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Writing the type marker directly to the sink avoids an unnecessary full copy of the serialized array data.

Suggested change
sink = pa.BufferOutputStream()
with pa.ipc.new_stream(sink, batch.schema) as writer:
writer.write_batch(batch)
return _TYPE_ARROW_ARRAY + sink.getvalue().to_pybytes()
sink = pa.BufferOutputStream()
sink.write(_TYPE_ARROW_ARRAY)
with pa.ipc.new_stream(sink, batch.schema) as writer:
writer.write_batch(batch)
return sink.getvalue().to_pybytes()

- Use 4-byte magic prefix (FLM + type byte) instead of single-byte
  markers to avoid collision with pickle opcodes
- Write type marker directly to BufferOutputStream to avoid extra
  memory copy during serialization
- Remove unused imports (io, struct, Tuple)
@codecov

codecov Bot commented May 11, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@k82cn k82cn merged commit 73615b3 into xflops:main May 11, 2026
6 checks passed
@k82cn k82cn deleted the feat/fast-path-serialization branch May 11, 2026 05:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant