Interoperability layer for Embeddenator: format conversions, FFI bindings, and language integrations.
Independent component extracted from the Embeddenator monolithic repository. Part of the Embeddenator workspace.
Repository: https://github.com/tzervas/embeddenator-interop
**** - Core functionality complete.
Implementation includes:
- Format conversion (JSON, bincode, text)
- C/C++ FFI bindings with automated header generation
- Python bindings (PyO3) - requires
pythonfeature - Envelope compression support (Zstd, LZ4)
- High-level adapter layers
- JSON: Human-readable, cross-language compatible
- Bincode: Efficient binary serialization
- Text: Debug-friendly output format
- Round-trip guarantees for JSON and bincode formats
- Support for all core Embeddenator types (SparseVec, Engram, Manifest, VSAConfig)
- C-compatible interface for cross-language integration
- Opaque handle-based API for memory safety
- Core operations: encode, decode, bundle, bind, cosine similarity
- Serialization to/from JSON for data interchange
- Well-documented safety requirements
- PyO3-based Python API (enable
pythonfeature) - Pythonic interface with property accessors
- Native integration with Python bytes and strings
- JSON and bincode serialization support
- EnvelopeAdapter: Full compression support with Zstd and LZ4 codecs
- FileAdapter: High-level file I/O operations
- StreamAdapter: Streaming encode/decode for large data
- BatchAdapter: Batch operations for efficiency
- AutoFormatAdapter: Automatic format detection
- Zstd: High compression ratio (enable
compression-zstdfeature) - LZ4: Fast compression (enable
compression-lz4feature) - None: No compression for maximum speed
- Full round-trip guarantees for all compression codecs
- Automatic C header generation via cbindgen (enable
c-bindingsfeature) - Headers generated in
include/embeddenator_interop.h - C++ compatible with proper include guards
- Full documentation included in generated headers
- Backend-agnostic VSA operations
- Vector store abstraction
- Candidate generation and reranking
- Runtime integration support
use embeddenator_interop::formats::{sparse_vec_to_format, sparse_vec_from_format, OutputFormat};
use embeddenator_vsa::SparseVec;
// Create a vector
let vec = SparseVec {
pos: vec![1, 2, 3],
neg: vec![4, 5],
};
// Convert to JSON
let json_bytes = sparse_vec_to_format(&vec, OutputFormat::JsonPretty).unwrap();
let from_json = sparse_vec_from_format(&json_bytes, OutputFormat::Json).unwrap();
// Convert to bincode
let bincode_bytes = sparse_vec_to_format(&vec, OutputFormat::Bincode).unwrap();
let from_bincode = sparse_vec_from_format(&bincode_bytes, OutputFormat::Bincode).unwrap();
// Debug text format
let text = sparse_vec_to_format(&vec, OutputFormat::Text).unwrap();
println!("{}", String::from_utf8(text).unwrap());use embeddenator_interop::FileAdapter;
use embeddenator_vsa::{SparseVec, ReversibleVSAConfig};
use embeddenator_fs::Manifest;
// Save and load sparse vectors
let vec = SparseVec::new();
FileAdapter::save_sparse_vec("vector.bin", &vec).unwrap();
let loaded = FileAdapter::load_sparse_vec("vector.bin").unwrap();
// Save and load config
let config = ReversibleVSAConfig::default();
FileAdapter::save_vsa_config("config.json", &config).unwrap();
let loaded_config = FileAdapter::load_vsa_config("config.json").unwrap();
// Save and load manifests
let manifest = Manifest {
files: Vec::new(),
total_chunks: 0,
};
FileAdapter::save_manifest("manifest.json", &manifest).unwrap();
let loaded_manifest = FileAdapter::load_manifest("manifest.json").unwrap();use embeddenator_interop::BatchAdapter;
use embeddenator_vsa::ReversibleVSAConfig;
let config = ReversibleVSAConfig::default();
let data_chunks = vec![b"hello".as_slice(), b"world".as_slice()];
// Batch encode
let vectors = BatchAdapter::batch_encode(&data_chunks, &config);
// Batch similarity
let query = vectors[0].clone();
let similarities = BatchAdapter::batch_similarity(&query, &vectors);
// Batch bundle
let bundled = BatchAdapter::batch_bundle(&vectors);#include "embeddenator_interop.h"
// Create vectors
SparseVecHandle* vec1 = sparse_vec_new();
SparseVecHandle* vec2 = sparse_vec_new();
// Perform operations
SparseVecHandle* bundled = sparse_vec_bundle(vec1, vec2);
double similarity = sparse_vec_cosine(vec1, vec2);
// Encode data
VSAConfigHandle* config = vsa_config_new();
const char* data = "Hello, C!";
SparseVecHandle* encoded = vsa_encode_data(config, (const uint8_t*)data, strlen(data), NULL);
// Serialize to JSON
ByteBuffer json = sparse_vec_to_json(encoded);
// Use json.data, json.len...
byte_buffer_free(json);
// Cleanup
sparse_vec_free(vec1);
sparse_vec_free(vec2);
sparse_vec_free(bundled);
sparse_vec_free(encoded);
vsa_config_free(config);from embeddenator_interop import SparseVec, VSAConfig
# Create vectors
vec1 = SparseVec.from_indices([1, 2, 3], [4, 5])
vec2 = SparseVec.from_indices([2, 3, 4], [5, 6])
# Operations
bundled = vec1.bundle(vec2)
similarity = vec1.cosine(vec2)
# Encode data
config = VSAConfig.new()
data = b"Hello, Python!"
encoded = config.encode(data, None)
# Serialize
json_str = encoded.to_json()
bytes_data = encoded.to_bytes()Default features: None
Optional features:
python: Enable Python bindings via PyO3c-bindings: Enable automated C header generation with cbindgencompression-zstd: Enable Zstd compression codeccompression-lz4: Enable LZ4 compression codeccompression: Enable all compression codecs (zstd + lz4)
[dependencies]
embeddenator-interop = { version = "0.20.0-alpha.1" }
# With Python support
embeddenator-interop = { version = "0.20.0-alpha.1", features = ["python"] }# Standard build
cargo build --manifest-path embeddenator-interop/Cargo.toml
# With compression support
cargo build --manifest-path embeddenator-interop/Cargo.toml --features compression
# Generate C headers (creates include/embeddenator_interop.h)
cargo build --manifest-path embeddenator-interop/Cargo.toml --features c-bindings
# With all features
cargo build --manifest-path embeddenator-interop/Cargo.toml --all-features
# With Python support
cargo build --manifest-path embeddenator-interop/Cargo.toml --features python
# Release build
cargo build --manifest-path embeddenator-interop/Cargo.toml --release# Run all tests
cargo test --manifest-path embeddenator-interop/Cargo.toml
# Run with Python tests
cargo test --manifest-path embeddenator-interop/Cargo.toml --features python- formats.rs: Format conversion utilities (JSON, bincode, text)
- ffi.rs: C FFI bindings with opaque handles
- bindings.rs: Python bindings via PyO3 (optional)
- adapters.rs: High-level adapter layers
- kernel_interop.rs: Backend-agnostic VSA operations
| Type | JSON | Bincode | Text |
|---|---|---|---|
| SparseVec | ✓ | ✓ | ✓ (read-only) |
| Engram | ✓ | ✓ | ✓ (read-only) |
| Manifest | ✓ | ✓ | ✓ (read-only) |
| SubEngram | ✓ | ✓ | ✓ (read-only) |
| VSAConfig | ✓ | ✓ | ✓ (read-only) |
All FFI functions are marked unsafe and require:
- Valid, non-null pointers
- Proper memory management (caller frees returned memory)
- Null-terminated UTF-8 strings
- No use-after-free violations
See FFI documentation for detailed safety contracts.
- Include generated header (requires cbindgen)
- Link against libraryembeddenator_interop.a or .so
- Follow opaque handle pattern
- Always free allocated resources
- Build with
--features python - Install as Python module
- Use Pythonic interface with native types
- Serialization integrates with pickle/JSON
- Use adapter layers for high-level operations
- Use formats module for conversion needs
- Use kernel_interop for backend integration
- Direct access to all functionality
- Bincode is ~10x faster than JSON for serialization
- Batch operations reduce overhead for multiple items
- Streaming adapters minimize memory usage for large data
- FFI calls have minimal overhead (single indirection)
MIT
- ADR-016 - Component decomposition rationale
- embeddenator - Main repository
- embeddenator-vsa - VSA implementation
- embeddenator-fs - Filesystem types
- embeddenator-io - I/O utilities