Fast, compact, and easy-to-use binary format for time-series data
Combines CSV simplicity with binary performance and compression
BCSV combines the ease of use of CSV files with the performance and storage efficiency of binary formats, specifically optimized for large time-series datasets on both high-performance and embedded platforms.
- No schema files: Define data structures directly in your code (C++, Python, C#)
- Self-documenting: File header contains all type information, like CSV but enforced
- Streaming first: Read/write data larger than available RAM, row by row
- Constant-time operations: Predictable performance for real-time recording
- Time-series optimized: Efficient compression for constant values and binary waveforms
- Crash-resilient: Retrieve data even from incomplete/interrupted writes
- Natural to use: Feels like CSV, works like binary
| Goal | Target | Status |
|---|---|---|
| Sequential recording | 1000 channels @ 1KHz (STM32F4), 10KHz (STM32F7/Zynq/RPi) | ✅ Achievable |
| Idle file growth | <1KB/s for 1000-channel 10KHz stream (counter only) | ✅ With ZoH |
| Processing speed | ≥1M rows/sec for 1000-channel streams on Zen3 CPU | |
| Compression ratio | <30% of equivalent CSV size | ✅ 15-25% typical |
| Platform support | C/C++, C#, Python | ✅ All supported |
* Typical datasets (8-33 columns) achieve 3.6M rows/sec (flexible API) to 7.5M rows/sec (static API). The 1M rows/sec target for 1000-channel streams requires further optimization.
Target users: Anyone running metrology, data acquisition, or telemetry tasks with digital tools.
# Header-only library - just copy include directory
cp -r include/bcsv/ your_project/include/
# Or use CMake
cmake -B build -S . -DCMAKE_BUILD_TYPE=Release
cmake --build build -j#include <bcsv/bcsv.h>
int main() {
// Create layout
bcsv::Layout layout;
layout.addColumn({"timestamp", bcsv::ColumnType::DOUBLE});
layout.addColumn({"temperature", bcsv::ColumnType::FLOAT});
layout.addColumn({"sensor_id", bcsv::ColumnType::UINT16});
// Write data
bcsv::Writer<bcsv::Layout> writer(layout);
writer.open("measurements.bcsv", true);
writer.row().set(0, 1234567890.0);
writer.row().set(1, 23.5f);
writer.row().set(2, uint16_t{42});
writer.writeRow();
writer.close();
// Read data
bcsv::Reader<bcsv::Layout> reader;
reader.open("measurements.bcsv");
while (reader.readNext()) {
auto timestamp = reader.row().get<double>(0);
auto temp = reader.row().get<float>(1);
auto id = reader.row().get<uint16_t>(2);
std::cout << "Sensor " << id << ": " << temp << "°C @ " << timestamp << "\n";
}
}- Header-only C++20 library: Easy integration, no linking required
- LZ4 compression: Fast compression with excellent ratios (70-85% reduction)
- xxHash64 checksums: Fast, reliable data integrity validation (3-5x faster than CRC32)
- Zero-Order Hold (ZoH): Extreme compression for constant/sparse data
- Dual API: Flexible runtime interface + static compile-time interface
- Type safety: Enforced types per column with compile-time or runtime validation
- Crash recovery: Read last complete row even from interrupted writes
- Random access: Efficient seeking for non-sequential reads
- C++ API: Modern C++20 header-only library
- C API: Shared library (.dll/.so) for language bindings
- Python: Full pandas integration via PyBCSV package
- C# / Unity: Game development integration with minimal GC impact
- CLI Tools: Professional csv2bcsv and bcsv2csv converters
- 3.6M-7.5M rows/second processing (8-33 columns, flexible/static API, Release build, Zen3)
- 127K+ rows/second for 1000-channel wide datasets
- 40-60% compression with LZ4 on real-world datasets
- 75-96% size reduction vs CSV (ZoH for sparse data)
- 3-5x faster checksums with xxHash64 vs CRC32
- Constant-time writes: No compression spikes (streaming mode)
Current Version: v1.2.0-dev (Active Development)
- ✅ Replaced CRC32 with xxHash64 (3-5x faster checksums)
- ✅ Removed Boost dependency (zero external dependencies)
- ✅ Upgraded to C++20 (modern concepts and features)
- ✅ Improved test coverage (59/59 tests passing)
See ARCHITECTURE.md for detailed roadmap.
Next milestones:
- v1.3.0 (Dec 2025): Streaming LZ4 compression (constant write latency)
- v1.4.0 (Jan 2026): File indexing for fast random access
- v1.5.0 (Feb 2026): Variable-length encoding (VLE) for better compression
- v2.0.0 (Q2 2026): Stable release with compatibility guarantees
- ARCHITECTURE.md: Design decisions, performance requirements, technical details
- VERSIONING.md: Automated versioning system
- examples/CLI_TOOLS.md: CLI tool usage and examples
- examples/PERFORMANCE_COMPARISON.md: Benchmark results
- python/README.md: Python package documentation
- unity/README.md: Unity integration guide
| Type | Size | Range/Precision | Use Case |
|---|---|---|---|
BOOL |
1 byte | true/false | Flags, states |
UINT8 |
1 byte | 0-255 | Small counters, IDs |
UINT16 |
2 bytes | 0-65535 | Sensor IDs, small values |
UINT32 |
4 bytes | 0-4.3B | Timestamps, large counters |
UINT64 |
8 bytes | 0-18.4E | High-res timestamps |
INT8 |
1 byte | -128 to 127 | Small signed values |
INT16 |
2 bytes | -32K to 32K | Temperature readings |
INT32 |
4 bytes | -2.1B to 2.1B | Standard integers |
INT64 |
8 bytes | Large range | 64-bit signed data |
FLOAT |
4 bytes | ±3.4E±38, 7 digits | Sensor values |
DOUBLE |
8 bytes | ±1.7E±308, 15 digits | High-precision measurements |
STRING |
Variable | UTF-8 text | Labels, descriptions |
Automatic type optimization in CLI tools minimizes storage by selecting smallest appropriate type.
- C++20 compiler: GCC 10+, Clang 12+, MSVC 2019+
- CMake 3.20+
- Git (for dependency fetching)
# Configure (Release recommended for performance)
cmake -B build -S . -DCMAKE_BUILD_TYPE=Release
# Build everything
cmake --build build -j
# Run tests
./build/bin/bcsv_gtest
# Run examples
./build/bin/example
./build/bin/performance_benchmark
# Use CLI tools
./build/bin/csv2bcsv data.csv
./build/bin/bcsv2csv data.bcsv output.csvcd python/
pip install -e . # Development mode
# or
pip install . # Standard installCMake Project:
add_subdirectory(external/bcsv)
target_link_libraries(your_target PRIVATE bcsv)Header-only (manual):
#include "bcsv/bcsv.h"
// No linking needed, just include path# Convert CSV to BCSV (auto-detect format)
csv2bcsv measurements.csv
# European CSV (semicolon delimiter, comma decimal)
csv2bcsv -d ';' --decimal-separator ',' data.csv
# Convert back to CSV
bcsv2csv measurements.bcsv output.csvimport pybcsv
import pandas as pd
# Pandas integration
df = pd.DataFrame({'time': [1, 2, 3], 'value': [10.5, 20.3, 15.7]})
pybcsv.write_dataframe(df, "data.bcsv")
df_read = pybcsv.read_dataframe("data.bcsv")
print(df_read)using BCSV;
var layout = new BcsvLayout();
layout.AddColumn("timestamp", BcsvColumnType.DOUBLE);
layout.AddColumn("position_x", BcsvColumnType.FLOAT);
var writer = new BcsvWriter(layout);
writer.Open("player_data.bcsv");
var row = writer.GetRow();
row.SetDouble(0, Time.time);
row.SetFloat(1, transform.position.x);
writer.WriteRow();
writer.Close();Best for dynamic schemas, prototyping, varying structures:
bcsv::Layout layout;
layout.addColumn({"id", bcsv::ColumnType::INT32});
layout.addColumn({"name", bcsv::ColumnType::STRING});
bcsv::Writer<bcsv::Layout> writer(layout);
writer.open("data.bcsv", true);
writer.row().set(0, int32_t{123});
writer.row().set(1, std::string{"Alice"});
writer.writeRow();Best for known schemas, performance-critical code (4-5x faster):
using MyLayout = bcsv::LayoutStatic<int32_t, std::string>;
auto layout = MyLayout::create({"id", "name"});
bcsv::Writer<MyLayout> writer(layout);
writer.open("data.bcsv", true);
writer.row().set<0>(int32_t{123}); // Type-safe compile-time index
writer.row().set<1>(std::string{"Alice"});
writer.writeRow();| Operation | Speed | Details |
|---|---|---|
| Sequential write (static) | 4.7M rows/sec | 8-column dataset |
| Sequential read (static) | 7.5M rows/sec | 8-column dataset |
| Sequential write (flexible) | 3.6M rows/sec | 8-column dataset |
| Sequential read (flexible) | 2.3M rows/sec | 8-column dataset |
| 1000-channel stream | 127K rows/sec | Wide dataset (1000 columns) |
| CSV → BCSV | 130K rows/sec | Auto type optimization |
| BCSV → CSV | 220K rows/sec | Round-trip conversion |
| Compression ratio | 15-25% | Typical vs CSV |
| ZoH compression | 3-4% | Sparse/constant data |
Dataset: 105K rows × 33 columns (time-series data)
- CSV size: 105 MB
- BCSV size: 16.3 MB (84% reduction)
- Processing: 127K rows/second
- Type optimization: 13 FLOAT + 19 DOUBLE + 1 UINT32 (vs 33 STRING default)
- LZ4: Embedded in repository (v1.10.0) or system version
- xxHash: Embedded in repository (v0.8.3)
- C++20 Standard Library: No external runtime dependencies
Optional (build/test only):
- Google Test: Auto-fetched by CMake for testing
- Python: For PyBCSV package
- pybind11: Auto-fetched for Python bindings
MIT License - Copyright (c) 2025 Tobias Weber
See LICENSE file for full details.
Contributions welcome! Please:
- Open an issue for bugs or feature requests
- Include benchmarks for performance-related changes
- Add tests for new features
- Update documentation for API changes
bcsv/
├── include/bcsv/ # Header-only library (copy this to integrate)
├── examples/ # Usage examples and CLI tools
├── tests/ # Comprehensive test suite
├── python/ # Python package (PyBCSV)
├── unity/ # C# Unity integration
├── cmake/ # Build system utilities
├── CMakeLists.txt # Main build configuration
├── README.md # This file
├── ARCHITECTURE.md # Design and requirements
└── VERSIONING.md # Version management
Ready to get started? Check out the examples/ directory or dive into ARCHITECTURE.md for technical details.