torchforge-data

Zero-copy, streaming data pipeline for edge-native machine learning in Rust.

Part of the torchforge-rs ecosystem.

The Destination

The long-term target of the torchforge-rs ecosystem is Federated Deep Reinforcement Learning (FDRL) at the edge: a fleet of constrained devices, each running a local DRL agent learning from its own physical environment, sharing only gradients — not raw data — with a coordinator. No cloud. No Python. No centralized data collection.

torchforge-data is the foundation layer for that target. The ReplayBuffer interface is designed with the federation boundary in mind from day one: what crosses the wire at v1.x are gradients, not experience tuples. That constraint shapes the data layer now.

This crate is v0.x infrastructure. FDRL is the v1.x target. The claim is not yet earned — the foundation has to be built first.

Why

Python's torch.utils.data.DataLoader has well-documented structural problems on constrained hardware:

Each worker process duplicates memory — consumption grows linearly with num_workers
Worker processes are torn down and recreated each epoch unless persistent_workers=True
GPU idle time of ~26% measured in profiling studies due to data pipeline stalls (arXiv:2211.04908)
No path to deterministic latency — the GIL and Python runtime introduce unpredictable pauses

These are acceptable tradeoffs on a 512GB cloud GPU node. They are not acceptable on an edge device with 512MB of RAM running a real-time reinforcement learning policy loop.

torchforge-data is built for the edge-first case: constrained memory, deterministic latency, continuous operation, no Python runtime.

Design Principles

Zero-copy where provably achievable — memory-mapped I/O as the default path; copies are explicit and justified
No multiprocessing overhead — parallelism via rayon (CPU) and tokio (async I/O, v0.5+), not forked processes
Streaming-first — datasets never need to fit in RAM
RL-native — replay buffers are first-class citizens, not afterthoughts
Federation-aware by design — the ReplayBuffer interface is designed with the FDRL boundary in mind: gradients are shared across devices, not raw experience tuples. No federation code ships here; the interface commitment is that none will be needed at v1.x.
Honest about unknowns — where optimal design is an open research question, we say so

Status

v0.0.1 — Pre-alpha. No stable API. Active design phase.

The repository structure, CI, governance documents, and OSS foundation are complete. Implementation of core abstractions begins at v0.1.0.

See ARCHITECTURE.md for current design decisions and open questions. See TODO.md for the full implementation roadmap.

Roadmap

Version	Goal
v0.1.0	`Dataset`, `DataLoader`, `ReplayBuffer` (AoS baseline), `MmapDataset` — enough for a DQN training loop
v0.2.0	`rayon` parallelism, SoA vs AoS benchmark, allocator comparison, prefetch buffer
v0.3.0	`PrioritizedReplayBuffer`, N-step returns, episode boundary tracking
v0.4.0	File format decision, Python interop reader/writer
v0.5.0	Async `DataLoader` via `tokio`
v1.0.0	Stable API, published edge benchmark, real RL training loop end-to-end

Planned API

⚠️ Illustrative only — will change before v0.1.0 is released.

use torchforge_data::{Dataset, DataLoader, LoaderConfig, ReplayBuffer};

// Streaming dataset from memory-mapped file
let dataset = MmapDataset::open("observations.bin")?;

let loader = DataLoader::new(dataset, LoaderConfig {
    batch_size: 32,
    shuffle: true,
    prefetch: 2,
    ..Default::default()
});

for batch in &loader {
    let batch = batch?;
    // batch is a zero-copy view into the memory-mapped region
}

// Replay buffer for online RL
let mut buffer: ReplayBuffer<Obs, Action, f32> = ReplayBuffer::new(50_000);
buffer.push(transition)?;
let batch = buffer.sample(32)?;

Contributing

See CONTRIBUTING.md for the full guide — prerequisites, branching model, PR process, and what "ready to merge" means for this project.

The most valuable contributions right now are:

Identifying incorrect assumptions in ARCHITECTURE.md
Benchmarks on real edge hardware (Raspberry Pi, Jetson, RISC-V boards)
Prior art we may have missed

Open an issue before submitting a PR — the design is not yet stable enough for unsolicited implementation PRs.

Please read our Code of Conduct before participating. To report a security issue, see SECURITY.md.

License

Apache-2.0. See LICENSE.

Part of the torchforge-rs ecosystem — also see torchforge-viz and torchforge-bench.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github		.github
bench_results		bench_results
benches		benches
docs		docs
fuzz		fuzz
src		src
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
ARCHITECTURE.md		ARCHITECTURE.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
TODO.md		TODO.md
deny.toml		deny.toml
rust-toolchain.toml		rust-toolchain.toml
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

torchforge-data

The Destination

Why

Design Principles

Status

Roadmap

Planned API

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

torchforge-data

The Destination

Why

Design Principles

Status

Roadmap

Planned API

Contributing

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages