Event-sourced actor system combining the C++ Actor Framework (CAF) for in-process actors, Aeron IPC for sub-microsecond inter-process messaging, and Apache Kudu for low-latency columnar persistence.
Distributed manufacturing telemetry is the design target — its latency budget, audit-trail requirements, and continuous-operation needs shaped every piece of the architecture. The primitives generalize well past that, though: anywhere you need durable event sourcing for many concurrent actors over ultra-low-latency IPC, reach is the right shape. Observability pipelines, real-time analytics, simulation harnesses, and high-rate sensor networks all fit comfortably.
Reach is part of the weathership.org portfolio of open-source infrastructure projects. It started life as a research example inside our asf-kudu fork; as the CAF/Aeron layer matured beyond what made sense as an "example" inside Kudu's own tree, we promoted it into its own project with Kudu as a submodule dependency.
Reach is a multi-process system. CAF actors model equipment state and orchestrate event flow in one process; a Kudu service handles durable columnar persistence in another; Aeron's shared-memory transport stitches them together with a sub-microsecond round trip.
The split is deliberate — each subsystem keeps its full threading model and fault domain to itself:
- CAF runs an uncontested actor scheduler. Clean, predictable dispatch for tens of thousands of equipment actors with no exogenous I/O stalling the message loop.
- Kudu runs its KRPC threadpool, tablet-server cooperation, and client-side scheduling in their native environment, without contending for resources with anything else.
- Aeron is the bus, and we like it a lot. A decade of production hardening in low-latency finance and aerospace gives us shared-memory IPC with ≈0.25μs RTT, bounded jitter, zero-copy on the hot path, and a mechanical-sympathy design ethos that lines up with manufacturing-floor latency budgets far better than anything TCP-based.
The result:
- A crash in any subsystem stays in that subsystem.
- The cross-process contract is explicit — every message is an Aeron payload you can capture, replay, and reason about.
- Actor compute and persistence I/O scale independently.
- Recovery is parallel: 100 equipment entities replay in ≈23 ms (≈4,347 entities/sec).
The examples/phases/ directory preserves the exploratory work that
led to this shape. It is documentation of the journey, not current API
surface.
reach/
├── components/kudu/ # asf-kudu submodule (rch fork — CAF/Aeron-friendly fixes)
├── src/ # libreach: event sourcing + Aeron bridge + chaos metrics
├── apps/ # production binaries (kudu_service_simple, caf_aeron_*, …)
├── examples/
│ ├── caf-stdlib/ # upstream CAF examples (smoke tests for CAF alone)
│ ├── aeron-stdlib/ # plain Aeron pub/sub primitives
│ └── phases/ # historical phase-progression record (phase3 crashes by design)
├── tests/ # C++ unit/integration tests
├── features/ # behave BDD scenarios — operator/workflow level
├── config/ # HOCON config (single source of truth)
├── docs/current/ # mdbook documentation (architecture, evaluation, roadmap)
└── docs/scratch/ # daily work notes (docs/scratch/<iso-date>/)
Reach uses devenv (Nix-based) and
direnv for reproducible builds. The submodule
pins our fork (rch/asf-kudu) on branch rch/devenv, which carries
CAF/Aeron-related stability fixes not yet landed upstream.
git clone --recursive git@github.com:weathership/oss-reach.git reach
cd reach
direnv allow # picks up devenv shell
# One-time: build Kudu's thirdparty + client libs inside the submodule.
just build-kudu # ~30–60 min once thirdparty is cached
# Build reach.
just configure && just build
# Smoke test (requires aeron-driver running; `devenv up` or `aeronmd &`).
./build/tests/test_event_sourcingFull task list:
just configure # cmake -B build -S .
just build # cmake --build build
just test # ctest
just behave # BDD scenarios (tier-0 + tier-1)
just docs-serve # mdbook live preview
just sync-kudu # update submodule to its tracked branch's tip- Process isolation is non-negotiable. Anything that drags Kudu client code into a CAF actor process is a regression.
- Event sourcing is deterministic.
Apply(event)must be a pure function of current state. - Recovery is parallel.
recovery_coordinatorfans out per-entity replay over Aeron — sequential replay is a perf regression.
See CLAUDE.md for the full set of contributor norms and
gotchas.
The full architecture / evaluation / roadmap lives in the mdbook at
docs/current/. Build with just docs-build or serve with
just docs-serve.
Notable chapters:
architecture/process-isolation.md— why CAF + Kudu in one process crashes, and how Aeron IPC sidesteps itarchitecture/event-sourcing.md— CQRS model, snapshots, and replayevaluation/results.md— recovery throughput, telemetry rates, process dependency graphroadmap/— CEP via RxCpp, attribute-based encryption, ML integration, FHE for privacy-preserving process analytics, hierarchical-storage FDW
Early — the architecture has been validated end-to-end (see
docs/current/src/architecture/process-isolation.md for the
phase-by-phase record), and the build is green on Linux against
nixpkgs-rolling. The BDD harness, CI, and CAF-mail-API migration are
on the near-term punch-list.
Apache 2.0 — see LICENSE.