Skip to content

weathership/reach

Repository files navigation

reach

License: Apache 2.0

Event-sourced actor system combining the C++ Actor Framework (CAF) for in-process actors, Aeron IPC for sub-microsecond inter-process messaging, and Apache Kudu for low-latency columnar persistence.

Distributed manufacturing telemetry is the design target — its latency budget, audit-trail requirements, and continuous-operation needs shaped every piece of the architecture. The primitives generalize well past that, though: anywhere you need durable event sourcing for many concurrent actors over ultra-low-latency IPC, reach is the right shape. Observability pipelines, real-time analytics, simulation harnesses, and high-rate sensor networks all fit comfortably.

Reach is part of the weathership.org portfolio of open-source infrastructure projects. It started life as a research example inside our asf-kudu fork; as the CAF/Aeron layer matured beyond what made sense as an "example" inside Kudu's own tree, we promoted it into its own project with Kudu as a submodule dependency.

Architecture

Reach is a multi-process system. CAF actors model equipment state and orchestrate event flow in one process; a Kudu service handles durable columnar persistence in another; Aeron's shared-memory transport stitches them together with a sub-microsecond round trip.

The split is deliberate — each subsystem keeps its full threading model and fault domain to itself:

  • CAF runs an uncontested actor scheduler. Clean, predictable dispatch for tens of thousands of equipment actors with no exogenous I/O stalling the message loop.
  • Kudu runs its KRPC threadpool, tablet-server cooperation, and client-side scheduling in their native environment, without contending for resources with anything else.
  • Aeron is the bus, and we like it a lot. A decade of production hardening in low-latency finance and aerospace gives us shared-memory IPC with ≈0.25μs RTT, bounded jitter, zero-copy on the hot path, and a mechanical-sympathy design ethos that lines up with manufacturing-floor latency budgets far better than anything TCP-based.

The result:

  • A crash in any subsystem stays in that subsystem.
  • The cross-process contract is explicit — every message is an Aeron payload you can capture, replay, and reason about.
  • Actor compute and persistence I/O scale independently.
  • Recovery is parallel: 100 equipment entities replay in ≈23 ms (≈4,347 entities/sec).

The examples/phases/ directory preserves the exploratory work that led to this shape. It is documentation of the journey, not current API surface.

Repository layout

reach/
├── components/kudu/   # asf-kudu submodule (rch fork — CAF/Aeron-friendly fixes)
├── src/               # libreach: event sourcing + Aeron bridge + chaos metrics
├── apps/              # production binaries (kudu_service_simple, caf_aeron_*, …)
├── examples/
│   ├── caf-stdlib/    # upstream CAF examples (smoke tests for CAF alone)
│   ├── aeron-stdlib/  # plain Aeron pub/sub primitives
│   └── phases/        # historical phase-progression record (phase3 crashes by design)
├── tests/             # C++ unit/integration tests
├── features/          # behave BDD scenarios — operator/workflow level
├── config/            # HOCON config (single source of truth)
├── docs/current/      # mdbook documentation (architecture, evaluation, roadmap)
└── docs/scratch/      # daily work notes (docs/scratch/<iso-date>/)

Quick start

Reach uses devenv (Nix-based) and direnv for reproducible builds. The submodule pins our fork (rch/asf-kudu) on branch rch/devenv, which carries CAF/Aeron-related stability fixes not yet landed upstream.

git clone --recursive git@github.com:weathership/oss-reach.git reach
cd reach
direnv allow                                    # picks up devenv shell

# One-time: build Kudu's thirdparty + client libs inside the submodule.
just build-kudu                                 # ~30–60 min once thirdparty is cached

# Build reach.
just configure && just build

# Smoke test (requires aeron-driver running; `devenv up` or `aeronmd &`).
./build/tests/test_event_sourcing

Full task list:

just configure            # cmake -B build -S .
just build                # cmake --build build
just test                 # ctest
just behave               # BDD scenarios (tier-0 + tier-1)
just docs-serve           # mdbook live preview
just sync-kudu            # update submodule to its tracked branch's tip

Architecture invariants

  • Process isolation is non-negotiable. Anything that drags Kudu client code into a CAF actor process is a regression.
  • Event sourcing is deterministic. Apply(event) must be a pure function of current state.
  • Recovery is parallel. recovery_coordinator fans out per-entity replay over Aeron — sequential replay is a perf regression.

See CLAUDE.md for the full set of contributor norms and gotchas.

Documentation

The full architecture / evaluation / roadmap lives in the mdbook at docs/current/. Build with just docs-build or serve with just docs-serve.

Notable chapters:

  • architecture/process-isolation.md — why CAF + Kudu in one process crashes, and how Aeron IPC sidesteps it
  • architecture/event-sourcing.md — CQRS model, snapshots, and replay
  • evaluation/results.md — recovery throughput, telemetry rates, process dependency graph
  • roadmap/ — CEP via RxCpp, attribute-based encryption, ML integration, FHE for privacy-preserving process analytics, hierarchical-storage FDW

Status

Early — the architecture has been validated end-to-end (see docs/current/src/architecture/process-isolation.md for the phase-by-phase record), and the build is green on Linux against nixpkgs-rolling. The BDD harness, CI, and CAF-mail-API migration are on the near-term punch-list.

License

Apache 2.0 — see LICENSE.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors