Skip to content

zestones/Aria

Repository files navigation

ARIA


CI Docker Python React TimescaleDB Winner

In every factory there's one person who knows when a machine is about to fail — they hear it. When they retire, that knowledge disappears forever. ARIA captures it, watches the equipment, and diagnoses what goes wrong — so the one who knows is never the last.

Winner — Best Use of Claude Managed Agents · Anthropic x Cerebral Valley "Build With Opus 4.7" Hackathon


The problem

Setting up industrial maintenance software costs €50k–€500k and takes six months of specialists. So most plants don't bother — they wait for machines to break.

But that knowledge already exists. The operator hears the change in vibration two days before the bearing fails. They wrote it in the shift log. They told the next operator at handover. It just never makes it into a machine.

ARIA is the bridge. Drop in the manufacturer's PDF, answer some questions, and you're live in ten minutes. After that, ARIA continuously ingests everything that already exists: live signal trends, operator logbook entries, shift notes, machine failure history, and computed KPIs (OEE, MTBF, MTTR) — building a knowledge base that grows with every incident. When something goes wrong, five agents pass the problem like a real maintenance team passes a ticket: detection -> diagnosis -> work order -> memory.


What ARIA does

flowchart LR
    PDF["PDF manual"] --> KBBuilder["KB Builder"]
    Operator["Operator calibration"] --> KBBuilder
    KBBuilder --> DB[("KB + failure history")]

    Signals["Live signals"] --> Sentinel["Sentinel"]
    DB --> Sentinel
    Sentinel -- "drift warning" --> UI["Operator UI"]
    Sentinel -- "anomaly" --> Investigator["Investigator"]

    DB <--> Investigator
    Investigator -- "RCA" --> WOG["Work Order Generator"]
    WOG --> UI

    QA["Q&A"] -- "ask_investigator" --> Investigator
    QA --> UI
Loading
  • KB Builder — reads the manufacturer PDF with Opus vision and captures the operator's floor knowledge in some questions — the KB is live before they leave the terminal.
  • Sentinel — watches live signals against the KB; detects breaches, runs forecast on signal tails, and judges whether a drift warning is worth surfacing — or whether context says stand down.
  • Investigator — diagnoses anomalies with Opus 4.7 extended thinking; writes and runs Python in a sandboxed container to compute exact degradation rates; recalls and builds on past failures.
  • Work Order Generator — turns the RCA into actions, parts list, and intervention window.
  • Q&A — natural-language operator chat; hands off to the Investigator when a deep diagnosis is needed.

Quickstart

Three commands stand the entire stack up:

cp .env.example .env        # fill in ANTHROPIC_API_KEY and JWT_SECRET_KEY
make install                # one-time: backend venv + frontend node_modules (for IDE intellisense)
make up                     # docker compose: db + migrations + 4 simulators + backend + frontend

Open the operator UI:

Service URL Notes
Frontend http://localhost:5173 hot module reload
Backend API http://localhost:8000 OpenAPI docs at /docs
Database localhost:5432 (aria / aria_dev_password / aria) TimescaleDB

Default seeded users — pick one to log in:

Username Password Role
admin admin123 admin
operator operator123 operator
viewer viewer123 viewer

Source is bind-mounted from the host into both backend and frontend containers — every edit reloads instantly without a rebuild.

Note

The first make up builds the simulator, backend, and frontend images. Allow two to three minutes on a cold cache. Subsequent starts are seconds.


How it runs

flowchart LR
    Sims["Simulators<br/>(one container per cell)"]
    DB[("TimescaleDB<br/>operational hypertables<br/>+ agent state")]
    Backend["Backend<br/>FastAPI + 5 agents + MCP server"]
    Front["Frontend<br/>operator UI"]

    Sims --> DB
    Backend <--> DB
    Front <-->|"WebSocket + REST"| Backend
Loading

The simulators write machine_status, production_event, and process_signal_data rows into TimescaleDB at one-Hertz. The backend reads from the same database through a 17-tool MCP surface (the agents' only path to data) and pushes events to the frontend over a WebSocket bus.

Two simulator modes ship out of the box:

  • SIMULATOR_MODE=realtime (default). Cells idle near nominal; drift is barely visible over a thirty-minute rehearsal. Demo endpoints inject scenario spikes on cue.
  • SIMULATOR_MODE=demo. Compresses a seventy-two-hour failure scenario into roughly four minutes wall-clock. The Bottle Filler's bearing-wear vibration crosses the alert threshold on its own — useful for a hands-off rehearsal loop.

The full simulator engine and per-cell scenario walkthrough live in docs/architecture/08-simulators.md.


Stack

Backend. FastAPI on Python 3.12, asyncpg, Pydantic, FastMCP. black + flake8 + pyright enforce the contract.

Database. TimescaleDB (PostgreSQL with hypertables) for the operational time series. JSONB columns on the agent-facing tables, validated at every write through Pydantic mirrors.

Agents. Anthropic Python SDK against Claude Opus 4.7 (reasoning + PDF vision) and Sonnet (routine tool work). The Investigator runs on Claude Managed Agents with hosted MCP and a sandboxed Python container; everything else runs on the Messages API.

Frontend. React + TypeScript + Vite, Tailwind, TanStack Query. Biome for lint and format.

Simulators. Python with asyncpg. One container per monitored cell. Markov state machine plus composable signal behaviors — drift, noise, derived signals, fault triggers — driven entirely by per-scenario configuration.

Infrastructure. Docker Compose, GitHub Actions CI, optional Cloudflare tunnel for hosted-MCP exposure.


Repository layout

ARIA/
├── backend/                FastAPI + agents + MCP server
│   ├── agents/             KB Builder, Sentinel, Forecast, Investigator, WO Generator, Q&A
│   ├── aria_mcp/           FastMCP server — 16 read tools + 1 write tool
│   ├── modules/            Bounded contexts (kb, work_order, signal, kpi, logbook, chat, sandbox, ...)
│   ├── core/               Cross-cutting: ws_manager, thresholds, security, database
│   ├── infrastructure/     SQL migrations + idempotent seeds
│   └── tests/              Unit + integration + e2e smoke
├── frontend/               React app — operator UI
├── simulator/              Standalone Python package — one image, four scenarios
├── docs/                   Architecture, audits, planning, demo, PRD
├── docker-compose.yaml
├── Makefile
└── README.md

Documentation

Full documentation index →

Getting started

Understanding ARIA

Architecture and reference


Development

make help prints the full list. The handful you will actually reach for:

make up           # bring the full stack up (hot-reload on backend + frontend)
make deploy       # bring the full stack up with cloudflare tunnel for managed agents
make ps           # service status
make logs         # tail logs from every container
make down         # stop everything (including the tunnel if it is running)

make check        # all quality gates (CI parity): black + flake8 + pyright + biome + tsc
make format       # auto-format backend (black) + frontend (biome)
make backend.test # pytest unit suite
make e2e          # end-to-end backend smoke (requires the stack up)
make doctor       # detect dependency drift between manifests and running containers

make db.shell     # psql into the database
make db.reset     # drop volume and re-run migrations + seeds (destroys data — confirms first)
make db.seed      # re-apply the demo seeds idempotently

Hosting the Investigator on Anthropic Managed Agents

The Investigator runs on Claude Managed Agents with a sandboxed Python container for numerical diagnostics. Falls back to Messages API in under five minutes (INVESTIGATOR_USE_MANAGED=false).

1. Generate a path secret. Anthropic's mcp_servers config does not forward custom HTTP headers, so the URL itself is the bearer token.

echo "ARIA_MCP_PATH_SECRET=$(openssl rand -hex 32)" >> .env

2. Expose /mcp via Cloudflare Tunnel. Two flavours:

Persistent tunnel (stable hostname). Create the tunnel in the Cloudflare Zero Trust dashboard mapping a hostname to http://backend:8000, then:

echo "CF_TUNNEL_TOKEN=<token-from-dashboard>" >> .env
make up.tunnel

Quick tunnel (ephemeral URL, no account needed).

docker run --rm --network aria_aria cloudflare/cloudflared:latest \
  tunnel --url http://backend:8000

Either way, append the path secret to the tunnel URL — that becomes ARIA_MCP_PUBLIC_URL:

ARIA_MCP_PUBLIC_URL=https://<your-tunnel>.trycloudflare.com/mcp/<ARIA_MCP_PATH_SECRET>/

Verify with curl $ARIA_MCP_PUBLIC_URL — you should see an MCP protocol response, not a 404.

3. Flip the flag and restart the backend.

echo "INVESTIGATOR_USE_MANAGED=true" >> .env
make restart

Sentinel-triggered investigations now run on Managed Agents. Flip back to false and restart for a sub-five-minute rollback to the Messages API path — both paths share the same external contract.

Behind the scenes

Curious how we planned and shipped ARIA in one week? → See our project board and roadmap

About

In every factory there's one person who knows when a machine is about to fail — they hear it. When they retire, that knowledge disappears forever. ARIA captures it, watches the equipment, and diagnoses what goes wrong — so the one who knows is never the last.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors