A todo system the agent cannot cheat.
This repo is the open-source mirror of the watertight Workloft Loop pilot. We
spent a day building enforcement around our agent stack's todo list because
items were rotting (164 open, many overdue two weeks) and nothing forced
resolution. The result is a small set of pieces that together make sure every
item ends in shipped or killed. No third state.
Full write-up: workloft.ai/ships/watertight-todos-2026-05-23.html
The biggest one. Before the agent can end a session, this hook queries the
todo table for items in in_progress status owned by the agent. If any
exist, the hook exits with code 2 and a model-visible message listing each.
Exit code 2 is the Claude Code convention for "do not stop yet". The model
must transition each item to shipped, blocked, or killed before it can close
out. Enforcement at the harness layer, not the prompt layer.
Register in ~/.claude/settings.json:
"hooks": {
"Stop": [{
"hooks": [{
"type": "command",
"command": "/path/to/hooks/loop_stop_audit.py",
"timeout": 5
}]
}]
}Hourly cron. For each Loop item:
- Open past due → Telegram ask with options + 72h expiry + flip to blocked
- Blocked aged 24h+ → louder Telegram reminder
- Blocked past expiry → auto-fire
default_action(constrained tokill,extend-3d, ortransfer-to-alfred)
The escalator can resolve any item without a human.
Snooze requires a --reason and caps at one attempt per item. A second
snooze auto-flips the item to blocked instead of indefinitely deferring.
Snooze was the original rot vector. It is closed now.
Each cron writes a heartbeat file on success. A watchdog (also cron) checks heartbeat ages against expected cadences and fires an edge-triggered Telegram alert if any cron misses its window. Recovery messages too.
No Healthchecks.io, no signup, no external dependency.
08:00 BST morning: shipping today, decisions owed, stale items.
22:00 BST evening: shipped, slipped, queued, archived, plus the pilot
evaluation report inline (gary/status_report.py).
Five metrics with pass/fail thresholds. Run gary status-report (or
python3 gary/status_report.py) for the current scoreboard.
Two migrations in migrations/. Together they add:
ownertext (bob | alfred)stagetext (research | ship | publish)snooze_countint with a hard cap of 1default_actiontext constrained to {kill, extend-3d, transfer-to-alfred}stage_entered_attimestamptz (TTL anchor that resists gaming viaupdated_attouches)blocked_at+escalation_expires_attimestamps for the blocked-state lifecycle- Status enum extended to {open, in_progress, blocked, shipped, done, cancelled, killed}
This is the small viable version of "the agent cannot leave work in a state nobody is tracking". It runs in production at Workloft on a one-person stack.
It is not a framework. It is not a SaaS. It is a small set of scripts you can read, fork, and adapt. The interesting move is the Stop hook, not the schema.
Day 1 of a 7-day pilot. Caught its first contract violation 30 minutes after going live. Day-7 decision gate: pass means generalise to other Gary tags; fail means redesign before generalising. Full pilot rubric in the Ship article.
MIT. Use, fork, learn from, contribute back if you find a hole.
A small Supabase-backed ledger of every public post (LinkedIn, X, future channels). One row per posted artefact: channel, slug, posted_url, posted_at, ship_ref (link to the article being promoted), hero_path, hashtags, char count, source. CLI:
workloft-post log --channel linkedin --slug X --url https://... --ship-ref https://...
workloft-post list [--channel X]
workloft-post show <id-prefix>
workloft-post statsThe point: Maggie-style JSON queues hold intent. A posts ledger holds outcome. Different jobs. Without a separate record of what actually went out, the agent stack cannot tell scheduled-but-never-posted from genuinely-shipped.
Migration: migrations/2026-05-23-workloft-posts.sql.