Skip to content

workloftai/loop-policy-update

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

loop-policy-update

Outer-loop measurement for LLM-powered scoring policies. Two-level autoresearch pattern from arXiv 2605.30003.

Shipped 2026-05-29 by Workloft. Full write-up:

What it does

You have a policy generator (a prompt, a scorer, a routing rule). Its outputs get filed somewhere downstream (tickets, todos, transactions). Each downstream item has an outcome (shipped, succeeded, killed).

This script answers: "are my policy's outputs predicting the outcomes I care about, on a per-category basis."

For each axis the policy scores against, it reports:

  • n_picks — how many policy outputs landed in that axis
  • mean_score — policy's average confidence for that axis
  • moved / killed / open — downstream outcomes
  • conversion = moved / n_picks
  • axis_health = conversion / (mean_score / 10) — 1.0 = perfectly calibrated

Outer loop measures. Inner loop tunes. The framework says outer goes first. Run this before you re-prompt anything.

Workloft-specific dependencies

The Workloft version reads:

  • /home/workloft/walt/data/hf-papers/hf-YYYY-MM-DD.top.json — Walt's daily HF paper picks
  • gary_todos Supabase table — Gary's todo outcomes (status, stage, title)

Swap those two reads for your own policy outputs and your own outcome store and the rest of the script applies unchanged.

Run it

python3 -m weight_loop --days 30 [--json]

Reports land in ./reports/walt-axis-health-<timestamp>.txt (or .json with --json).

What this is not

  • Not RLHF. No model is retrained.
  • Not the inner loop. No prompts are tuned.
  • Not a substitute for human axis choice. The axes are still chosen by you.

It is the layer between "policy exists" and "policy is the right one." Most production stacks skip it.

Licence

MIT. See LICENSE.

About

Outer-loop measurement for LLM-powered scoring policies (two-level autoresearch)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages