Coherence-Aware Reinforcement Learning
A model becomes an agent when it stops pattern-matching and starts knowing. That transition isn't gradual — it's a phase transition, like water becoming ice. One moment the model is guessing. The next, it's coherent.
Standard training can't see this happening. You watch a loss curve and hope.
CARL measures the moment of crystallization — and rewards it.
Phi (order parameter)
│
guessing │ knowing
░░░░░░░░░░░░░░░░░░░░░░░░│████████████████████████
│
crystallization
The order parameter Phi measures how coherent a model's probability field is at every token. When Phi crystallizes, the model has found its internal anchor — a fixed point it can navigate from to any concept space without losing itself.
This is alignment you can measure, not just evaluate.
pip install carl-studio # core CLI + one-shot observe
pip install 'carl-studio[training]' # local train/eval
pip install 'carl-studio[hf]' # status/logs/stop/push/HF Jobs
pip install 'carl-studio[tui]' # observe --live
pip install 'carl-studio[observe]' # observe --diagnose
pip install 'carl-studio[all]' # everythingMost users should start with:
pip install 'carl-studio[training,hf]'CARL Studio does not require a .env, and it does not auto-load one.
- Hugging Face workflows work with either
HF_TOKENor a priorhf auth login/huggingface-cli login - Claude-powered features use
ANTHROPIC_API_KEYor--api-key - RunPod uses
RUNPOD_API_KEY - public Trackio observe works without credentials
If you want a template, copy .env.example and load it into your shell before running carl:
cp .env.example .env
set -a
source .env
set +aQuick setup:
hf auth login
export ANTHROPIC_API_KEY=sk-ant-xxx # only for --diagnose / chat
carl config showFull auth details: docs/auth.md
See inside a Trackio run (no GPU required, base install):
carl observe --url https://your-trackio-space.hf.space/ --run your-runIf the dashboard contains multiple projects, add --project your-project.
Train with coherence rewards (carl-studio[training]):
carl project init
carl train --config carl.yamlOr run directly from the CLI:
carl train --model Tesslate/OmniCoder-9B --method grpo --dataset your-org/your-dataset --output-repo your-org/your-model --compute a100-largeGate a checkpoint (carl-studio[training]):
carl eval --adapter your-username/your-model ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌──────┐ ┌──────┐
│ Observe │ ──> │ Measure │ ──> │ Train │ ──> │ Gate │ ──> │ Ship │
│ │ │ Phi │ │ CARL │ │ │ │ │
└─────────┘ └─────────┘ └─────────┘ └──────┘ └──────┘
point at entropy + task rewards cascade push to
any run order param + coherence auto-fires hub
Observe — Point CARL at a Trackio dashboard or log file. Instantly see Phi trajectory, entropy, phase state, health.
Measure — Phi = 1 - H(P)/log|V|. Zero means maximum uncertainty. One means complete coherence. Computed per token, every step.
Train — Five reward functions in a cascade. Task rewards teach what. CARL rewards teach how coherently.
Gate — The cascade auto-calibrates from the training signal. No hardcoded thresholds. CARL activates only when the model demonstrates sustained capability.
Ship — Eval gate passes → checkpoint pushed to Hub.
| Workflow | Command | Install |
|---|---|---|
| One-shot observe | carl observe --url ... --run ... |
pip install carl-studio |
| Live observe | carl observe --live ... |
pip install 'carl-studio[tui]' |
| Claude diagnosis | carl observe --diagnose ... |
pip install 'carl-studio[observe]' |
| Local train/eval | carl train, carl eval |
pip install 'carl-studio[training]' |
| HF job management / publish | carl status, carl logs, carl stop, carl push |
pip install 'carl-studio[hf]' |
Managed tiers build on top of these open workflows; extras control local capabilities, not research access.
| Workflow | Auth |
|---|---|
| Local file observe | none |
| Public Trackio observe | none |
| Claude diagnosis / chat | ANTHROPIC_API_KEY or --api-key |
| Hub jobs / push / gated model access | HF_TOKEN or prior HF login |
| RunPod backend | RUNPOD_API_KEY |
Trained with CARL on OmniCoder-9B:
| Metric | Value |
|---|---|
| Task completion | 92% |
| Tool format compliance | 99% |
| Mean tool calls per task | 11.09 |
| Phase 2' eval gate | PASS |
80 GRPO steps. Five reward functions. Self-calibrating cascade gate.
The math is published and independently reproducible:
- Bounded Informational Time Crystals — derives the conservation law
- Material Reality — validates across 6,244 trials
- Semantic Realizability — formal proof
Architecture, API, CLI commands, environments, compute backends → docs/reference.md
Credential setup and provider auth → docs/auth.md
terminals.tech · PyPI · Paper · Docs
MIT — Intuition Labs LLC
