Skip to content

wheattoast11/carl

Repository files navigation

CARL: from chaos to crystal

CARL

Coherence-Aware Reinforcement Learning

PyPI Python License Paper


Why

A model becomes an agent when it stops pattern-matching and starts knowing. That transition isn't gradual — it's a phase transition, like water becoming ice. One moment the model is guessing. The next, it's coherent.

Standard training can't see this happening. You watch a loss curve and hope.

CARL measures the moment of crystallization — and rewards it.

                         Phi (order parameter)
                              │
          guessing            │         knowing
     ░░░░░░░░░░░░░░░░░░░░░░░░│████████████████████████
                              │
                        crystallization

The order parameter Phi measures how coherent a model's probability field is at every token. When Phi crystallizes, the model has found its internal anchor — a fixed point it can navigate from to any concept space without losing itself.

This is alignment you can measure, not just evaluate.


Install

pip install carl-studio                     # core CLI + one-shot observe
pip install 'carl-studio[training]'        # local train/eval
pip install 'carl-studio[hf]'              # status/logs/stop/push/HF Jobs
pip install 'carl-studio[tui]'             # observe --live
pip install 'carl-studio[observe]'         # observe --diagnose
pip install 'carl-studio[all]'             # everything

Most users should start with:

pip install 'carl-studio[training,hf]'

Auth

CARL Studio does not require a .env, and it does not auto-load one.

  • Hugging Face workflows work with either HF_TOKEN or a prior hf auth login / huggingface-cli login
  • Claude-powered features use ANTHROPIC_API_KEY or --api-key
  • RunPod uses RUNPOD_API_KEY
  • public Trackio observe works without credentials

If you want a template, copy .env.example and load it into your shell before running carl:

cp .env.example .env
set -a
source .env
set +a

Quick setup:

hf auth login
export ANTHROPIC_API_KEY=sk-ant-xxx   # only for --diagnose / chat
carl config show

Full auth details: docs/auth.md

Use

See inside a Trackio run (no GPU required, base install):

carl observe --url https://your-trackio-space.hf.space/ --run your-run

If the dashboard contains multiple projects, add --project your-project.

Train with coherence rewards (carl-studio[training]):

carl project init
carl train --config carl.yaml

Or run directly from the CLI:

carl train --model Tesslate/OmniCoder-9B --method grpo --dataset your-org/your-dataset --output-repo your-org/your-model --compute a100-large

Gate a checkpoint (carl-studio[training]):

carl eval --adapter your-username/your-model

How It Works

 ┌─────────┐     ┌─────────┐     ┌─────────┐     ┌──────┐     ┌──────┐
 │ Observe │ ──> │ Measure │ ──> │  Train  │ ──> │ Gate │ ──> │ Ship │
 │         │     │   Phi   │     │  CARL   │     │      │     │      │
 └─────────┘     └─────────┘     └─────────┘     └──────┘     └──────┘
  point at        entropy +       task rewards     cascade      push to
  any run         order param     + coherence      auto-fires   hub

Observe — Point CARL at a Trackio dashboard or log file. Instantly see Phi trajectory, entropy, phase state, health.

Measure — Phi = 1 - H(P)/log|V|. Zero means maximum uncertainty. One means complete coherence. Computed per token, every step.

Train — Five reward functions in a cascade. Task rewards teach what. CARL rewards teach how coherently.

Gate — The cascade auto-calibrates from the training signal. No hardcoded thresholds. CARL activates only when the model demonstrates sustained capability.

Ship — Eval gate passes → checkpoint pushed to Hub.


CLI Install Matrix

Workflow Command Install
One-shot observe carl observe --url ... --run ... pip install carl-studio
Live observe carl observe --live ... pip install 'carl-studio[tui]'
Claude diagnosis carl observe --diagnose ... pip install 'carl-studio[observe]'
Local train/eval carl train, carl eval pip install 'carl-studio[training]'
HF job management / publish carl status, carl logs, carl stop, carl push pip install 'carl-studio[hf]'

Managed tiers build on top of these open workflows; extras control local capabilities, not research access.

Credential Matrix

Workflow Auth
Local file observe none
Public Trackio observe none
Claude diagnosis / chat ANTHROPIC_API_KEY or --api-key
Hub jobs / push / gated model access HF_TOKEN or prior HF login
RunPod backend RUNPOD_API_KEY

Results

Trained with CARL on OmniCoder-9B:

Metric Value
Task completion 92%
Tool format compliance 99%
Mean tool calls per task 11.09
Phase 2' eval gate PASS

80 GRPO steps. Five reward functions. Self-calibrating cascade gate.


Papers

The math is published and independently reproducible:


Reference

Architecture, API, CLI commands, environments, compute backends → docs/reference.md

Credential setup and provider auth → docs/auth.md


Star History

Star History

terminals.tech · PyPI · Paper · Docs

MIT — Intuition Labs LLC

About

Coherence-Aware Reinforcement Learning (CARL) - breakthrough LLM post-training and test-time training paradigm. carl builds the world's most advanced and intelligent agent systems that are a step change over current gen agents

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors