Skip to content

trotsky1997/OpenFugu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OpenFugu

An open, runnable reverse-engineering of Sakana AI's Fugu — the "one model to command them all" LLM orchestrator.

Fugu is sold as a single model; it is really a policy over models — a tiny coordinator that, per query, routes work to a pool of frontier LLMs and returns one answer. Sakana's product and trained weights are closed. OpenFugu rebuilds the mechanism from the two papers + released artifacts, verifies it against real weights, trains a Conductor of our own, and serves it behind one OpenAI-compatible endpoint. Four stages, all working: read → run → train → serve.

Independent reimplementation. Not affiliated with Sakana AI. No third-party code/weights are redistributed here — scripts/fetch_artifacts.py pulls them from their licensed sources. See NOTICE.

What's inside

Stage What Evidence
read docs/HOW_FUGU_IS_IMPLEMENTED.md — full math; docs/ARCHITECTURE.md — investigation log, evidence-graded reverse-engineered from papers + author code
run openfugu/mini.py (TRINITY: hidden-state → linear head → worker); openfugu/ultra.py (Conductor: workflow-DAG) mini.py --self-test = 95% agent / 100% role on the 37-case fixture, real weights
train train/train_trinity.py — self-train the TRINITY coordinator from scratch via sep-CMA-ES (no Sakana weights); train/train_conductor.py — GRPO a Conductor on nvidia/ToolScale TRINITY: chance→optimal routing in ~5 generations (mock, runs anywhere); Conductor: reward 1.21 → 1.64 over 100 steps (curve)
serve openfugu/serve.py — one OpenAI-compatible /v1/chat/completions; internal TRINITY loop over a litellm pool curl returns one answer; pool hidden
eval eval/eval_orchestration.py — does per-question routing beat the best single model? trained router +107% over best single worker (query-level routing, not per-step coordination — see results caveat)

Quickstart

pip install -r requirements.txt           # torch, transformers, trl, litellm, ...
python scripts/fetch_artifacts.py         # pull Qwen3-0.6B + model_iter_60.npy + fixture (not redistributed)

export FUGU_MODEL=$(...Qwen3-0.6B path...)
export FUGU_VECTOR=$PWD/artifacts/model_iter_60.npy
export FUGU_FIXTURE=$PWD/artifacts/qwen_router_prompt_eval_cases.json

# READ:  the architecture, evidence-graded
less docs/HOW_FUGU_IS_IMPLEMENTED.md

# RUN:   prove the reconstruction is faithful to the checkpoint
python openfugu/mini.py --self-test       # -> 95% / 100%

# RUN:   route one query (offline mock pool)
python openfugu/mini.py --demo

# RUN (live): real worker pool via litellm
export FUGU_API_KEY=...  FUGU_BASE_URL=...
python openfugu/mini.py --demo --live \
  --slot-models "novita/deepseek/deepseek-v4-flash,novita/zai-org/glm-5,..."

# TRAIN: a Conductor on ToolScale (8x A800-class; HF generation, no vLLM)
python train/train_conductor.py           # reward climbs off zero; saves checkpoint

# TRAIN: Fugu-Ultra recursive topology — Conductor revises its own output (test-time scaling)
python train/train_recursion.py           # mock: +9% over one-shot (toy policy w/ headroom)
python train/train_recursion_real.py      # REAL recursion (round-0 fed back into round-1)
python eval/eval_recursion_real.py        # honest held-out: round-0 vs round-1 → TIE (see results/)

# TRAIN: adaptive k-of-n pool — generalize to arbitrary worker subsets (swap the pool)
python train/train_adaptive_pool.py            # mock: +44% over blind, 94% of oracle
python train/train_adaptive_pool_perstep.py    # REAL per-step: random k-of-n subset masked each turn,
                                               # base 0.625 -> 1.000 (n=8, overfit caveat), PASS

# TRAIN: self-train the TRINITY coordinator from scratch (sep-CMA-ES, mock — no GPU/API)
python train/train_trinity.py             # chance -> optimal routing; PASS in seconds

# SERVE: Fugu as one model (API worker pool via litellm)
python openfugu/serve.py --slot-models "<csv>" --port 8088
curl localhost:8088/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"messages":[{"role":"user","content":"flatten a nested list in one line"}]}'

# SERVE (real end-to-end): TRAINED per-step head + REAL local worker pool, no API
python openfugu/serve.py --model <qwen3-0.6b dir> --vector model_iter_60.npy \
  --head trinity_perstep.npy --local-models "<llama dir>,<gemma dir>" --port 8088
# prove it end-to-end (boots the server, POSTs a real GSM8K question, checks the answer):
python eval/serve_e2e.py --model <qwen3-0.6b dir> --vector model_iter_60.npy \
  --head trinity_perstep.npy --local-models "<llama dir>,<gemma dir>"   # -> answer 72, PASS

# PIPELINE: train -> serve -> verify in ONE command (the head served is the head just trained)
python pipeline/e2e_train_serve.py --model <qwen3-0.6b dir> --vector model_iter_60.npy \
  --local-models "<llama dir>,<gemma dir>" --port 8097      # trains a fresh head, serves it, PASS
# serve+verify only, reusing an existing head:
python pipeline/e2e_train_serve.py --skip-train --head trinity_perstep.npy \
  --model <qwen3-0.6b dir> --local-models "<llama dir>,<gemma dir>"

# SERVE (Fugu-Ultra): the Conductor workflow-DAG executor, fully local (no API)
python openfugu/ultra.py --query "..." --local-conductor <conductor dir> \
  --local-models "<llama dir>,<deepseek dir>"          # local Conductor emits + executes a DAG
python eval/ultra_e2e.py --conductor-ckpt <conductor dir> \
  --local-models "<llama dir>,<deepseek dir>"          # asserts a parsed, executed workflow

# EVAL: does orchestration beat the best single model? (the central Fugu claim)
python eval/eval_orchestration.py        # trained coordinator +107% over best single, PASS

The mechanism in one breath

A ~0.6B backbone (Qwen3-0.6B) never answers the user. It produces one hidden state at the penultimate token; a bias-free linear head scores each worker; the top worker is dispatched and its reply is returned. ~19.5K trainable numbers (the head + singular-value-fine-tuning offsets on 9 matrices), optimized gradient-free (sep-CMA-ES). Fugu-Ultra swaps the per-turn picker for a 7B Conductor that emits a whole workflow DAG. No worker weights are ever touched — it is macro-level composition over other people's models. Full math, with an EXEC/CODE/DATA evidence grade on every claim, in docs/.

Trained Conductor weights

The Conductor we trained on ToolScale (a fine-tune of Llama-3.2-3B-Instruct) is published on HuggingFace, not in this repo (Llama 3.2 Community License applies — see NOTICE):

huggingface.co/di-zhang-fdu/openfugu-conductor-3b   (see model card)

License

Apache-2.0 for all OpenFugu code (LICENSE). Third-party material is fetched, not redistributed; trained weights carry the Llama 3.2 license. See NOTICE.

About

Open reimplementation of Sakana Fugu — the 'one model to command them all' LLM orchestrator. Read → run → train → serve.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages