pokeloop

How AI learns to play Pokémon GO on AI sandboxes.

A population of LLM-driven agents evolves via genetic algorithms in parallel islo.dev sandboxes, learning to earn its first Pokémon badge in 8 generations.

🌐 Read the page · 📺 Watch the movie · 🧬 Run it

TL;DR

Pokémon GO can't run honestly from a Linux sandbox in 2026 — Play Integrity hardware attestation, arm64-only APKs, and a $5M Niantic injunction make the literal version a ban-on-first-frame botnet.

So we built the next thing: a population of 8 LLM agents that evolve via genetic algorithms in parallel forkable VMs. The fitness signal comes from RAM-derived rewards in Pokémon Crystal on PyBoy. The "GO feel" is a HUD overlay (Pokédex pops, catch animations). The substrate — islo snapshot save → islo use --snapshot → islo logs --type agent — is borrowed directly from meta-harness on islo.

The snapshot tree is the search tree.

Results

metric	G1	G8
best fitness	+1.5	+17.0
mean fitness	0.0	+12.0
worst fitness	−1.5	+6.0
badges earned (best)	0	1 (Falkner)
Pokédex seen (best)	0	8

How it works

for gen in 1..8:
    pop = [sandbox_from(snapshot_base, prompt_i) for i in 1..8]   # parallel fork
    fits = parallel_rollout(pop, horizon=H)
    elites = top_k(pop, fits, k=2)                                # tournament
    children = [LLM.crossover(*sample_pair(elites)) for _ in 6]   # textual crossover
    children = [LLM.mutate(c) if random() < .5 else c for c in children]
    pop = elites + children
    snapshot_base = best_individual.snapshot                      # advance the gym

Each individual is a Claude system prompt; each generation runs in 8 islo sandboxes in parallel; fitness is a RAM-derived dense signal (badges, pokedex, map progress).

The "evolution" is textual — natural-language gradients on prompts, not weight updates. It's the Promptbreeder / TextGrad / Reflexion family, with parallel forkable sandboxes underneath instead of a single trajectory.

It's a multi-agent system in the population sense: 8 agents per generation, each with its own policy, never communicating during a rollout — only via the genetic information channel between generations.

Repo layout

pokeloop/
├── docs/                 ← GitHub Pages site
│   ├── index.html
│   ├── style.css
│   └── assets/
│       ├── movie.mp4
│       ├── movie.gif
│       └── screenshot.png
├── env_worker.py         ← PyBoy HTTP gym (save/load/screen/state)
├── policy.py             ← Claude tool-use action policy
├── trainer.py            ← textual DPO over preference pairs (DPO version)
├── reward.py             ← RAM-derived dense reward
├── orchestrator.py       ← real-run loop (DPO version)
├── mock_orchestrator.py  ← deterministic mock for the DPO movie
├── mock_ga.py            ← deterministic mock for the GA movie
├── frames.py             ← procedural Crystal-ish PIL frame generator
├── viewer/               ← single-policy DPO viewer
├── viewer_ga/            ← population/generation GA viewer
├── record.py             ← Playwright recorder
├── policies/v0.txt       ← seed system prompt
├── prompt.md             ← the one-shot islo build prompt
└── scripts/
    ├── make_movie.sh     ← build the DPO movie
    ├── make_ga_movie.sh  ← build the GA movie
    ├── run_local.sh      ← real run on a local Mac
    └── run_islo.sh       ← real run inside an islo sandbox

Run it

Just make the movie (no ROM, no API key, ~3 minutes)

git clone https://github.com/zozo123/pokeloop
cd pokeloop
SECONDS_RUN=230 bash scripts/make_ga_movie.sh
open movie_ga/pokeloop-ga.mp4

Real run on islo.dev (bring your own Crystal ROM)

export ANTHROPIC_API_KEY=sk-ant-...
cp /your/legal/copy/crystal.gbc roms/crystal.gbc

islo use pokeloop --image python:3.12-slim --source github://zozo123/pokeloop
islo use pokeloop -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY -- bash scripts/run_islo.sh
islo share pokeloop 8080
# → https://<id>.share.islo.dev — your live demo URL

Real run locally (macOS / Linux)

bash scripts/run_local.sh
open http://localhost:8080

The 9-minute build prompt

The whole rig is small enough to materialize from a single prompt — see prompt.md for the Captain-Claw-shape one-shot.

Inspiration

Meta-harness on islo — the snapshot → use → logs pattern this work copies. Pokeloop is meta-harness applied to RL post-training.
Karpathy's agentic autoresearch — LLMs that propose, run, read, update, in sandboxes. The GA loop is one realization.
Claude Plays Pokémon & Gemini Plays Pokémon — single-agent, no learning. This is the multi-agent post-training version.

Caveats

Bring your own ROM. We never ship one.
Anthropic API calls dominate latency (~1–2 actions/sec).
The mock movie is a deterministic playback — same viewer code, scripted events, same shape as a real run. Swap mock_ga.py for the live orchestrator.py for genuine learning.
"Pokémon GO" in the title is a frame, not a game. We don't connect to Niantic servers and we don't want to.

License

MIT — see LICENSE.

No Niantic accounts were created or harmed in the making of this demo.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pokeloop

TL;DR

Results

How it works

Repo layout

Run it

Just make the movie (no ROM, no API key, ~3 minutes)

Real run on islo.dev (bring your own Crystal ROM)

Real run locally (macOS / Linux)

The 9-minute build prompt

Inspiration

Caveats

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
docs		docs
policies		policies
scripts		scripts
viewer		viewer
viewer_ga		viewer_ga
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
env_worker.py		env_worker.py
frames.py		frames.py
islo.yaml		islo.yaml
mock_ga.py		mock_ga.py
mock_orchestrator.py		mock_orchestrator.py
orchestrator.py		orchestrator.py
orchestrator_real.py		orchestrator_real.py
policy.py		policy.py
prompt.md		prompt.md
pyproject.toml		pyproject.toml
record.py		record.py
reward.py		reward.py
trainer.py		trainer.py
worker.py		worker.py

Folders and files

Latest commit

History

Repository files navigation

pokeloop

TL;DR

Results

How it works

Repo layout

Run it

Just make the movie (no ROM, no API key, ~3 minutes)

Real run on islo.dev (bring your own Crystal ROM)

Real run locally (macOS / Linux)

The 9-minute build prompt

Inspiration

Caveats

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages