FlowZero

⚠️ Work in Progress: This project is under active development. Core infrastructure is complete, but the neural network training pipeline and full Expert-Iteration loop are not yet implemented.

FlowZero

FlowZero is a search-augmented reinforcement-learning agent, inspired by AlphaZero, designed to solve Flow Free puzzles from first principles. The goal is to combine a hand-rolled Monte Carlo Graph Search with a lightweight ResNet policy-value network in an Expert-Iteration loop, while relying on established libraries only for tensor operations, logging, and continuous integration.

Project Status

✅ Implemented

Flow Free game engine with complete board representation and move validation
SAT-based puzzle solver for generating verified puzzle datasets
Monte Carlo Graph Search (MCGS) with UCB1 selection and configurable rollouts
Puzzle generation pipeline (handcrafted, synthetic, and unsolvable examples)
Configuration management via YAML
CI/CD pipeline with automated testing, linting, and formatting

🚧 In Progress

MCGS refinement and testing (currently being validated and optimized)
Gymnasium environment for RL integration

📋 Not Yet Implemented

ResNet policy-value network architecture
Expert-Iteration training loop that combines MCGS with neural network learning
Self-play data generation pipeline
Model checkpointing and evaluation infrastructure

Overview

Flow Free puzzles are cast as deterministic, episodic Markov Decision Processes (MDPs). The planned training will proceed in repeated Expert-Iteration cycles:

Planning (Expert): MCGS runs a fixed number of simulations per move, using the current ResNet's policy and value estimates.
Learning (Apprentice): A 6-block ResNet will be trained to
- imitate the graph's move distribution (cross-entropy loss), and
- predict final outcomes (mean-squared error loss).

This self-play framework will yield continual policy improvement without human-labeled data.

Repository Layout

.
├── flowzero_src/
│   ├── data/
│   │   ├── handcrafted/       # Curated puzzle definitions
│   │   ├── synthetic/         # Automatically generated puzzles
│   │   └── unsolvable/        # Negative examples (e.g. unsolvable_cross.txt)
│   ├── flowfree/
│   │   ├── game.py            # ✅ Board representation, move validation
│   │   └── solver.py          # ✅ SAT encoder & solver for data generation
│   ├── gym/
│   │   └── flowfree_gym_env.py # 🚧 Gymnasium environment wrapper
│   ├── mcgs/
│   │   └── mcgs.py            # 🚧 Monte Carlo Graph Search implementation
│   ├── util/                  # ✅ Helper functions and utilities
│   ├── generate_boards.py     # ✅ Procedural puzzle generator
│   └── train.py               # 📋 Expert-Iteration training (not yet implemented)
├── tests/                     # ✅ pytest suite with comprehensive coverage
│   ├── test_game/
│   ├── test_mcgs/
│   ├── test_utilities/
│   └── conftest.py
├── requirements.txt
├── pyproject.toml
├── config.yaml
├── LICENSE
└── .github/
    └── workflows/ci.yml       # ✅ Linting, formatting, and test automation

Installation

Create and activate a Python 3.10+ virtual environment (Python 3.13 recommended):

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Run tests to verify installation:

pytest

What Works Now

You can currently:

Play Flow Free puzzles using the game engine
Generate puzzles of various difficulties
Solve puzzles using the SAT-based solver to verify solvability
Run MCGS simulations on puzzles (experimental, under testing)
Use the Gymnasium environment for custom RL experiments

Roadmap

The following components are planned for future development:

Complete MCGS validation - Finish testing and refining the Monte Carlo Graph Search implementation
ResNet architecture - Implement the 6-block ResNet for policy and value prediction
Training pipeline - Build the Expert-Iteration loop combining MCGS and neural network training
Self-play system - Create infrastructure for generating training data through self-play
Evaluation framework - Develop metrics and benchmarks to track agent improvement
Model management - Add checkpointing, versioning, and model comparison tools

Acknowledgments & References

Expert Iteration: Anthony, Tian & Barber (2017) Link

AlphaZero: Silver et al. (2017) Link

Gymnasium: Towers et al. (2024) Link

Special Thanks: Matt Zucker, Ben Torvaney, Loki Chow, and contributors

License & Disclaimer

Apache 2.0 License. This project is not affiliated with Big Duck Games LLC or DeepMind.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.github/workflows		.github/workflows
flowzero_src		flowzero_src
scripts		scripts
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_script.bat		run_script.bat
run_script.sh		run_script.sh
run_tests.bat		run_tests.bat
run_tests.sh		run_tests.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FlowZero

Table of Contents

Project Status

✅ Implemented

🚧 In Progress

📋 Not Yet Implemented

Overview

Repository Layout

Installation

What Works Now

Roadmap

Acknowledgments & References

License & Disclaimer

About

Uh oh!

Languages

License

talkierbox/flowzero

Folders and files

Latest commit

History

Repository files navigation

FlowZero

Table of Contents

Project Status

✅ Implemented

🚧 In Progress

📋 Not Yet Implemented

Overview

Repository Layout

Installation

What Works Now

Roadmap

Acknowledgments & References

License & Disclaimer

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages