⚠️ Work in Progress: This project is under active development. Core infrastructure is complete, but the neural network training pipeline and full Expert-Iteration loop are not yet implemented.
FlowZero is a search-augmented reinforcement-learning agent, inspired by AlphaZero, designed to solve Flow Free puzzles from first principles. The goal is to combine a hand-rolled Monte Carlo Graph Search with a lightweight ResNet policy-value network in an Expert-Iteration loop, while relying on established libraries only for tensor operations, logging, and continuous integration.
- Flow Free game engine with complete board representation and move validation
- SAT-based puzzle solver for generating verified puzzle datasets
- Monte Carlo Graph Search (MCGS) with UCB1 selection and configurable rollouts
- Puzzle generation pipeline (handcrafted, synthetic, and unsolvable examples)
- Configuration management via YAML
- CI/CD pipeline with automated testing, linting, and formatting
- MCGS refinement and testing (currently being validated and optimized)
- Gymnasium environment for RL integration
- ResNet policy-value network architecture
- Expert-Iteration training loop that combines MCGS with neural network learning
- Self-play data generation pipeline
- Model checkpointing and evaluation infrastructure
Flow Free puzzles are cast as deterministic, episodic Markov Decision Processes (MDPs). The planned training will proceed in repeated Expert-Iteration cycles:
- Planning (Expert): MCGS runs a fixed number of simulations per move, using the current ResNet's policy and value estimates.
- Learning (Apprentice):
A 6-block ResNet will be trained to
- imitate the graph's move distribution (cross-entropy loss), and
- predict final outcomes (mean-squared error loss).
This self-play framework will yield continual policy improvement without human-labeled data.
.
├── flowzero_src/
│ ├── data/
│ │ ├── handcrafted/ # Curated puzzle definitions
│ │ ├── synthetic/ # Automatically generated puzzles
│ │ └── unsolvable/ # Negative examples (e.g. unsolvable_cross.txt)
│ ├── flowfree/
│ │ ├── game.py # ✅ Board representation, move validation
│ │ └── solver.py # ✅ SAT encoder & solver for data generation
│ ├── gym/
│ │ └── flowfree_gym_env.py # 🚧 Gymnasium environment wrapper
│ ├── mcgs/
│ │ └── mcgs.py # 🚧 Monte Carlo Graph Search implementation
│ ├── util/ # ✅ Helper functions and utilities
│ ├── generate_boards.py # ✅ Procedural puzzle generator
│ └── train.py # 📋 Expert-Iteration training (not yet implemented)
├── tests/ # ✅ pytest suite with comprehensive coverage
│ ├── test_game/
│ ├── test_mcgs/
│ ├── test_utilities/
│ └── conftest.py
├── requirements.txt
├── pyproject.toml
├── config.yaml
├── LICENSE
└── .github/
└── workflows/ci.yml # ✅ Linting, formatting, and test automation
Create and activate a Python 3.10+ virtual environment (Python 3.13 recommended):
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txtRun tests to verify installation:
pytestYou can currently:
- Play Flow Free puzzles using the game engine
- Generate puzzles of various difficulties
- Solve puzzles using the SAT-based solver to verify solvability
- Run MCGS simulations on puzzles (experimental, under testing)
- Use the Gymnasium environment for custom RL experiments
The following components are planned for future development:
- Complete MCGS validation - Finish testing and refining the Monte Carlo Graph Search implementation
- ResNet architecture - Implement the 6-block ResNet for policy and value prediction
- Training pipeline - Build the Expert-Iteration loop combining MCGS and neural network training
- Self-play system - Create infrastructure for generating training data through self-play
- Evaluation framework - Develop metrics and benchmarks to track agent improvement
- Model management - Add checkpointing, versioning, and model comparison tools
Expert Iteration: Anthony, Tian & Barber (2017) Link
AlphaZero: Silver et al. (2017) Link
Gymnasium: Towers et al. (2024) Link
Special Thanks: Matt Zucker, Ben Torvaney, Loki Chow, and contributors
Apache 2.0 License. This project is not affiliated with Big Duck Games LLC or DeepMind.