A production-grade Tetris-playing AI using Deep Q-Network (DQN) reinforcement learning, managed with UV for fast dependency management.
- Game Environment: Full Tetris game logic (no external game libraries required for simulation)
- DQN Agent: Deep Q-Network with experience replay and target network
- Epsilon-Greedy Exploration: Balances exploration and exploitation
- Fast Training: Optimized NumPy-based game logic
- Monitoring: Real-time training visualization and metrics
- Checkpointing: Save/load model weights for reproducibility
- UV Package Management: Ultra-fast Python dependency management
tetris_agent/
โโโ pyproject.toml # UV project config
โโโ src/
โ โโโ main.py # Training & evaluation loop
โ โโโ tetris.py # Game environment
โ โโโ agent.py # Not in this version (see model.py)
โ โโโ model.py # DQN architecture & agent
โ โโโ utils.py # Replay buffer & monitoring
โ โโโ config.py # Hyperparameters
โโโ models/ # Saved checkpoints
โโโ README.md
โโโ .gitignore
-
Install UV (if not already installed):
pip install uv
-
Create and setup project:
cd c:\Users\webst\OneDrive\Desktop\Tetris uv sync
Start training the agent:
uv run python src/main.py --mode trainTraining will:
- Run for 500 episodes (configurable in
config.py) - Save checkpoints every 500 episodes to
models/ - Display training statistics every 50 episodes
- Generate training progress plot at
models/training_progress.png
Evaluate a trained model:
uv run python src/main.py --mode eval --model models/tetris_agent_final.pt --eval-episodes 10-
State Representation: 213-dimensional feature vector
- Board state: 200 dimensions (20ร10 flattened)
- Current piece type: 7 dimensions (one-hot)
- Piece rotation: 4 dimensions (one-hot)
- Piece position: 2 dimensions (normalized)
-
Actions (4 total):
- 0: Move left
- 1: Move right
- 2: Rotate clockwise
- 3: Drop piece
-
Reward Function:
- +10 per line cleared
- -1 per time step
- -10 for game over
- -0.5 per piece locked
Input (213) โ FC(256) โ ReLU โ FC(256) โ ReLU โ FC(4 actions)
- Architecture: 2 hidden layers with 256 units
- Activation: ReLU
- Output: Q-values for each action
- Loss: Mean Squared Error (MSE)
- Optimizer: Adam (lr=1e-4)
- Algorithm: Deep Q-Learning with Experience Replay
- Batch Size: 32
- Replay Buffer: 100,000 transitions
- Gamma (discount): 0.99
- Epsilon Decay: 0.995 per episode
- Target Update: Every 1000 training steps
Edit src/config.py to customize:
# Training
LEARNING_RATE = 1e-4
BATCH_SIZE = 32
GAMMA = 0.99
EPSILON_START = 1.0
EPSILON_END = 0.01
EPSILON_DECAY = 0.995
# Network
HIDDEN_DIM = 256
# Episodes
NUM_EPISODES = 500
MAX_STEPS_PER_EPISODE = 10000Training generates:
-
Console Output: Real-time metrics every 50 episodes
Episode 50 | Avg Reward: 45.32 | Avg Lines: 4.21 | Avg Loss: 0.0234 -
Plot:
models/training_progress.pngwith 4 subplots:- Episode rewards + moving average
- Lines cleared per episode
- Episode length (steps)
- Training loss
Models are saved during training:
models/tetris_agent_ep500.pt- Checkpoint at episode 500models/tetris_agent_final.pt- Final trained modelmodels/tetris_agent_emergency.pt- Saved if training interrupted
Load a model:
from model import DQNAgent
agent = DQNAgent(state_dim=213, device="cuda")
agent.load("models/tetris_agent_final.pt")If you don't have a GPU, edit config.py:
DEVICE = "cpu" # Change from "cuda"Reduce batch size in config.py:
BATCH_SIZE = 16 # Reduce from 32- Reduce
NUM_EPISODESfor quick testing - Use
DEVICE = "cuda"if available - Increase
UPDATE_FREQto train less frequently
Stores transitions in a buffer and samples random batches for training, breaking correlations between consecutive samples.
Separate network updated less frequently, providing stable Q-value targets and reducing training instability.
Exploration strategy that selects random actions with probability ฮต, enabling the agent to discover new strategies.
Learns optimal action-value function Q(s,a) that estimates the expected return from taking action a in state s.
- Add Double DQN for reduced overestimation
- Implement Dueling DQN architecture
- Add prioritized experience replay
- Frame stacking for temporal awareness
- Pygame visualization during training
- Policy gradient methods (A3C, PPO)
- Rust backend for game physics (pyo3)
MIT License - Feel free to use for learning and research.
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Submit a pull request
Built with โค๏ธ for AI + Tetris enthusiasts