🎮 Tetris AI Agent with Deep Q-Learning

A production-grade Tetris-playing AI using Deep Q-Network (DQN) reinforcement learning, managed with UV for fast dependency management.

🎯 Features

Game Environment: Full Tetris game logic (no external game libraries required for simulation)
DQN Agent: Deep Q-Network with experience replay and target network
Epsilon-Greedy Exploration: Balances exploration and exploitation
Fast Training: Optimized NumPy-based game logic
Monitoring: Real-time training visualization and metrics
Checkpointing: Save/load model weights for reproducibility
UV Package Management: Ultra-fast Python dependency management

📋 Project Structure

tetris_agent/
├── pyproject.toml         # UV project config
├── src/
│   ├── main.py            # Training & evaluation loop
│   ├── tetris.py          # Game environment
│   ├── agent.py           # Not in this version (see model.py)
│   ├── model.py           # DQN architecture & agent
│   ├── utils.py           # Replay buffer & monitoring
│   └── config.py          # Hyperparameters
├── models/                # Saved checkpoints
├── README.md
└── .gitignore

🚀 Quick Start

Installation

Install UV (if not already installed):
```
pip install uv
```

Create and setup project:

cd c:\Users\webst\OneDrive\Desktop\Tetris
uv sync

Training

Start training the agent:

uv run python src/main.py --mode train

Training will:

Run for 500 episodes (configurable in config.py)
Save checkpoints every 500 episodes to models/
Display training statistics every 50 episodes
Generate training progress plot at models/training_progress.png

Evaluation

Evaluate a trained model:

uv run python src/main.py --mode eval --model models/tetris_agent_final.pt --eval-episodes 10

🧠 Architecture

Game Environment (`tetris.py`)

State Representation: 213-dimensional feature vector
- Board state: 200 dimensions (20×10 flattened)
- Current piece type: 7 dimensions (one-hot)
- Piece rotation: 4 dimensions (one-hot)
- Piece position: 2 dimensions (normalized)
Actions (4 total):
- 0: Move left
- 1: Move right
- 2: Rotate clockwise
- 3: Drop piece
Reward Function:
- +10 per line cleared
- -1 per time step
- -10 for game over
- -0.5 per piece locked

Neural Network (`model.py`)

Input (213) → FC(256) → ReLU → FC(256) → ReLU → FC(4 actions)

Architecture: 2 hidden layers with 256 units
Activation: ReLU
Output: Q-values for each action
Loss: Mean Squared Error (MSE)
Optimizer: Adam (lr=1e-4)

Training Algorithm

Algorithm: Deep Q-Learning with Experience Replay
Batch Size: 32
Replay Buffer: 100,000 transitions
Gamma (discount): 0.99
Epsilon Decay: 0.995 per episode
Target Update: Every 1000 training steps

📊 Hyperparameters

Edit src/config.py to customize:

# Training
LEARNING_RATE = 1e-4
BATCH_SIZE = 32
GAMMA = 0.99
EPSILON_START = 1.0
EPSILON_END = 0.01
EPSILON_DECAY = 0.995

# Network
HIDDEN_DIM = 256

# Episodes
NUM_EPISODES = 500
MAX_STEPS_PER_EPISODE = 10000

📈 Monitoring

Training generates:

Console Output: Real-time metrics every 50 episodes

Episode 50 | Avg Reward: 45.32 | Avg Lines: 4.21 | Avg Loss: 0.0234

Plot: models/training_progress.png with 4 subplots:
- Episode rewards + moving average
- Lines cleared per episode
- Episode length (steps)
- Training loss

💾 Model Checkpoints

Models are saved during training:

models/tetris_agent_ep500.pt - Checkpoint at episode 500
models/tetris_agent_final.pt - Final trained model
models/tetris_agent_emergency.pt - Saved if training interrupted

Load a model:

from model import DQNAgent
agent = DQNAgent(state_dim=213, device="cuda")
agent.load("models/tetris_agent_final.pt")

🔧 Troubleshooting

CUDA Issues

If you don't have a GPU, edit config.py:

DEVICE = "cpu"  # Change from "cuda"

Out of Memory

Reduce batch size in config.py:

BATCH_SIZE = 16  # Reduce from 32

Training Too Slow

Reduce NUM_EPISODES for quick testing
Use DEVICE = "cuda" if available
Increase UPDATE_FREQ to train less frequently

📚 Key Concepts

Experience Replay

Stores transitions in a buffer and samples random batches for training, breaking correlations between consecutive samples.

Target Network

Separate network updated less frequently, providing stable Q-value targets and reducing training instability.

Epsilon-Greedy

Exploration strategy that selects random actions with probability ε, enabling the agent to discover new strategies.

Q-Learning

Learns optimal action-value function Q(s,a) that estimates the expected return from taking action a in state s.

🎓 Learning Resources

📝 Future Enhancements

Add Double DQN for reduced overestimation
Implement Dueling DQN architecture
Add prioritized experience replay
Frame stacking for temporal awareness
Pygame visualization during training
Policy gradient methods (A3C, PPO)
Rust backend for game physics (pyo3)

📄 License

MIT License - Feel free to use for learning and research.

🤝 Contributing

Contributions welcome! Please:

Fork the repository
Create a feature branch
Submit a pull request

Built with ❤️ for AI + Tetris enthusiasts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎮 Tetris AI Agent with Deep Q-Learning

🎯 Features

📋 Project Structure

🚀 Quick Start

Installation

Training

Evaluation

🧠 Architecture

Game Environment (`tetris.py`)

Neural Network (`model.py`)

Training Algorithm

📊 Hyperparameters

📈 Monitoring

💾 Model Checkpoints

🔧 Troubleshooting

CUDA Issues

Out of Memory

Training Too Slow

📚 Key Concepts

Experience Replay

Target Network

Epsilon-Greedy

Q-Learning

🎓 Learning Resources

📝 Future Enhancements

📄 License

🤝 Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

🎮 Tetris AI Agent with Deep Q-Learning

🎯 Features

📋 Project Structure

🚀 Quick Start

Installation

Training

Evaluation

🧠 Architecture

Game Environment (tetris.py)

Neural Network (model.py)

Training Algorithm

📊 Hyperparameters

📈 Monitoring

💾 Model Checkpoints

🔧 Troubleshooting

CUDA Issues

Out of Memory

Training Too Slow

📚 Key Concepts

Experience Replay

Target Network

Epsilon-Greedy

Q-Learning

🎓 Learning Resources

📝 Future Enhancements

📄 License

🤝 Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Game Environment (`tetris.py`)

Neural Network (`model.py`)

Packages