# STAC-RL Training on Google Colab GPU

This notebook sets up and runs the STAC-RL chess training with CUDA acceleration.

**Before starting:**
1. Go to Runtime > Change runtime type > Select GPU (T4 or better)
2. Update the `REPO_URL` below with your GitHub repository URL

In [None]:
# Configuration
REPO_URL = "https://github.com/YOUR_USERNAME/YOUR_REPO.git"  # UPDATE THIS!
BRANCH = "main"

## 1. Check GPU Availability

In [None]:
!nvidia-smi
!nvcc --version

## 2. Clone Repository

In [None]:
import os

# Clone the repo
if os.path.exists('stac-rl'):
    print("Repository already cloned, pulling latest changes...")
    !cd stac-rl && git pull
else:
    print("Cloning repository...")
    !git clone {REPO_URL} stac-rl

%cd stac-rl

## 3. Install Dependencies

In [None]:
# Install CMake if needed (Colab usually has it)
!apt-get update -qq
!apt-get install -y cmake build-essential

# Install Python dependencies (for data generation)
!pip install -q python-chess numpy torch

## 4. Compile C++ Code with CUDA

In [None]:
# Create build directory
!mkdir -p build
%cd build

# Configure with CMake (CUDA enabled)
!cmake .. \
    -DCMAKE_BUILD_TYPE=Release \
    -DUSE_CUDA=ON \
    -DCMAKE_CUDA_ARCHITECTURES=75

# Build (use all available cores)
!make -j$(nproc)

%cd ..

## 5. Generate Training Data (Optional)

Generate chess game data using Python (faster for data generation):

In [None]:
# Generate 1000 games for training
!mkdir -p data
!python3 scripts/self_play.py --num-games 1000 --output data/games.npz --max-moves 200

## 6. Run Training with GPU

In [None]:
# Option A: C++ Training (if implemented)
!./build/stac_train

### OR Train with PyTorch (if C++ not ready):

In [None]:
# Option B: PyTorch Training
!python3 scripts/train_pytorch.py \
    --data data/games.npz \
    --device cuda \
    --d-model 512 \
    --n-layers 8 \
    --n-heads 8 \
    --epochs 20 \
    --batch-size 256 \
    --lr 1e-4

## 7. Monitor GPU Usage (Optional)

In [None]:
# Run this in a separate cell while training
!watch -n 1 nvidia-smi

## 8. Download Trained Model

In [None]:
from google.colab import files

# Download the final model
if os.path.exists('checkpoints/model_final.pt'):
    files.download('checkpoints/model_final.pt')
elif os.path.exists('models/final_model.bin'):
    files.download('models/final_model.bin')
else:
    print("No trained model found!")

## Troubleshooting

**If compilation fails:**
- Check CUDA architecture: `!nvidia-smi --query-gpu=compute_cap --format=csv`
- Update CMAKE_CUDA_ARCHITECTURES accordingly (e.g., 70 for V100, 75 for T4, 80 for A100)

**If out of memory:**
- Reduce batch size
- Reduce model dimensions (d-model, n-layers)

**If training is slow:**
- Ensure GPU runtime is selected
- Check GPU utilization with `nvidia-smi`