# üèà NFL Analytics Engine - High Accuracy GPU Training

This notebook trains the NFL trajectory prediction model with **maximum accuracy** while preventing overfitting/underfitting.

## Setup Requirements
1. **GPU Required**: Go to `Runtime ‚Üí Change runtime type ‚Üí GPU (T4 or A100 recommended)`
2. **Training Data**: Upload your `input_2023_w*.csv` files to the `train/` directory
3. **Estimated Time**: 2-4 hours on T4, 1-2 hours on A100

## 1Ô∏è‚É£ GPU Verification & Setup

In [None]:
# Verify GPU availability
import torch

if torch.cuda.is_available():
    gpu_name = torch.cuda.get_device_name(0)
    gpu_memory = torch.cuda.get_device_properties(0).total_memory / 1e9
    print(f"‚úÖ GPU Available: {gpu_name}")
    print(f"   Memory: {gpu_memory:.1f} GB")
    
    # Recommend batch size based on GPU
    if gpu_memory >= 40:  # A100
        recommended_batch = 128
    elif gpu_memory >= 15:  # T4/V100
        recommended_batch = 64
    else:
        recommended_batch = 32
    print(f"   Recommended batch size: {recommended_batch}")
else:
    print("‚ùå No GPU detected!")
    print("   Go to Runtime ‚Üí Change runtime type ‚Üí GPU")
    raise SystemExit("GPU required for training")

In [None]:
# Mount Google Drive (optional - for saving checkpoints)
from google.colab import drive
drive.mount('/content/drive')

# Clone or upload project
# Option 1: Clone from GitHub (replace with your repo)
# !git clone https://github.com/YOUR_USERNAME/NFL.git
# %cd NFL

# Option 2: Upload from Drive
# !cp -r /content/drive/MyDrive/NFL /content/NFL
# %cd /content/NFL

In [None]:
# Run setup script
!bash setup_colab.sh

## 2Ô∏è‚É£ Verify Data

In [None]:
import os
import glob

# Check for training data
data_files = glob.glob('train/input_2023_w*.csv')
print(f"Found {len(data_files)} week files:")
for f in sorted(data_files):
    size_mb = os.path.getsize(f) / 1e6
    print(f"  - {os.path.basename(f)}: {size_mb:.1f} MB")

if len(data_files) == 0:
    print("\n‚ö†Ô∏è  No data files found!")
    print("   Upload input_2023_w*.csv files to the train/ directory")

## 3Ô∏è‚É£ Quick Sanity Check (2 min)

In [None]:
# Run sanity check to verify everything works
!python train_production.py --config configs/sanity.yaml

## 4Ô∏è‚É£ High-Accuracy Training

This configuration uses:
- **Larger model**: 128 hidden dim, 6 GNN layers, 8 attention heads
- **Anti-overfitting**: Dropout 0.15, early stopping (patience 15), data augmentation
- **Anti-underfitting**: Warmup LR, 150 max epochs, multi-task loss
- **Mixed precision**: FP16 for speed and larger batch sizes

In [None]:
# Start TensorBoard for monitoring
%load_ext tensorboard
%tensorboard --logdir logs/

In [None]:
# üöÄ Start high-accuracy training
!python train_production.py --config configs/high_accuracy.yaml

## 5Ô∏è‚É£ Monitor Training Progress

### What to look for:

| Metric | Underfitting | Optimal | Overfitting |
|--------|--------------|---------|-------------|
| Train Loss | High & flat | Decreasing smoothly | Very low |
| Val Loss | High | Follows train closely | Increases |
| Train-Val Gap | Small but both high | Small and both low | Large gap |

### Expected Results (after training):
- **val_ade**: < 2.0 yards (Average Displacement Error)
- **val_fde**: < 3.5 yards (Final Displacement Error)  
- **miss_rate_2yd**: < 25% (Final position > 2 yards off)

In [None]:
# View training curves
import matplotlib.pyplot as plt
import pandas as pd
from tensorboard.backend.event_processing import event_accumulator
import glob

# Find latest TensorBoard log
log_dirs = glob.glob('logs/nfl_high_accuracy*/tensorboard/*')
if log_dirs:
    latest_log = sorted(log_dirs)[-1]
    ea = event_accumulator.EventAccumulator(latest_log)
    ea.Reload()
    
    # Plot training curves
    fig, axes = plt.subplots(2, 2, figsize=(14, 10))
    
    metrics = [
        ('train_loss', 'val_loss_traj', 'Loss'),
        ('train_vel_loss', 'val_ade', 'ADE (yards)'),
        ('val_fde', None, 'FDE (yards)'),
        ('val_miss_rate_2yd', None, 'Miss Rate @ 2yd')
    ]
    
    for ax, (train_key, val_key, title) in zip(axes.flat, metrics):
        try:
            if train_key and train_key in ea.scalars.Keys():
                data = ea.scalars.Items(train_key)
                ax.plot([x.step for x in data], [x.value for x in data], label='Train')
            if val_key and val_key in ea.scalars.Keys():
                data = ea.scalars.Items(val_key)
                ax.plot([x.step for x in data], [x.value for x in data], label='Val')
            ax.set_title(title)
            ax.set_xlabel('Epoch')
            ax.legend()
            ax.grid(True, alpha=0.3)
        except:
            pass
    
    plt.tight_layout()
    plt.show()
else:
    print("No training logs found yet. Run training first.")

## 6Ô∏è‚É£ Save Best Model to Drive

In [None]:
# Copy best checkpoint to Google Drive
import shutil

best_ckpt = glob.glob('checkpoints/*best*.ckpt')
if best_ckpt:
    dest = '/content/drive/MyDrive/NFL_Models/'
    os.makedirs(dest, exist_ok=True)
    
    for ckpt in best_ckpt:
        shutil.copy(ckpt, dest)
        print(f"‚úÖ Saved: {ckpt} ‚Üí {dest}")
else:
    print("No best checkpoint found yet.")

## 7Ô∏è‚É£ Troubleshooting

### If training is **underfitting** (high loss, poor metrics):
- Increase `hidden_dim` to 256
- Add more layers (try `num_gnn_layers: 8`)
- Decrease dropout to 0.1
- Remove label smoothing

### If training is **overfitting** (val loss increases while train decreases):
- Increase dropout to 0.2
- Increase weight_decay to 5e-4
- Reduce hidden_dim to 64
- Enable stronger augmentation

### If training is **slow**:
- Reduce batch_size if OOM errors
- Reduce num_workers to 1
- Use fewer weeks initially (weeks: [1, 2, 3])

In [None]:
# Custom training with adjusted parameters (if needed)
# Uncomment and modify as needed:

# !python train_production.py \
#     --config configs/high_accuracy.yaml \
#     --hidden-dim 256 \
#     --batch-size 32 \
#     --max-epochs 200 \
#     --learning-rate 0.0003