# Intro to W&B
This course teaches how to use W&B to track and visualize machine learning experiments. We'll cover the basics of W&B, how to log and visualize metrics, hyperparameters, and artifacts, and how to use W&B to track and compare your models over time.

In [1]:
import math
from pathlib import Path
from types import SimpleNamespace
from tqdm.auto import tqdm
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.optim import Adam
from utilities import get_dataloaders

import wandb

## Sprite Classification
We're going to build a model to classify sprites into one of five categories: `hero`, `non-hero`, `food`, `spell` and `side-facing`.
We'll use W&B to track our experiments and compare our models.
Here is an example of some of the sprites and classes:


<img src="sprite_sample.png">

In [2]:
INPUT_SIZE = 3 * 16 * 16
OUTPUT_SIZE = 5
HIDDEN_SIZE = 256
NUM_WORKERS = 2
CLASSES = ['hero', 'non-hero', 'food', 'spell', 'side-facing']
DATA_DIR = Path('./data/')
DEVICE = torch.device('mps')  # Use `mps` macOS Metal Performance Shaders (MPS) for GPU acceleration, `cuda` for CUDA GPUs, or `cpu` for CPU

In [3]:
def get_model(dropout):
    """Simple MLP with Dropout"""
    return nn.Sequential(
        nn.Flatten(),
        nn.Linear(INPUT_SIZE, HIDDEN_SIZE),
        nn.BatchNorm1d(HIDDEN_SIZE),
        nn.ReLU(),
        nn.Dropout(dropout),
        nn.Linear(HIDDEN_SIZE, OUTPUT_SIZE)
    ).to(DEVICE)

In [4]:
# Define a config object to store our hyperparameters
config = SimpleNamespace(
    epochs=2,
    batch_size=128,
    lr=1e-5,
    dropout=0.5,
    slice_size=10_000,
    valid_pct=0.2,
)

In [5]:
def train_model(config):
    """Train a model with a given config"""

    wandb.init(
        project="dlai_intro",
        config=config,
    )

    # Get the data
    train_dl, valid_dl = get_dataloaders(DATA_DIR,
                                         config.batch_size,
                                         config.slice_size,
                                         config.valid_pct)
    n_steps_per_epoch = math.ceil(len(train_dl.dataset) / config.batch_size)

    # A simple MLP model
    model = get_model(config.dropout)

    # Make the loss and optimizer
    loss_func = nn.CrossEntropyLoss()
    optimizer = Adam(model.parameters(), lr=config.lr)

    example_ct = 0

    for epoch in tqdm(range(config.epochs), total=config.epochs):
        model.train()

        for step, (images, labels) in enumerate(train_dl):
            images, labels = images.to(DEVICE), labels.to(DEVICE)

            outputs = model(images)
            train_loss = loss_func(outputs, labels)
            optimizer.zero_grad()
            train_loss.backward()
            optimizer.step()

            example_ct += len(images)
            metrics = {
                "train/train_loss": train_loss,
                "train/epoch": epoch + 1,
                "train/example_ct": example_ct
            }
            wandb.log(metrics)

        # Compute validation metrics, log images on last epoch
        val_loss, accuracy = validate_model(model, valid_dl, loss_func)
        # Compute train and validation metrics
        val_metrics = {
            "val/val_loss": val_loss,
            "val/val_accuracy": accuracy
        }
        wandb.log(val_metrics)

    wandb.finish()

In [6]:
def validate_model(model, valid_dl, loss_func):
    """Compute the performance of the model on the validation dataset"""
    model.eval()
    val_loss = 0.0
    correct = 0

    with torch.inference_mode():
        for i, (images, labels) in enumerate(valid_dl):
            images, labels = images.to(DEVICE), labels.to(DEVICE)

            # Forward pass
            outputs = model(images)
            val_loss += loss_func(outputs, labels) * labels.size(0)

            # Compute accuracy and accumulate
            _, predicted = torch.max(outputs.data, 1)
            correct += (predicted == labels).sum().item()

    return val_loss / len(valid_dl.dataset), correct / len(valid_dl.dataset)


In [7]:
train_model(config)

[34m[1mwandb[0m: Currently logged in as: [33mthatgardnerone[0m. Use [1m`wandb login --relogin`[0m to force relogin


  0%|          | 0/2 [00:00<?, ?it/s]

VBox(children=(Label(value='0.007 MB of 0.007 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
train/epoch,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁████████████████████
train/example_ct,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
train/train_loss,███▇▇▆▆▇▅▆▅▅▅▆▆▆▄▄▄▅▅▄▄▄▄▄▃▃▃▃▃▃▃▃▁▂▂▂▃▂
val/val_accuracy,▁█
val/val_loss,█▁

0,1
train/epoch,2.0
train/example_ct,16000.0
train/train_loss,1.23573
val/val_accuracy,0.666
val/val_loss,1.20146


In [8]:
# So let's change the learning rate to a 1e-3 
# and see how this affects our results.
config.lr = 1e-4
train_model(config)

VBox(children=(Label(value='Waiting for wandb.init()...\r'), FloatProgress(value=0.011166967133289695, max=1.0…

  0%|          | 0/2 [00:00<?, ?it/s]

VBox(children=(Label(value='0.007 MB of 0.007 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
train/epoch,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁████████████████████
train/example_ct,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
train/train_loss,██▇▇▆▅▅▄▄▄▄▄▄▄▃▃▃▂▃▂▂▂▂▂▂▂▂▂▂▁▁▂▁▁▁▂▁▁▁▁
val/val_accuracy,▁█
val/val_loss,█▁

0,1
train/epoch,2.0
train/example_ct,16000.0
train/train_loss,0.37025
val/val_accuracy,0.9415
val/val_loss,0.33149


In [9]:
config.lr = 1e-4
train_model(config)

VBox(children=(Label(value='Waiting for wandb.init()...\r'), FloatProgress(value=0.011125712500021716, max=1.0…

  0%|          | 0/2 [00:00<?, ?it/s]

VBox(children=(Label(value='0.001 MB of 0.006 MB uploaded\r'), FloatProgress(value=0.15489210759680758, max=1.…

0,1
train/epoch,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁████████████████████
train/example_ct,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
train/train_loss,█▇▇▆▆▅▅▅▄▄▃▄▄▃▂▃▃▃▂▃▂▂▂▂▂▁▂▂▂▁▁▂▁▁▂▁▂▁▁▁
val/val_accuracy,▁█
val/val_loss,█▁

0,1
train/epoch,2.0
train/example_ct,16000.0
train/train_loss,0.35855
val/val_accuracy,0.944
val/val_loss,0.34253


In [10]:
config.dropout = 0.1
config.epochs = 1
train_model(config)

VBox(children=(Label(value='Waiting for wandb.init()...\r'), FloatProgress(value=0.011167267133310411, max=1.0…

  0%|          | 0/1 [00:00<?, ?it/s]

VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
train/epoch,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
train/example_ct,▁▁▁▁▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇████
train/train_loss,███▇▇▇▆▆▆▅▅▅▄▄▄▄▄▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▁▁▁
val/val_accuracy,▁
val/val_loss,▁

0,1
train/epoch,1.0
train/example_ct,8000.0
train/train_loss,0.55202
val/val_accuracy,0.9155
val/val_loss,0.49286


In [11]:
config.lr = 1e-3
train_model(config)

VBox(children=(Label(value='Waiting for wandb.init()...\r'), FloatProgress(value=0.011167293522238551, max=1.0…

  0%|          | 0/1 [00:00<?, ?it/s]

VBox(children=(Label(value='0.001 MB of 0.007 MB uploaded\r'), FloatProgress(value=0.1492877492877493, max=1.0…

0,1
train/epoch,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
train/example_ct,▁▁▁▁▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇████
train/train_loss,█▆▅▄▃▃▃▂▃▂▂▂▂▂▂▂▂▂▂▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
val/val_accuracy,▁
val/val_loss,▁

0,1
train/epoch,1.0
train/example_ct,8000.0
train/train_loss,0.09373
val/val_accuracy,0.9845
val/val_loss,0.0695
