<a href="https://colab.research.google.com/github/willmmoses/wandb-interview/blob/main/W_and_B_Code_Interview.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Debug this Colab!

This colab represents a simple ML pipline, loading data, defining a model and fitting the model to the data. It has also been instrumented with Weights and Biases logging tools.

At Weights and Biases, we often help our users debug their pipelines -- both the ML code and the logging code from `wandb` integrated into it.

Your task is to debug this simple pipeline such that the model is able to learn and <u>perform reasonably well</u> (hint: Sweeps) on the given task, without changing the general structure of the model. As you do so, use comments and markdown cells to explain a bit about your process.

In [2]:
!pip install wandb

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting wandb
  Downloading wandb-0.12.21-py2.py3-none-any.whl (1.8 MB)
[K     |████████████████████████████████| 1.8 MB 5.1 MB/s 
Collecting pathtools
  Downloading pathtools-0.1.2.tar.gz (11 kB)
Collecting GitPython>=1.0.0
  Downloading GitPython-3.1.27-py3-none-any.whl (181 kB)
[K     |████████████████████████████████| 181 kB 41.4 MB/s 
Collecting docker-pycreds>=0.4.0
  Downloading docker_pycreds-0.4.0-py2.py3-none-any.whl (9.0 kB)
Collecting sentry-sdk>=1.0.0
  Downloading sentry_sdk-1.9.0-py2.py3-none-any.whl (156 kB)
[K     |████████████████████████████████| 156 kB 54.9 MB/s 
[?25hCollecting setproctitle
  Downloading setproctitle-1.3.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (30 kB)
Collecting shortuuid>=0.5.0
  Downloading shortuuid-1.0.9-py3-none-any.whl (9.4 kB)
Collecting gitdb<5,>=4.0.1
  Downloading gitdb-4.0.9-

In [3]:
import torch
import torch.nn as nn
import torch.nn.functional as F

from torch.utils.data import DataLoader

import torchvision
from torchvision import transforms

import wandb

# Data Preprocessing

In this section, I ran the code, saw where an error occured, resolved the error, and repeated this until the errors were resolved.

In [4]:
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

batch_size = 32

cifar10 = torchvision.datasets.CIFAR10(root='./data', download=True, transform=torchvision.transforms.ToTensor())
pivot = 40000
cifar10 = sorted(cifar10, key=lambda x: x[1])
train_set = torch.utils.data.Subset(cifar10, range(pivot))
val_set = torch.utils.data.Subset(cifar10, range(pivot, len(cifar10)))
train_loader = DataLoader(train_set, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_set, batch_size=batch_size, shuffle=True)

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz


  0%|          | 0/170498071 [00:00<?, ?it/s]

Extracting ./data/cifar-10-python.tar.gz to ./data


In [5]:
class Network(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        # self.pool = nn.MaxPooling2D(2, 2) # A simple mistype in the function name
        self.pool = nn.MaxPool2d(2,2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        # self.fc1 = nn.Linear(600, 120) # Size mismatch with data, changed it to fit
        # self.fc2 = nn.Linear(120, 2) # Changed size to match input from new fc1, and still work with fc3
        self.fc1 = nn.Linear(400, 32)
        self.fc2 = nn.Linear(32, 2)
        self.fc3 = nn.Linear(2, 10)
    
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        # x = torch.Flatten(x, 1) # A simple mistype in the function name
        x = torch.flatten(x,1)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

model = Network()

In [6]:
model_criterion = nn.CrossEntropyLoss()
model_optimizer = torch.optim.SGD(model.parameters(), lr=1e3, momentum=0.9)

# Training and Validation

In this part, you will also need to additionally calculate training and validation accuracy and log it to Weights and Biases.

Same as above. Ran the code, saw the error, fixed the error. 

Note: You can either run this, or the sweeps code, but not both in the same run due both initializing a WandB environment

In [None]:
with wandb.init(project = 'Tier-1-Test', save_code=True) as run:
    for epoch in range(5):
        current_loss = 0

        model.train()

        for i, data in enumerate(train_loader):
            images, labels = data
            outputs = model(images)
            loss = model_criterion(outputs, labels)

            loss.backward()
            model_optimizer.step()

            current_loss += loss
        # run.log('train_loss', current_loss / (i + 1)) # Incorrect input format. Changed input from multiple items to dictionary
        run.log({'train_loss': current_loss / (i + 1)})
        
        model.eval()

        current_loss = 0

        for i, data in enumerate(val_loader):
            images, labels = data
            outputs = model(images)

            loss = model_criterion(outputs, labels)

            current_loss += loss
        # run.log('val_loss', current_loss / (i + 1)) # Incorrect input format. Changed input from multiple items to dictionary
        run.log({'val_loss': current_loss / (i + 1)})

# Sweep Implementation

In this part, I rework the above code to use sweeps for optimization.

In [22]:
# Implementing sweeps

sweep_config = {
    'method' : 'random'
}

metric = {
    'name': 'val_loss',
    'goal': 'minimize'
}

sweep_config['metric'] = metric

parameters =  {
    'fc_1_2_layer_size':{
        'values':[16,32,64]
    },
    'fc_2_3_layer_size':{
        'values':[1,2,4]
    },
    'optimizer': {
        'values':['adam', 'sgd']
    },
    'learning_rate': {
        'values':[0.1, 0.01, 0.001, 0.0]
    },
    'epochs': {
        'values':[1,5,10,20]
    },
    'batch_size': {
        'values':[16,32,64]
    },
}

sweep_config['parameters'] = parameters

sweep_id = wandb.sweep(sweep_config, project='Tier-1-Test')

Create sweep with ID: ewv9b0v0
Sweep URL: https://wandb.ai/ybfomsgplyvbzfhilj/Tier-1-Test/sweeps/ewv9b0v0


In [23]:
# Rewriting given training code to use w/ sweeps
def build_network(fc_1_2_layer_size, fc_2_3_layer_size):
  """
  Builds a neural network based on the Network class defined above and sets the fc layers.
  :param fc_1_2_layer_size: the output size of fc1 and input size of fc2
  :param fc_2_3_layer_size: the output size of fc2 and input size of fc3
  :return: a configured network
  """
  network = Network()
  network.fc1 = nn.Linear(400, fc_1_2_layer_size)
  network.fc2 = nn.Linear(fc_1_2_layer_size, fc_2_3_layer_size)
  network.fc3 = nn.Linear(fc_2_3_layer_size, 10)
  return network


def build_optimizer(network, optimizer, learning_rate):
  """
  Builds an optimizer for the given network based on the type of optimizer given
  (adam or sgd), and sets the learning rate appropriately.
  :param network: the neural network the optimizer is to be built for
  :param optimizer: the type of optimizer to be built (adam or sgd)
  :param learning_rate: the learning rate for the optimizer
  :return: the configured optimizer
  """
  if optimizer == "adam":
    return torch.optim.Adam(network.parameters(), lr=learning_rate)
  elif optimizer == "sgd":
    return torch.optim.SGD(network.parameters(), lr=learning_rate, momentum=0.9)
  else:
    return


def train_epoch(network, loader, optimizer, criterion):
  """
  Trains the given network for an epoch. Returns the average training loss.
  :param network: the neural network to be trained
  :param loader: the data loader used to feed the network
  :param optimizer: the optimizer to be used on the network
  :param criterion: the criterion to be used on the network
  :return: the average loss of the batches.
  """
  cum_loss = 0
  network.train()
  for i, data in enumerate(loader):
    images, labels = data
    outputs = model(images)
    loss = criterion(outputs, labels)
    cum_loss += loss
    loss.backward()
    optimizer.step()
    wandb.log({"train_batch_loss": loss.item()})
  return cum_loss / (i+1)


def val_epoch(network, loader, optimizer, criterion):
  """
  Validates the given model. Returns the average validation loss.
  :param network: the neural network to be validated
  :param loader: the data loader used to feed the network
  :param optimizer: the optimizer to be used on the network
  :param criterion: the criterion to be used on the network
  :return: the average loss of the batches.
  """
  val_cum_loss = 0
  network.eval()
  for i, data in enumerate(val_loader):
      images, labels = data
      outputs = model(images)

      loss = criterion(outputs, labels)

      val_cum_loss += loss
      wandb.log({"val_batch_loss": loss.item()})
  return val_cum_loss / (i+1)


def train(config=None):
  """
  Handles the sweep functionality. Assumes there's a valid WandB sweep config
  named "config"
  :return: none
  """
  with wandb.init(config=config):
    config = wandb.config
    train_loader = DataLoader(train_set, batch_size=config.batch_size, shuffle=True)
    val_loader = DataLoader(val_set, batch_size=config.batch_size, shuffle=True)
    network = build_network(config.fc_1_2_layer_size, config.fc_2_3_layer_size)
    optimizer = build_optimizer(network, config.optimizer, config.learning_rate)
    criterion = nn.CrossEntropyLoss()

    for epoch in range(config.epochs):
      avg_train_loss = train_epoch(network, train_loader, optimizer, criterion)
      avg_val_loss = val_epoch(network, val_loader, optimizer, criterion)
      wandb.log({"train_loss": avg_train_loss, "val_loss": avg_val_loss, "epoch" : epoch})



In [None]:
wandb.agent(sweep_id, train)

[34m[1mwandb[0m: Agent Starting Run: xpjzq1mk with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 20
[34m[1mwandb[0m: 	fc_1_2_layer_size: 16
[34m[1mwandb[0m: 	fc_2_3_layer_size: 1
[34m[1mwandb[0m: 	learning_rate: 0
[34m[1mwandb[0m: 	optimizer: adam


VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
epoch,▁▁▂▂▂▃▃▄▄▄▅▅▅▆▆▇▇▇██
train_batch_loss,▆▄▆▃▄▅▇▅▂▄▄▄▁▅▅▅▄▄▅▇█▄▃▄█▂▃▅▅▆▄▅▆▇▃▄▄▃▆▅
train_loss,▆▄▇▇▃▄▄▂▄▅▅▇▂█▇▇▇▃▃▁
val_batch_loss,▇▆▂▆▃▅▅▄▃▇▆▅▅▆▆▃▅▄▅▇▁▅▄▄▅▃▇▄▅▇▄▅▄▄▇▃▆▂▅█
val_loss,▁▅▇▄▆▆▅▃▇█▄▇▅▄▄▄▄▃▅▃

0,1
epoch,19.0
train_batch_loss,2.44929
train_loss,2.37355
val_batch_loss,2.29246
val_loss,2.32589


[34m[1mwandb[0m: Agent Starting Run: dvbiapnl with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 20
[34m[1mwandb[0m: 	fc_1_2_layer_size: 64
[34m[1mwandb[0m: 	fc_2_3_layer_size: 1
[34m[1mwandb[0m: 	learning_rate: 0.01
[34m[1mwandb[0m: 	optimizer: sgd


VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
epoch,▁▁▂▂▂▃▃▄▄▄▅▅▅▆▆▇▇▇██
train_batch_loss,▆▄▁█▂▅▇▄▆▅▃▂▇▄▄▄▇▄▄▃▄▆▃▆▄▇▆▆▆▅▅▆▄▇▅▇▅▄▇▆
train_loss,▄▂▂▆▅▅▅▅▄█▁▅▃▅▃▇▇▃▃█
val_batch_loss,▇▄▄▇▅▁▇▄▅▄▅▅▅▅▄▇▂▅▅▅▁▇▇▇▄▃▅▄▅▅█▃▃▆▂▂▆▄▄▅
val_loss,▅▅█▄▄▆▃▄▄▅▂▁▃▄▄▅▅▄▆▇

0,1
epoch,19.0
train_batch_loss,2.39267
train_loss,2.37356
val_batch_loss,2.40815
val_loss,2.32607


[34m[1mwandb[0m: Agent Starting Run: cwkh5o3x with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 10
[34m[1mwandb[0m: 	fc_1_2_layer_size: 32
[34m[1mwandb[0m: 	fc_2_3_layer_size: 4
[34m[1mwandb[0m: 	learning_rate: 0.1
[34m[1mwandb[0m: 	optimizer: sgd


VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
epoch,▁▂▃▃▄▅▆▆▇█
train_batch_loss,▃▅▄▆█▆▆▁▆▄▆▆▆█▆▄▅▄▆▄▃▄▃▆▅▄▇▆▄█▂▂▄▂▆▅▂▅▂▃
train_loss,▆▄▁▅█▅▄▅▇▁
val_batch_loss,▄▁▃▇▂▅▃▂▆▄▅▅▇▃▂▄▆▄▅▂▄▄▅▅█▄▃▅▃▃▃▆▆▇▃▅▅▄▅▄
val_loss,▄▂▇▆▅▆▂█▅▁

0,1
epoch,9.0
train_batch_loss,2.48076
train_loss,2.37356
val_batch_loss,2.27588
val_loss,2.32586


[34m[1mwandb[0m: Agent Starting Run: umz600uh with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 10
[34m[1mwandb[0m: 	fc_1_2_layer_size: 16
[34m[1mwandb[0m: 	fc_2_3_layer_size: 4
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	optimizer: sgd


VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
epoch,▁▂▃▃▄▅▆▆▇█
train_batch_loss,▇▅▂▂▆▃▁▃▃▅▅▆▆▃▃▅▅█▅▄▅█▅▆▄▂▂▆▁▂▂▅▃▅▂▃▃▆▅▄
train_loss,█▅█▇▅▃▁▅▁▇
val_batch_loss,▆▆▅▃▄▂▄▃▄▃▄▄▅▄▆▅▅▆▄▁▄▇▇▆▅▅▆█▄▂▃▅▃▄▄▃▆▄▄▅
val_loss,█▃▆██▁████

0,1
epoch,9.0
train_batch_loss,2.37437
train_loss,2.37356
val_batch_loss,2.34271
val_loss,2.32597


[34m[1mwandb[0m: Agent Starting Run: n0tqquki with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 10
[34m[1mwandb[0m: 	fc_1_2_layer_size: 16
[34m[1mwandb[0m: 	fc_2_3_layer_size: 2
[34m[1mwandb[0m: 	learning_rate: 0.01
[34m[1mwandb[0m: 	optimizer: sgd


VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
epoch,▁▂▃▃▄▅▆▆▇█
train_batch_loss,▃▆▅▅▄▄▄▄▂▃▅▆▆▄█▁▆▅▃▃▆▅█▆▇▅▃▆▅▅▆▇▅▃▄▂▃▃▅▃
train_loss,▄▄▂▅▁█▄▂▆▃
val_batch_loss,▄█▅▃▂▅▃▇▅▃▅▁▅▆▃▇▄▄▂▄▃▂▄▅▃▅▃▅▁▆▆▂▅▆▄▄▄▆▄▂
val_loss,▂█▄▆▄▅▄▄▁▅

0,1
epoch,9.0
train_batch_loss,2.34892
train_loss,2.37356
val_batch_loss,2.35947
val_loss,2.32599


[34m[1mwandb[0m: Agent Starting Run: 8a794zv6 with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 5
[34m[1mwandb[0m: 	fc_1_2_layer_size: 16
[34m[1mwandb[0m: 	fc_2_3_layer_size: 2
[34m[1mwandb[0m: 	learning_rate: 0.01
[34m[1mwandb[0m: 	optimizer: sgd


VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
epoch,▁▃▅▆█
train_batch_loss,▅▂▆▅▁▅▅▁▃▆▂▃▂▅▄▅▅▄▅▃▄▅▆▂▃▂▆▅▃▅▂▆▇█▁▆▆▆▄▃
train_loss,██▁█▅
val_batch_loss,▁▅▃▆▁▁▃▄▄▃▄▁▅▂▁█▇▃▃▁▂▄▃▁▃▇▅▃▃█▆▂▃▃▅▆▃▃▃▁
val_loss,█▃█▁▆

0,1
epoch,4.0
train_batch_loss,2.40777
train_loss,2.37356
val_batch_loss,2.32559
val_loss,2.32594


[34m[1mwandb[0m: Agent Starting Run: 2c1gno2a with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 10
[34m[1mwandb[0m: 	fc_1_2_layer_size: 16
[34m[1mwandb[0m: 	fc_2_3_layer_size: 1
[34m[1mwandb[0m: 	learning_rate: 0.1
[34m[1mwandb[0m: 	optimizer: sgd


VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
epoch,▁▂▃▃▄▅▆▆▇█
train_batch_loss,▆▃▆▄▄▇█▁▇▄▅▅▃▇▅▄▄▅▄▅▃▄█▆▆▃▅▄▅▄▅▂▆▄▂▆▄▆▅▄
train_loss,▁▆█▃▁▁▂██▄
val_batch_loss,▁▃▇▆▅▆▇▅▇▅█▅█▃▅▃▅▆▇▃▆▃█▇▅▅▄█▄▆▇▆▄▅▅▆▅▄▆▇
val_loss,▆▃▁█▄▁▆▄▆▃

0,1
epoch,9.0
train_batch_loss,2.31701
train_loss,2.37356
val_batch_loss,2.29273
val_loss,2.32589


[34m[1mwandb[0m: Agent Starting Run: fhsi4dmy with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 10
[34m[1mwandb[0m: 	fc_1_2_layer_size: 64
[34m[1mwandb[0m: 	fc_2_3_layer_size: 4
[34m[1mwandb[0m: 	learning_rate: 0.1
[34m[1mwandb[0m: 	optimizer: sgd


VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
epoch,▁▂▃▃▄▅▆▆▇█
train_batch_loss,▆▆▄▄▄▅▆▃▅▄▄█▇▃▄▃▇▄▅▅▄▇▅▅▄▅▆▇▁▅▅▇▇▅▄▅▇▄▂▅
train_loss,█▁▄▁▄▆▄▃▅▂
val_batch_loss,▇▆▇▆█▂▄▄▅▆▇█▅▂▆▆▇▆▆▆▅▅▁▆▅▄▇▅▃▆▃▇█▃▄▃▅▅▇▇
val_loss,▁▅▄▄▅█▆▃▁▆

0,1
epoch,9.0
train_batch_loss,2.3555
train_loss,2.37356
val_batch_loss,2.34273
val_loss,2.32597


[34m[1mwandb[0m: Agent Starting Run: lr1y700c with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 10
[34m[1mwandb[0m: 	fc_1_2_layer_size: 16
[34m[1mwandb[0m: 	fc_2_3_layer_size: 1
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	optimizer: adam


VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
epoch,▁▂▃▃▄▅▆▆▇█
train_batch_loss,▇▄▃█▅█▇▇▅▆▇▅▆▁▄▃▃▄█▅▅▇█▇▃▂█▆█▃▅▇▇▄█▃▅▄▆▅
train_loss,▃▁▃▄▃▂█▃▂▂
val_batch_loss,▃▄▃▆▄▃▂▄▄▄▂▅▃▃▃▁▅▃▄▂▃▂▅▃▅▅▃▄▆▄▆▄▆▂▂▄█▃▂▂
val_loss,█▇▂▃▁▆▅▂▄▃

0,1
epoch,9.0
train_batch_loss,2.50574
train_loss,2.37356
val_batch_loss,2.29254
val_loss,2.32589


[34m[1mwandb[0m: Agent Starting Run: y4u8zlpv with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 20
[34m[1mwandb[0m: 	fc_1_2_layer_size: 32
[34m[1mwandb[0m: 	fc_2_3_layer_size: 1
[34m[1mwandb[0m: 	learning_rate: 0
[34m[1mwandb[0m: 	optimizer: sgd


VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
epoch,▁▁▂▂▂▃▃▄▄▄▅▅▅▆▆▇▇▇██
train_batch_loss,▄▂▆▃▆▇▃▅▅▄▄▅▅▅▆▄▅▇▄█▄▃▄▅▃▄▇▇█▆▆▄█▆▇▄▁▃▅▅
train_loss,▇▇▅▅▃▁▆▇▂▃▇▅▅▃▂▅▃▂█▅
val_batch_loss,▄▅▅▆▅▄▃▆▃▄▆▇▁▆▅▇▅▃▄▅▃█▆▄▆▃▄▃▇▆▆▄▆▇▄▇▂▆▄▆
val_loss,▂▂▇▅▆▇▅▃▅▁▂▃▇▇▃▄█▆▂▅

0,1
epoch,19.0
train_batch_loss,2.39524
train_loss,2.37356
val_batch_loss,2.32617
val_loss,2.32594


[34m[1mwandb[0m: Agent Starting Run: 6wn54zdr with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 10
[34m[1mwandb[0m: 	fc_1_2_layer_size: 32
[34m[1mwandb[0m: 	fc_2_3_layer_size: 4
[34m[1mwandb[0m: 	learning_rate: 0.01
[34m[1mwandb[0m: 	optimizer: adam


VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
epoch,▁▂▃▃▄▅▆▆▇█
train_batch_loss,▅▂▅▁▅▇▄▅▅▂▄▅▅█▆▁▃▄▃▄▂▆▂▃▅▃▄▃▅▂▅▇▄▁▇█▃▅▂▃
train_loss,▅▁▄▅▄█▅▁▇▅
val_batch_loss,▆▄▃▃▂▄▆▃▃▄▇▂▄▂▆▄▆▃▄█▅▂▆▄▅▆▆▁▆▆▇▃▆▆▂▆▄▃▆▅
val_loss,▁▅█▂▁▂▂▄▃▁

0,1
epoch,9.0
train_batch_loss,2.38142
train_loss,2.37356
val_batch_loss,2.29233
val_loss,2.32589


[34m[1mwandb[0m: Agent Starting Run: dmc5bbhz with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fc_1_2_layer_size: 64
[34m[1mwandb[0m: 	fc_2_3_layer_size: 2
[34m[1mwandb[0m: 	learning_rate: 0
[34m[1mwandb[0m: 	optimizer: sgd


VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
epoch,▁
train_batch_loss,▃▅▄▅▇▆▁▅▂▂▃█▄▅▅▃▅▃▄▆▃▄▃▇▄▄▅▃▅▃▆▃▄▇▆▄▄▆▂▃
train_loss,▁
val_batch_loss,▂▄▄▃▄▆▅▄▅▂▅▃▆▄▅▂▃▆█▄▄▅▂▇▄▄▄▇▂▆▆▄▄▆▁▄▃▅▃▁
val_loss,▁

0,1
epoch,0.0
train_batch_loss,2.41532
train_loss,2.37356
val_batch_loss,2.30885
val_loss,2.32591


[34m[1mwandb[0m: Agent Starting Run: o5e3o27y with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 10
[34m[1mwandb[0m: 	fc_1_2_layer_size: 64
[34m[1mwandb[0m: 	fc_2_3_layer_size: 2
[34m[1mwandb[0m: 	learning_rate: 0.1
[34m[1mwandb[0m: 	optimizer: sgd


VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
epoch,▁▂▃▃▄▅▆▆▇█
train_batch_loss,▇▅▄▅▃█▅▂▅▃▄▃▃▅▂▄▆▄▄▁▆▄▆▇▅▅▄▆▅▃▃▄▅▇▅▃▅▆▃▆
train_loss,█▂▅▄▄▅▁▁▇▃
val_batch_loss,▄▅▅▄▄▃▄▅▃▃▄▅▄▇▅█▃▃▁▃▃▅▄▆▅▆▇▄▂▅▄▂▆▄▂▄▅▂▆▄
val_loss,▆▁█▃▁▃▃█▁█

0,1
epoch,9.0
train_batch_loss,2.39565
train_loss,2.37356
val_batch_loss,2.35971
val_loss,2.326


[34m[1mwandb[0m: Agent Starting Run: 0ro8o5pp with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	epochs: 10
[34m[1mwandb[0m: 	fc_1_2_layer_size: 32
[34m[1mwandb[0m: 	fc_2_3_layer_size: 4
[34m[1mwandb[0m: 	learning_rate: 0.01
[34m[1mwandb[0m: 	optimizer: sgd


VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
epoch,▁▂▃▃▄▅▆▆▇█
train_batch_loss,▄▆▇▃▅▅▅▆▄▄▆▇▅▆▄▄▄▃█▂▅▄▁▄▄▃▄▅▆▅▅▃▅▄▄▅▆▃▄▅
train_loss,▃▄▃▆▄▅█▁▄▁
val_batch_loss,▄▅▆▇▅▁▆▅▆▅▅▇▂▇▅▆▆▃▂▅▆▆▅▅▃▅▃▇▂▃▆▃▅▅▃▆▅▆█▆
val_loss,▄▆▄▆█▄▁█▅▇

0,1
epoch,9.0
train_batch_loss,2.39031
train_loss,2.37356
val_batch_loss,2.35877
val_loss,2.32599


[34m[1mwandb[0m: Agent Starting Run: j9thgmcd with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 10
[34m[1mwandb[0m: 	fc_1_2_layer_size: 64
[34m[1mwandb[0m: 	fc_2_3_layer_size: 1
[34m[1mwandb[0m: 	learning_rate: 0.1
[34m[1mwandb[0m: 	optimizer: adam


VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
epoch,▁▂▃▃▄▅▆▆▇█
train_batch_loss,▆▄▅▃▅▄▃▆▅▄▇▄▂▁▄▇▅▄▇▇█▅▄▅▇▃▅▆▆▇▆▁▄▂▅▅▂▆▅▃
train_loss,▅▄▆▁▇█▄▇▇▅
val_batch_loss,▃▅▃▆▆▅▆▄▄▄▄▄▂▇▆▁▆▅▁▇▆▆▄▇▄▁▃▅▃▅▂▃▇▁▃▃▃▄▃█
val_loss,▂█▇▂▃▆▄▂▁▇

0,1
epoch,9.0
train_batch_loss,2.32588
train_loss,2.37356
val_batch_loss,2.35855
val_loss,2.32599


[34m[1mwandb[0m: Agent Starting Run: 2kkc5qud with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fc_1_2_layer_size: 64
[34m[1mwandb[0m: 	fc_2_3_layer_size: 2
[34m[1mwandb[0m: 	learning_rate: 0
[34m[1mwandb[0m: 	optimizer: adam


VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
epoch,▁
train_batch_loss,▆▄▅▄▅▅▅▄▅▅▆▅▆▄▁▅▃▅▅▆▃▃▆▄▅▄▄▆▆▄▆▄█▄▃▇▄▄▄▃
train_loss,▁
val_batch_loss,▄▃▁▂▆▅█▅▄▄▄▃▄▅▇▇▆▃▄▃▂▅▆█▃▅▃▆▃▃▇▃▄▄▄▇▆▂▅▆
val_loss,▁

0,1
epoch,0.0
train_batch_loss,2.3542
train_loss,2.37356
val_batch_loss,2.37569
val_loss,2.32602


[34m[1mwandb[0m: Agent Starting Run: myxf995i with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fc_1_2_layer_size: 16
[34m[1mwandb[0m: 	fc_2_3_layer_size: 4
[34m[1mwandb[0m: 	learning_rate: 0.01
[34m[1mwandb[0m: 	optimizer: sgd


VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
epoch,▁
train_batch_loss,▂▃▄▄▅▇▆▄▆▃█▄▄▅▂▃▄▃▂▄▅▄▇▅▄▁▃▃▅▄▄▃▄▆▆▅▂▄▄▆
train_loss,▁
val_batch_loss,▃▅▅▇▄▆▃▄▄▄▄▂▅▄█▂▄▅▃▄▁█▅▅▄▄▄▅█▅▄▅▂▅▅▁▅▃▇▄
val_loss,▁

0,1
epoch,0.0
train_batch_loss,2.43443
train_loss,2.37356
val_batch_loss,2.32654
val_loss,2.32594


[34m[1mwandb[0m: Agent Starting Run: 44d87lah with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 20
[34m[1mwandb[0m: 	fc_1_2_layer_size: 64
[34m[1mwandb[0m: 	fc_2_3_layer_size: 2
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	optimizer: sgd


VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
epoch,▁▁▂▂▂▃▃▄▄▄▅▅▅▆▆▇▇▇██
train_batch_loss,▅▃▆▅▃▄▄█▂▁▃█▄▅▃▆▆▄▃▃▃▃▂▆▂▃▂▃▇▅▄▃▅▅▅▃▆▂▆▆
train_loss,▅▇▄█▁▅▄▂▂▆█▆▄▅▆▄▅██▁
val_batch_loss,▂▅▅▄▄▄▇▅▅▃█▅▅▄▅▅▂▃▃▄▄▃▄▄▁▄▅▂▅▄▇▅▃▅▅▄▅▂▅▃
val_loss,▅▄▄▄▇▅▇▁▅▇▅▅▅█▄▅▅▄▅▃

0,1
epoch,19.0
train_batch_loss,2.31219
train_loss,2.37356
val_batch_loss,2.29314
val_loss,2.32589


[34m[1mwandb[0m: Agent Starting Run: akc70nn7 with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fc_1_2_layer_size: 64
[34m[1mwandb[0m: 	fc_2_3_layer_size: 4
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	optimizer: sgd


VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
epoch,▁
train_batch_loss,▅▄▆▅▇▆▆▅▃▃█▅▄▃▆▄▂▂▃▄▆▄▁▂▅▄▇▆▄▅▃▄▄▃▅▂█▅▅▆
train_loss,▁
val_batch_loss,▄▅▅▂▆▄▁▅▃▂▂▃▃▆▄▄▅▆▅▄▁▅▅▆▂▆▂▃▇▃▄▁▃▃▄▆▅▃█▃
val_loss,▁

0,1
epoch,0.0
train_batch_loss,2.3818
train_loss,2.37356
val_batch_loss,2.34287
val_loss,2.32597


[34m[1mwandb[0m: Agent Starting Run: 6u5awj9e with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 10
[34m[1mwandb[0m: 	fc_1_2_layer_size: 64
[34m[1mwandb[0m: 	fc_2_3_layer_size: 1
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	optimizer: adam


VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
epoch,▁▂▃▃▄▅▆▆▇█
train_batch_loss,▅▅▄▅▆▂█▄▃▅▅▃▅▃▄▄▄▂▆▆▃▅▆▂▅▆▆▃▅▄▄▄▆▆▄▁▅▄▅▂
train_loss,▃▃▁▄▄▄▃█▅▄
val_batch_loss,▄▅▃▃▅▃▄▃▁▇▄▇▆▆▆▇▅▅▆▅▆█▅▆▆▅▅▆▅▆▄▆▄▅▆█▅▂▇▅
val_loss,▃▁▅█▅▁▅▃█▅

0,1
epoch,9.0
train_batch_loss,2.29466
train_loss,2.37356
val_batch_loss,2.34283
val_loss,2.32597


[34m[1mwandb[0m: Agent Starting Run: idj1dbk8 with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	epochs: 5
[34m[1mwandb[0m: 	fc_1_2_layer_size: 64
[34m[1mwandb[0m: 	fc_2_3_layer_size: 2
[34m[1mwandb[0m: 	learning_rate: 0.01
[34m[1mwandb[0m: 	optimizer: adam


VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
epoch,▁▃▅▆█
train_batch_loss,▄▅▅▆▄▄▃█▁▂▅▆▆█▆▅▄▄▅▇▇▅▇▇▆▆▆█▂▅█▆▁▆▇▆▇▂▄▄
train_loss,▇▁█▇▇
val_batch_loss,▅▅▅▄▅▅▆▅▇▅▅▄▄▅▆▇▆▆▂▇▅▄▅▄▅▅▄▅▇▃▃▂▆▁▅▄▂▆▆█
val_loss,▆▁▃▁█

0,1
epoch,4.0
train_batch_loss,2.36214
train_loss,2.37356
val_batch_loss,2.35901
val_loss,2.326


[34m[1mwandb[0m: Agent Starting Run: e6kms1iq with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 5
[34m[1mwandb[0m: 	fc_1_2_layer_size: 64
[34m[1mwandb[0m: 	fc_2_3_layer_size: 2
[34m[1mwandb[0m: 	learning_rate: 0
[34m[1mwandb[0m: 	optimizer: sgd


VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
epoch,▁▃▅▆█
train_batch_loss,▆▆▄▆▅▅▆▅▆▆▆▇█▄█▅▇▆▇▆▇▆▆▅▆▅▅▅▇▇▆▅▆▇▆▇▁▆▆█
train_loss,▁▂▅█▇
val_batch_loss,▅▄▇▆▄▃▅▃▃▂▇▄▂▅▁▂▂▄▄▅▄▄▃▄▄▃▆▃▄▂▄▄▂▅█▂▂▃▃▃
val_loss,█▁█▆▁

0,1
epoch,4.0
train_batch_loss,2.36335
train_loss,2.37356
val_batch_loss,2.30907
val_loss,2.32592


[34m[1mwandb[0m: Agent Starting Run: ignr1o4y with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 20
[34m[1mwandb[0m: 	fc_1_2_layer_size: 16
[34m[1mwandb[0m: 	fc_2_3_layer_size: 2
[34m[1mwandb[0m: 	learning_rate: 0.01
[34m[1mwandb[0m: 	optimizer: sgd


VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
epoch,▁▁▂▂▂▃▃▄▄▄▅▅▅▆▆▇▇▇██
train_batch_loss,▆▂▆▄▅▅▃▃▄▅▄▆█▄▁▄▂▃▅▆▄▄▃▄▃▄▂▃▆▂▁▄▅▄▁▅▄▅▄▄
train_loss,▇▆▆▇▆▇▁▇▅▃▆█▇▃▅▃▃▅▃▇
val_batch_loss,▄▅▄▅▅▃▆▄▅▅█▄▂▄▂▅▅▄▅▂▆▄▄▄▄▇▆▄▆▆▄▇▄▅▂▅▅▁▆▆
val_loss,▅▅▃▇▇▅▅▅▁▃▆▃▅▅▇▅▅█▄▅

0,1
epoch,19.0
train_batch_loss,2.32397
train_loss,2.37356
val_batch_loss,2.34276
val_loss,2.32597


[34m[1mwandb[0m: Agent Starting Run: bbsonc59 with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 5
[34m[1mwandb[0m: 	fc_1_2_layer_size: 64
[34m[1mwandb[0m: 	fc_2_3_layer_size: 2
[34m[1mwandb[0m: 	learning_rate: 0.01
[34m[1mwandb[0m: 	optimizer: sgd


VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
epoch,▁▃▅▆█
train_batch_loss,▇▆▃▆▁▅▅▃▃▆▅▆▅▆▆▄▃▅█▃▅▃▆▄▄▄▄▇▄█▅▃▅▆▂▅█▅▄▄
train_loss,▇▂█▁▅
val_batch_loss,▆▃▃▆▄▆▂▄▆▁▆▂▄▂▄▃▅█▆▃▁▅▅▇▅▆▆▃▆▆▂▃▂▄▄▇▂▃▆▂
val_loss,▁█▇▇▅

0,1
epoch,4.0
train_batch_loss,2.33661
train_loss,2.37356
val_batch_loss,2.32592
val_loss,2.32594


[34m[1mwandb[0m: Agent Starting Run: m7f38map with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 20
[34m[1mwandb[0m: 	fc_1_2_layer_size: 64
[34m[1mwandb[0m: 	fc_2_3_layer_size: 2
[34m[1mwandb[0m: 	learning_rate: 0
[34m[1mwandb[0m: 	optimizer: sgd


VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
epoch,▁▁▂▂▂▃▃▄▄▄▅▅▅▆▆▇▇▇██
train_batch_loss,▅▇▇▄▆▅▄▆▄▄▄▆▃▁▇▇▂▄▆▄▅▅▆▅▄▆▆█▄▄▄▂▇▆▄▄▆▄▄█
train_loss,▁▅▂▄▃▁▂▃▄▃▃▃▁▃▄▃█▂▃▂
val_batch_loss,▅▄▃▃▄▂█▄▄▅▂▅▄▃▅▅▂▄▄▃▄▂▄▆▃▄▄▄▄▅▅▄▄▅▄▄▅▆▁▂
val_loss,▅▄▃▅▅█▅▃▄▃▅▁▆▅▂▅▅▆▆▂

0,1
epoch,19.0
train_batch_loss,2.2831
train_loss,2.37356
val_batch_loss,2.2769
val_loss,2.32586


[34m[1mwandb[0m: Agent Starting Run: gydi1a5y with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	epochs: 20
[34m[1mwandb[0m: 	fc_1_2_layer_size: 64
[34m[1mwandb[0m: 	fc_2_3_layer_size: 2
[34m[1mwandb[0m: 	learning_rate: 0.1
[34m[1mwandb[0m: 	optimizer: sgd


VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
epoch,▁▁▂▂▂▃▃▄▄▄▅▅▅▆▆▇▇▇██
train_batch_loss,▂▅▄▂▃▄▃▅▅▃█▄▄▂▁▅▂▅▃▃▂▆▄▃▅▄▄▂▄▅▅▁▄▄▄▄▄▆▃▃
train_loss,▅▇▅▅▄▅▇▃▅▅▅▅▁▇▅▅▂▃▃█
val_batch_loss,▅▄▄▄▄▇▃▅▁▆█▃▅▄▃▄▅▄▆▁▅▃▂▄▅▃▄▄▂▄▅▃▂▂▄▄▄▄▁▂
val_loss,▃▆▅▆█▆▃▅▆▇▅▁▅▂▆▇▃▆▅▅

0,1
epoch,19.0
train_batch_loss,2.30903
train_loss,2.37356
val_batch_loss,2.30937
val_loss,2.32592


[34m[1mwandb[0m: Agent Starting Run: j2akhbws with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	epochs: 20
[34m[1mwandb[0m: 	fc_1_2_layer_size: 32
[34m[1mwandb[0m: 	fc_2_3_layer_size: 4
[34m[1mwandb[0m: 	learning_rate: 0.01
[34m[1mwandb[0m: 	optimizer: adam


Now that you have completed the task, please write 3-5 lines sharing your approach to the problem and how you went about solving this task.

I started by just running the code and seeing where it broke. From there, I worked through each error that occurred, fixing the error, then proceeding to the next. 

Once the pipeline was working just fine, I moved on to improving model performance via sweeps. I started with just epochs, then added batch sizes and eventually optimizer parameters and fc layer sizes. 

The sweeps now take some time to run, but should be able to provide more insight. One potential performance improvement would be to swap to bayesian instead of random search, but I was unsure as to what the definition of a small number of parameters was, so I stuck with random.