<a href="https://colab.research.google.com/github/timsetsfire/wandb-examples/blob/main/colab/W%26B_Training_with_Optuna.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# PyTorch + W&B

The purpose of this lab is to instrument W&B a top of existing ML Workflows which might be leveraging 
* PyTorch
* Tensorboard (for metric tracking)
* Python `logging` (for metric tracking)

We will augment this workflow by leveraging 
* Wandb Experiments and syncing with Tensorboard
* Wandb logging
* Wandb Artifacts for dataset and model logging / versioning
* Tables to surface prediction examples on Test datasets
* track lineage of all artifacts and experiments completed

Lastly, we'll do a simple demo of sweeps and interact with the runs via W&B API to
* query runs and run summaries
* artifacts

In [1]:
%%capture
!pip install wandb easydict optuna --upgrade

In [2]:
%%capture
!pip install tensorboard dill

## Logging In

In [3]:
#@title Enter host address
#@markdown Enter the host url which corresponds to your WB instance.
host = "https://api.wandb.ai" #@param {type: "string"}


In [4]:
import wandb
## when using wandb anywhere other than wandb.ai, you must 
## provide a proper host, so the client knows where to communcate
## details of the experiment
# wandb.login(key = key, host = host)
wandb.login(host = host)

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


True

In [5]:
import os
import random
import logging
import numpy as np
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
from tqdm.notebook import tqdm
from torch.utils.tensorboard import SummaryWriter

# Ensure deterministic behavior
torch.backends.cudnn.deterministic = True
random.seed(hash("setting random seeds") % 2**32 - 1)
np.random.seed(hash("improves reproducibility") % 2**32 - 1)
torch.manual_seed(hash("by removing stochasticity") % 2**32 - 1)
torch.cuda.manual_seed_all(hash("so runs are repeatable") % 2**32 - 1)

# Device configuration
# if you wind up with any device other than cpu, some code below will need to 
# change specific to the way we are interacting with torch tensors.  
# device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
device = torch.device("cpu")

# remove slow mirror from list of MNIST mirrors
torchvision.datasets.MNIST.mirrors = [mirror for mirror in torchvision.datasets.MNIST.mirrors
                                      if not mirror.startswith("http://yann.lecun.com")]

## Get Data (and log it)

When we get data and log it, there are obviously tons of way to complete this.  Depending on how you log data, and whether or not you log your retrieval mechanism is a matter of preference and internal guidelines you need to follow.  

In our approach, we will write a `getter` for our data.  The benefit of writing a getting, we can log this getter with our dataset as part of the artifact metadata.



Before we get started it is important to set the name space for your project.  This is going to be accomplished by passing a `project_name` as well as an `entity` to your wandb experiment

`entity` corresponds to the team to which the project will be associated with.  The `entity` could be a team name, or your user name.  

In [6]:
project_name = "demos" #@param {type: "string"}
entity = "tim-w" #@param {type: "string"}

## Logging data

W&B is very unopinionated with regard to how you track your experiments.  We could log data in any number of ways.  
* Log one artifact which represents all the data - training, validation, and test data to one artifact 
* Log several artifacts - one for each of the training, validation, and test data loaders.  

It is a matter of what best suites your needs and workflows and expectations.  

### Anatomy of an artifact 

The `Artifact` class will correspond to an entry in the W&B Artifact registry.  The artifact has 
* a name
* a type
* metadata
* description
* files, directory of files, or references

Example usage 
```
run = wandb.init(project = "my-project")
artifact = wandb.Artifact(name = "my_artifact", type = "data")
artifact.add_file("/path/to/my/file.txt")
run.log_artifact(artifact)
run.finish()
```

In [21]:
## create the data directory locally if it does not already exists
from pathlib import Path
data_path = Path("./data")
data_path.mkdir(exist_ok = True)

## define out data getter 
def get_data(slice=5, train=True):
  '''
  helper function to get data
  args: 
    slice: Int => passed to torch.utils.data.Subset indices argument
    train: Boolean => True to download training data, False for test data
  '''
  full_dataset = torchvision.datasets.MNIST(root=".",
                                            train=train, 
                                            transform=transforms.ToTensor(),
                                            download=True)
  #  equiv to slicing with [::slice] 
  sub_dataset = torch.utils.data.Subset(
    full_dataset, indices=range(0, len(full_dataset), slice))

  return sub_dataset

In [22]:
logging.basicConfig(
                format="%(levelname)s - %(asctime)s - %(message)s",
        )
logger = logging.getLogger("CNN-Logger")
logger.setLevel("INFO")

## Our First W&B Experiment / Run

We are going to 
* get our training and test data
* split the training data into training and validation
* create artifacts for all three dataset
* log those artifacts to W&B.  

In [23]:
#%%wandb -h 600 
import pickle
from dill.source import getsource
from dill import detect
from datetime import datetime 

with wandb.init(project = project_name, job_type = "data-acquisition") as run:

  train, test = get_data(train=True), get_data(train=False)
  train, validation = torch.utils.data.random_split(train, [10000, 2000])

  torch.save(train, './data/training_data.pt')
  torch.save(validation, './data/validation_data.pt')
  torch.save(test, './data/test_data.pt')

  train_artifact = wandb.Artifact(name = "mnist-training-data", type = "dataset", 
                                  description = "training data",
                                  metadata = { 
                                      "data-set": "MNIST training",
                                      "getter": getsource(detect.code(get_data))}
                                  )
  train_artifact.add_file("./data/training_data.pt")

  validation_artifact = wandb.Artifact(name = "mnist-validation-data", type = "dataset", 
                                       description = "validation data",
                                       metadata = { 
                                      "data-set": "MNIST validation",
                                      "getter": getsource(detect.code(get_data))})
  validation_artifact.add_file("./data/validation_data.pt")

  test_artifact = wandb.Artifact(name = "mnist-test-data", type = "dataset", 
                                 description = "test data",
                                 metadata = { 
                                      "data-set": "MNIST test",
                                      "getter": getsource(detect.code(get_data))})
  test_artifact.add_file("./data/test_data.pt")  
  
  run.log_artifact(train_artifact)
  run.log_artifact(validation_artifact)
  run.log_artifact(test_artifact)

VBox(children=(Label(value='0.303 MB of 0.303 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

## Artifact usage (Creating the DAG)

Part of the value of W&B is the ability to capture lineage via Experiments and Artifacts.  Next up for our work flow is to specifiy a model and commence training.  

It is key to remember that experiments create and consume artifacts and we have already completed one  experimemtns where we created dataset artifacts.  

Next up, we will commence an experiment that will consume the artifacts from the previous run for the purposes of training model, then we will create a model artifact.

## Specify the model



In [24]:
# Conventional and convolutional neural network
class ConvNet(nn.Module):
    def __init__(self, kernels, classes=10):
        super(ConvNet, self).__init__()
        
        self.layer1 = nn.Sequential(
            nn.Conv2d(1, kernels[0], kernel_size=5, stride=1, padding=2),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2))
        self.layer2 = nn.Sequential(
            nn.Conv2d(16, kernels[1], kernel_size=5, stride=1, padding=2),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2))
        self.fc = nn.Linear(7 * 7 * kernels[-1], classes)
        
    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = out.reshape(out.size(0), -1)
        out = self.fc(out)
        return out

In [25]:
def make_loader(dataset, batch_size):
    loader = torch.utils.data.DataLoader(dataset=dataset,
                                         batch_size=batch_size, 
                                         shuffle=True,
                                         pin_memory=True, num_workers=2)
    return loader

## Training

In our first model training experiment, we are going to sync our wandb experiment to tensorboard -> so no wandb specific logging will be instrumented.  


In [26]:
# %%wandb -h 600
# Run training and track with wandb, but no explicit logging.  
# since we were alredy using tensorboard via WritterSumamry, we'll 
# sync w&b to tensorboard.
config = dict(
    epochs=5,
    classes=10,
    kernels=[16, 32],
    batch_size=128,
    learning_rate=0.01,
    dataset="MNIST",
    architecture="CNN"
    )

with wandb.init(project = project_name, 
                 job_type = "training", 
                 config = config,
                 sync_tensorboard = True) as run:

  config = wandb.config
  ## or, ifyou have a nasty nested dictionary for your config
  # config = EasyDict(wandb.config)

  run.use_artifact(f"{run.entity}/{run.project}/mnist-training-data:latest")
  run.use_artifact(f"{run.entity}/{run.project}/mnist-validation-data:latest")
  ## download and instantiation of the artifacts might be necessary.  

  train_loader = make_loader(train, batch_size=config.batch_size)
  validation_loader = make_loader(validation, batch_size=config.batch_size)

  model = ConvNet(config.kernels, config.classes).to(device)
  criterion = nn.CrossEntropyLoss()
  optimizer = torch.optim.Adam(model.parameters(), lr=config.learning_rate)

  writer = SummaryWriter(log_dir = "./wandb/latest-run")
  total_batches = len(train_loader) * config.epochs
  example_ct = 0  # number of examples seen
  batch_ct = 0
  for epoch in tqdm(range(config.epochs)):
    for step, (images, labels) in enumerate(train_loader):
      images, labels = images.to(device), labels.to(device)
      # Forward pass ➡
      outputs = model(images)
      loss = criterion(outputs, labels)
      # Backward pass ⬅
      optimizer.zero_grad()
      loss.backward()
      # Step with optimizer
      optimizer.step()
      example_ct +=  len(images)
      batch_ct += 1
      # Report metrics every 25th batch
      if ((batch_ct + 1) % 25) == 0:
        writer.add_scalar("Train Metrics/loss", loss, batch_ct)
        writer.add_scalar("epoch", loss, batch_ct)
        logger.info(f"Epoch: {epoch}, Loss: {loss.detach().numpy()}")
    with torch.no_grad():
      correct, total = 0, 0
      for images, labels in validation_loader:
          images, labels = images.to(device), labels.to(device)
          outputs = model(images)
          _, predicted = torch.max(outputs.data, 1)
          total += labels.size(0)
          correct += (predicted == labels).sum().item()
          loss = criterion(outputs, labels)
          writer.add_scalar("Validation Metrics/loss", loss, batch_ct)
          writer.add_scalar("epoch", epoch, batch_ct)
      logger.info(f"Epoch {epoch}, Accuracy of the model on the {total} test images: {100 * correct / total}%")
      writer.add_scalar("Validation Metrics/accuracy", correct/total, batch_ct)
      writer.add_scalar("epoch", epoch, batch_ct)

  torch.save(model.state_dict(), "model.pt")
  model_artifact = wandb.Artifact(name = "mnist-model", type = "model")
  model_artifact.add_file("model.pt")
  run.log_artifact(model_artifact)


  0%|          | 0/5 [00:00<?, ?it/s]

INFO:CNN-Logger:Epoch: 0, Loss: 0.6152558922767639
INFO:CNN-Logger:Epoch: 0, Loss: 0.37864214181900024
INFO:CNN-Logger:Epoch: 0, Loss: 0.33058860898017883
INFO:CNN-Logger:Epoch 0, Accuracy of the model on the 2000 test images: 93.25%
INFO:CNN-Logger:Epoch: 1, Loss: 0.18225780129432678
INFO:CNN-Logger:Epoch: 1, Loss: 0.14611569046974182
INFO:CNN-Logger:Epoch: 1, Loss: 0.04811842739582062
INFO:CNN-Logger:Epoch 1, Accuracy of the model on the 2000 test images: 94.85%
INFO:CNN-Logger:Epoch: 2, Loss: 0.15746846795082092
INFO:CNN-Logger:Epoch: 2, Loss: 0.030076010152697563
INFO:CNN-Logger:Epoch: 2, Loss: 0.12193747609853745
INFO:CNN-Logger:Epoch 2, Accuracy of the model on the 2000 test images: 95.75%
INFO:CNN-Logger:Epoch: 3, Loss: 0.06308609992265701
INFO:CNN-Logger:Epoch: 3, Loss: 0.10632984340190887
INFO:CNN-Logger:Epoch: 3, Loss: 0.10202132165431976
INFO:CNN-Logger:Epoch 3, Accuracy of the model on the 2000 test images: 96.0%
INFO:CNN-Logger:Epoch: 4, Loss: 0.049882564693689346
INFO:CNN

VBox(children=(Label(value='0.210 MB of 0.210 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
Train Metrics/loss,█▅▅▃▂▁▃▁▂▁▂▂
Validation Metrics/accuracy,▁▅█
Validation Metrics/loss,█▃▅▁
epoch,▂▂▂▁▁▁▁▃▁▁▁▆▁▁▁█
global_step,▁▂▂▂▃▃▄▄▅▅▆▆▆▇██

0,1
Train Metrics/loss,0.10202
Validation Metrics/accuracy,0.9575
Validation Metrics/loss,0.08218
epoch,3.0
global_step,316.0


## Test Data Evaluation

In [27]:
import pandas as pd
with wandb.init(project = project_name, entity = entity, job_type = "evaluation") as run:
  model_artifact = run.use_artifact(model_artifact.wait())
  ## instantiate the model if necessary
  # model_dir = model_artifact.download()
  # model = ConvNet(config.kernels, config.classes)
  # model.load_state_dict(torch.load(f"{model_dir}/model.pt"))
  run.use_artifact(f"{run.entity}/{run.project}/mnist-test-data:latest")
  ## same goes for the dataset
  test_loader = make_loader(test, batch_size=config.batch_size)

  model.eval()
  # Run the model on some test examples

  with torch.no_grad():
      correct, total = 0, 0
      total_loss = 0
      all_data = []
      for images, labels in test_loader:
          images, labels = images.to(device), labels.to(device)
          outputs = model(images)
          _, predicted = torch.max(outputs.data, 1)
          total += labels.size(0)
          correct += (predicted == labels).sum().item()
          loss = criterion(outputs, labels)*labels.size(0)
          total_loss += loss
          wandb_images = []
          for image in images.numpy():
            temp = wandb.Image(image)
            wandb_images.append(temp) 
          scores = pd.DataFrame( outputs.numpy().tolist(), columns = [f"p{i}" for i in range(outputs.shape[1])]).to_dict(orient = "series")
          data = {"images":wandb_images, "predicted": predicted.numpy().tolist(), "labels": labels.numpy().tolist()}
          data = {**data, **scores}
          all_data.append(pd.DataFrame(data))
      import pandas as pd 
      df = pd.concat(all_data)
      wandb.log({"Predictions vs Actuals": wandb.Table(dataframe = df)})
      run.log({"Test Metrics/loss": total_loss / total, "Test Metrics/accuracy": correct / total})
      logger.info(f"Accuracy of the model on the {total} " +
            f"test images: {100 * correct / total}%")
          

INFO:CNN-Logger:Accuracy of the model on the 2000 test images: 96.8%


0,1
Test Metrics/accuracy,▁
Test Metrics/loss,▁

0,1
Test Metrics/accuracy,0.968
Test Metrics/loss,0.10459


## HPO with Optuna

In [28]:

# @wandbc.track_in_wandb()
# def objective(trial):
#     learning_rate = trial.suggest_float("learning_rate", 0.001, 0.1)
#     batch_size = trial.suggest_categorical(name = "batch_size", choices = [128, 256])
#     config_standard = dict(
#       epochs=5,
#       classes=10,
#       kernels=[16, 32],
#       dataset="MNIST",
#       architecture="CNN",
#       learning_rate = learning_rate, 
#       batch_size = batch_size
#     )
#     wandb.config.update(config_standard)
#     x = trial.suggest_float("x", -10, 10)
#     return (x - 2) ** 2




  """Entry point for launching an IPython kernel.


## Training Function

In [29]:
import optuna
from optuna.integration.wandb import WeightsAndBiasesCallback

In [30]:
wandb_kwargs = {"project": "my-optuna-project-v5"}
wandbc = WeightsAndBiasesCallback(wandb_kwargs=wandb_kwargs, as_multirun=True)

  


In [31]:
@wandbc.track_in_wandb()
def train_func(trial): 

    learning_rate = trial.suggest_float("learning_rate", 0.001, 0.1)
    batch_size = trial.suggest_categorical(name = "batch_size", choices = [128, 256])
    config_standard = dict(
      epochs=5,
      classes=10,
      kernels=[16, 32],
      dataset="MNIST",
      architecture="CNN",
    )
    wandb.config.update(config_standard)
  
    wandb.use_artifact(f"{entity}/{project_name}/mnist-training-data:latest")
    wandb.use_artifact(f"{entity}/{project_name}/mnist-validation-data:latest")

    train_loader = make_loader(train, batch_size=config.batch_size)
    validation_loader = make_loader(validation, batch_size=config.batch_size)

    model = ConvNet(config.kernels, config.classes).to(device)
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=config.learning_rate)

    total_batches = len(train_loader) * config.epochs
    example_ct = 0  # number of examples seen
    batch_ct = 0
    for epoch in tqdm(range(config.epochs)):
      for _, (images, labels) in enumerate(train_loader):
        images, labels = images.to(device), labels.to(device)
        # Forward pass ➡
        outputs = model(images)
        loss = criterion(outputs, labels)
        # Backward pass ⬅
        optimizer.zero_grad()
        loss.backward()
        # Step with optimizer
        optimizer.step()
        example_ct +=  len(images)
        batch_ct += 1
        # Report metrics every 25th batch
        if ((batch_ct + 1) % 25) == 0:
          logger.info(f"Epoch: {epoch}, Loss: {loss.detach().numpy()}")
          wandb.log({ "Train Metrics/loss": loss, "epoch": epoch})
      with torch.no_grad():
        correct, total = 0, 0
        for images, labels in validation_loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
            loss = criterion(outputs, labels)
            wandb.log({"Validation Metrics/loss": loss, "epoch": epoch})
        logger.info(f"Epoch {epoch}, Accuracy of the model on the {total} validation images: {100 * correct / total}%")
        wandb.log({"Validation Metrics/accuracy": correct / total})

    torch.save(model.state_dict(), f"{wandb.run.id}-model.pt")
    model_artifact = wandb.Artifact(name = f"{wandb.run.id}-mnist-model", type = "model")
    model_artifact.add_file(f"{wandb.run.id}-model.pt")
    wandb.log_artifact(model_artifact)
    return loss


  """Entry point for launching an IPython kernel.


In [34]:
study = optuna.create_study()
study.optimize(train_func, n_trials=10, callbacks=[wandbc])

[32m[I 2022-10-24 16:04:39,584][0m A new study created in memory with name: no-name-fb829780-3530-4509-bc48-2372be956dbf[0m


  0%|          | 0/5 [00:00<?, ?it/s]

INFO:CNN-Logger:Epoch: 0, Loss: 0.2202364057302475
INFO:CNN-Logger:Epoch: 0, Loss: 0.1907530277967453
INFO:CNN-Logger:Epoch: 0, Loss: 0.19717641174793243
INFO:CNN-Logger:Epoch 0, Accuracy of the model on the 2000 validation images: 94.95%
INFO:CNN-Logger:Epoch: 1, Loss: 0.12388281524181366
INFO:CNN-Logger:Epoch: 1, Loss: 0.10667406767606735
INFO:CNN-Logger:Epoch: 1, Loss: 0.05271652340888977
INFO:CNN-Logger:Epoch 1, Accuracy of the model on the 2000 validation images: 97.25%
INFO:CNN-Logger:Epoch: 2, Loss: 0.07366706430912018
INFO:CNN-Logger:Epoch: 2, Loss: 0.030108748003840446
INFO:CNN-Logger:Epoch: 2, Loss: 0.0902952179312706
INFO:CNN-Logger:Epoch 2, Accuracy of the model on the 2000 validation images: 97.0%
INFO:CNN-Logger:Epoch: 3, Loss: 0.018616069108247757
INFO:CNN-Logger:Epoch: 3, Loss: 0.06598187983036041
INFO:CNN-Logger:Epoch: 3, Loss: 0.022139111533761024
INFO:CNN-Logger:Epoch 3, Accuracy of the model on the 2000 validation images: 97.25%
INFO:CNN-Logger:Epoch: 4, Loss: 0.019

VBox(children=(Label(value='0.225 MB of 0.225 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
Train Metrics/loss,█▇▇▅▄▃▃▂▄▁▃▂▁▁▃
Validation Metrics/accuracy,▁█▇█▇
Validation Metrics/loss,▅█▆▄▄▇▄▃▂▃▃▃▇▄▂▄▃▂▁▅▇▃▃▁▂▂▁▅▃▇▃▅▂▄▄▁▃▂▃▅
batch_size,▁
epoch,▁▁▁▁▁▁▁▁▃▃▃▃▃▃▃▃▅▅▅▅▅▅▅▅▆▆▆▆▆▆▆▆████████
learning_rate,▁
trial_number,▁
value,▁

0,1
Train Metrics/loss,0.05613
Validation Metrics/accuracy,0.9705
Validation Metrics/loss,0.17458
batch_size,128.0
epoch,4.0
learning_rate,0.08832
trial_number,0.0
value,0.00094


  0%|          | 0/5 [00:00<?, ?it/s]

INFO:CNN-Logger:Epoch: 0, Loss: 0.3204013705253601
INFO:CNN-Logger:Epoch: 0, Loss: 0.1375519335269928
INFO:CNN-Logger:Epoch: 0, Loss: 0.06728356331586838
INFO:CNN-Logger:Epoch 0, Accuracy of the model on the 2000 validation images: 94.95%
INFO:CNN-Logger:Epoch: 1, Loss: 0.16734910011291504
INFO:CNN-Logger:Epoch: 1, Loss: 0.08250366896390915
INFO:CNN-Logger:Epoch: 1, Loss: 0.1568121612071991
INFO:CNN-Logger:Epoch 1, Accuracy of the model on the 2000 validation images: 96.7%
INFO:CNN-Logger:Epoch: 2, Loss: 0.022772399708628654
INFO:CNN-Logger:Epoch: 2, Loss: 0.04573654383420944
INFO:CNN-Logger:Epoch: 2, Loss: 0.03988000005483627
INFO:CNN-Logger:Epoch 2, Accuracy of the model on the 2000 validation images: 97.35%
INFO:CNN-Logger:Epoch: 3, Loss: 0.055933479219675064
INFO:CNN-Logger:Epoch: 3, Loss: 0.06086835637688637
INFO:CNN-Logger:Epoch: 3, Loss: 0.02095036581158638
INFO:CNN-Logger:Epoch 3, Accuracy of the model on the 2000 validation images: 97.25%
INFO:CNN-Logger:Epoch: 4, Loss: 0.1088

0,1
Train Metrics/loss,█▄▂▅▃▄▁▂▂▂▂▁▃▁▁
Validation Metrics/accuracy,▁▆▇▇█
Validation Metrics/loss,▅▅▃▅█▅▂▅▂▃▂▁▂▄▄▂▄▄▂▂▂▃▄▃▄▂▄▁▃▂▂▂▅▃▃▅▄▂▂▁
batch_size,▁
epoch,▁▁▁▁▁▁▁▁▃▃▃▃▃▃▃▃▅▅▅▅▅▅▅▅▆▆▆▆▆▆▆▆████████
learning_rate,▁
trial_number,▁
value,▁

0,1
Train Metrics/loss,0.00392
Validation Metrics/accuracy,0.9755
Validation Metrics/loss,0.02227
batch_size,128.0
epoch,4.0
learning_rate,0.01012
trial_number,1.0
value,0.00078


  0%|          | 0/5 [00:00<?, ?it/s]

INFO:CNN-Logger:Epoch: 0, Loss: 0.5920660495758057
INFO:CNN-Logger:Epoch: 0, Loss: 0.27357736229896545
INFO:CNN-Logger:Epoch: 0, Loss: 0.2189915031194687
INFO:CNN-Logger:Epoch 0, Accuracy of the model on the 2000 validation images: 94.7%
INFO:CNN-Logger:Epoch: 1, Loss: 0.20672453939914703
INFO:CNN-Logger:Epoch: 1, Loss: 0.19700832664966583
INFO:CNN-Logger:Epoch: 1, Loss: 0.14599210023880005
INFO:CNN-Logger:Epoch 1, Accuracy of the model on the 2000 validation images: 96.6%
INFO:CNN-Logger:Epoch: 2, Loss: 0.06642049551010132
INFO:CNN-Logger:Epoch: 2, Loss: 0.09510014951229095
INFO:CNN-Logger:Epoch: 2, Loss: 0.10135313868522644
INFO:CNN-Logger:Epoch 2, Accuracy of the model on the 2000 validation images: 96.95%
INFO:CNN-Logger:Epoch: 3, Loss: 0.09899364411830902
INFO:CNN-Logger:Epoch: 3, Loss: 0.0544300302863121
INFO:CNN-Logger:Epoch: 3, Loss: 0.05054887384176254
INFO:CNN-Logger:Epoch 3, Accuracy of the model on the 2000 validation images: 97.1%
INFO:CNN-Logger:Epoch: 4, Loss: 0.07150340

VBox(children=(Label(value='0.259 MB of 0.259 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
Train Metrics/loss,█▄▄▃▃▃▂▂▂▂▂▂▂▁▁
Validation Metrics/accuracy,▁▆▇▇█
Validation Metrics/loss,▄▄▅▆▄█▄▄▃▃▄▄▁▃▄▄▂▂▆▃▃▁▄▅▂▄▅▃▅▁▁▃▅▄▁▃▅▄▁▆
batch_size,▁
epoch,▁▁▁▁▁▁▁▁▃▃▃▃▃▃▃▃▅▅▅▅▅▅▅▅▆▆▆▆▆▆▆▆████████
learning_rate,▁
trial_number,▁
value,▁

0,1
Train Metrics/loss,0.02118
Validation Metrics/accuracy,0.9745
Validation Metrics/loss,0.2124
batch_size,256.0
epoch,4.0
learning_rate,0.08938
trial_number,2.0
value,0.00077


  0%|          | 0/5 [00:00<?, ?it/s]

INFO:CNN-Logger:Epoch: 0, Loss: 0.3968081772327423
INFO:CNN-Logger:Epoch: 0, Loss: 0.12533177435398102
INFO:CNN-Logger:Epoch: 0, Loss: 0.018057527020573616
INFO:CNN-Logger:Epoch 0, Accuracy of the model on the 2000 validation images: 96.6%
INFO:CNN-Logger:Epoch: 1, Loss: 0.04383872076869011
INFO:CNN-Logger:Epoch: 1, Loss: 0.11850938200950623
INFO:CNN-Logger:Epoch: 1, Loss: 0.0904545709490776
INFO:CNN-Logger:Epoch 1, Accuracy of the model on the 2000 validation images: 96.5%
INFO:CNN-Logger:Epoch: 2, Loss: 0.09245205670595169
INFO:CNN-Logger:Epoch: 2, Loss: 0.11973658204078674
INFO:CNN-Logger:Epoch: 2, Loss: 0.047684986144304276
INFO:CNN-Logger:Epoch 2, Accuracy of the model on the 2000 validation images: 96.6%
INFO:CNN-Logger:Epoch: 3, Loss: 0.023342739790678024
INFO:CNN-Logger:Epoch: 3, Loss: 0.08425422012805939
INFO:CNN-Logger:Epoch: 3, Loss: 0.0288467425853014
INFO:CNN-Logger:Epoch 3, Accuracy of the model on the 2000 validation images: 97.4%
INFO:CNN-Logger:Epoch: 4, Loss: 0.014514

VBox(children=(Label(value='0.280 MB of 0.280 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
Train Metrics/loss,█▃▁▂▃▂▂▃▂▁▂▁▁▁▁
Validation Metrics/accuracy,▂▁▂██
Validation Metrics/loss,▄▂▅▄▃▂▃▂▂▄▃▂▄▄▃▄▂▂▂▃▂▃▃▃▃▁▃▃▃▂▂▂▁▄▂█▃▁▄▁
batch_size,▁
epoch,▁▁▁▁▁▁▁▁▃▃▃▃▃▃▃▃▅▅▅▅▅▅▅▅▆▆▆▆▆▆▆▆████████
learning_rate,▁
trial_number,▁
value,▁

0,1
Train Metrics/loss,0.03891
Validation Metrics/accuracy,0.974
Validation Metrics/loss,0.01173
batch_size,256.0
epoch,4.0
learning_rate,0.04556
trial_number,3.0
value,0.00079


  0%|          | 0/5 [00:00<?, ?it/s]

INFO:CNN-Logger:Epoch: 0, Loss: 0.41071754693984985
INFO:CNN-Logger:Epoch: 0, Loss: 0.21347413957118988
INFO:CNN-Logger:Epoch: 0, Loss: 0.06579791009426117
INFO:CNN-Logger:Epoch 0, Accuracy of the model on the 2000 validation images: 95.5%
INFO:CNN-Logger:Epoch: 1, Loss: 0.09683655202388763
INFO:CNN-Logger:Epoch: 1, Loss: 0.22325533628463745
INFO:CNN-Logger:Epoch: 1, Loss: 0.0488869808614254
INFO:CNN-Logger:Epoch 1, Accuracy of the model on the 2000 validation images: 96.45%
INFO:CNN-Logger:Epoch: 2, Loss: 0.06366995722055435
INFO:CNN-Logger:Epoch: 2, Loss: 0.039332177489995956
INFO:CNN-Logger:Epoch: 2, Loss: 0.03154755383729935
INFO:CNN-Logger:Epoch 2, Accuracy of the model on the 2000 validation images: 97.25%
INFO:CNN-Logger:Epoch: 3, Loss: 0.06239454075694084
INFO:CNN-Logger:Epoch: 3, Loss: 0.022603249177336693
INFO:CNN-Logger:Epoch: 3, Loss: 0.04329656809568405
INFO:CNN-Logger:Epoch 3, Accuracy of the model on the 2000 validation images: 97.45%
INFO:CNN-Logger:Epoch: 4, Loss: 0.01

0,1
Train Metrics/loss,█▅▂▃▅▂▂▂▁▂▁▂▁▁▁
Validation Metrics/accuracy,▁▄▇█▇
Validation Metrics/loss,▄▄▄▆▄▇▄▅▃▂▃▃▄▁▅▅▅▃▁▂▁▃▂▄▃▅█▁▂▄▅▇▄▃▃▂▃▅▂▄
batch_size,▁
epoch,▁▁▁▁▁▁▁▁▃▃▃▃▃▃▃▃▅▅▅▅▅▅▅▅▆▆▆▆▆▆▆▆████████
learning_rate,▁
trial_number,▁
value,▁

0,1
Train Metrics/loss,0.01964
Validation Metrics/accuracy,0.9715
Validation Metrics/loss,0.09723
batch_size,128.0
epoch,4.0
learning_rate,0.01569
trial_number,4.0
value,0.00072


  0%|          | 0/5 [00:00<?, ?it/s]

INFO:CNN-Logger:Epoch: 0, Loss: 0.29345884919166565
INFO:CNN-Logger:Epoch: 0, Loss: 0.15035413205623627
INFO:CNN-Logger:Epoch: 0, Loss: 0.08890308439731598
INFO:CNN-Logger:Epoch 0, Accuracy of the model on the 2000 validation images: 95.9%
INFO:CNN-Logger:Epoch: 1, Loss: 0.20670713484287262
INFO:CNN-Logger:Epoch: 1, Loss: 0.1508227437734604
INFO:CNN-Logger:Epoch: 1, Loss: 0.13893410563468933
INFO:CNN-Logger:Epoch 1, Accuracy of the model on the 2000 validation images: 97.3%
INFO:CNN-Logger:Epoch: 2, Loss: 0.062476929277181625
INFO:CNN-Logger:Epoch: 2, Loss: 0.029634464532136917
INFO:CNN-Logger:Epoch: 2, Loss: 0.06384575366973877
INFO:CNN-Logger:Epoch 2, Accuracy of the model on the 2000 validation images: 97.2%
INFO:CNN-Logger:Epoch: 3, Loss: 0.03379198908805847
INFO:CNN-Logger:Epoch: 3, Loss: 0.046857286244630814
INFO:CNN-Logger:Epoch: 3, Loss: 0.031060097739100456
INFO:CNN-Logger:Epoch 3, Accuracy of the model on the 2000 validation images: 97.6%
INFO:CNN-Logger:Epoch: 4, Loss: 0.036

0,1
Train Metrics/loss,█▄▃▆▄▄▂▁▂▁▁▁▁▁▂
Validation Metrics/accuracy,▁▇▆█▄
Validation Metrics/loss,█▄▃▅▂▂▆▄▅▄▅▃▂▄▄▃▂▂▄▃▃▃▆▅▂▄▄▄▃▃▄▃▁▄▅▅▃▆▅▃
batch_size,▁
epoch,▁▁▁▁▁▁▁▁▃▃▃▃▃▃▃▃▅▅▅▅▅▅▅▅▆▆▆▆▆▆▆▆████████
learning_rate,▁
trial_number,▁
value,▁

0,1
Train Metrics/loss,0.06619
Validation Metrics/accuracy,0.9655
Validation Metrics/loss,0.09086
batch_size,256.0
epoch,4.0
learning_rate,0.04191
trial_number,5.0
value,0.00088


  0%|          | 0/5 [00:00<?, ?it/s]

INFO:CNN-Logger:Epoch: 0, Loss: 0.2686982750892639
INFO:CNN-Logger:Epoch: 0, Loss: 0.09075188636779785
INFO:CNN-Logger:Epoch: 0, Loss: 0.16386130452156067
INFO:CNN-Logger:Epoch 0, Accuracy of the model on the 2000 validation images: 96.65%
INFO:CNN-Logger:Epoch: 1, Loss: 0.08306562900543213
INFO:CNN-Logger:Epoch: 1, Loss: 0.06270737200975418
INFO:CNN-Logger:Epoch: 1, Loss: 0.04240083321928978
INFO:CNN-Logger:Epoch 1, Accuracy of the model on the 2000 validation images: 96.4%
INFO:CNN-Logger:Epoch: 2, Loss: 0.05335497483611107
INFO:CNN-Logger:Epoch: 2, Loss: 0.07852634787559509
INFO:CNN-Logger:Epoch: 2, Loss: 0.0659700557589531
INFO:CNN-Logger:Epoch 2, Accuracy of the model on the 2000 validation images: 96.95%
INFO:CNN-Logger:Epoch: 3, Loss: 0.010773732326924801
INFO:CNN-Logger:Epoch: 3, Loss: 0.00247353152371943
INFO:CNN-Logger:Epoch: 3, Loss: 0.06678922474384308
INFO:CNN-Logger:Epoch 3, Accuracy of the model on the 2000 validation images: 97.7%
INFO:CNN-Logger:Epoch: 4, Loss: 0.00498

0,1
Train Metrics/loss,█▃▅▃▃▂▂▃▃▁▁▃▁▂▁
Validation Metrics/accuracy,▂▁▄█▅
Validation Metrics/loss,▅▄▄▅▃▃█▆▂▇▅▃▄▇▄█▂▄▅▄▂█▁▅▁▃▄▁▃▃▂▂▅▃▁▆▃▁▂▂
batch_size,▁
epoch,▁▁▁▁▁▁▁▁▃▃▃▃▃▃▃▃▅▅▅▅▅▅▅▅▆▆▆▆▆▆▆▆████████
learning_rate,▁
trial_number,▁
value,▁

0,1
Train Metrics/loss,0.0152
Validation Metrics/accuracy,0.971
Validation Metrics/loss,0.05417
batch_size,128.0
epoch,4.0
learning_rate,0.06827
trial_number,6.0
value,0.00085


  0%|          | 0/5 [00:00<?, ?it/s]

INFO:CNN-Logger:Epoch: 0, Loss: 0.44174179434776306
INFO:CNN-Logger:Epoch: 0, Loss: 0.23112303018569946
INFO:CNN-Logger:Epoch: 0, Loss: 0.19885970652103424
INFO:CNN-Logger:Epoch 0, Accuracy of the model on the 2000 validation images: 94.3%
INFO:CNN-Logger:Epoch: 1, Loss: 0.0911320149898529
INFO:CNN-Logger:Epoch: 1, Loss: 0.04662971571087837
INFO:CNN-Logger:Epoch: 1, Loss: 0.07121451944112778
INFO:CNN-Logger:Epoch 1, Accuracy of the model on the 2000 validation images: 95.65%
INFO:CNN-Logger:Epoch: 2, Loss: 0.05859382450580597
INFO:CNN-Logger:Epoch: 2, Loss: 0.09998374432325363
INFO:CNN-Logger:Epoch: 2, Loss: 0.07784135639667511
INFO:CNN-Logger:Epoch 2, Accuracy of the model on the 2000 validation images: 95.9%
INFO:CNN-Logger:Epoch: 3, Loss: 0.11165395379066467
INFO:CNN-Logger:Epoch: 3, Loss: 0.04304277524352074
INFO:CNN-Logger:Epoch: 3, Loss: 0.016078347340226173
INFO:CNN-Logger:Epoch 3, Accuracy of the model on the 2000 validation images: 96.05%
INFO:CNN-Logger:Epoch: 4, Loss: 0.0334

0,1
Train Metrics/loss,█▅▄▂▂▂▂▂▂▃▁▁▁▂▂
Validation Metrics/accuracy,▁▅▆▆█
Validation Metrics/loss,▃▆▆▂▄▂▃█▃▄▃▃▂▂▂▂▁▃▁▂▃▂▂▃▄▆▂▄▁▃▃▃▅▄▁▁▁▄▃▂
batch_size,▁
epoch,▁▁▁▁▁▁▁▁▃▃▃▃▃▃▃▃▅▅▅▅▅▅▅▅▆▆▆▆▆▆▆▆████████
learning_rate,▁
trial_number,▁
value,▁

0,1
Train Metrics/loss,0.07842
Validation Metrics/accuracy,0.9675
Validation Metrics/loss,0.0839
batch_size,128.0
epoch,4.0
learning_rate,0.0285
trial_number,7.0
value,0.00098


  0%|          | 0/5 [00:00<?, ?it/s]

INFO:CNN-Logger:Epoch: 0, Loss: 0.3242708444595337
INFO:CNN-Logger:Epoch: 0, Loss: 0.2208414077758789
INFO:CNN-Logger:Epoch: 0, Loss: 0.14153040945529938
INFO:CNN-Logger:Epoch 0, Accuracy of the model on the 2000 validation images: 95.85%
INFO:CNN-Logger:Epoch: 1, Loss: 0.2474503368139267
INFO:CNN-Logger:Epoch: 1, Loss: 0.15772059559822083
INFO:CNN-Logger:Epoch: 1, Loss: 0.11678603291511536
INFO:CNN-Logger:Epoch 1, Accuracy of the model on the 2000 validation images: 96.55%
INFO:CNN-Logger:Epoch: 2, Loss: 0.07085666060447693
INFO:CNN-Logger:Epoch: 2, Loss: 0.016292011365294456
INFO:CNN-Logger:Epoch: 2, Loss: 0.04826488718390465
INFO:CNN-Logger:Epoch 2, Accuracy of the model on the 2000 validation images: 96.5%
INFO:CNN-Logger:Epoch: 3, Loss: 0.020137270912528038
INFO:CNN-Logger:Epoch: 3, Loss: 0.09969673305749893
INFO:CNN-Logger:Epoch: 3, Loss: 0.03388528525829315
INFO:CNN-Logger:Epoch 3, Accuracy of the model on the 2000 validation images: 97.0%
INFO:CNN-Logger:Epoch: 4, Loss: 0.03153

VBox(children=(Label(value='0.355 MB of 0.355 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
Train Metrics/loss,█▆▄▆▄▃▂▁▂▁▃▁▁▂▂
Validation Metrics/accuracy,▁▄▄▆█
Validation Metrics/loss,▅▂▄▃▇▅▂▄▃▅▃▄▁▃▃▂▂▂▄▃▃▃▃▆▂▃▁▄▄▂▄▃▁▃▃▃▂▁▄█
batch_size,▁
epoch,▁▁▁▁▁▁▁▁▃▃▃▃▃▃▃▃▅▅▅▅▅▅▅▅▆▆▆▆▆▆▆▆████████
learning_rate,▁
trial_number,▁
value,▁

0,1
Train Metrics/loss,0.04008
Validation Metrics/accuracy,0.976
Validation Metrics/loss,0.35578
batch_size,128.0
epoch,4.0
learning_rate,0.0136
trial_number,8.0
value,0.00067


  0%|          | 0/5 [00:00<?, ?it/s]

INFO:CNN-Logger:Epoch: 0, Loss: 0.33528223633766174
INFO:CNN-Logger:Epoch: 0, Loss: 0.1397526115179062
INFO:CNN-Logger:Epoch: 0, Loss: 0.07320921123027802
INFO:CNN-Logger:Epoch 0, Accuracy of the model on the 2000 validation images: 94.8%
INFO:CNN-Logger:Epoch: 1, Loss: 0.1139756292104721
INFO:CNN-Logger:Epoch: 1, Loss: 0.03201408311724663
INFO:CNN-Logger:Epoch: 1, Loss: 0.06809043884277344
INFO:CNN-Logger:Epoch 1, Accuracy of the model on the 2000 validation images: 95.9%
INFO:CNN-Logger:Epoch: 2, Loss: 0.027186453342437744
INFO:CNN-Logger:Epoch: 2, Loss: 0.09580682963132858
INFO:CNN-Logger:Epoch: 2, Loss: 0.020073655992746353
INFO:CNN-Logger:Epoch 2, Accuracy of the model on the 2000 validation images: 96.8%
INFO:CNN-Logger:Epoch: 3, Loss: 0.033711690455675125
INFO:CNN-Logger:Epoch: 3, Loss: 0.03615190088748932
INFO:CNN-Logger:Epoch: 3, Loss: 0.05919386073946953
INFO:CNN-Logger:Epoch 3, Accuracy of the model on the 2000 validation images: 97.2%
INFO:CNN-Logger:Epoch: 4, Loss: 0.02731

0,1
Train Metrics/loss,█▄▂▃▁▂▁▃▁▁▂▂▁▁▂
Validation Metrics/accuracy,▁▄▇██
Validation Metrics/loss,▃▄▄▃▄▄█▂▄▂▂▃▂▂▃▅▂▂▂▃▃▃▃▇▅▂▁▃▁▂▂▃▂▂▃▂▂▃▁▃
batch_size,▁
epoch,▁▁▁▁▁▁▁▁▃▃▃▃▃▃▃▃▅▅▅▅▅▅▅▅▆▆▆▆▆▆▆▆████████
learning_rate,▁
trial_number,▁
value,▁

0,1
Train Metrics/loss,0.04821
Validation Metrics/accuracy,0.9725
Validation Metrics/loss,0.09453
batch_size,256.0
epoch,4.0
learning_rate,0.01338
trial_number,9.0
value,0.00077


## Use API to interact with W&B

In [None]:
import pandas as pd
import wandb
api = wandb.Api()
sweep = api.sweep(f"{entity}/{project_name}/{sweep_id}")
temp_data = []
for r in sweep.runs:
 temp_dict = dict(**dict(r.summary), **r.config)
 temp_dict["run_id"] = r.id
 temp_dict["run_name"] = r.name
 temp_data.append( temp_dict)
df = pd.DataFrame(temp_data)
df.set_index("run_id", inplace = True)
best_run_id = sweep.best_run().id
best_run = api.run(f"{entity}/{project_name}/{best_run_id}")
df.loc[best_run_id]

[34m[1mwandb[0m: Sorting runs by +summary_metrics.Validation Metrics/loss


_timestamp                     1663081318.589144
Train Metrics/loss                      0.254352
Validation Metrics/loss                 0.002262
Validation Metrics/accuracy                0.979
_step                                         99
epoch                                          4
_wandb                           {'runtime': 41}
_runtime                               42.174489
epochs                                         5
classes                                       10
dataset                                    MNIST
kernels                                 [16, 32]
batch_size                                   128
architecture                                 CNN
learning_rate                              0.005
run_name                         valiant-sweep-4
Name: 5o6kh8ef, dtype: object

In [None]:
import os
os.kill(os.getpid(), 9)

In [None]:
project_name = "demos" #@param {type: "string"}
entity = "tim-w" #@param {type: "string"}


In [None]:
import pandas as pd
import wandb
api = wandb.Api()
sweep = api.sweep(f"{entity}/{project_name}/{sweep_id}")
temp_data = []
for r in sweep.runs:
 temp_dict = dict(**dict(r.summary), **r.config)
 temp_dict["run_id"] = r.id
 temp_dict["run_name"] = r.name
 temp_data.append( temp_dict)
df = pd.DataFrame(temp_data)
df.set_index("run_id", inplace = True)
best_run_id = sweep.best_run().id
best_run = api.run(f"{entity}/{project_name}/{best_run_id}")
df.loc[best_run_id]