<a href="https://colab.research.google.com/github/timsetsfire/wandb-examples/blob/main/colab/W%26B_Training_with_Optuna.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# PyTorch + W&B

The purpose of this lab is to instrument W&B a top of existing ML Workflows which might be leveraging 
* PyTorch
* Tensorboard (for metric tracking)
* Python `logging` (for metric tracking)

We will augment this workflow by leveraging 
* Wandb Experiments and syncing with Tensorboard
* Wandb logging
* Wandb Artifacts for dataset and model logging / versioning
* Tables to surface prediction examples on Test datasets
* track lineage of all artifacts and experiments completed

Lastly, we'll do a simple HPO leveraging Optuna and W&B integration.   And we'll finish it off with interacting with the runs via W&B API to
* query runs and run summaries
* artifacts

In [1]:
%%capture
!pip install wandb easydict optuna --upgrade

In [2]:
%%capture
!pip install tensorboard dill

## Logging In

In [3]:
#@title Enter host address
#@markdown Enter the host url which corresponds to your WB instance.
host = "https://api.wandb.ai" #@param {type: "string"}


In [4]:
import wandb
## when using wandb anywhere other than wandb.ai, you must 
## provide a proper host, so the client knows where to communcate
## details of the experiment
# wandb.login(key = key, host = host)
wandb.login(host = host)

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


True

In [5]:
import os
import random
import logging
import numpy as np
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
from tqdm.notebook import tqdm
from torch.utils.tensorboard import SummaryWriter

# Ensure deterministic behavior
torch.backends.cudnn.deterministic = True
random.seed(hash("setting random seeds") % 2**32 - 1)
np.random.seed(hash("improves reproducibility") % 2**32 - 1)
torch.manual_seed(hash("by removing stochasticity") % 2**32 - 1)
torch.cuda.manual_seed_all(hash("so runs are repeatable") % 2**32 - 1)

# Device configuration
# if you wind up with any device other than cpu, some code below will need to 
# change specific to the way we are interacting with torch tensors.  
# device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
device = torch.device("cpu")

# remove slow mirror from list of MNIST mirrors
torchvision.datasets.MNIST.mirrors = [mirror for mirror in torchvision.datasets.MNIST.mirrors
                                      if not mirror.startswith("http://yann.lecun.com")]

## Get Data (and log it)

When we get data and log it, there are obviously tons of way to complete this.  Depending on how you log data, and whether or not you log your retrieval mechanism is a matter of preference and internal guidelines you need to follow.  

In our approach, we will write a `getter` for our data.  The benefit of writing a getting, we can log this getter with our dataset as part of the artifact metadata.



Before we get started it is important to set the name space for your project.  This is going to be accomplished by passing a `project_name` as well as an `entity` to your wandb experiment

`entity` corresponds to the team to which the project will be associated with.  The `entity` could be a team name, or your user name.  

In [57]:
project_name = "demos" #@param {type: "string"}
entity = "tim-w" #@param {type: "string"}

## Logging data

W&B is very unopinionated with regard to how you track your experiments.  We could log data in any number of ways.  
* Log one artifact which represents all the data - training, validation, and test data to one artifact 
* Log several artifacts - one for each of the training, validation, and test data loaders.  

It is a matter of what best suites your needs and workflows and expectations.  

### Anatomy of an artifact 

The `Artifact` class will correspond to an entry in the W&B Artifact registry.  The artifact has 
* a name
* a type
* metadata
* description
* files, directory of files, or references

Example usage 
```
run = wandb.init(project = "my-project")
artifact = wandb.Artifact(name = "my_artifact", type = "data")
artifact.add_file("/path/to/my/file.txt")
run.log_artifact(artifact)
run.finish()
```

In [21]:
## create the data directory locally if it does not already exists
from pathlib import Path
data_path = Path("./data")
data_path.mkdir(exist_ok = True)

## define out data getter 
def get_data(slice=5, train=True):
  '''
  helper function to get data
  args: 
    slice: Int => passed to torch.utils.data.Subset indices argument
    train: Boolean => True to download training data, False for test data
  '''
  full_dataset = torchvision.datasets.MNIST(root=".",
                                            train=train, 
                                            transform=transforms.ToTensor(),
                                            download=True)
  #  equiv to slicing with [::slice] 
  sub_dataset = torch.utils.data.Subset(
    full_dataset, indices=range(0, len(full_dataset), slice))

  return sub_dataset

In [22]:
logging.basicConfig(
                format="%(levelname)s - %(asctime)s - %(message)s",
        )
logger = logging.getLogger("CNN-Logger")
logger.setLevel("INFO")

## Our First W&B Experiment / Run

We are going to 
* get our training and test data
* split the training data into training and validation
* create artifacts for all three dataset
* log those artifacts to W&B.  

In [23]:
#%%wandb -h 600 
import pickle
from dill.source import getsource
from dill import detect
from datetime import datetime 

with wandb.init(project = project_name, job_type = "data-acquisition") as run:

  train, test = get_data(train=True), get_data(train=False)
  train, validation = torch.utils.data.random_split(train, [10000, 2000])

  torch.save(train, './data/training_data.pt')
  torch.save(validation, './data/validation_data.pt')
  torch.save(test, './data/test_data.pt')

  train_artifact = wandb.Artifact(name = "mnist-training-data", type = "dataset", 
                                  description = "training data",
                                  metadata = { 
                                      "data-set": "MNIST training",
                                      "getter": getsource(detect.code(get_data))}
                                  )
  train_artifact.add_file("./data/training_data.pt")

  validation_artifact = wandb.Artifact(name = "mnist-validation-data", type = "dataset", 
                                       description = "validation data",
                                       metadata = { 
                                      "data-set": "MNIST validation",
                                      "getter": getsource(detect.code(get_data))})
  validation_artifact.add_file("./data/validation_data.pt")

  test_artifact = wandb.Artifact(name = "mnist-test-data", type = "dataset", 
                                 description = "test data",
                                 metadata = { 
                                      "data-set": "MNIST test",
                                      "getter": getsource(detect.code(get_data))})
  test_artifact.add_file("./data/test_data.pt")  
  
  run.log_artifact(train_artifact)
  run.log_artifact(validation_artifact)
  run.log_artifact(test_artifact)

VBox(children=(Label(value='0.303 MB of 0.303 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

## Artifact usage (Creating the DAG)

Part of the value of W&B is the ability to capture lineage via Experiments and Artifacts.  Next up for our work flow is to specifiy a model and commence training.  

It is key to remember that experiments create and consume artifacts and we have already completed one  experimemtns where we created dataset artifacts.  

Next up, we will commence an experiment that will consume the artifacts from the previous run for the purposes of training model, then we will create a model artifact.

## Specify the model



In [24]:
# Conventional and convolutional neural network
class ConvNet(nn.Module):
    def __init__(self, kernels, classes=10):
        super(ConvNet, self).__init__()
        
        self.layer1 = nn.Sequential(
            nn.Conv2d(1, kernels[0], kernel_size=5, stride=1, padding=2),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2))
        self.layer2 = nn.Sequential(
            nn.Conv2d(16, kernels[1], kernel_size=5, stride=1, padding=2),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2))
        self.fc = nn.Linear(7 * 7 * kernels[-1], classes)
        
    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = out.reshape(out.size(0), -1)
        out = self.fc(out)
        return out

In [25]:
def make_loader(dataset, batch_size):
    loader = torch.utils.data.DataLoader(dataset=dataset,
                                         batch_size=batch_size, 
                                         shuffle=True,
                                         pin_memory=True, num_workers=2)
    return loader

## Training

In our first model training experiment, we are going to sync our wandb experiment to tensorboard -> so no wandb specific logging will be instrumented.  


In [26]:
# %%wandb -h 600
# Run training and track with wandb, but no explicit logging.  
# since we were alredy using tensorboard via WritterSumamry, we'll 
# sync w&b to tensorboard.
config = dict(
    epochs=5,
    classes=10,
    kernels=[16, 32],
    batch_size=128,
    learning_rate=0.01,
    dataset="MNIST",
    architecture="CNN"
    )

with wandb.init(project = project_name, 
                 job_type = "training", 
                 config = config,
                 sync_tensorboard = True) as run:

  config = wandb.config
  ## or, ifyou have a nasty nested dictionary for your config
  # config = EasyDict(wandb.config)

  run.use_artifact(f"{run.entity}/{run.project}/mnist-training-data:latest")
  run.use_artifact(f"{run.entity}/{run.project}/mnist-validation-data:latest")
  ## download and instantiation of the artifacts might be necessary.  

  train_loader = make_loader(train, batch_size=config.batch_size)
  validation_loader = make_loader(validation, batch_size=config.batch_size)

  model = ConvNet(config.kernels, config.classes).to(device)
  criterion = nn.CrossEntropyLoss()
  optimizer = torch.optim.Adam(model.parameters(), lr=config.learning_rate)

  writer = SummaryWriter(log_dir = "./wandb/latest-run")
  total_batches = len(train_loader) * config.epochs
  example_ct = 0  # number of examples seen
  batch_ct = 0
  for epoch in tqdm(range(config.epochs)):
    for step, (images, labels) in enumerate(train_loader):
      images, labels = images.to(device), labels.to(device)
      # Forward pass ➡
      outputs = model(images)
      loss = criterion(outputs, labels)
      # Backward pass ⬅
      optimizer.zero_grad()
      loss.backward()
      # Step with optimizer
      optimizer.step()
      example_ct +=  len(images)
      batch_ct += 1
      # Report metrics every 25th batch
      if ((batch_ct + 1) % 25) == 0:
        writer.add_scalar("Train Metrics/loss", loss, batch_ct)
        writer.add_scalar("epoch", loss, batch_ct)
        logger.info(f"Epoch: {epoch}, Loss: {loss.detach().numpy()}")
    with torch.no_grad():
      correct, total = 0, 0
      for images, labels in validation_loader:
          images, labels = images.to(device), labels.to(device)
          outputs = model(images)
          _, predicted = torch.max(outputs.data, 1)
          total += labels.size(0)
          correct += (predicted == labels).sum().item()
          loss = criterion(outputs, labels)
          writer.add_scalar("Validation Metrics/loss", loss, batch_ct)
          writer.add_scalar("epoch", epoch, batch_ct)
      logger.info(f"Epoch {epoch}, Accuracy of the model on the {total} test images: {100 * correct / total}%")
      writer.add_scalar("Validation Metrics/accuracy", correct/total, batch_ct)
      writer.add_scalar("epoch", epoch, batch_ct)

  torch.save(model.state_dict(), "model.pt")
  model_artifact = wandb.Artifact(name = "mnist-model", type = "model")
  model_artifact.add_file("model.pt")
  run.log_artifact(model_artifact)


  0%|          | 0/5 [00:00<?, ?it/s]

INFO:CNN-Logger:Epoch: 0, Loss: 0.6152558922767639
INFO:CNN-Logger:Epoch: 0, Loss: 0.37864214181900024
INFO:CNN-Logger:Epoch: 0, Loss: 0.33058860898017883
INFO:CNN-Logger:Epoch 0, Accuracy of the model on the 2000 test images: 93.25%
INFO:CNN-Logger:Epoch: 1, Loss: 0.18225780129432678
INFO:CNN-Logger:Epoch: 1, Loss: 0.14611569046974182
INFO:CNN-Logger:Epoch: 1, Loss: 0.04811842739582062
INFO:CNN-Logger:Epoch 1, Accuracy of the model on the 2000 test images: 94.85%
INFO:CNN-Logger:Epoch: 2, Loss: 0.15746846795082092
INFO:CNN-Logger:Epoch: 2, Loss: 0.030076010152697563
INFO:CNN-Logger:Epoch: 2, Loss: 0.12193747609853745
INFO:CNN-Logger:Epoch 2, Accuracy of the model on the 2000 test images: 95.75%
INFO:CNN-Logger:Epoch: 3, Loss: 0.06308609992265701
INFO:CNN-Logger:Epoch: 3, Loss: 0.10632984340190887
INFO:CNN-Logger:Epoch: 3, Loss: 0.10202132165431976
INFO:CNN-Logger:Epoch 3, Accuracy of the model on the 2000 test images: 96.0%
INFO:CNN-Logger:Epoch: 4, Loss: 0.049882564693689346
INFO:CNN

VBox(children=(Label(value='0.210 MB of 0.210 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
Train Metrics/loss,█▅▅▃▂▁▃▁▂▁▂▂
Validation Metrics/accuracy,▁▅█
Validation Metrics/loss,█▃▅▁
epoch,▂▂▂▁▁▁▁▃▁▁▁▆▁▁▁█
global_step,▁▂▂▂▃▃▄▄▅▅▆▆▆▇██

0,1
Train Metrics/loss,0.10202
Validation Metrics/accuracy,0.9575
Validation Metrics/loss,0.08218
epoch,3.0
global_step,316.0


## Test Data Evaluation

In [27]:
import pandas as pd
with wandb.init(project = project_name, entity = entity, job_type = "evaluation") as run:
  model_artifact = run.use_artifact(model_artifact.wait())
  ## instantiate the model if necessary
  # model_dir = model_artifact.download()
  # model = ConvNet(config.kernels, config.classes)
  # model.load_state_dict(torch.load(f"{model_dir}/model.pt"))
  run.use_artifact(f"{run.entity}/{run.project}/mnist-test-data:latest")
  ## same goes for the dataset
  test_loader = make_loader(test, batch_size=config.batch_size)

  model.eval()
  # Run the model on some test examples

  with torch.no_grad():
      correct, total = 0, 0
      total_loss = 0
      all_data = []
      for images, labels in test_loader:
          images, labels = images.to(device), labels.to(device)
          outputs = model(images)
          _, predicted = torch.max(outputs.data, 1)
          total += labels.size(0)
          correct += (predicted == labels).sum().item()
          loss = criterion(outputs, labels)*labels.size(0)
          total_loss += loss
          wandb_images = []
          for image in images.numpy():
            temp = wandb.Image(image)
            wandb_images.append(temp) 
          scores = pd.DataFrame( outputs.numpy().tolist(), columns = [f"p{i}" for i in range(outputs.shape[1])]).to_dict(orient = "series")
          data = {"images":wandb_images, "predicted": predicted.numpy().tolist(), "labels": labels.numpy().tolist()}
          data = {**data, **scores}
          all_data.append(pd.DataFrame(data))
      import pandas as pd 
      df = pd.concat(all_data)
      wandb.log({"Predictions vs Actuals": wandb.Table(dataframe = df)})
      run.log({"Test Metrics/loss": total_loss / total, "Test Metrics/accuracy": correct / total})
      logger.info(f"Accuracy of the model on the {total} " +
            f"test images: {100 * correct / total}%")
          

INFO:CNN-Logger:Accuracy of the model on the 2000 test images: 96.8%


0,1
Test Metrics/accuracy,▁
Test Metrics/loss,▁

0,1
Test Metrics/accuracy,0.968
Test Metrics/loss,0.10459


## HPO with Optuna

In [35]:
import optuna
from optuna.integration.wandb import WeightsAndBiasesCallback

In [54]:
wandb_kwargs = {"project": "my-optuna-project-v6"}
wandbc = WeightsAndBiasesCallback(wandb_kwargs=wandb_kwargs, as_multirun=True)

  


## Training Function

In [55]:
@wandbc.track_in_wandb()
def train_func(trial): 

    learning_rate = trial.suggest_float("learning_rate", 0.001, 0.1)
    batch_size = trial.suggest_categorical(name = "batch_size", choices = [128, 256])
    
    config_standard = dict(
      epochs=5,
      classes=10,
      kernels=[16, 32],
      dataset="MNIST",
      architecture="CNN",
    )
    wandb.config.update(config_standard)
  
    wandb.use_artifact(f"{entity}/{project_name}/mnist-training-data:latest")
    wandb.use_artifact(f"{entity}/{project_name}/mnist-validation-data:latest")

    train_loader = make_loader(train, batch_size=config.batch_size)
    validation_loader = make_loader(validation, batch_size=config.batch_size)

    model = ConvNet(config.kernels, config.classes).to(device)
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=config.learning_rate)

    total_batches = len(train_loader) * config.epochs
    example_ct = 0  # number of examples seen
    batch_ct = 0
    for epoch in tqdm(range(config.epochs)):
      for _, (images, labels) in enumerate(train_loader):
        images, labels = images.to(device), labels.to(device)
        # Forward pass ➡
        outputs = model(images)
        loss = criterion(outputs, labels)
        # Backward pass ⬅
        optimizer.zero_grad()
        loss.backward()
        # Step with optimizer
        optimizer.step()
        example_ct +=  len(images)
        batch_ct += 1
        # Report metrics every 25th batch
        if ((batch_ct + 1) % 25) == 0:
          logger.info(f"Epoch: {epoch}, Loss: {loss.detach().numpy()}")
          wandb.log({ "Train Metrics/loss": loss, "epoch": epoch})
      with torch.no_grad():
        correct, total = 0, 0
        for images, labels in validation_loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
            loss = criterion(outputs, labels)
            wandb.log({"Validation Metrics/loss": loss, "epoch": epoch})
        logger.info(f"Epoch {epoch}, Accuracy of the model on the {total} validation images: {100 * correct / total}%")
        wandb.log({"Validation Metrics/accuracy": correct / total})

    torch.save(model.state_dict(), f"{wandb.run.id}-model.pt")
    model_artifact = wandb.Artifact(name = f"{wandb.run.id}-mnist-model", type = "model")
    model_artifact.add_file(f"{wandb.run.id}-model.pt")
    wandb.log_artifact(model_artifact)
    return loss


  """Entry point for launching an IPython kernel.


In [58]:
study = optuna.create_study()
study.optimize(train_func, n_trials=3, callbacks=[wandbc])

[32m[I 2022-10-24 17:48:44,401][0m A new study created in memory with name: no-name-860cd23f-b1a6-4519-b06b-679b77cde112[0m


  0%|          | 0/5 [00:00<?, ?it/s]

INFO:CNN-Logger:Epoch: 0, Loss: 0.6680903434753418
INFO:CNN-Logger:Epoch: 0, Loss: 0.17628821730613708
INFO:CNN-Logger:Epoch: 0, Loss: 0.08908680826425552
INFO:CNN-Logger:Epoch 0, Accuracy of the model on the 2000 validation images: 95.25%
INFO:CNN-Logger:Epoch: 1, Loss: 0.11890428513288498
INFO:CNN-Logger:Epoch: 1, Loss: 0.20492930710315704
INFO:CNN-Logger:Epoch: 1, Loss: 0.06894282251596451
INFO:CNN-Logger:Epoch 1, Accuracy of the model on the 2000 validation images: 96.7%
INFO:CNN-Logger:Epoch: 2, Loss: 0.023337792605161667
INFO:CNN-Logger:Epoch: 2, Loss: 0.04297465458512306
INFO:CNN-Logger:Epoch: 2, Loss: 0.10000427812337875
INFO:CNN-Logger:Epoch 2, Accuracy of the model on the 2000 validation images: 96.95%
INFO:CNN-Logger:Epoch: 3, Loss: 0.03571633994579315
INFO:CNN-Logger:Epoch: 3, Loss: 0.12120484560728073
INFO:CNN-Logger:Epoch: 3, Loss: 0.03460875153541565
INFO:CNN-Logger:Epoch 3, Accuracy of the model on the 2000 validation images: 96.85%
INFO:CNN-Logger:Epoch: 4, Loss: 0.013

0,1
Train Metrics/loss,█▃▂▂▃▂▁▁▂▁▂▁▁▂▁
Validation Metrics/accuracy,▁▅▆▆█
Validation Metrics/loss,▄▃▅▅▄▆▄▆▁▂▄▅▆▂▂▄▃▂▁▄▃▅▃██▂▂▅▁▄▂▂▇▄▁▄▃▃▆▂
batch_size,▁
epoch,▁▁▁▁▁▁▁▁▃▃▃▃▃▃▃▃▅▅▅▅▅▅▅▅▆▆▆▆▆▆▆▆████████
learning_rate,▁
trial_number,▁
value,▁

0,1
Train Metrics/loss,0.02053
Validation Metrics/accuracy,0.9755
Validation Metrics/loss,0.06962
batch_size,128.0
epoch,4.0
learning_rate,0.04509
trial_number,0.0
value,0.06962


  0%|          | 0/5 [00:00<?, ?it/s]

INFO:CNN-Logger:Epoch: 0, Loss: 0.38173356652259827
INFO:CNN-Logger:Epoch: 0, Loss: 0.2183065265417099
INFO:CNN-Logger:Epoch: 0, Loss: 0.16459240019321442
INFO:CNN-Logger:Epoch 0, Accuracy of the model on the 2000 validation images: 95.4%
INFO:CNN-Logger:Epoch: 1, Loss: 0.1371435821056366
INFO:CNN-Logger:Epoch: 1, Loss: 0.11178731173276901
INFO:CNN-Logger:Epoch: 1, Loss: 0.06877362728118896
INFO:CNN-Logger:Epoch 1, Accuracy of the model on the 2000 validation images: 96.1%
INFO:CNN-Logger:Epoch: 2, Loss: 0.08115685731172562
INFO:CNN-Logger:Epoch: 2, Loss: 0.023021064698696136
INFO:CNN-Logger:Epoch: 2, Loss: 0.06586712598800659
INFO:CNN-Logger:Epoch 2, Accuracy of the model on the 2000 validation images: 96.65%
INFO:CNN-Logger:Epoch: 3, Loss: 0.06091821938753128
INFO:CNN-Logger:Epoch: 3, Loss: 0.04546160250902176
INFO:CNN-Logger:Epoch: 3, Loss: 0.06639283150434494
INFO:CNN-Logger:Epoch 3, Accuracy of the model on the 2000 validation images: 95.8%
INFO:CNN-Logger:Epoch: 4, Loss: 0.017961

0,1
Train Metrics/loss,█▅▄▃▃▂▂▁▂▂▂▂▁▂▂
Validation Metrics/accuracy,▁▄▇▃█
Validation Metrics/loss,▆▄▄▃▄▂▃▆▄▄▄▃▄▅▄▃▂▃▁▂▃▃█▆▄▃▆▁▃▆█▄▂▆▃▄▅▅▃▇
batch_size,▁
epoch,▁▁▁▁▁▁▁▁▃▃▃▃▃▃▃▃▅▅▅▅▅▅▅▅▆▆▆▆▆▆▆▆████████
learning_rate,▁
trial_number,▁
value,▁

0,1
Train Metrics/loss,0.05044
Validation Metrics/accuracy,0.9685
Validation Metrics/loss,0.22864
batch_size,128.0
epoch,4.0
learning_rate,0.06795
trial_number,1.0
value,0.22864


  0%|          | 0/5 [00:00<?, ?it/s]

INFO:CNN-Logger:Epoch: 0, Loss: 0.3770720362663269
INFO:CNN-Logger:Epoch: 0, Loss: 0.13634097576141357
INFO:CNN-Logger:Epoch: 0, Loss: 0.04603542760014534
INFO:CNN-Logger:Epoch 0, Accuracy of the model on the 2000 validation images: 96.35%
INFO:CNN-Logger:Epoch: 1, Loss: 0.1509110927581787
INFO:CNN-Logger:Epoch: 1, Loss: 0.12618388235569
INFO:CNN-Logger:Epoch: 1, Loss: 0.16642513871192932
INFO:CNN-Logger:Epoch 1, Accuracy of the model on the 2000 validation images: 96.5%
INFO:CNN-Logger:Epoch: 2, Loss: 0.03947776183485985
INFO:CNN-Logger:Epoch: 2, Loss: 0.014395236037671566
INFO:CNN-Logger:Epoch: 2, Loss: 0.10786885768175125
INFO:CNN-Logger:Epoch 2, Accuracy of the model on the 2000 validation images: 96.15%
INFO:CNN-Logger:Epoch: 3, Loss: 0.0834028422832489
INFO:CNN-Logger:Epoch: 3, Loss: 0.037476349622011185
INFO:CNN-Logger:Epoch: 3, Loss: 0.017707500606775284
INFO:CNN-Logger:Epoch 3, Accuracy of the model on the 2000 validation images: 97.2%
INFO:CNN-Logger:Epoch: 4, Loss: 0.0114992

0,1
Train Metrics/loss,█▃▂▄▃▄▂▁▃▂▁▁▁▂▃
Validation Metrics/accuracy,▂▃▁▆█
Validation Metrics/loss,█▃▃▂▇▃▄▇▄▂▂▁▃▃▄▂▅▂▂▄▅▇▅▅▁▂▅▃▄▃▅▄▂▇▃▁▂▂▂▃
batch_size,▁
epoch,▁▁▁▁▁▁▁▁▃▃▃▃▃▃▃▃▅▅▅▅▅▅▅▅▆▆▆▆▆▆▆▆████████
learning_rate,▁
trial_number,▁
value,▁

0,1
Train Metrics/loss,0.11366
Validation Metrics/accuracy,0.977
Validation Metrics/loss,0.0919
batch_size,256.0
epoch,4.0
learning_rate,0.0982
trial_number,2.0
value,0.0919


## Use API to interact with W&B

In [60]:
import pandas as pd
import wandb
api = wandb.Api()
runs = api.runs(f"{entity}/{wandb_kwargs['project']}")
temp_data = []
for r in runs:
  drop_keys = set(r.summary.keys()).intersection( r.config.keys())
  for d in drop_keys:
    del r.config[d]
  temp_dict = dict(**dict(r.summary), **r.config)
  temp_dict["run_id"] = r.id
  temp_dict["run_name"] = r.name
  temp_data.append(temp_dict)
df = pd.DataFrame(temp_data)
df.set_index("run_id", inplace = True)
df.sort_values("value")

Unnamed: 0_level_0,_step,_wandb,_timestamp,Train Metrics/loss,Validation Metrics/loss,Validation Metrics/accuracy,epoch,value,_runtime,batch_size,trial_number,learning_rate,epochs,classes,dataset,kernels,direction,architecture,run_name
run_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
vu0oscom,100,{'runtime': 39},1666634000.0,0.020526,0.069625,0.9755,4,0.069625,102.153047,128,0,0.045086,5,10,MNIST,"[16, 32]",[MINIMIZE],CNN,trial/0/fiery-river-1
4la8vri0,100,{'runtime': 38},1666634000.0,0.113664,0.091902,0.977,4,0.091902,39.258188,256,2,0.098197,5,10,MNIST,"[16, 32]",[MINIMIZE],CNN,trial/2/fluent-hill-3
subktc6y,100,{'runtime': 37},1666634000.0,0.050436,0.228639,0.9685,4,0.228639,38.056626,128,1,0.067952,5,10,MNIST,"[16, 32]",[MINIMIZE],CNN,trial/1/golden-lion-2


In [61]:
study.best_trial

FrozenTrial(number=0, values=[0.0696248859167099], datetime_start=datetime.datetime(2022, 10, 24, 17, 48, 44, 410084), datetime_complete=datetime.datetime(2022, 10, 24, 17, 49, 21, 650878), params={'learning_rate': 0.045086145844261565, 'batch_size': 128}, distributions={'learning_rate': FloatDistribution(high=0.1, log=False, low=0.001, step=None), 'batch_size': CategoricalDistribution(choices=(128, 256))}, user_attrs={}, system_attrs={}, intermediate_values={}, trial_id=0, state=TrialState.COMPLETE, value=None)