<a href="https://colab.research.google.com/github/wandb/examples/blob/master/colabs/tensorflow/Hyperparameter_Optimization_in_TensorFlow_using_W&B_Sweeps.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a><!--- @wandbcode{tf-sweeps} -->

<img src="https://i.imgur.com/gb6B4ig.png" width="400" alt="Weights & Biases" />

<!--- @wandbcode{tf-sweeps} -->

# 🧹 Weights & Biases Sweep + ‍🌊 TensorFlow 2.x
Use Weights & Biases for machine learning experiment tracking, dataset versioning, and project collaboration.



<img src="http://wandb.me/mini-diagram" width="650" alt="Weights & Biases" />

Use Weights & Biases Sweeps to automate hyperparameter optimization and explore the space of possible models, complete with interactive dashboards like this:

![](https://i.imgur.com/AN0qnpC.png)


## 🤔 Why Should I Use Sweeps?

* **Quick setup**: With just a few lines of code you can run W&B sweeps.
* **Transparent**: We cite all the algorithms we're using, and [our code is open source](https://github.com/wandb/client/tree/master/wandb/sweeps).
* **Powerful**: Our sweeps are completely customizable and configurable. You can launch a sweep across dozens of machines, and it's just as easy as starting a sweep on your laptop.

**[Check out the official documentation $\rightarrow$](https://docs.wandb.com/sweeps)**


## What this notebook covers



* Simple steps to get started with W&B Sweep with custom training loop in TensorFlow.
* We will find best hyperparameters for our image classification task.

**Note**: Sections starting with _Step_ are all you need to perform hyperparameter sweep in existing code.
The rest of the code is there to set up a simple example.





# 🚀 Install, Import, and Log in

### Step 0️⃣: Install W&B

In [1]:
%%capture
!pip install wandb

In [2]:
!pip install gdown

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [3]:
import gdown
## While creating link, select anyone with the link, and additionally, uc?id= should be added like in below.
url = 'https://drive.google.com/uc?id=1Wfw19aYs6Gle-jlAi41Gk3pWjlzBN7-W'
output = 'all-data.mat'
gdown.download(url, output, quiet=False)

Downloading...
From: https://drive.google.com/uc?id=1Wfw19aYs6Gle-jlAi41Gk3pWjlzBN7-W
To: /content/all-data.mat
100%|██████████| 116M/116M [00:02<00:00, 44.8MB/s]


'all-data.mat'

In [4]:
from pandas import read_csv
from numpy import set_printoptions
from sklearn import datasets, linear_model
from sklearn.preprocessing import RobustScaler
from sklearn.model_selection import train_test_split
import random
import tensorflow as tf
import numpy as np
import pathlib

# Using GPU
import os
import scipy.io as scpy
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = '0'  # Set to -1 if CPU should be used CPU = -1 , GPU = 0

gpus = tf.config.experimental.list_physical_devices('GPU')
cpus = tf.config.experimental.list_physical_devices('CPU')

if gpus:
    try:
        # Currently, memory growth needs to be the same across GPUs
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
        logical_gpus = tf.config.experimental.list_logical_devices('GPU')
        print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
    except RuntimeError as e:
        # Memory growth must be set before GPUs have been initialized
        print(e)
elif cpus:
    try:
        # Currently, memory growth needs to be the same across GPUs
        logical_cpus= tf.config.experimental.list_logical_devices('CPU')
        print(len(cpus), "Physical CPU,", len(logical_cpus), "Logical CPU")
    except RuntimeError as e:
        # Memory growth must be set before GPUs have been initialized
        print(e)


1 Physical GPUs, 1 Logical GPUs


### Step 1️⃣: Import W&B and Login

In [21]:
import tqdm
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.datasets import cifar10

import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

%matplotlib inline

Error in callback <function _WandbInit._resume_backend at 0x7f9dea168560> (for pre_run_cell):


BrokenPipeError: ignored

Error in callback <function _WandbInit._pause_backend at 0x7f9dea08db90> (for post_run_cell):


BrokenPipeError: ignored

In [22]:
import wandb
from wandb.keras import WandbCallback

wandb.login()

Error in callback <function _WandbInit._resume_backend at 0x7f9dea168560> (for pre_run_cell):


BrokenPipeError: ignored

True

Error in callback <function _WandbInit._pause_backend at 0x7f9dea08db90> (for post_run_cell):


BrokenPipeError: ignored

> Side note: If this is your first time using W&B or you are not logged in, the link that appears after running `wandb.login()` will take you to sign-up/login page. Signing up is as easy as a few clicks.

# 👩‍🍳 Prepare Dataset

In [7]:
from pandas import read_csv
from numpy import set_printoptions
from sklearn import datasets, linear_model
from sklearn.preprocessing import RobustScaler
from sklearn.model_selection import train_test_split
import random
# Using GPU
import os
import scipy.io as scpy

data = scpy.loadmat("all-data.mat")
# Extracting x_train from the mat file dictionary.
x_data = data["XTrain"]
# Extracting y_train from the mat file dictionary.
y_data = data["y_train"]
# Converting x_train and y_train to a numpy array.
x_data = np.array(x_data,dtype='float32')
y_data = np.array(y_data,dtype='float32')-1
x_temp_data=data['XTest']
y_temp_data=data['y_test']
x_temp_data=np.array(x_temp_data,dtype='float32')
y_temp_data=np.array(y_temp_data,dtype='float32')-1
# x_data=np.concatenate((x_data,x_temp_data),axis=0)
# y_data=np.concatenate((y_data,y_temp_data),axis=0)

# Verifying the shapes.
print(x_data.shape)
print(y_data.shape)

SEED = 99
os.environ['PYTHONHASHSEED']=str(SEED)
random.seed(SEED)
np.random.seed(SEED)
tf.random.set_seed(SEED)
# split into train test sets
x_train, x_val_to_use, y_train, y_val_to_use = train_test_split(x_data, y_data, test_size=0.3, random_state=SEED)
x_val, x_test, y_val, y_test = train_test_split(x_val_to_use, y_val_to_use, test_size=0.5, random_state=SEED)

print(f" {len(x_train), len(x_val), len(x_test)}")
print(f" {len(y_train), len(y_val), len(y_test)}")
# train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
# test_dataset = tf.data.Dataset.from_tensor_slices((x_val, y_val))
# BATCH_SIZE = 64
# # SHUFFLE_BUFFER_SIZE = 100
# train_dataset = train_dataset.batch(BATCH_SIZE)
# test_dataset = test_dataset.batch(BATCH_SIZE)


(15000, 1000)
(15000, 1)
 (10500, 2250, 2250)
 (10500, 2250, 2250)


# 🧠 Define the Model and Training Loop

## 🏗️ Build a Simple Classifier MLP

In [13]:
def Model():
    inputs = keras.Input(shape=(1000,), name="inputs")
    x1 = keras.layers.Dense(64, activation="relu")(inputs)
    x2 = keras.layers.Dense(32, activation="relu")(x1)
    outputs = keras.layers.Dense(3, activation="softmax", name="predictions")(x2)

    return keras.Model(inputs=inputs, outputs=outputs)

    
def train_step(x, y, model, optimizer, loss_fn, train_acc_metric):
    with tf.GradientTape() as tape:
        logits = model(x, training=True)
        loss_value = loss_fn(y, logits)

    grads = tape.gradient(loss_value, model.trainable_weights)
    optimizer.apply_gradients(zip(grads, model.trainable_weights))

    train_acc_metric.update_state(y, logits)

    return loss_value

    
def test_step(x, y, model, loss_fn, val_acc_metric):
    val_logits = model(x, training=True)
    # print("Val Logits shape model output = " + str(val_logits.shape))
    # print("Val Output shape model data   = "+ str(y.shape))
    loss_value = loss_fn(y, val_logits)
    val_acc_metric.update_state(y, val_logits)

    return loss_value

## 🔁 Write a Training Loop

### Step 3️⃣: Log metrics with `wandb.log`

In [14]:
def train(train_dataset,
          val_dataset, 
          model,
          optimizer,
          loss_fn,
          train_acc_metric,
          val_acc_metric,
          epochs=10, 
          log_step=200, 
          val_log_step=50):
  
    for epoch in range(epochs):
        print("\nStart of epoch %d" % (epoch,))

        train_loss = []   
        val_loss = []

        # Iterate over the batches of the dataset
        for step, (x_batch_train, y_batch_train) in tqdm.tqdm(enumerate(train_dataset), total=len(train_dataset)):
            loss_value = train_step(x_batch_train, y_batch_train, 
                                    model, optimizer, 
                                    loss_fn, train_acc_metric)
            train_loss.append(float(loss_value))

        # Run a validation loop at the end of each epoch
        for step, (x_batch_val, y_batch_val) in tqdm.tqdm(enumerate(val_dataset), total=len(val_dataset)):
            val_loss_value = test_step(x_batch_val, y_batch_val, 
                                       model, loss_fn, 
                                       val_acc_metric)
            val_loss.append(float(val_loss_value))
            
        # Display metrics at the end of each epoch
        train_acc = train_acc_metric.result()
        print("Training acc over epoch: %.4f" % (float(train_acc),))

        val_acc = val_acc_metric.result()
        print("Validation acc: %.4f" % (float(val_acc),))

        # Reset metrics at the end of each epoch
        train_acc_metric.reset_states()
        val_acc_metric.reset_states()

        # 3️⃣ log metrics using wandb.log
        wandb.log({'epochs': epoch,
                   'loss': np.mean(train_loss),
                   'acc': float(train_acc), 
                   'val_loss': np.mean(val_loss),
                   'val_acc':float(val_acc)})

# Step 4️⃣: Configure the Sweep

This is where you will:
* Define the hyperparameters you're sweeping over
* Provide your hyperparameter optimization method. We have `random`, `grid` and `bayes` methods.
* Provide an objective and a `metric` if using `bayes`, for example to `minimize` the `val_loss`.
* Use `hyperband` for early termination of poorly-performing runs

#### [Check out more on Sweep Configs $\rightarrow$](https://docs.wandb.com/sweeps/configuration)

In [20]:
sweep_config = {
  'method': 'bayes', 
  'metric': {
      'name': 'val_loss',
      'goal': 'minimize'
  },
  'early_terminate':{
      'type': 'hyperband',
      'min_iter': 10
  },
  'parameters': {
      'batch_size': {
          'values': [8, 16, 32, 64, 128, 256, 512] 
      },
      'learning_rate':{
          'values': [0.01, 0.005, 0.001, 0.0005, 0.0001, 0.00005, 0.00001]
      }
  }
}

Error in callback <function _WandbInit._resume_backend at 0x7f9dea168560> (for pre_run_cell):


BrokenPipeError: ignored

Error in callback <function _WandbInit._pause_backend at 0x7f9dea08db90> (for post_run_cell):


BrokenPipeError: ignored

# Step 5️⃣: Wrap the Training Loop

You'll need a function, like `sweep_train` below,
that uses `wandb.config` to set the hyperparameters
before `train` gets called.

In [17]:
def sweep_train(config_defaults=None):
    # Set default values
    config_defaults = {
        "batch_size": 8,
        "learning_rate": 0.01
    }
    # Initialize wandb with a sample project name
    wandb.init(config=config_defaults)  # this gets over-written in the Sweep

    # Specify the other hyperparameters to the configuration, if any
    wandb.config.epochs = 10
    wandb.config.log_step = 20
    wandb.config.val_log_step = 50
    wandb.config.architecture_name = "Custom"
    wandb.config.dataset_name = "Custom-Acustic"

    # build input pipeline using tf.data
    train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
    train_dataset = (train_dataset.shuffle(buffer_size=256)
                                  .batch(wandb.config.batch_size)
                                  .prefetch(buffer_size=tf.data.AUTOTUNE))

    val_dataset = tf.data.Dataset.from_tensor_slices((x_val, y_val))
    val_dataset = (val_dataset.shuffle(buffer_size=256)
                                  .batch(wandb.config.batch_size)
                                  .prefetch(buffer_size=tf.data.AUTOTUNE))

    # initialize model
    model = Model()

    # Instantiate an optimizer to train the model.
    optimizer = keras.optimizers.Adam(learning_rate=wandb.config.learning_rate)
    # Instantiate a loss function.
    loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=False)

    # Prepare the metrics.
    train_acc_metric = keras.metrics.SparseCategoricalAccuracy()
    val_acc_metric = keras.metrics.SparseCategoricalAccuracy()

    train(train_dataset,
          val_dataset, 
          model,
          optimizer,
          loss_fn,
          train_acc_metric,
          val_acc_metric,
          epochs=wandb.config.epochs, 
          log_step=wandb.config.log_step, 
          val_log_step=wandb.config.val_log_step)

# Step 6️⃣: Initialize Sweep and Run Agent 

In [18]:
sweep_id = wandb.sweep(sweep_config, project="sweeps-tensorflow")

Create sweep with ID: q16s9ooz
Sweep URL: https://wandb.ai/veysiadn/sweeps-tensorflow/sweeps/q16s9ooz


You can limit the number of total runs with the `count` parameter, we will limit a 10 to make the script run fast, feel free to increase the number of runs and see what happens.

In [19]:
wandb.agent(sweep_id, function=sweep_train, count=100)

[34m[1mwandb[0m: Agent Starting Run: gm0ztm0x with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	learning_rate: 0.001



Start of epoch 0


100%|██████████| 329/329 [00:03<00:00, 96.52it/s]
100%|██████████| 71/71 [00:00<00:00, 170.77it/s]


Training acc over epoch: 0.9599
Validation acc: 0.9947

Start of epoch 1


100%|██████████| 329/329 [00:03<00:00, 96.24it/s]
100%|██████████| 71/71 [00:00<00:00, 184.63it/s]

Training acc over epoch: 0.9939
Validation acc: 0.9973





VBox(children=(Label(value='0.000 MB of 0.008 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=0.051904…

0,1
acc,▁█
epochs,▁█
loss,█▁
val_acc,▁█
val_loss,█▁

0,1
acc,0.9939
epochs,1.0
loss,0.02139
val_acc,0.99733
val_loss,0.01227


[34m[1mwandb[0m: Agent Starting Run: gdy4polg with config:
[34m[1mwandb[0m: 	batch_size: 128
[34m[1mwandb[0m: 	learning_rate: 0.01



Start of epoch 0


100%|██████████| 83/83 [00:00<00:00, 89.90it/s]
100%|██████████| 18/18 [00:00<00:00, 152.48it/s]


Training acc over epoch: 0.9510
Validation acc: 0.9911

Start of epoch 1


100%|██████████| 83/83 [00:00<00:00, 90.56it/s]
100%|██████████| 18/18 [00:00<00:00, 160.80it/s]


Training acc over epoch: 0.9923
Validation acc: 0.9800


VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
acc,▁█
epochs,▁█
loss,█▁
val_acc,█▁
val_loss,▁█

0,1
acc,0.99229
epochs,1.0
loss,0.02947
val_acc,0.98
val_loss,0.06798


[34m[1mwandb[0m: Agent Starting Run: b5vl5sfd with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	learning_rate: 0.001



Start of epoch 0


100%|██████████| 329/329 [00:03<00:00, 92.80it/s]
100%|██████████| 71/71 [00:00<00:00, 171.54it/s]


Training acc over epoch: 0.9578
Validation acc: 0.9916

Start of epoch 1


100%|██████████| 329/329 [00:03<00:00, 94.20it/s]
100%|██████████| 71/71 [00:00<00:00, 167.75it/s]


Training acc over epoch: 0.9967
Validation acc: 0.9964


VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
acc,▁█
epochs,▁█
loss,█▁
val_acc,▁█
val_loss,█▁

0,1
acc,0.99667
epochs,1.0
loss,0.01348
val_acc,0.99644
val_loss,0.01735


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: 79r3cj2b with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	learning_rate: 0.001



Start of epoch 0


100%|██████████| 165/165 [00:01<00:00, 90.60it/s]
100%|██████████| 36/36 [00:00<00:00, 174.25it/s]


Training acc over epoch: 0.9435
Validation acc: 0.9920

Start of epoch 1


100%|██████████| 165/165 [00:01<00:00, 91.23it/s]
100%|██████████| 36/36 [00:00<00:00, 165.51it/s]

Training acc over epoch: 0.9966
Validation acc: 0.9951





VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
acc,▁█
epochs,▁█
loss,█▁
val_acc,▁█
val_loss,█▁

0,1
acc,0.99657
epochs,1.0
loss,0.01747
val_acc,0.99511
val_loss,0.01847


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: jam6aj0b with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	learning_rate: 0.0005



Start of epoch 0


100%|██████████| 165/165 [00:01<00:00, 90.21it/s]
100%|██████████| 36/36 [00:00<00:00, 168.03it/s]


Training acc over epoch: 0.9196
Validation acc: 0.9893

Start of epoch 1


100%|██████████| 165/165 [00:01<00:00, 90.88it/s]
100%|██████████| 36/36 [00:00<00:00, 171.88it/s]


Training acc over epoch: 0.9951
Validation acc: 0.9938


VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
acc,▁█
epochs,▁█
loss,█▁
val_acc,▁█
val_loss,█▁

0,1
acc,0.99514
epochs,1.0
loss,0.03405
val_acc,0.99378
val_loss,0.02413


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: 2d4uwg81 with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	learning_rate: 0.0005



Start of epoch 0


100%|██████████| 329/329 [00:03<00:00, 92.64it/s]
100%|██████████| 71/71 [00:00<00:00, 169.90it/s]


Training acc over epoch: 0.9495
Validation acc: 0.9938

Start of epoch 1


100%|██████████| 329/329 [00:03<00:00, 92.58it/s]
100%|██████████| 71/71 [00:00<00:00, 167.90it/s]


Training acc over epoch: 0.9971
Validation acc: 0.9938


0,1
acc,▁█
epochs,▁█
loss,█▁
val_acc,▁▁
val_loss,█▁

0,1
acc,0.99714
epochs,1.0
loss,0.01783
val_acc,0.99378
val_loss,0.01952


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: eogr8j6z with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	learning_rate: 0.001



Start of epoch 0


100%|██████████| 329/329 [00:03<00:00, 90.97it/s]
100%|██████████| 71/71 [00:00<00:00, 168.14it/s]


Training acc over epoch: 0.9584
Validation acc: 0.9929

Start of epoch 1


100%|██████████| 329/329 [00:03<00:00, 92.10it/s]
100%|██████████| 71/71 [00:00<00:00, 167.67it/s]


Training acc over epoch: 0.9966
Validation acc: 0.9951


0,1
acc,▁█
epochs,▁█
loss,█▁
val_acc,▁█
val_loss,█▁

0,1
acc,0.99657
epochs,1.0
loss,0.01484
val_acc,0.99511
val_loss,0.01964


[34m[1mwandb[0m: Agent Starting Run: zxj71qqr with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	learning_rate: 0.001



Start of epoch 0


100%|██████████| 329/329 [00:03<00:00, 90.02it/s]
100%|██████████| 71/71 [00:00<00:00, 157.22it/s]


Training acc over epoch: 0.9612
Validation acc: 0.9938

Start of epoch 1


100%|██████████| 329/329 [00:03<00:00, 90.07it/s]
100%|██████████| 71/71 [00:00<00:00, 163.84it/s]

Training acc over epoch: 0.9945
Validation acc: 0.9933





0,1
acc,▁█
epochs,▁█
loss,█▁
val_acc,█▁
val_loss,█▁

0,1
acc,0.99448
epochs,1.0
loss,0.01954
val_acc,0.99333
val_loss,0.02073


[34m[1mwandb[0m: Agent Starting Run: c5k2iu6p with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	learning_rate: 0.0005



Start of epoch 0


100%|██████████| 329/329 [00:03<00:00, 92.01it/s]
100%|██████████| 71/71 [00:00<00:00, 164.23it/s]


Training acc over epoch: 0.9306
Validation acc: 0.9924

Start of epoch 1


100%|██████████| 329/329 [00:03<00:00, 92.55it/s]
100%|██████████| 71/71 [00:00<00:00, 169.68it/s]

Training acc over epoch: 0.9965
Validation acc: 0.9956





0,1
acc,▁█
epochs,▁█
loss,█▁
val_acc,▁█
val_loss,█▁

0,1
acc,0.99648
epochs,1.0
loss,0.0194
val_acc,0.99556
val_loss,0.01715


[34m[1mwandb[0m: Agent Starting Run: po3jyvoy with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	learning_rate: 0.0001



Start of epoch 0


100%|██████████| 329/329 [00:03<00:00, 92.62it/s]
100%|██████████| 71/71 [00:00<00:00, 166.31it/s]


Training acc over epoch: 0.8280
Validation acc: 0.9578

Start of epoch 1


100%|██████████| 329/329 [00:03<00:00, 92.21it/s]
100%|██████████| 71/71 [00:00<00:00, 171.61it/s]

Training acc over epoch: 0.9773
Validation acc: 0.9849





0,1
acc,▁█
epochs,▁█
loss,█▁
val_acc,▁█
val_loss,█▁

0,1
acc,0.97733
epochs,1.0
loss,0.15174
val_acc,0.98489
val_loss,0.10797


[34m[1mwandb[0m: Agent Starting Run: en501hbe with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	learning_rate: 0.0001



Start of epoch 0


100%|██████████| 165/165 [00:01<00:00, 89.76it/s]
100%|██████████| 36/36 [00:00<00:00, 166.13it/s]


Training acc over epoch: 0.7459
Validation acc: 0.9262

Start of epoch 1


100%|██████████| 165/165 [00:01<00:00, 91.44it/s]
100%|██████████| 36/36 [00:00<00:00, 160.57it/s]

Training acc over epoch: 0.9616
Validation acc: 0.9738





VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
acc,▁█
epochs,▁█
loss,█▁
val_acc,▁█
val_loss,█▁

0,1
acc,0.96162
epochs,1.0
loss,0.24186
val_acc,0.97378
val_loss,0.17289


[34m[1mwandb[0m: Agent Starting Run: 8ghj8rmv with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	learning_rate: 0.0005



Start of epoch 0


100%|██████████| 165/165 [00:01<00:00, 89.30it/s]
100%|██████████| 36/36 [00:00<00:00, 159.98it/s]


Training acc over epoch: 0.9171
Validation acc: 0.9889

Start of epoch 1


100%|██████████| 165/165 [00:01<00:00, 87.66it/s]
100%|██████████| 36/36 [00:00<00:00, 164.64it/s]

Training acc over epoch: 0.9955
Validation acc: 0.9960





VBox(children=(Label(value='0.000 MB of 0.000 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
acc,▁█
epochs,▁█
loss,█▁
val_acc,▁█
val_loss,█▁

0,1
acc,0.99552
epochs,1.0
loss,0.03216
val_acc,0.996
val_loss,0.02225


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: gnw8apes with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	learning_rate: 0.0001



Start of epoch 0


100%|██████████| 329/329 [00:03<00:00, 92.96it/s]
100%|██████████| 71/71 [00:00<00:00, 175.49it/s]


Training acc over epoch: 0.8042
Validation acc: 0.9511

Start of epoch 1


100%|██████████| 329/329 [00:03<00:00, 94.26it/s]
100%|██████████| 71/71 [00:00<00:00, 173.51it/s]

Training acc over epoch: 0.9750
Validation acc: 0.9791





VBox(children=(Label(value='0.000 MB of 0.009 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=0.049565…

0,1
acc,▁█
epochs,▁█
loss,█▁
val_acc,▁█
val_loss,█▁

0,1
acc,0.97495
epochs,1.0
loss,0.14941
val_acc,0.97911
val_loss,0.10429


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: mwpy5q54 with config:
[34m[1mwandb[0m: 	batch_size: 256
[34m[1mwandb[0m: 	learning_rate: 0.0005



Start of epoch 0


100%|██████████| 42/42 [00:00<00:00, 84.08it/s]
100%|██████████| 9/9 [00:00<00:00, 152.59it/s]


Training acc over epoch: 0.7476
Validation acc: 0.9604

Start of epoch 1


100%|██████████| 42/42 [00:00<00:00, 87.80it/s]
100%|██████████| 9/9 [00:00<00:00, 135.89it/s]

Training acc over epoch: 0.9802
Validation acc: 0.9858





0,1
acc,▁█
epochs,▁█
loss,█▁
val_acc,▁█
val_loss,█▁

0,1
acc,0.98019
epochs,1.0
loss,0.15794
val_acc,0.98578
val_loss,0.10037


[34m[1mwandb[0m: Agent Starting Run: 4ztzkxuw with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	learning_rate: 0.0001



Start of epoch 0


100%|██████████| 329/329 [00:03<00:00, 91.96it/s]
100%|██████████| 71/71 [00:00<00:00, 158.81it/s]


Training acc over epoch: 0.8003
Validation acc: 0.9573

Start of epoch 1


100%|██████████| 329/329 [00:03<00:00, 93.42it/s]
100%|██████████| 71/71 [00:00<00:00, 173.93it/s]


Training acc over epoch: 0.9778
Validation acc: 0.9849


0,1
acc,▁█
epochs,▁█
loss,█▁
val_acc,▁█
val_loss,█▁

0,1
acc,0.97781
epochs,1.0
loss,0.159
val_acc,0.98489
val_loss,0.10589


[34m[1mwandb[0m: Agent Starting Run: f9ygitly with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	learning_rate: 0.0001



Start of epoch 0


100%|██████████| 329/329 [00:03<00:00, 92.44it/s]
100%|██████████| 71/71 [00:00<00:00, 168.76it/s]


Training acc over epoch: 0.8064
Validation acc: 0.9636

Start of epoch 1


100%|██████████| 329/329 [00:03<00:00, 92.30it/s]
100%|██████████| 71/71 [00:00<00:00, 162.07it/s]


Training acc over epoch: 0.9788
Validation acc: 0.9893


0,1
acc,▁█
epochs,▁█
loss,█▁
val_acc,▁█
val_loss,█▁

0,1
acc,0.97876
epochs,1.0
loss,0.13973
val_acc,0.98933
val_loss,0.09178


[34m[1mwandb[0m: Agent Starting Run: zjcajxm6 with config:
[34m[1mwandb[0m: 	batch_size: 256
[34m[1mwandb[0m: 	learning_rate: 0.005



Start of epoch 0


100%|██████████| 42/42 [00:00<00:00, 85.24it/s]
100%|██████████| 9/9 [00:00<00:00, 151.27it/s]


Training acc over epoch: 0.9420
Validation acc: 0.9880

Start of epoch 1


100%|██████████| 42/42 [00:00<00:00, 88.40it/s]
100%|██████████| 9/9 [00:00<00:00, 149.34it/s]

Training acc over epoch: 0.9942
Validation acc: 0.9898





0,1
acc,▁█
epochs,▁█
loss,█▁
val_acc,▁█
val_loss,█▁

0,1
acc,0.99419
epochs,1.0
loss,0.0171
val_acc,0.98978
val_loss,0.02628


[34m[1mwandb[0m: Agent Starting Run: 7b3d1hps with config:
[34m[1mwandb[0m: 	batch_size: 256
[34m[1mwandb[0m: 	learning_rate: 0.01



Start of epoch 0


100%|██████████| 42/42 [00:00<00:00, 78.56it/s]
100%|██████████| 9/9 [00:00<00:00, 151.60it/s]


Training acc over epoch: 0.9313
Validation acc: 0.9871

Start of epoch 1


100%|██████████| 42/42 [00:00<00:00, 85.56it/s]
100%|██████████| 9/9 [00:00<00:00, 147.38it/s]

Training acc over epoch: 0.9927
Validation acc: 0.9929





VBox(children=(Label(value='0.000 MB of 0.008 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=0.051927…

0,1
acc,▁█
epochs,▁█
loss,█▁
val_acc,▁█
val_loss,█▁

0,1
acc,0.99267
epochs,1.0
loss,0.02358
val_acc,0.99289
val_loss,0.02507


[34m[1mwandb[0m: Agent Starting Run: 23wwn5nw with config:
[34m[1mwandb[0m: 	batch_size: 256
[34m[1mwandb[0m: 	learning_rate: 0.01



Start of epoch 0


100%|██████████| 42/42 [00:00<00:00, 81.18it/s]
100%|██████████| 9/9 [00:00<00:00, 149.78it/s]


Training acc over epoch: 0.9356
Validation acc: 0.9933

Start of epoch 1


100%|██████████| 42/42 [00:00<00:00, 84.97it/s]
100%|██████████| 9/9 [00:00<00:00, 148.05it/s]


Training acc over epoch: 0.9935
Validation acc: 0.9907


0,1
acc,▁█
epochs,▁█
loss,█▁
val_acc,█▁
val_loss,▁█

0,1
acc,0.99352
epochs,1.0
loss,0.01971
val_acc,0.99067
val_loss,0.0409


[34m[1mwandb[0m: Agent Starting Run: x0o6gj1z with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	learning_rate: 0.005



Start of epoch 0


100%|██████████| 165/165 [00:01<00:00, 90.58it/s]
100%|██████████| 36/36 [00:00<00:00, 176.88it/s]


Training acc over epoch: 0.9598
Validation acc: 0.9631

Start of epoch 1


100%|██████████| 165/165 [00:01<00:00, 88.81it/s]
100%|██████████| 36/36 [00:00<00:00, 162.19it/s]


Training acc over epoch: 0.9884
Validation acc: 0.9876


0,1
acc,▁█
epochs,▁█
loss,█▁
val_acc,▁█
val_loss,█▁

0,1
acc,0.98838
epochs,1.0
loss,0.04285
val_acc,0.98756
val_loss,0.04925


Error in callback <function _WandbInit._pause_backend at 0x7f9dea08db90> (for post_run_cell):


BrokenPipeError: ignored

# 👀 Visualize Results

Click on the **Sweep URL** link above to see your live results.


# 🎨 Example Gallery

See examples of projects tracked and visualized with W&B in our [Gallery →](https://app.wandb.ai/gallery)

# 📏 Best Practices
1. **Projects**: Log multiple runs to a project to compare them. `wandb.init(project="project-name")`
2. **Groups**: For multiple processes or cross validation folds, log each process as a runs and group them together. `wandb.init(group='experiment-1')`
3. **Tags**: Add tags to track your current baseline or production model.
4. **Notes**: Type notes in the table to track the changes between runs.
5. **Reports**: Take quick notes on progress to share with colleagues and make dashboards and snapshots of your ML projects.

# 🤓 Advanced Setup
1. [Environment variables](https://docs.wandb.com/library/environment-variables): Set API keys in environment variables so you can run training on a managed cluster.
2. [Offline mode](https://docs.wandb.com/library/technical-faq#can-i-run-wandb-offline): Use `dryrun` mode to train offline and sync results later.
3. [On-prem](https://docs.wandb.com/self-hosted): Install W&B in a private cloud or air-gapped servers in your own infrastructure. We have local installations for everyone from academics to enterprise teams.