# Weights & Biases 101 ü•æüèïÔ∏è

This notebook is intended to show you how to track your machine learning experiments using [Weights & Biases](https://wandb.ai).

Weights & Biases has two major components: a python client named `wandb` ü™Ñüêù  and a web application that allows you to store, query, visualize, and share metadata from your machine learning experiments, e.g. loss curves, evaluation metrics, model predictions... you can `.log` *just about* anything.

The client is open source and you can find the [source code on Github](http://github.com/wandb/wandb)! ‚≠ê

The first step on our journey is to install the client, which is as easy as:

## ü™Ñ Install `wandb` library and login


The first step on our journey is to install the client, which is as easy as:



In [None]:
!pip install c1_aiml_aem -qU

In [None]:
!pip install "numpy<1.26.4"

In [None]:
!pip install scikit-learn

## Log in to W&B
- You can explicitly login using `wandb login` or `wandb.login()` (See below)
- Alternatively you can set environment variables. There are several env variables which you can set to change the behavior of W&B logging. The most important are:
    - `WANDB_API_KEY` - find this in your "Settings" section under your profile
    - `WANDB_BASE_URL` - this is the url of the W&B server
- Find your API Token in "Profile" -> "Setttings" in the W&B App


In [None]:
from c1_aiml_aem import wandb
import numpy as np
import random

In [None]:
## Replace this with Cap1 Instance url
WANDB_HOST = "https://wandb.cloud.capitalone.com" #@param
# Equivalent to running "wandb login" in your shell

wandb.login(host= WANDB_HOST)

#
# Note that https://api.wandb.ai is the default and points to the publicly hosted
# app.
#
# Alternative you can configure this with environment variables:
# export WANDB_API_KEY="<your-api-key>"
# export WANDB_BASE_URL="<your-wandb-endpoint>"

Calling `wandb login` or `wandb.login` will write your API key to your `~/.netrc` file. __To authenticate the client in a headless job on the cloud, you will definitely want to use the `WANDB_API_KEY` environment variable__.

**Default Destination:** When a user signs up to the instance and joins a team, wandb will automatically write runs this team. This setting is controlled directly through your settings and can be updated by

*   Visiting https://<host-url>/settings
*   Look for `Default Team` section
*   Updating `Default location to create new projects` to entity of choice

In [None]:
import random
import math

WANDB_ENTITY = 'wb-new-user-training-20251008' #@param #Point to a team you are a member of!
WANDB_PROJECT = "workshop_wandb_intro" #@param
YOUR_NAME = "uma" #@param #We will use this for our filtering and grouping to make it easy for your to identify your runs in the project

# Logging Tables

In [9]:
import wandb

# Start a new run
with wandb.init(project="table-demo") as run:

    # Create a table object with two columns and two rows of data
    my_table = wandb.Table(
        columns=["a", "b"],
        data=[["a1", "b1"], ["a2", "b2"]],
        )

    # Log the table to W&B
    run.log({"Table Name": my_table})

### Logging a single image

In [None]:
import wandb
from PIL import Image
import numpy as np

# Initialize a new W&B run
run = wandb.init(project=WANDB_PROJECT, name="image_logging_example")

# Load or create an image (example using a random numpy array)
image = np.random.randint(0, 255, (100, 100, 3), dtype=np.uint8)  # Random 100x100 RGB image
image_pil = Image.fromarray(image)

# Log the image to W&B
run.log({"sample_image": wandb.Image(image_pil)})

# Finish the run
run.finish()

### Log Dataframes of Media

You can also log `pandas.DataFrame` objects with `.log`! These will be converted into a `wandb.Table` (docs) and interactievly displayed inside of W&B.

Note: One of the most powerful features of `wandb.Table`s is that you can include any `wandb` type as a cell value! This includes, images, plots, videos, audio... almost anything ü§©

Below we will use a the Oxford-IIIT Pet Dataset of 37 different pet breeds along with corresponding segmentation masks provided in the annotations for logging media example

In [None]:
!curl -SL -qq https://www.robots.ox.ac.uk/~vgg/data/pets/data/images.tar.gz > images.tar.gz
!curl -SL -qq https://www.robots.ox.ac.uk/~vgg/data/pets/data/annotations.tar.gz > annotations.tar.gz
!tar -xzf images.tar.gz
!tar -xzf annotations.tar.gz

In [None]:
import os
import wandb
import numpy as np
from PIL import Image
from pathlib import Path

# Function to load an image and mask
def load_image_and_mask(image_path, mask_path):
    image = np.array(Image.open(image_path))
    mask = np.array(Image.open(mask_path))
    return image, mask

# Function to create W&B mask overlay
def wb_mask(image, mask):
    return wandb.Image(image, masks={"predictions": {"mask_data": mask}}, caption="Segmentation Image")

def log_single_images(path_img, path_lbl):
  # Single Image and Mask Logging
  image_path = path_img / 'Abyssinian_1.jpg'
  mask_path = path_lbl / 'Abyssinian_1.png'
  image_np, mask_np = load_image_and_mask(image_path, mask_path)
  return image_np, mask_np

def wandb_table_multiple_imags(path_img,path_lbl, num_images):
    table = wandb.Table(columns=["ID", "Original Image", "Image with Mask"])
    # Multiple Images Logging in a Table
    table = wandb.Table(columns=["ID", "Original Image", "Image with Mask"])

    # Log first X images and their masks to the table
    for each in os.listdir(path_img)[:num_images]:  # limiting to first X images
        image_path = path_img / each
        mask_path = path_lbl / f'{Path(each).stem}.png'  # Adjust to match mask filenames
        image_np, mask_np = load_image_and_mask(image_path, mask_path)

        # Create mask overlay using W&B
        mask_img = wb_mask(image_np, mask_np)

        # Add image path, original image, and image with mask to the table
        table.add_data(str(image_path), wandb.Image(image_np), mask_img)

    return table

In [None]:
# Logging a single image and a table of images with segmentation masks
run = wandb.init(
    entity = WANDB_ENTITY,
    project=WANDB_PROJECT,
    group=YOUR_NAME,
    name="logging_rich_media",
    )

# Define paths
path_img = Path('images')
path_lbl = Path('annotations/trimaps')

image_np, mask_np = log_single_images(path_img, path_lbl)

# Log single image and segmentation mask
run.log({
    "input_image": wandb.Image(image_np, caption="Input Image"),
    "segmentation_mask": wb_mask(image_np, mask_np)
})

#Log tables of images and segmentation mask
image_tables = wandb_table_multiple_imags(path_img, path_lbl, 30)

# Log the table
run.log({"Segmentation Table": image_tables})

run.finish()

### Log Sequences of Media

If you periodically call `run.log` to log a number (for example, loss), Weights & Biases will automatically render a line plot showing the change in that value over time (a loss curve). You can also log media under a key more than once over the course of an experiment, in which case Weights & Biases will display that media with a step slider so you can scrub over the course of the experiment and see how it changed. This is particularly useful for seeing how model predictions and visualizations of model performance (e.g. a precision/recall curve) change over time. In the example below, we log a `wandb.Image` repeatedly just to demonstrate how this works. Below is an example of doing the same with audio.

In [None]:
%%sh
curl https://parade.com/.image/t_share/MTkwNTgwOTUyNjU2Mzg5MjQ1/albert-einstein-quotes-jpg.jpg > image.jpg

In [None]:
from PIL import Image, ImageFilter
import pandas as pd
# Load image with pillow, resize to 512 square
im = Image.open("./image.jpg").resize((512, 512))
images = []
with wandb.init(project = WANDB_PROJECT) as run:

  for step in range(10):

    # Log image
    images.append( (step, wandb.Image(im)))
    run.log({"image": wandb.Image(im)})

    # Apply small Gaussian blur
    im = im.filter(ImageFilter.GaussianBlur(radius=1.5))

  # Also log the images + associated logging step to a W&B Table
  run.log({ "images_df": pd.DataFrame( images, columns = ["step", "images"])})

# Custom Charts

You can create Custom Charts to create charts that are more complex and not offered by the default UI. Log arbitrary tables of data and visualize them exactly how you want. Control details of fonts, colors, and tooltips with the power of Vega. Below is an example of logging a PR curve using a built-in Vega spec.

Try even more examples in this [colab](https://tiny.cc/custom-charts)

## Bar plot

In [6]:
import random
import wandb

# Generate random data for the table
data = [
    ["car", random.uniform(0, 1)],
    ["bus", random.uniform(0, 1)],
    ["road", random.uniform(0, 1)],
    ["person", random.uniform(0, 1)],
]

# Create a table with the data
table = wandb.Table(data=data, columns=["class", "accuracy"])

# Initialize a W&B run and log the bar plot
with wandb.init(project="bar_chart") as run:
    # Create a bar plot from the table
    bar_plot = wandb.plot.bar(
         table=table,
         label="class",
         value="accuracy",
         title="Object Classification Accuracy",
    )

    # Log the bar chart to W&B
    run.log({"bar_plot": bar_plot})

## Confusion matrix

In [7]:
import numpy as np
import wandb

# Define class names for wildlife
wildlife_class_names = ["Lion", "Tiger", "Elephant", "Zebra"]

# Generate random true labels (0 to 3 for 10 samples)
wildlife_y_true = np.random.randint(0, 4, size=10)

# Generate random probabilities for each class (10 samples x 4 classes)
wildlife_probs = np.random.rand(10, 4)
wildlife_probs = np.exp(wildlife_probs) / np.sum(
    np.exp(wildlife_probs),
    axis=1,
    keepdims=True,
)

# Initialize W&B run and log confusion matrix
with wandb.init(project="wildlife_classification") as run:
    confusion_matrix = wandb.plot.confusion_matrix(
         probs=wildlife_probs,
         y_true=wildlife_y_true,
         class_names=wildlife_class_names,
         title="Wildlife Classification Confusion Matrix",
    )
    run.log({"wildlife_confusion_matrix": confusion_matrix})

## Line Series

In [8]:
import wandb

# Initialize W&B run
with wandb.init(project="line_series_example") as run:
    # x values shared across all y series
    xs = list(range(10))

    # Multiple y series to plot
    ys = [
         [i for i in range(10)],  # y = x
         [i**2 for i in range(10)],  # y = x^2
         [i**3 for i in range(10)],  # y = x^3
    ]

    # Generate and log the line series chart
    line_series_chart = wandb.plot.line_series(
         xs,
         ys,
         title="title",
         xname="step",
    )
    run.log({"line-series-single-x": line_series_chart})

## PR Curve

In [3]:
import wandb

# y_true: 0 = not spam, 1 = spam
y_true = [0, 1, 1, 0, 1, 0, 1, 0, 0, 1]
spam_p = [0.90, 0.80, 0.70, 0.60, 0.55, 0.50, 0.45, 0.40, 0.30, 0.20]  # <-- overlap on purpose

# wandb.pr_curve needs [p(not spam), p(spam)] for each row
y_probas = [[1 - p, p] for p in spam_p]
labels = ["not spam", "spam"]

with wandb.init(project="spam-detection") as run:
    pr = wandb.plot.pr_curve(
        y_true=y_true,
        y_probas=y_probas,
        labels=labels,
        title="Precision-Recall Curve for Spam Detection",
    )
    run.log({"pr-curve": pr})


ROC Curve

In [4]:
import numpy as np
import wandb

# Simulate a medical diagnosis classification problem with three diseases
n_samples = 200
n_classes = 3

# True labels: assign "Diabetes", "Hypertension", or "Heart Disease" to
# each sample
disease_labels = ["Diabetes", "Hypertension", "Heart Disease"]
# 0: Diabetes, 1: Hypertension, 2: Heart Disease
y_true = np.random.choice([0, 1, 2], size=n_samples)

# Predicted probabilities: simulate predictions, ensuring they sum to 1
# for each sample
y_probas = np.random.dirichlet(np.ones(n_classes), size=n_samples)

# Specify classes to plot (plotting all three diseases)
classes_to_plot = [0, 1, 2]

# Initialize a W&B run and log a ROC curve plot for disease classification
with wandb.init(project="medical_diagnosis") as run:
   roc_plot = wandb.plot.roc_curve(
        y_true=y_true,
        y_probas=y_probas,
        labels=disease_labels,
        classes_to_plot=classes_to_plot,
        title="ROC Curve for Disease Classification",
   )
   run.log({"roc-curve": roc_plot})

## Scatterplot

In [5]:
import math
import random
import wandb

# Simulate temperature variations at different altitudes over time
data = [
   [i, random.uniform(-10, 20) - 0.005 * i + 5 * math.sin(i / 50)]
   for i in range(300)
]

# Create W&B table with altitude (m) and temperature (¬∞C) columns
table = wandb.Table(data=data, columns=["altitude (m)", "temperature (¬∞C)"])

# Initialize W&B run and log the scatter plot
with wandb.init(project="temperature-altitude-scatter") as run:
   # Create and log the scatter plot
   scatter_plot = wandb.plot.scatter(
        table=table,
        x="altitude (m)",
        y="temperature (¬∞C)",
        title="Altitude vs Temperature",
   )
   run.log({"altitude-temperature-scatter": scatter_plot})