<a href="https://colab.research.google.com/github/xuwangfmc/dlbook/blob/main/wb_hydra/Artifacts.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Artifacts
该教程主要介绍如何将Weights&Bias的Artifacts工具运用到PyTorch当中，实现数据集和模型的版本控制。

Weights&Bias的Artifacts页面如下所示：
![tXCvaoq.png](https://s2.loli.net/2022/01/22/tyjYG7u6EIW5mVX.png)

Artifact记录了各个数据集和模型的版本及被调用情况，通过这样的页面我们很容易管理整个项目的版本。

## 安装并导入wandb库

In [1]:
# Compatible with wandb version 0.9.2+
!pip install wandb -qqq
!apt install tree

[K     |████████████████████████████████| 1.7 MB 11.8 MB/s 
[K     |████████████████████████████████| 142 kB 46.5 MB/s 
[K     |████████████████████████████████| 180 kB 24.1 MB/s 
[K     |████████████████████████████████| 97 kB 5.2 MB/s 
[K     |████████████████████████████████| 63 kB 1.6 MB/s 
[?25h  Building wheel for subprocess32 (setup.py) ... [?25l[?25hdone
  Building wheel for pathtools (setup.py) ... [?25l[?25hdone
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following NEW packages will be installed:
  tree
0 upgraded, 1 newly installed, 0 to remove and 37 not upgraded.
Need to get 40.7 kB of archives.
After this operation, 105 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 tree amd64 1.7.0-5 [40.7 kB]
Fetched 40.7 kB in 0s (142 kB/s)
Selecting previously unselected package tree.
(Reading database ... 155229 files and directories currently installed.)
Preparin

In [2]:
import os
import wandb

## 加载数据集并保存到Artifacts

In [3]:
import random
import numpy as np
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
from tqdm.notebook import tqdm

# Ensure deterministic behavior
torch.backends.cudnn.deterministic = True
random.seed(hash("setting random seeds") % 2**32 - 1)
np.random.seed(hash("improves reproducibility") % 2**32 - 1)
torch.manual_seed(hash("by removing stochasticity") % 2**32 - 1)
torch.cuda.manual_seed_all(hash("so runs are repeatable") % 2**32 - 1)

# Device configuration
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

from collections import namedtuple

Dataset = namedtuple("Dataset", ["x", "y"])

def load():
    """Load the data"""
     
    train_dataset = torchvision.datasets.MNIST(root=".",
                          train=True, 
                          transform=None,
                          download=True)
    test_dataset = torchvision.datasets.MNIST(root=".",
                          train=False, 
                          transform=None,
                          download=True)
    x_train, y_train = [],[]
    x_test, y_test = [],[]
    for num, (input1, label) in enumerate(train_dataset):
      input = np.array(input1)
      x_train.append(input)
      y_train.append(label)
    for num, (input1, label) in enumerate(test_dataset):
      input = np.array(input1)
      x_test.append(input)
      y_test.append(label)
    training_set = Dataset(x_train, y_train)
    test_set = Dataset(x_test, y_test)
    datasets = [training_set, test_set]
    return datasets

def load_and_log():

    # start a run, with a type to label it and a project it can call home
    with wandb.init(project="artifacts-pytorch", job_type="load-data") as run:
        
        datasets= load()  # separate code for loading the datasets
        names = ["training", "test"]

        # create our Artifact
        raw_data = wandb.Artifact(
            "mnist-raw", type="dataset",
            description="Raw MNIST dataset, split into train/test",
            metadata={"source": "torch.datasets.mnist",
                      "sizes": [len(dataset.x) for dataset in datasets]})

        for name, data in zip(names, datasets):
            # Store a new file in the artifact, and write something into its contents.
            with raw_data.new_file(name + ".npz", mode="wb") as file:
                np.savez(file, x=data.x, y=data.y)

        # Save the artifact to W&B.
        run.log_artifact(raw_data)
        
load_and_log()

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize


wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit: ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./MNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]

Extracting ./MNIST/raw/train-images-idx3-ubyte.gz to ./MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./MNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/28881 [00:00<?, ?it/s]

Extracting ./MNIST/raw/train-labels-idx1-ubyte.gz to ./MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./MNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]

Extracting ./MNIST/raw/t10k-images-idx3-ubyte.gz to ./MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./MNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/4542 [00:00<?, ?it/s]

Extracting ./MNIST/raw/t10k-labels-idx1-ubyte.gz to ./MNIST/raw



VBox(children=(Label(value=' 52.87MB of 52.87MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.…

- 1) wandb.init()会初始化一个新的运行，返回一个对象并
创建本地目录。目录存放着所有日志和文件，并且会异步传输到Weights&Bias服务器当中。

- 2) wandb.Artifact会创建一个新的Artifact，用于保存数据集或者模型的文件，从而实现版本控制。通过.new_file函数可以在该Artifact当中新建文件并保存。

- 3) log_artifact函数会将创建的Artifact上传到服务器当中。

##调用数据集

下载存放在Artifacts的数据集进行处理,并将处理后的数据集上传到Artifacts。

In [4]:
def preprocess(dataset, img_transform=True):
  "Prepare the data"
  x, y = dataset.x, dataset.y
  if img_transform:
    tran = transforms.ToTensor()
    x = np.expand_dims(x, 1)
    x = x / 255.0
    y = np.expand_dims(y, -1)
    y = y / 1.0
    
  return Dataset(x, y)

def preprocess_and_log(steps):

    with wandb.init(project="artifacts-pytorch", job_type="preprocess-data") as run:

        processed_data = wandb.Artifact(
            "mnist-preprocess", type="dataset",
            description="Preprocessed MNIST dataset",
            metadata=steps)
         
        # declare which artifact we'll be using
        raw_data_artifact = run.use_artifact('mnist-raw:latest')

        # if need be, download the artifact
        raw_dataset = raw_data_artifact.download()
        
        for split in ["training", "test"]:
            raw_split = read(raw_dataset, split)
            processed_dataset = preprocess(raw_split, **steps)

            with processed_data.new_file(split + ".npz", mode="wb") as file:
                np.savez(file, x=processed_dataset.x, y=processed_dataset.y)

        run.log_artifact(processed_data)


def read(data_dir, split):
    filename = split + ".npz"
    data = np.load(os.path.join(data_dir, filename))

    return Dataset(x=data["x"], y=data["y"])

steps = {"img_transform": True}

preprocess_and_log(steps)

[34m[1mwandb[0m: Currently logged in as: [33mxuwangfmc[0m (use `wandb login --relogin` to force relogin)


[34m[1mwandb[0m: Downloading large artifact mnist-raw:latest, 52.87MB. 2 files... Done. 0:0:0


VBox(children=(Label(value=' 419.24MB of 419.24MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=…

## 定义模型并保存到Artifacts

设计ConvNet网络结构，配置网络的基本参数，并将设计好的模型文件上传到Artifacts。

In [5]:
# Conventional and convolutional neural network

class ConvNet(nn.Module):
    def __init__(self, kernels, classes=10):
        super(ConvNet, self).__init__()
        
        self.layer1 = nn.Sequential(
            nn.Conv2d(1, kernels[0], kernel_size=5, stride=1, padding=2),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2))
        self.layer2 = nn.Sequential(
            nn.Conv2d(16, kernels[1], kernel_size=5, stride=1, padding=2),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2))
        self.fc = nn.Linear(7 * 7 * kernels[-1], classes)
        
    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = out.reshape(out.size(0), -1)
        out = self.fc(out)
        return out

In [6]:
def build_model_and_log(config):
    with wandb.init(project="artifacts-pytorch", job_type="initialize", config=config) as run:
        config = wandb.config
        
        model = ConvNet(config.kernels, config.classes).to(device)

        model_artifact = wandb.Artifact(
            "convnet", type="model",
            description="Convnet",
            metadata=dict(config))

        torch.save(model,'model.pkl')
        # another way to add a file to an Artifact

        model_artifact.add_file("model.pkl")

        wandb.save("model.pkl")

        run.log_artifact(model_artifact)

model_config = dict(
    epochs=5,
    classes=10,
    kernels=[16, 32],
    batch_size=128,
    learning_rate=0.005,
    dataset="MNIST",
    architecture="CNN")

build_model_and_log(model_config)

VBox(children=(Label(value=' 0.23MB of 0.23MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

## 调用Artifacts的模型

定义模型的具体训练方式。

In [7]:
def train(model, loader, criterion, optimizer, config):
    # tell wandb to watch what the model gets up to: gradients, weights, and more!
    #wandb.watch(model, criterion, log="all", log_freq=10)

    # Run training and track with wandb
    total_batches = len(loader) * config.epochs
    example_ct = 0  # number of examples seen
    batch_ct = 0
    for epoch in tqdm(range(config.epochs)):
        for _, (images, labels) in enumerate(loader):

            loss = train_batch(images, labels, model, optimizer, criterion)
            example_ct +=  len(images)
            batch_ct += 1


def train_batch(images, labels, model, optimizer, criterion):
    images, labels = images.to(device), labels.to(device)
    
    # Forward pass ➡
    outputs = model(images)
    loss = criterion(outputs, labels)
    
    # Backward pass ⬅
    optimizer.zero_grad()
    loss.backward()

    # Step with optimizer
    optimizer.step()

    return loss

定义模型的具体测试方式。

In [8]:
def test(model, test_loader):
    model.eval()

    # Run the model on some test examples
    with torch.no_grad():
        correct, total = 0, 0
        for images, labels in test_loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

        print(f"Accuracy of the model on the {total} " +
              f"test images: {100 * correct / total}%")
        
        wandb.log({"test_accuracy": correct / total})
        return correct / total

定义数据集处理方式

In [9]:
class my_mnist(torch.utils.data.Dataset):
  def __init__(self, data):
    self.dataset = data
  def __getitem__(self, idx):
    img = torch.from_numpy(self.dataset.x[idx]).to(torch.float32)
    label = torch.from_numpy(self.dataset.y[idx]).to(torch.long)
    label = label.squeeze(dim=-1)

    return (img, label)
  def __len__(self):
    return self.dataset.x.shape[0]

将训练后的模型参数上传到Artifacts, 并输出模型的测试结果。

In [10]:
def train_and_log(config):

    with wandb.init(project="artifacts-pytorch", job_type="train", config=config) as run:
        config = wandb.config

        data = run.use_artifact('mnist-preprocess:latest')
        data_dir = data.download()
        training_dataset = read(data_dir, "training")

        training_dataset = my_mnist(training_dataset)
        training_loader = torch.utils.data.DataLoader(dataset=training_dataset,
                                  batch_size=config.batch_size, 
                                  shuffle=True,
                                  pin_memory=True, num_workers=2)

        model_artifact = run.use_artifact("convnet:latest")
        model_dir = model_artifact.download()
        

        model_path = os.path.join(model_dir, "model.pkl")

        model = torch.load('model.pkl')

        criterion = nn.CrossEntropyLoss()
        optimizer = torch.optim.Adam(
        model.parameters(), lr=config.learning_rate)

        train(model, training_loader, criterion, optimizer, config)

        model_artifact = wandb.Artifact(
            "trained-model", type="model",
            description="trained model",
            metadata=dict(config))
        
        torch.save(model,'trained_model.pkl')
        # another way to add a file to an Artifact
        model_artifact.add_file("trained_model.pkl")
        wandb.save("trained_model.pkl")


        run.log_artifact(model_artifact)

    return model

    
def test_and_log(config):
    
    with wandb.init(project="artifacts-pytorch", job_type="report", config=config) as run:
        config = wandb.config
        data = run.use_artifact('mnist-preprocess:latest')
        data_dir = data.download()
        test_dataset = read(data_dir, "test")
        test_dataset = my_mnist(test_dataset)
        test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                  batch_size=config.batch_size, 
                                  shuffle=True,
                                  pin_memory=True, num_workers=2)

        model_artifact = run.use_artifact("trained-model:latest")
        model_dir = model_artifact.download()

        test_accuracy = test(model, test_loader)

        run.summary.update({"test_accuracy": test_accuracy})     



train_config = dict(
    epochs=5,
    classes=10,
    kernels=[16, 32],
    batch_size=128,
    learning_rate=0.005,
    dataset="MNIST",
    architecture="CNN")

test_config = {"batch_size": 128}
model = train_and_log(train_config)
test_and_log(test_config)


[34m[1mwandb[0m: Downloading large artifact mnist-preprocess:latest, 419.24MB. 2 files... Done. 0:0:0


  0%|          | 0/5 [00:00<?, ?it/s]

VBox(children=(Label(value=' 0.23MB of 0.23MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

[34m[1mwandb[0m: Downloading large artifact mnist-preprocess:latest, 419.24MB. 2 files... Done. 0:0:0


Accuracy of the model on the 10000 test images: 99.04%


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
test_accuracy,▁

0,1
test_accuracy,0.9904


打开输出结果的链接并选择Artifacts页面即可查看模型和数据集的保存情况。