## **Tutorial 1. Overall Prodecure of OTO.**
## *One-Shot General DNN Training and Compression Framework*

* Note that this tutorial is for showing the end-to-end functionality of OTO.

* The exampler DNN training is **far away from complete**, thereby ignoring the achieved accuracy.
* We will provide more detailed and advanced tutorails to present and illustrate more complete experiments.

### Step 1. Create OTO instance

In [1]:
import torch
from backends import DemoNet
from only_train_once import OTO

model = DemoNet()
dummy_input = torch.zeros(1, 3, 32, 32)
oto = OTO(model=model.cuda(), dummy_input=dummy_input.cuda())

#### (Optional) Visualize the dependancy graph of DNN for ZIG partitions

In [2]:
# A DemoNet.gv.pdf will be generated to display the depandancy graph.
oto.visualize_zigs()

### Step 2. Dataset Preparation

In [3]:
from torchvision.datasets import CIFAR10
import torchvision.transforms as transforms

trainset = CIFAR10(root='cifar10', train=True, download=True, transform=transforms.Compose([
            transforms.RandomHorizontalFlip(),
            transforms.RandomCrop(32, 4),
            transforms.ToTensor(),
            transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])]))
testset = CIFAR10(root='cifar10', train=False, download=True, transform=transforms.Compose([
            transforms.ToTensor(),
            transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])]))

trainloader =  torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True, num_workers=4)
testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=False, num_workers=4)

Files already downloaded and verified
Files already downloaded and verified


### Step 3. Setup DHSPG optimizer

In [4]:
optimizer = oto.dhspg(lr=0.1, target_group_sparsity=0.7)

### Step 4. Train the model as normal

In [5]:
from tqdm import tqdm
from utils.utils import check_accuracy

max_epoch = 30
model.cuda()
criterion = torch.nn.CrossEntropyLoss()

for epoch in range(max_epoch):
    f_avg_val = 0.0
    model.train()
    for X, y in trainloader:
        X = X.cuda()
        y = y.cuda()
        y_pred = model.forward(X)
        f = criterion(y_pred, y)
        optimizer.zero_grad()
        f.backward()
        f_avg_val += f
        optimizer.step()
    group_sparsity, _ = optimizer.compute_group_sparsity_omega()
    accuracy1, accuracy5 = check_accuracy(model, testloader)
    f_avg_val = f_avg_val.cpu().item() / len(trainloader)
    print("Epoch: {ep}, loss: {f:.2f}, group_sparsity: {gs:.2f}, acc1: {acc:.4f}".format(ep=epoch, f=f_avg_val, gs=group_sparsity, acc=accuracy1))

Epoch: 0, loss: 1.89, group_sparsity: 0.00, acc1: 0.2233
Epoch: 1, loss: 1.76, group_sparsity: 0.00, acc1: 0.2788
Epoch: 2, loss: 1.70, group_sparsity: 0.05, acc1: 0.3312
Epoch: 3, loss: 1.67, group_sparsity: 0.12, acc1: 0.2701
Epoch: 4, loss: 1.64, group_sparsity: 0.16, acc1: 0.2850
...
Epoch: 25, loss: 1.40, group_sparsity: 0.70, acc1: 0.4676
Epoch: 26, loss: 1.40, group_sparsity: 0.70, acc1: 0.3705
Epoch: 27, loss: 1.40, group_sparsity: 0.70, acc1: 0.4549
Epoch: 28, loss: 1.39, group_sparsity: 0.70, acc1: 0.3854
Epoch: 29, loss: 1.39, group_sparsity: 0.70, acc1: 0.4151


### Step 5. Get compressed model in ONNX format

In [6]:
# A DemoNet_compressed.onnx will be generated. 
oto.compress()

### (Optional) Compute FLOPs and number of parameters before and after OTO training

In [7]:
full_flops = oto.compute_flops()
compressed_flops = oto.compute_flops(compressed=True)
full_num_params = oto.compute_num_params()
compressed_num_params = oto.compute_num_params(compressed=True)

print("Full FLOPs (M): {f_flops:.2f}. Compressed FLOPs (M): {c_flops:.2f}. Reduction Ratio: {f_ratio:.4f}"\
      .format(f_flops=full_flops, c_flops=compressed_flops, f_ratio=1 - compressed_flops/full_flops))
print("Full # Params: {f_params}. Compressed # Params: {c_params}. Reduction Ratio: {f_ratio:.4f}"\
      .format(f_params=full_num_params, c_params=compressed_num_params, f_ratio=1 - compressed_num_params/full_num_params))

Full FLOPs (M): 605.78. Compressed FLOPs (M): 50.21. Reduction Ratio: 0.9171
Full # Params: 627338. Compressed # Params: 51840. Reduction Ratio: 0.9174


### (Optional) Check the compressed model accuracy

*Both full and compressed model should return the exact same accuracy.*

In [8]:
from utils.utils import check_accuracy_onnx
testloader = torch.utils.data.DataLoader(testset, batch_size=1, shuffle=False, num_workers=4)

acc1_full, acc5_full = check_accuracy(model, testloader)
print("Full model: Acc 1: {acc1}, Acc 5: {acc5}".format(acc1=acc1_full, acc5=acc5_full))

acc1_compressed, acc5_compressed = check_accuracy_onnx(oto.compressed_model_path, testloader)
print("Compressed model: Acc 1: {acc1}, Acc 5: {acc5}".format(acc1=acc1_compressed, acc5=acc5_compressed))

Full model: Acc 1: 0.4151, Acc 5: 0.9089
Compressed model: Acc 1: 0.4151, Acc 5: 0.9089
