## Benchmark IN-trained classifiers on `ImageNet-Hard`

<a target="_blank" href="https://colab.research.google.com/github/taesiri/ZoomIsAllYouNeed/blob/main/src/ImageNet_Hard/Benchmark-ImageNet-Hard.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

* [Website](https://taesiri.github.io/ZoomIsAllYouNeed/)
* [Github](https://github.com/taesiri/ZoomIsAllYouNeed)

In [3]:
!pip install transformers datasets timm

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.27.2-py3-none-any.whl (6.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.8/6.8 MB[0m [31m27.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting datasets
  Downloading datasets-2.10.1-py3-none-any.whl (469 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m469.0/469.0 KB[0m [31m11.0 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m60.2 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.11.0
  Downloading huggingface_hub-0.13.3-py3-none-any.whl (199 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m199.8/199.8 KB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
Collectin

In [4]:
import torch
import pandas as pd
import numpy as np
import torchvision.transforms as transforms
import torchvision.models as models
from tqdm import tqdm
from torch.utils.data import DataLoader
from datasets import load_dataset

In [5]:
print(torch.cuda.device_count())
print(torch.cuda.get_device_name(device=0))
print(torch.__version__)

1
Tesla T4
1.13.1+cu116


## Transforms

In [6]:
standard_transform = transforms.Compose(
  [transforms.Resize(256), 
   transforms.CenterCrop(224), 
   transforms.ToTensor(), 
   transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])

## Datasets


In [7]:
imagenet_hard_dataset = load_dataset('taesiri/imagenet-hard', split='validation')
imagenet_hard_dataset

Downloading readme:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading and preparing dataset None/None to /root/.cache/huggingface/datasets/taesiri___parquet/taesiri--imagenet-hard-124be08d1e33678b/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec...


Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/314M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.64G [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.96G [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/2.07G [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating validation split:   0%|          | 0/23845 [00:00<?, ? examples/s]

Dataset parquet downloaded and prepared to /root/.cache/huggingface/datasets/taesiri___parquet/taesiri--imagenet-hard-124be08d1e33678b/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec. Subsequent calls will reuse this data.


Dataset({
    features: ['image', 'label'],
    num_rows: 23845
})

In [8]:
def preprocess_batch(batch):
    batch['image'] = [standard_transform(image.convert('RGB')) for image in batch['image']]
    return batch

imagenet_hard_dataset.set_transform(preprocess_batch)

In [9]:
# Indices of classes that should be masked out (set to False) in ImageNet's 1000 classes
false_mask = [ 12,  13,  24, 333, 339, 340, 352, 354, 386, 400, 404, 430, 444, 466, 510, 527, 630, 668, 746, 779, 802, 890, 916, 919, 954, 981, 984, 985]
mask = np.ones(1000, dtype=bool)
mask[false_mask] = False

In [10]:
# Helpers
concat = lambda x: np.concatenate(x, axis=0)
to_np = lambda x: x.data.to('cpu').numpy()

## Benchmark

In [11]:
def run_benchmark_masked(model, bs=16):
  model.cuda()
  model.eval()
  
  
  loader = DataLoader(imagenet_hard_dataset, batch_size=256, num_workers=2)

  correct_ones = 0
  with torch.inference_mode():
    with torch.no_grad():
      for i, (batch) in enumerate(tqdm(loader)):
        images, target = batch['image'], batch['label']
        images = images.cuda()
        target = target.cuda()
        
        model_output = model(images)[:,mask]
        pred = model_output.data.max(1)[1]
        correct_ones += pred.eq(target.data).sum().item()
  return 100*correct_ones/len(imagenet_hard_dataset)

In [12]:
model_names = ['resnet50', 'resnet18', 'alexnet', 'vgg19', 'vit_b_32']

In [13]:
accuracy = {}

for name in model_names:
  model = models.__dict__[name](pretrained=True)
  accuracy[name] = run_benchmark_masked(model)
  print(f'{name} accuracy: {accuracy[name]}')

Downloading: "https://download.pytorch.org/models/resnet50-0676ba61.pth" to /root/.cache/torch/hub/checkpoints/resnet50-0676ba61.pth


  0%|          | 0.00/97.8M [00:00<?, ?B/s]

100%|██████████| 94/94 [05:46<00:00,  3.69s/it]


resnet50 accuracy: 11.52023485007339


Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /root/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth


  0%|          | 0.00/44.7M [00:00<?, ?B/s]

100%|██████████| 94/94 [05:12<00:00,  3.33s/it]


resnet18 accuracy: 8.647515202348501


Downloading: "https://download.pytorch.org/models/alexnet-owt-7be5be79.pth" to /root/.cache/torch/hub/checkpoints/alexnet-owt-7be5be79.pth


  0%|          | 0.00/233M [00:00<?, ?B/s]

100%|██████████| 94/94 [05:06<00:00,  3.27s/it]


alexnet accuracy: 5.6280142587544555


Downloading: "https://download.pytorch.org/models/vgg19-dcbb9e9d.pth" to /root/.cache/torch/hub/checkpoints/vgg19-dcbb9e9d.pth


  0%|          | 0.00/548M [00:00<?, ?B/s]

100%|██████████| 94/94 [05:52<00:00,  3.75s/it]


vgg19 accuracy: 9.00398406374502


Downloading: "https://download.pytorch.org/models/vit_b_32-d86f8d99.pth" to /root/.cache/torch/hub/checkpoints/vit_b_32-d86f8d99.pth


  0%|          | 0.00/337M [00:00<?, ?B/s]

100%|██████████| 94/94 [05:28<00:00,  3.49s/it]

vit_b_32 accuracy: 16.682742713357097





In [14]:
pd.DataFrame(accuracy, index=['accuracy']).T.round(2)

Unnamed: 0,accuracy
resnet50,11.52
resnet18,8.65
alexnet,5.63
vgg19,9.0
vit_b_32,16.68
