Skip to content

Latest commit

 

History

History
26 lines (20 loc) · 2.56 KB

MODEL_ZOO.md

File metadata and controls

26 lines (20 loc) · 2.56 KB

Pretrained models

We provide a collection of models trained with semantic softmax on ImageNet-21K-P dataset. All results are on input resolution of 224.
For proper comparison between the models, we also provide some throughput metrics.

Backbone ImageNet-21K-P semantic
top-1 Accuracy
[%]
ImageNet-1K
top-1 Accuracy
[%]
Maximal
batch size
Maximal
training speed
(img/sec)
Maximal
inference speed
(img/sec)
MobilenetV3_large_100 73.1 78.0 488 1210 5980
OFA_flops_595m_s 75.0 81.0 288 500 3240
ResNet50 75.6 82.0 320 720 2760
Mixer-B-16 76.3 82.3 160 420 1420
TResNet-M 76.4 83.1 520 670 2970
TResNet-L (V2) 76.7 83.9 240 300 1460
ViT-B-16 77.6 84.4 160 340 1140

To initialize the different models and properly load the weights, use this file.

use the following models names (--model_name): tresnet_m, tresnet_l, ofa_flops_595m_s, resnet50, vit_base_patch16_224, mobilenetv3_large_100

Notes

  • Maximal training and inference speeds were calculated on NVIDIA V100 GPU, with 90% of maximal batch size.
  • ViT model highly benefits from O2 mixed-precision training and inference. O1 mixed-precision speeds (torch.autocast) are lower.
  • We are still optimising ViT hyper parameters on ImageNet-1K training. Accuracy would probably be higher in the future.
  • Our ofa_flops_595m model is slightly different than the orignal model - we converted all hard-sigmoids to regular sigmoids, since they are faster, both on CPU and GPU, and gives better scores. Hence we renamed the model to 'ofa_flops_595m_s'.