Skip to content

Latest commit

 

History

History
596 lines (577 loc) · 23.3 KB

MODEL_ZOO.md

File metadata and controls

596 lines (577 loc) · 23.3 KB

Model Zoo

Introduction

This file documents a collection of baselines trained with pycls, primarily for the Designing Network Design Spaces paper. All configurations for these baselines are located in the configs/dds_baselines directory. The tables below provide results and useful statistics about training and inference. Links to the pretrained models are provided as well. The following experimental and training settings are used for all of the training and inference runs.

Experimental Settings

  • All baselines were run on Big Basin servers with 8 NVIDIA Tesla V100 GPUs (16GB GPU memory).
  • All baselines were run using PyTorch 1.6, CUDA 9.2, and cuDNN 7.6.
  • Inference times are reported for 64 images on 1 GPU for all models.
  • Training times are reported for 100 epochs on 8 GPUs with the batch size listed.
  • The reported errors are averaged across 5 reruns for robust estimates.
  • The provided checkpoints are from the runs with errors closest to the average.
  • All models and results below are on the ImageNet-1k dataset.
  • The model id column is provided for ease of reference.

Training Settings

Our primary goal is to provide simple and strong baselines that are easy to reproduce. For all models, we use our basic training settings without any training enhancements (e.g., DropOut, DropConnect, AutoAugment, EMA, etc.) or testing enhancements (e.g., multi-crop, multi-scale, flipping, etc.); please see our Designing Network Design Spaces paper for more information.

  • We use SGD with mometum of 0.9, a half-period cosine schedule, and train for 100 epochs.
  • For ResNet/ResNeXt/RegNet, we use a reference learning rate of 0.1 and a weight decay of 5e-5 (see Figure 21).
  • For EfficientNet, we use a reference learning rate of 0.2 and a weight decay of 1e-5 (see Figure 22).
  • The actual learning rate for each model is computed as (batch-size / 128) * reference-lr.
  • For training, we use aspect ratio, flipping, PCA, and per-channel mean and SD normalization.
  • At test time, we rescale images to (256 / 224) * train-res and take the center crop of train-res.
  • For ResNet/ResNeXt/RegNet, we use the image size of 224x224 for training.
  • For EfficientNet, the training image size varies following the original paper.

For 8 GPU training, we apply 5 epoch gradual warmup, following the ImageNet in 1 Hour paper. Note that the learning rate scaling rule described above is similar to the one from the ImageNet in 1 Hour paper but the number of images per GPU varies among models. To understand how the configs are adjusted, please see the examples in the configs/lr_scaling directory.

Baselines

RegNetX Models

model flops
(B)
params
(M)
acts
(M)
batch
size
infer
(ms)
train
(hr)
error
(top-1)
model id download
RegNetX-200MF 0.2 2.7 2.2 1024 10 2.8 31.1 160905981 model
RegNetX-400MF 0.4 5.2 3.1 1024 15 3.9 27.3 160905967 model
RegNetX-600MF 0.6 6.2 4.0 1024 17 4.4 25.9 160906442 model
RegNetX-800MF 0.8 7.3 5.1 1024 21 5.7 24.8 160906036 model
RegNetX-1.6GF 1.6 9.2 7.9 1024 33 8.7 23.0 160990626 model
RegNetX-3.2GF 3.2 15.3 11.4 512 57 14.3 21.7 160906139 model
RegNetX-4.0GF 4.0 22.1 12.2 512 69 17.1 21.4 160906383 model
RegNetX-6.4GF 6.5 26.2 16.4 512 92 23.5 20.8 161116590 model
RegNetX-8.0GF 8.0 39.6 14.1 512 94 22.6 20.7 161107726 model
RegNetX-12GF 12.1 46.1 21.4 512 137 32.9 20.3 160906020 model
RegNetX-16GF 15.9 54.3 25.5 512 168 39.7 20.0 158460855 model
RegNetX-32GF 31.7 107.8 36.3 256 318 76.9 19.5 158188473 model

RegNetY Models

model flops
(B)
params
(M)
acts
(M)
batch
size
infer
(ms)
train
(hr)
error
(top-1)
model id download
RegNetY-200MF 0.2 3.2 2.2 1024 11 3.1 29.6 176245422 model
RegNetY-400MF 0.4 4.3 3.9 1024 19 5.1 25.9 160906449 model
RegNetY-600MF 0.6 6.1 4.3 1024 19 5.2 24.5 160981443 model
RegNetY-800MF 0.8 6.3 5.2 1024 22 6.0 23.7 160906567 model
RegNetY-1.6GF 1.6 11.2 8.0 1024 39 10.1 22.0 160906681 model
RegNetY-3.2GF 3.2 19.4 11.3 512 67 16.5 21.0 160906834 model
RegNetY-4.0GF 4.0 20.6 12.3 512 68 16.8 20.6 160906838 model
RegNetY-6.4GF 6.4 30.6 16.4 512 104 26.1 20.1 160907112 model
RegNetY-8.0GF 8.0 39.2 18.0 512 113 28.1 20.1 161160905 model
RegNetY-12GF 12.1 51.8 21.4 512 150 36.0 19.7 160907100 model
RegNetY-16GF 15.9 83.6 23.0 512 189 45.6 19.6 161303400 model
RegNetY-32GF 32.3 145.0 30.3 256 319 76.0 19.0 161277763 model

ResNet Models

model flops
(B)
params
(M)
acts
(M)
batch
size
infer
(ms)
train
(hr)
error
(top-1)
model id download
ResNet-50 4.1 22.6 11.1 256 53 12.2 23.2 161235311 model
ResNet-101 7.8 44.6 16.2 256 90 20.4 21.4 161167170 model
ResNet-152 11.5 60.2 22.6 256 130 29.2 20.9 161167467 model

ResNeXt Models

model flops
(B)
params
(M)
acts
(M)
batch
size
infer
(ms)
train
(hr)
error
(top-1)
model id download
ResNeXt-50 4.2 25.0 14.4 256 78 18.0 21.9 161167411 model
ResNeXt-101 8.0 44.2 21.2 256 137 31.8 20.7 161167590 model
ResNeXt-152 11.7 60.0 29.7 256 197 45.7 20.4 162471172 model

EfficientNet Models

model flops
(B)
params
(M)
acts
(M)
batch
size
infer
(ms)
train
(hr)
error
(top-1)
model id download
EfficientNet-B0 0.4 5.3 6.7 256 34 11.7 24.9 161305613 model
EfficientNet-B1 0.7 7.8 10.9 256 52 15.6 24.1 161304979 model
EfficientNet-B2 1.0 9.2 13.8 256 68 18.4 23.4 161305015 model
EfficientNet-B3 1.8 12.0 23.8 256 114 32.1 22.5 161305060 model
EfficientNet-B4 4.2 19.0 48.5 128 240 65.1 21.2 161305098 model
EfficientNet-B5 9.9 30.0 98.9 64 504 135.1 21.5 161305138 model