This repository contains the implementation of compressing DeiT-B into UP-DeiT-T.
python >= 3.7
torch >= 1.4.0
torchvision >= 0.5.0
We provide the detailed requirements in requirements.txt. You can run conda install --yes --file requirements.txt
to create the same running environment as ours.
We use standard ImageNet dataset. You can download it from http://image-net.org/.
To generate the proxy dataset, assume the ImageNet dataset is saved on /mnt/ramdisk/
, run:
python cp_image.py
Download the pretrained model DeiT-B into the pretrainmodel
floader, then run:
./run_importance.sh
Note: We have provided the importance of each channel of DeiT-B on the importance folder, including the proxy size of 2k and 5k.
After generating importance, run:
python sample_sub_model.py
To train a UP-DeiT-T
on ImageNet from scratch, run:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python main_distill.py --dist-url 'tcp://127.0.0.1:12342' --dist-backend 'nccl' --multiprocessing-distributed --world-size 1 --rank 0 -j <worker-size> --learning-rate <learning-rate> --name <save-name> --wd <weight-decay> -b <batch-size> --alpha <cumix-ratio> --epochs <epoch-number> --resume <pretrainde-sub-model> <imagenet-path>