This is an official implementation for "A Unified Pruning Framework for Vision Transformers".
For UP-DeiT on the image classification task, please see get_started.md for detailed instructions.
ImageNet-1K Pretrained UP-DeiT Models
Model | Top-1 | #Param. | Throughputs |
---|---|---|---|
UP-DeiT-T | 75.94% | 5.7M | 1408.5 |
UP-DeiT-S | 81.56% | 22.1M | 603.1 |
ImageNet-1K Pretrained UP-PVTv2 Models
Model | Top-1 | #Param. | Throughputs |
---|---|---|---|
UP-PVTv2-B0 | 75.30% | 3.7M | 139.9 |
UP-PVTv2-B1 | 79.48% | 14.0M | 249.9 |
Note: Test throughput on a Titan XP GPU with a fixed 32 mini-batch size.
Note: UP-DeiT and UP-PVTv2 have the same architecture as the original DeiT and PVTv2, but with higher accuracy. See our paper for more results.
WikiText-103 Pretrained Neural Language Modeling Model with Adaptive Inputs. Our model is based on Fairseq.
Model | Perplexity | #Param. |
---|---|---|
Original Model | 19.00 | 291M |
UP-Transformer | 19.88 | 95M |
@article{yu2021unified,
title={A unified pruning framework for vision transformers},
author={Yu, Hao and Wu, Jianxin},
journal={arXiv preprint arXiv:2111.15127},
year={2021}
}
If you have any question about our work, please do not hesitate to contact us by emails provided in the paper.