Skip to content

yuhao318/UP-ViT

Repository files navigation

UP-ViT

This is an official implementation for "A Unified Pruning Framework for Vision Transformers".

Getting Started

For UP-DeiT on the image classification task, please see get_started.md for detailed instructions.

Main Results on ImageNet-1K with Pretrained Models

ImageNet-1K Pretrained UP-DeiT Models

Model Top-1 #Param. Throughputs
UP-DeiT-T 75.94% 5.7M 1408.5
UP-DeiT-S 81.56% 22.1M 603.1

ImageNet-1K Pretrained UP-PVTv2 Models

Model Top-1 #Param. Throughputs
UP-PVTv2-B0 75.30% 3.7M 139.9
UP-PVTv2-B1 79.48% 14.0M 249.9

Note: Test throughput on a Titan XP GPU with a fixed 32 mini-batch size.

Note: UP-DeiT and UP-PVTv2 have the same architecture as the original DeiT and PVTv2, but with higher accuracy. See our paper for more results.

Main Results on WikiText-103 with Pretrained Models

WikiText-103 Pretrained Neural Language Modeling Model with Adaptive Inputs. Our model is based on Fairseq.

Model Perplexity #Param.
Original Model 19.00 291M
UP-Transformer 19.88 95M

Citation

@article{yu2021unified,
  title={A unified pruning framework for vision transformers},
  author={Yu, Hao and Wu, Jianxin},
  journal={arXiv preprint arXiv:2111.15127},
  year={2021}
}

Contacts

If you have any question about our work, please do not hesitate to contact us by emails provided in the paper.

About

This is an official implementation for "A Unified Pruning Framework for Vision Transformers".

Resources

Stars

Watchers

Forks

Packages

No packages published