Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions

This repository contains PyTorch evaluation code, training code and pretrained models for PVT (Pyramid Vision Transformer).

Like ResNet, PVT is a pure transformer backbone that can be easily plugged in most downstream task models.

With a comparable number of parameters, PVT-Small+RetinaNet achieves 40.4 AP on the COCO dataset, surpassing ResNet50+RetinNet (36.3 AP) by 4.1 AP.

Figure 1: Performance of RetinaNet 1x with different backbones.

This repository is developed on top of pytorch-image-models and deit.

For details see Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions.

If you use this code for a paper please cite:

@misc{wang2021pyramid,
      title={Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions}, 
      author={Wenhai Wang and Enze Xie and Xiang Li and Deng-Ping Fan and Kaitao Song and Ding Liang and Tong Lu and Ping Luo and Ling Shao},
      year={2021},
      eprint={2102.12122},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Done

PVT-Tiny/-Small
PVT + Object Detection

Todo List

ImageNet model weights
PVT + Semantic FPN configs & models
PVT + DETR/Sparse R-CNN config & models
PVT + Trans2Seg config & models

Usage

First, clone the repository locally:

git clone https://github.com/whai362/PVT.git

Then, install PyTorch 1.6.0+ and torchvision 0.7.0+ and pytorch-image-models 0.3.2:

conda install -c pytorch pytorch torchvision
pip install timm==0.3.2

Data preparation

Download and extract ImageNet train and val images from http://image-net.org/. The directory structure is the standard layout for the torchvision datasets.ImageFolder, and the training and validation data is expected to be in the train/ folder and val folder respectively:

/path/to/imagenet/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class/2
      img4.jpeg

Model Zoo

Object Detection

Detection configs & models see here.

Method	Lr schd	box AP	mask AP	Config	Download
PVT-Tiny + RetinaNet (800x)	1x	36.7	-	config	Todo.
PVT-Small + RetinaNet (640x)	1x	38.7	-	config	model
PVT-Small + RetinaNet (800x)	1x	40.4	-	config	model
R50 + DETR	50ep	32.3	-	config	Todo.
PVT-Small + DETR	50ep	34.7	-	config	Todo.

Image Classification

We provide baseline PVT models pretrained on ImageNet 2012.

name	acc@1	#params (M)	url
PVT-Tiny	75.1	13.2	51 M, PyTorch<=1.5
PVT-Small	79.8	24.5	93 M, PyTorch<=1.5
PVT-Medium	81.2	44.2	Todo.
PVT-Large	81.7	61.4	Todo.

Evaluation

To evaluate a pre-trained PVT-Small on ImageNet val with a single GPU run:

sh dist_train.sh pvt_small 1 /path/to/checkpoint_root --data-path /path/to/imagenet --resume /path/to/checkpoint_file --eval

This should give

* Acc@1 79.764 Acc@5 94.950 loss 0.885
Accuracy of the network on the 50000 test images: 79.8%

Training

To train PVT-Small on ImageNet on a single node with 8 gpus for 300 epochs run:

sh dist_train.sh pvt_small 8 /path/to/checkpoint_root --data-path /path/to/imagenet

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.circleci		.circleci
.github		.github
detection		detection
mcloader		mcloader
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
datasets.py		datasets.py
dist_resume.sh		dist_resume.sh
dist_train.sh		dist_train.sh
engine.py		engine.py
hubconf.py		hubconf.py
losses.py		losses.py
main.py		main.py
pvt.py		pvt.py
requirements.txt		requirements.txt
run_with_submitit.py		run_with_submitit.py
samplers.py		samplers.py
tox.ini		tox.ini
utils.py		utils.py

License

WellXiong/PVT

Folders and files

Latest commit

History

Repository files navigation

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions

Done

Todo List

Usage

Data preparation

Model Zoo

Object Detection

Image Classification

Evaluation

Training

License

About

Resources

License

Stars

Watchers

Forks

Languages