This paper aims to develop a network that can outperform not only the canonical transformers, but also the high-performance convolutional models. We propose a new transformer based hybrid network by taking advantage of transformers to capture long-range dependencies, and of CNNs to model local features. Furthermore, we scale it to obtain a family of models, called CMTs, obtaining much better accuracy and efficiency than previous convolution and transformer based models.
Paper: Jianyuan Guo, Kai Han, Han Wu, Chang Xu, Yehui Tang, Chunjing Xu, Yunhe Wang. CMT: Convolutional Neural Networks Meet Vision Transformers. Accepted in CVPR 2022.
A block of CMT is shown below:
Dataset used: [ImageNet2012]
- Dataset size 224*224 colorful images in 1000 classes
- Train:1,281,167 images
- Test: 50,000 images
- Data format:jpeg
- Note:Data will be processed in dataset.py
- Hardware(Ascend/GPU)
- Prepare hardware environment with Ascend or GPU.
- Framework
- For more information, please check the resources below£º
CMT
├── eval.py # inference entry
├── fig
│ └── CMT.PNG # the illustration of CMT network
├── readme.md # Readme
└── src
├── dataset.py # dataset loader
└── cmt.py # CMT network
After installing MindSpore via the official website, you can start evaluation as follows:
# CMT infer example
GPU: python eval.py --model cmt --dataset_path dataset_path --platform GPU --checkpoint_path [CHECKPOINT_PATH]
checkpoint can be downloaded at https://download.mindspore.cn/model_zoo/.
result: {'acc': 0.832} ckpt= ./cmt_s_ms.ckpt
In dataset.py, we set the seed inside "create_dataset" function. We also use random seed in train.py.
Please check the official homepage.