Compact Transformer #20133

astariul · 2022-11-09T09:06:06Z

Model description

Escaping the Big Data Paradigm with Compact Transformers

Abstract :

With the rise of Transformers as the standard for language processing, and their advancements in computer vision, there has been a corresponding growth in parameter size and amounts of training data. Many have come to believe that because of this, transformers are not suitable for small sets of data. This trend leads to concerns such as: limited availability of data in certain scientific domains and the exclusion of those with limited resource from research in the field. In this paper, we aim to present an approach for small-scale learning by introducing Compact Transformers. We show for the first time that with the right size, convolutional tokenization, transformers can avoid overfitting and outperform state-of-the-art CNNs on small datasets. Our models are flexible in terms of model size, and can have as little as 0.28M parameters while achieving competitive results. Our best model can reach 98% accuracy when training from scratch on CIFAR-10 with only 3.7M parameters, which is a significant improvement in data-efficiency over previous Transformer based models being over 10x smaller than other transformers and is 15% the size of ResNet50 while achieving similar performance. CCT also outperforms many modern CNN based approaches, and even some recent NAS-based approaches. Additionally, we obtain a new SOTA result on Flowers-102 with 99.76% top-1 accuracy, and improve upon the existing baseline on ImageNet (82.71% accuracy with 29% as many parameters as ViT), as well as NLP tasks. Our simple and compact design for transformers makes them more feasible to study for those with limited computing resources and/or dealing with small datasets, while extending existing research efforts in data efficient transformers.

Open source status

The model implementation is available
The model weights are available

Provide useful links for the implementation

Paper : https://arxiv.org/pdf/2104.05704.pdf
Github repository : https://github.com/SHI-Labs/Compact-Transformers

navinelahi · 2022-12-04T11:16:18Z

Are you willing to collaborate to make this available at HF transformers? @astariul . If so, please connect with me

atharvakavitkar · 2023-04-05T09:41:25Z

Hi @astariul and @navinelahi, are there any updates on this issue? May I start working on this?

astariul added the New model label Nov 9, 2022

rishabbala mentioned this issue Jun 26, 2023

Add Compact Convolutional Transformer model (CCT) #24507

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compact Transformer #20133

Compact Transformer #20133

astariul commented Nov 9, 2022 •

edited

Loading

navinelahi commented Dec 4, 2022

atharvakavitkar commented Apr 5, 2023 •

edited

Loading

Compact Transformer #20133

Compact Transformer #20133

Comments

astariul commented Nov 9, 2022 • edited Loading

Model description

Escaping the Big Data Paradigm with Compact Transformers

Open source status

Provide useful links for the implementation

navinelahi commented Dec 4, 2022

atharvakavitkar commented Apr 5, 2023 • edited Loading

astariul commented Nov 9, 2022 •

edited

Loading

atharvakavitkar commented Apr 5, 2023 •

edited

Loading