This repository is the official implementation of RLRR. In this study, we approach the problem from the perspective of Singular Value Decomposition (SVD) of pre-trained parameter matrices, providing insights into the tuning dynamics of existing methods.
To install requirements:
conda env create -n RLRR -f environment.yaml
Before running the code, please activate this conda environment.
- FGVC & vtab-1k
You can follow VPT to download them.
Since the original vtab dataset is processed with tensorflow scripts and the processing of some datasets is tricky, we also upload the extracted vtab-1k dataset in onedrive for your convenience. You can download from here and then use them with our vtab.py directly. (Note that the license is in vtab dataset).
- For pre-trained ViT, Swin-B models on ImageNet-21K. You can also manually download them from ViT,Swin Transformer.
- Clone this repo:
git clone https://github.com/zstarN70/RLRR.git
cd RLRR
- To fine-tune a pre-trained ViT model on VTAB, run:
CUDA_VISIBLE_DEVICES=0 python train_vtab.py --dataset_name=kitti
- To fine-tune a pre-trained ViT model on FGVC, run:
CUDA_VISIBLE_DEVICES=0 python train_fgvc.py --dataset_name=kitti
If this project is helpful for you, you can cite our paper:
@inproceedings{dong2024low,
title={Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach},
author={Dong, Wei and Zhang, Xing and Chen, Bihui and Yan, Dawei and Lin, Zhijun and Yan, Qingsen and Wang, Peng and Yang, Yang},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={16101--16110},
year={2024}
}
The code is built upon timm. The processing of the vtab-1k dataset refers to vpt, vtab github repo, and NOAH.
If you have any questions, please contact me:zstar@xauat.edu.cn