Skip to content
/ RLRR Public

Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach, CVPR 2024

Notifications You must be signed in to change notification settings

zstarN70/RLRR

Repository files navigation

Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach (RLRR)


This repository is the official implementation of RLRR. In this study, we approach the problem from the perspective of Singular Value Decomposition (SVD) of pre-trained parameter matrices, providing insights into the tuning dynamics of existing methods. 在这里插入图片描述

Usage


Environment

To install requirements:

conda env create -n RLRR -f environment.yaml

Before running the code, please activate this conda environment.

Data Preparation

  • FGVC & vtab-1k

You can follow VPT to download them.

Since the original vtab dataset is processed with tensorflow scripts and the processing of some datasets is tricky, we also upload the extracted vtab-1k dataset in onedrive for your convenience. You can download from here and then use them with our vtab.py directly. (Note that the license is in vtab dataset).

Pre-trained model preparation

  • For pre-trained ViT, Swin-B models on ImageNet-21K. You can also manually download them from ViT,Swin Transformer.

Train & Inference

  • Clone this repo:
git clone https://github.com/zstarN70/RLRR.git
cd RLRR
  • To fine-tune a pre-trained ViT model on VTAB, run:
CUDA_VISIBLE_DEVICES=0 python  train_vtab.py --dataset_name=kitti
  • To fine-tune a pre-trained ViT model on FGVC, run:
CUDA_VISIBLE_DEVICES=0 python  train_fgvc.py --dataset_name=kitti

Citation

If this project is helpful for you, you can cite our paper:

@inproceedings{dong2024low,
  title={Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach},
  author={Dong, Wei and Zhang, Xing and Chen, Bihui and Yan, Dawei and Lin, Zhijun and Yan, Qingsen and Wang, Peng and Yang, Yang},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={16101--16110},
  year={2024}
}

Acknowledgement

The code is built upon timm. The processing of the vtab-1k dataset refers to vpt, vtab github repo, and NOAH.

Link

If you have any questions, please contact me:zstar@xauat.edu.cn

About

Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach, CVPR 2024

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages