Skip to content

zstarN70/RLRR

Repository files navigation

Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach (RLRR)


This repository is the official implementation of RLRR. In this study, we approach the problem from the perspective of Singular Value Decomposition (SVD) of pre-trained parameter matrices, providing insights into the tuning dynamics of existing methods. 在这里插入图片描述

Usage


Environment

To install requirements:

conda env create -n RLRR -f environment.yaml

Before running the code, please activate this conda environment.

Data Preparation

  • FGVC & vtab-1k

You can follow VPT to download them.

Since the original vtab dataset is processed with tensorflow scripts and the processing of some datasets is tricky, we also upload the extracted vtab-1k dataset in onedrive for your convenience. You can download from here and then use them with our vtab.py directly. (Note that the license is in vtab dataset).

Pre-trained model preparation

  • For pre-trained ViT, Swin-B models on ImageNet-21K. You can also manually download them from ViT,Swin Transformer.

Train & Inference

  • Clone this repo:
git clone https://github.com/zstarN70/RLRR.git
cd RLRR
  • To fine-tune a pre-trained ViT model on VTAB, run:
CUDA_VISIBLE_DEVICES=0 python  train_vtab.py --dataset_name=kitti
  • To fine-tune a pre-trained ViT model on FGVC, run:
CUDA_VISIBLE_DEVICES=0 python  train_fgvc.py --dataset_name=kitti

Citation

If this project is helpful for you, you can cite our paper:


Acknowledgement

The code is built upon timm. The processing of the vtab-1k dataset refers to vpt, vtab github repo, and NOAH.

Link

As I am preparing for my graduation, the code was prepared in a hurry, and many details were not checked in time. If you have any questions, please contact me:zstar@xauat.edu.cn

About

Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach, CVPR 2024

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages