This repository contains the code for our paper Negative Yields Positive: Unified Dual-Path Adapter for Vision-Language Models
. [Paper]
We test our codebase with PyTorch 1.12.1 with CUDA 11.6. Please install corresponding PyTorch and CUDA versions according to your computational resources. Then install the rest of required packages by running pip install -r requirements.txt
.
Please follow the following file provided by Tip-Adapter to download all the 11 datasets we used for experiments: https://github.com/gaopengcuhk/Tip-Adapter/blob/main/DATASET.md
You can extract the features by running CUDA_VISIBLE_DEVICES=0 python extract_features.py
.
After running, you can get all the image features from tran/val/test set, as well as the positive/negative textual features in caches/[dataset_name]
.
You can simply run CUDA_VISIBLE_DEVICES=0 python main.py --config configs/[dataset_name].yaml --shot [shot_number]
to train and test the DualAdapter model.
Here, dataset_name
should be one of [caltech101, dtd, eurosat, fgvc, food101, imagenet, oxford_flowers, oxford_pets, stanford_cars, sun397, ucf101]
, and shot_number
is chosen from 1/2/4/8/16.
Our codebase is adapted from Tip-Adapter, CLIP, APE, and CuPL. We thank the authors for releasing their code!
If you have any questions, please contact at cezhang@cs.cmu.edu.
If you find this code useful, please consider citing our work:
@article{zhang2024negative,
title={Negative Yields Positive: Unified Dual-Path Adapter for Vision-Language Models},
author={Zhang, Ce and Stepputtis, Simon and Sycara, Katia and Xie, Yaqi},
journal={arXiv preprint arXiv:2403.12964},
year={2024}
}