Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models

This repository provides the official PyTorch implementation of our CVPR 2024 paper:

Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models
Authors: Yabin Zhang, Wenjie Zhu, Hui Tang, Zhiyuan Ma, Kaiyang Zhou, Lei Zhang

Overview

This repository contains the implementation of DMN for image classification with a pre-trained CLIP. We consider four task settings:

Zero-shot classification in a test-time adaptation manner
Few-shot classification
Training-free few-shot classification
Out-of-distribution generalization

Results on ImageNet dataset under different task settings.

The overall framework of our DMN.

Prerequisites

Hardware

This implementation is for the single-GPU configuration. All experiments can be reproduced on a GPU with more than 10GB memory (e.g., 1080Ti)!

Environment

The code is tested on PyTorch 1.13.1.

Datasets

We suggest downloading all datasets to a root directory (${data_root}), and renaming the directory of each dataset as suggested in ${ID_to_DIRNAME} in ./data/datautils.py. This would allow you to evaluate multiple datasets within the same run.
If this is not feasible, you could evaluate different datasets separately, and change the ${data_root} accordingly in the bash script.

For zero/few-shot classification, we consider 11 datasets:

For out-of-distribution generalization, we consider 4 datasets:

Run DMN

We provide a simple bash script under ./scripts/run.sh. You can modify the paths and other args in the script. One can easily reproduce all results by:

bash ./scripts/run.sh

For simplicity, we use set_id to denote different datasets. A complete list of set_id can be found in ${ID_to_DIRNAME} in ./data/datautils.py.

Main Results

Zero-shot Classification

Few-shot Classification

Few-shot classification results on 11 datasets with a VITB/16 image encoder.

Out-of-Distribution Generalization

Method	ImageNet(IN)	IN-A	IN-V2	IN-R	IN-Sketch	Average	OOD Average
CLIP-RN50	58.16	21.83	51.41	56.15	33.37	44.18	40.69
Ensembled prompt	59.81	23.24	52.91	60.72	35.48	46.43	43.09
CoOp	63.33	23.06	55.40	56.60	34.67	46.61	42.43
CoCoOp	62.81	23.32	55.72	57.74	34.48	46.81	42.82
TPT	60.74	26.67	54.70	59.11	35.09	47.26	43.89
DMN-ZS	63.87	28.57	56.12	61.44	39.84	49.97	46.49

Citation

If you find our code useful or our work relevant, please consider citing:

@inproceedings{zhang2024dual,
  title={Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models},
  author={Zhang, Yabin and Zhu, Wenjie and Tang, Hui and Ma, Zhiyuan and Zhou, Kaiyang and Zhang, Lei},
  booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
  year={2024}
}

Acknowledgements

We thank the authors of CoOp/CoCoOp and TPT for their open-source implementation and instructions on data preparation.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
_site		_site
clip		clip
data		data
figures		figures
scripts		scripts
utils		utils
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
camera_ready_dmn_searched_vit_0.txt		camera_ready_dmn_searched_vit_0.txt
camera_ready_dmn_tf_searched_vit_0.txt		camera_ready_dmn_tf_searched_vit_0.txt
camera_ready_dmn_tf_shift_0.txt		camera_ready_dmn_tf_shift_0.txt
dmn_main.py		dmn_main.py

License

YBZh/DMN

Folders and files

Latest commit

History

Repository files navigation

Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models

Overview

Prerequisites

Hardware

Environment

Datasets

Run DMN

Main Results

Zero-shot Classification

Few-shot Classification

Out-of-Distribution Generalization

Citation

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Languages