Skip to content

zhaohengz/CAILA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CAILA: Concept-Aware Intra-Layer Adapters for Compositional Zero-Shot Learning [WACV 2024]

CAILA: Concept-Aware Intra-Layer Adapters for Compositional Zero-Shot Learning
Zhaoheng Zheng, Haidong Zhu and Ram Nevatia

Official implementation of CAILA: Concept-Aware Intra-Layer Adapters for Compositional Zero-Shot Learning.

Installation

We build our model based on Python 3.8 and PyTorch 1.13. To prepare the environment, please follow the instructions below.

  • Create a conda environment and install the requirements:
     conda create -n caila-release python=3.8.13 pip
    
  • Enter the environment:
     conda activate caila-release
    
  • Install the requirements:
     pip install -r requirements.txt
    

Datasets

For MIT-States, C-GQA and UT-Zappos, please run the following script to download the datasets to the directory you desire (DATA_ROOT in our example):

bash ./utils/download_data.sh DATA_ROOT

For VAW-CZSL, please follow the instruction in the official repo.

The DATA_ROOT folder should be organized as following:

DATA_ROOT/
├── mit-states/
│   ├── images/
│   ├── compositional-split-natural/
├── cgqa/
│   ├── images/
│   ├── compositional-split-natural/
├── ut-zap50k/
│   ├── images/
│   ├── compositional-split-natural/
├── vaw-czsl/
│   ├── images/
│   ├── compositional-split-natural/

After preparing the data, set the DATA_FOLDER variable in flags.py to your data path.

If you encounter any FileNotFoundError regarding the split files, please find them here: Link.

Model Zoo

Dataset AUC (Base/Large) Download
MIT-States 16.1 / 23.4 Base / Large
C-GQA 10.4 / 14.8 Base / Large
UT-Zappos 39.0 / 44.1 Base / Large
VAW-CZSL* 17.1 / 19.0 V / V+L

*For VAW-CZSL, we provide two variations of Large model: one has adapters on the vision side (V) and the other has adapters on both the vision and language sides (V+L). The V+L model requires more GPU memory.

Evaluation

To evaluate the model, put the downloaded checkpoint in a folder. We use mit-base as an example:

checkpoints/
├── mit-base/
│   ├── ckpt_best_auc.t7

Then, run the following command to evaluate the model:

python test.py --config configs/caila/mit.yml --logpath checkpoints/mit-base

Training

First, please download CLIP checkpoints from HuggingFace: VIT-B/32 and VIT-L/14 and put them under clip_ckpts as following:

clip_ckpts/
├── clip-vit-base-patch32.pth
├── clip-vit-large-patch14.pth

Then, run the following command to train the model:

torchrun --nproc_per_node=$N_GPU train.py --config CONFIG_FILE

where CONFIG_FILE is the path to the config file. We provide the config files for all the experiments in the configs folder. For example, to train the base model on MIT-States, run:

torchrun --nproc_per_node=$N_GPU train.py --config configs/caila/mit.yml

Citing CAILA

If you find CAILA useful in your research, please consider citing:

@InProceedings{Zheng_2024_WACV,
    author    = {Zheng, Zhaoheng and Zhu, Haidong and Nevatia, Ram},
    title     = {CAILA: Concept-Aware Intra-Layer Adapters for Compositional Zero-Shot Learning},
    booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
    month     = {January},
    year      = {2024},
    pages     = {1721-1731}
}

About

Official implementation of "CAILA: Concept-Aware Intra-Layer Adapters for Compositional Zero-Shot Learning" [WACV 2024]

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published