Skip to content

zjukg/DUET

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DUET

license arxiv badge AAAI Pytorch

In this paper, we present a transformer-based end-to-end ZSL method named DUET, which integrates latent semantic knowledge from the pre-trained language models (PLMs) via a self-supervised multi-modal learning paradigm. Specifically, we (1) developed a cross-modal semantic grounding network to investigate the model's capability of disentangling semantic attributes from the images; (2) applied an attribute-level contrastive learning strategy to further enhance the model's discrimination on fine-grained visual characteristics against the attribute co-occurrence and imbalance; (3) proposed a multi-task learning policy for considering multi-model objectives.

  • Due to the page and format restrictions set by AAAI publications, we have omitted some details and appendix content. For the complete version of the paper, including the selection of prompts and experiment details, please refer to our arXiv version.

🔔 News

🤖 Model Architecture

Model_architecture

📚 Dataset Download

  • The cache data for (CUB, AWA, SUN) are available here (Baidu cloud, 19.89G, Code: s07d).

📕 Code Path

Code Structures

There are four parts in the code.

  • model: It contains the main files for DUET network.
  • data: It contains the data splits for different datasets.
  • cache: It contains some cache files.
  • script: The training scripts for DUET.
DUET
├── cache
│   ├── AWA2
│   │   ├── attributeindex2prompt.json
│   │   └── id2imagepixel.pkl
│   ├── CUB
│   │   ├── attributeindex2prompt.json
│   │   ├── id2imagepixel.pkl
│   │   └── mapping.json
│   └── SUN
│   │   ├── attributeindex2prompt.json
│   │   ├── id2imagepixel.pkl
│   │   └── mapping.json
├── data
│   ├── AWA2
│   │   ├── APN.mat
│   │   ├── TransE_65000.mat
│   │   ├── att_splits.mat
│   │   ├── attri_groups_9.json
│   │   ├── kge_CH_AH_CA_60000.mat
│   │   └── res101.mat
│   ├── CUB
│   │   ├── APN.mat
│   │   ├── att_splits.mat
│   │   ├── attri_groups_8.json
│   │   ├── attri_groups_8_layer.json
│   │   └── res101.mat
│   └── SUN
│       ├── APN.mat
│       ├── att_splits.mat
│       ├── attri_groups_4.json
│       └── res101.mat
├── log
│   ├── AWA2
│   ├── CUB
│   └── SUN
├── model
│   ├── log.py
│   ├── main.py
│   ├── main_utils.py
│   ├── model_proto.py
│   ├── modeling_lxmert.py
│   ├── opt.py
│   ├── swin_modeling_bert.py
│   ├── util.py
│   └── visual_utils.py
├── out
│   ├── AWA2
│   ├── CUB
│   └── SUN
└── script
    ├── AWA2
    │   └── AWA2_GZSL.sh
    ├── CUB
    │   └── CUB_GZSL.sh
    └── SUN
        └── SUN_GZSL.sh

🔬 Dependencies

  • Python 3
  • PyTorch >= 1.8.0
  • Transformers>= 4.11.3
  • NumPy
  • All experiments are performed with one RTX 3090Ti GPU.

🎯 Prerequisites

  • Dataset: please download the dataset, i.e., CUB, AWA2, SUN, and change the opt.image_root to the dataset root path on your machine
    • ❗NOTE: For other required feature files like APN.mat and id2imagepixel.pkl, please refer to here.
  • Data split: please download the data folder and place it in ./data/.
  • Attributeindex2prompt.json should generate and place it in ./cache/dataset/.
  • Download pretrained vision Transformer as the vision encoder:

🚀 Train & Eval

The training script for AWA2_GZSL:

bash script/AWA2/AWA2_GZSL.sh
[--dataset {AWA2, SUN, CUB}] [--calibrated_stacking CALIBRATED_STACKING] [--nepoch NEPOCH] [--batch_size BATCH_SIZE] [--manualSeed MANUAL_SEED]
[--classifier_lr LEARNING-RATE] [--xe XE] [--attri ATTRI] [--gzsl] [--patient PATIENT] [--model_name MODEL_NAME] [--mask_pro MASK-PRO] 
[--mask_loss_xishu MASK_LOSS_XISHU] [--xlayer_num XLAYER_NUM] [--construct_loss_weight CONSTRUCT_LOSS_WEIGHT] [--sc_loss SC_LOSS] [--mask_way MASK_WAY]
[--attribute_miss ATTRIBUTE_MISS]

📌 Note:

  • you can open the .sh file for parameter modification.
  • Don't worry if you have any question. Just feel free to let we know via Adding Issues.

🤝 Cite:

Please consider citing this paper if you use the code or data from our work. Thanks a lot :)

@inproceedings{DBLP:conf/aaai/ChenHCGZFPC23,
  author       = {Zhuo Chen and
                  Yufeng Huang and
                  Jiaoyan Chen and
                  Yuxia Geng and
                  Wen Zhang and
                  Yin Fang and
                  Jeff Z. Pan and
                  Huajun Chen},
  title        = {{DUET:} Cross-Modal Semantic Grounding for Contrastive Zero-Shot Learning},
  booktitle    = {{AAAI}},
  pages        = {405--413},
  publisher    = {{AAAI} Press},
  year         = {2023}
}

Flag Counter