This is the official implementation of On the Difference of BERT-style and CLIP-style Text Encoders at Findings of ACL-2023.
- Python==3.8
- torch==1.12.1
- torchvision==0.13.1
- torchmetrics==0.10.0
- torch-fidelity==0.3.0
- pytorch-lightning==1.7.7
- transformers==4.26.0.dev0
- datasets==2.8.1.dev0
- evaluate==0.4.0
bash run_glue_bert.sh
bash run_glue_clip.sh
Download from this link and put them in
the data/cxc
folder.
python main.py
Download the dataset from this link and put them in the data/celebahq
and data/celebahq-caption
folders.
cd taming-transformers
pip install -e .
bash train.sh
bash generate.sh
If you use or extend our work, please cite our paper at Findings of ACL-2023.
@inproceedings{chen-acl-2023-synesthesia,
title = "On the Difference of BERT-style and CLIP-style Text Encoders",
author = "Chen, Zhihong and
Chen, Guiming Hardy and
Diao, Shizhe and
Wan, Xiang and
Wang, Benyou",
booktitle = "Findings of the Association for Computational Linguistics: ACL 2023",
month = july,
year = "2023",
}