This repository contains code and pre-trained models for our NAACL-2022 paper MCSE: Multimodal Contrastive Learning of Sentence Embeddings. If you find this repository useful, please consider citing our paper.
Contact: Miaoran Zhang (mzhang@lsv.uni-saarland.de)
Model | Avg. STS |
---|---|
mcse-flickr-bert-base-uncased [Google Drive] [Huggingface] | 77.70 |
mcse-flickr-roberta-base [Google Drive] [Huggingface] | 78.44 |
mcse-coco-bert-base-uncased [Google Drive] [Huggingface] | 77.08 |
mcse-coco-roberta-base [Google Drive] [Huggingface] | 78.17 |
Note: flickr
indicates that models are trained on wiki+flickr, and coco
indicates that models are trained on wiki+coco.
- Python 3.9.5
- Pytorch 1.7.1
- Install other packages:
pip install -r requirements.txt
Please organize the data directory as following:
REPO ROOT
|
|--data
| |--wiki1m_for_simcse.txt
| |--flickr_random_captions.txt
| |--flickr_resnet.hdf5
| |--coco_random_captions.txt
| |--coco_resnet.hdf5
Wiki1M
wget https://huggingface.co/datasets/princeton-nlp/datasets-for-simcse/resolve/main/wiki1m_for_simcse.txt
Flickr30k & MS-COCO
You can either download the preprocessed data we used:
(annotation sources: flickr30k-entities and coco).
Or preprocess the data by yourself (take Flickr30k as an example):
- Download the flickr30k-entities.
- Request access to the flickr-images from here. Note that the use of the images much abide by the Flickr Terms of Use.
- Run script:
unzip ${path_to_flickr-entities}/annotations.zip python preprocess/prepare_flickr.py \ --flickr_entities_dir ${path_to_flickr-entities} \ --flickr_images_dir ${path_to_flickr-images} \ --output_dir data/ --batch_size 32
-
Prepare the senteval datasets for evaluation:
cd SentEval/data/downstream/ bash download_dataset.sh
-
Run scripts:
# For example: (more examples are given in scripts/.) sh scripts/run_wiki_flickr.sh
Note: In the paper we run experiments with 5 seeds (0,1,2,3,4). You can find the detailed parameter settings in Appendix.