This is the PyTorch code for using LSEH of VSE networks, which is described in the paper " Improving Visual-Semantic Embeddings by Learning Semantically-Enhanced Hard Negatives for Cross-modal Information Retrieval".
We recommend the following dependencies.
-
ubuntu (>=18.04)
-
Python 3.8
-
PyTorch (1.7.1)
-
Punkt Sentence Tokenizer:
import nltk
nltk.download()
> d punkt
We used anaconda to manage the dependencies.
The datasets of Flickr30K and MS-COCO can be downloaded from here or follow VSRN. The dataset of IAPR TC-12 can be downloaded from here. For VSEinfty_LSEH project change the folder names of all datasets as 'precomp', for other projects change the folder names of the datasets as 'f30k_precomp', 'coco_precomp', and 'iaprtc12_precomp' respectively.
Modify "$DATA_PATH" in file "train_caps.txt" and run below.
python SVDdescriptions.py --data_path $DATA_PATH
Or directly download from our data.
See the result file "train_svd.txt" in the root folder "output". Copy "train_svd.txt" into the dataset (e.g. f30k_precomp and coco_precomp).
- For Flickr30K:
python train.py --data_path $DATA_PATH --data_name f30k_precomp --logger_name runs/flickr_VSRN --learning_rate 0.0004 --lr_update 5 --max_violation --max_len 60
- For MS-COCO:
python train.py --data_path $DATA_PATH --data_name coco_precomp --logger_name runs/coco_VSRN --learning_rate 0.0004 --lr_update 6 --max_violation --max_len 60
- For IAPR TC-12:
python IAPRTC12/trainIAPRTC12.py --data_path $DATA_PATH --data_name IAPRTC12_precomp --logger_name runs/IAPR_VSRN --learning_rate 0.0005 --lr_update 20 --max_violation --max_len 60
Modify the "$MODEL_PATH" and "$DATA_PATH" in the evaluation.py file. Then Run evaluation.py:
python evaluation.py/evaluationIAPRTC12.py
- For Flickr30K:
python train.py --data_path $DATA_PATH --data_name f30k --logger_name runs/f30k --model_name runs/f30k --learning_rate 0.0008 --lr_update 10
- For MS-COCO:
python train.py --data_path $DATA_PATH --data_name coco --logger_name runs/coco --model_name runs/coco --learning_rate 0.0008 --lr_update 10
- For IAPR TC-12:
python IAPRTC12/trainIAPRTC12.py --data_path $DATA_PATH --data_name iaprtc12 --logger_name runs/IAPR --model_name runs/IAPR
Modify the "$MODEL_PATH" and "$DATA_PATH" in the evaluation.py file. Then Run eval.py:
python eval.py/evalIAPRTC12.py
- For Flickr30K:
(For SGR) python train.py --data_name f30k_precomp --num_epochs 40 --lr_update 30 --module_name SGR
(For SAF) python train.py --data_name f30k_precomp --num_epochs 30 --lr_update 20 --module_name SAF
- For MS-COCO:
(For SGR) python train.py --data_name coco_precomp --num_epochs 20 --lr_update 10 --module_name SGR
(For SAF) python train.py --data_name coco_precomp --num_epochs 20 --lr_update 10 --module_name SAF
- For IAPR TC-12:
(For SGR) python IAPRTC12/trainIAPRTC12.py --data_name IAPRTC_precomp --module_name SGR
(For SAF) python IAPRTC12/trainIAPRTC12.py --data_name IAPRTC_precomp --module_name SAF
Modify the "$MODEL_PATH" and "$DATA_PATH" in the evaluation.py file. Then Run evaluation.py:
python evaluation.py/evaluationIAPRTC12.py
- For Flickr30K:
python train.py --data_path "$DATA_PATH" --data_name f30k_precomp --logger_name runs/f30k_vse++ --max_violation
- For MS-COCO:
python train.py --data_path "$DATA_PATH" --data_name coco_precomp --logger_name runs/coco_vse++ --max_violation
- For IAPR TC-12:
python IAPRTC12/trainIAPRTC12.py --data_path "$DATA_PATH" --data_name IAPRTC_precomp --logger_name runs/IAPRTC_vse++ --max_violation
Modify the "$MODEL_PATH" and "$DATA_PATH" in the evaluation.py file. Then Run evaluation.py:
python evaluation.py/evaluationIAPRTC12.py
Comment "ContrastiveLossLSEH" and uncomment "ContrastiveLoss" in the model file of each project. Or follow the offical repositories as follows:
Projects using LMH follow VSRN, VSE∞, SGRAF, VSE++.
If you find this code useful for your research, please consider citing:
@article{gong2022improving,
title={Improving Visual-Semantic Embeddings by Learning Semantically-Enhanced Hard Negatives for Cross-modal Information Retrieval},
author={Gong, Yan and Cosma, Georgina},
journal={Pattern Recognition},
pages={109272},
year={2022},
publisher={Elsevier}
}