Cross-modal-retrieve

By Xiami2019

Requirements

Python 3.6
PyTorch 1.1.0
pytorch-transformers

Dataset

Images and text are from iaprtc-12 https://www.imageclef.org/photodata
Labels are from saiaprtc-12 https://www.imageclef.org/SIAPRdata
And make a mixture dataset for cross-modal retrieve by simply concatenate the labels, images, text.
In this dataset, we have 20000 images and their corresponding description both in English and German.
There are also multi labels for each image-text pair. Follow the setting in reference paper, I choose 10000 images as train set.
When testing, I choose 2000 images as query set and the last 18000 images as database.

Model

The network consists of image model and text model. I use pretrained Resnet18 as the image model and pretrained BERT-base as the text model.

Objective

Final objective consists of four triplet losses：
$F=L_{I\rightarrow T}+L_{T\rightarrow I}+L_{T\rightarrow T}+L_{I\rightarrow I}$

Result

Due to time and device limited, extensive experient has not been executed.
After about 30 epochs
Text→Images

language	16bits	32bits	48bits	64bits
`English`	To be added	0.5005	To be added	To be added
`German`	To be added	0.4947	To be added	To be added

Images→Text

language	16bits	32bits	48bits	64bits
`English`	To be added	0.4955	To be added	To be added
`German`	To be added	0.4940	To be added	To be added

Reference

[1] Xi Zhang, Hanjiang Lai, Jiashi Feng Attention-aware Deep Adversarial Hashing for Cross-Modal Retrieval

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
README.md		README.md
dataset.py		dataset.py
logger.py		logger.py
model.py		model.py
optim.py		optim.py
run.sh		run.sh
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

dataset.py

dataset.py

logger.py

logger.py

model.py

model.py

optim.py

optim.py

run.sh

run.sh

train.py

train.py

utils.py

utils.py

Repository files navigation

Cross-modal-retrieve

Requirements

Dataset

Model

Objective

Result

Reference

About

Releases

Packages

Languages

xiami2019/Cross-modal-retrieval

Folders and files

Latest commit

History

Repository files navigation

Cross-modal-retrieve

Requirements

Dataset

Model

Objective

Result

Reference

About

Resources

Stars

Watchers

Forks

Languages