Skip to content

tojiboyevf/image_captioning

Repository files navigation

Image Caption Generator

Final project from Deep Learning 2022 course Skoltech

Team members:

  • Farid Davletshin
  • Fakhriddin Tojiboev
  • Albert Sayapin
  • Olga Gorbunova
  • Evgeniy Garsiya
  • Hai Le
  • Lina Bashaeva
  • Dmitriy Gilyov

Environment

We use conda package manager to install required python packages. In order to improve speed and reliability of package version resolution it is advised to use mamba-forge (installation) that works over conda. Once mamba is installed, run the following command (while in the root of the repository):

mamba env create -f environment.yml

This will create new environment named img_caption with many required packages already installed. You can install additional packages by running:

mamba install <package name>

You should run the following commands to install pytorch library:

conda activate img_caption
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
conda install -c pytorch torchtext

In order to read and run Jupyter Notebooks you may follow either of two options:

  1. [recommended] using notebook-compatibility features of modern IDEs, e.g. via python and jupyter extensions of VS Code.
  2. install jupyter notebook packages: either with mamba install jupyterlab or with mamba install jupyter notebook

Note: If you prefer to use conda, just replace mamba commands with conda, e.g. instead of mamba install use conda install.

General setup

  1. Clone this repository
$ git clone https://github.com/tojiboyevf/image_captioning.git
  1. Move to project's directory and download dataset Flickr8k, COCO_2014 and GloVe
$ cd image_captioning
$ bash load_flickr8k.sh
$ bash load_glove.sh
$ bash load_coco.sh

Quick start

If you want to try re-train our models and/or observe evaluation results you are welcome to examples folder.

Open any notebook from there and follow the instructions inside.

Evaluation results

Link to the report

Flickr8k

bleu 1 bleu 2 bleu 3 bleu 4
vgg16 + lstm train
val
test
55.53
55.14
55.41
34.94
34.42
34.34
21.94
21.36
21.13
14.02
13.47
13.29
vgg16 + transformer train
val
test
53.13
52.79
52.76
33.63
33.07
33.04
21.01
20.13
20.27
13.21
12.31
12.38
densenet161 + lstm train
val
test
55.05
55.18
55.27
31.18
31.23
30.76
17.79
17.75
17.11
10.84
10.78
10.23
densenet161 + transformer train
val
test
69.55
65.71
65.98
49.93
44.46
44.79
35.55
29.94
30.04
25.03
20.13
19.75
DeiT + lstm train
val
test
56.06
53.23
53.48
34.40
30.86
31.06
20.97
17.62
17.61
13.24
10.91
10.61
DeiT + transformer train
val
test
70.43
62.71
62.57
53.22
43.71
44.09
42.16
34.58
35.11
35.15
29.32
29.80
inceptionV3 + transformer train
val
test
61.44
60.37
60.19
41.09
39.84
39.19
27.52
26.26
25.70
18.29
17.25
16.70
resnet34 + transformer train
val
test
67.23
63.33
63.70
48.05
42.58
42.92
34.08
28.69
29.19
23.84
19.22
19.51

COCO val2014

bleu 1 bleu 2 bleu 3 bleu 4
vgg16 + lstm 46.71 23.75 12.25 8.39
vgg16 + transformer 50.24 27.14 16.10 8.80
densenet161 + lstm 49.33 23.25 11.70 9.46
densenet161 + transformer 55.38 30.71 17.09 9.79
DeiT + lstm 45.73 22.04 11.14 9.12
DeiT + transformer 53.09 29.76 16.92 9.95
inceptionV3 + transformer 49.14 26.49 14.21 8.11