Image-Captioning

CNN-Encoder and RNN-Decoder (Bahdanau Attention) for image caption or image to text on MS-COCO dataset.

Task Description

Given an image like the example below, our goal is to generate a caption such as "a surfer riding on a wave".

To accomplish this, you'll use an attention-based model, which enables us to see what parts of the image the model focuses on as it generates a caption.

The model architecture is similar to Show, Attend and Tell: Neural Image Caption Generation with Visual Attention.

Main principle

The model consists of CNN-Encoder and RNN-Decoder. The CNN-Encoder is used to extract the information of the input image to generate the intermediate representation H, and then use RNN-Decode to gradually decode the H (using Bahdanau Attention) to generate a text description corresponding to the image.

模型由CNN-Encoder和RNN-Decoder组成，首先使用CNN-Encoder提取输入图片的信息生成中间表示H，然后使用RNN-Decode对H逐步解码（使用了BahdanauAttention）生成图片对应的文本描述。

Input: image_features.shape (16, 64, 2048)
---------------Pass by cnn_encoder---------------
Output: image_features_encoder.shape (16, 64, 256)

Input: batch_words.shape (16, 1)
Input: rnn state shape (16, 512)
---------------Pass by rnn_decoder---------------
Output: out_batch_words.shape (16, 5031)
Output: out_state.shape (16, 512)
Output: attention_weights.shape (16, 64, 1)

Code test pass

Pyhon 3.6
TensorFlow version 2

Usage

1. Preparing data

python data_utils.py

Manual download of data If the code can't download the data automatically because of network reasons, you can download the data manually.

Downloading captions data from http://images.cocodataset.org/annotations/annotations_trainval2014.zip
unzip annotations_trainval2014.zip and move annotations to project
Downloading images data from http://images.cocodataset.org/zips/train2014.zip
unzip train2014.zip and move train2014 to project

2. Train model

python train_image_caption_model.py

3. Model inference

python inference_image_caption.py

Experimental result

loss

inference_image_caption outputs

Reference Code

image_captioning.ipynb

This notebook is an end-to-end example. When you run the notebook, it downloads the MS-COCO dataset, preprocesses and caches a subset of images using Inception V3, trains an encoder-decoder model, and generates captions on new images using the trained model.

In this example, you will train a model on a relatively small amount of data—the first 30,000 captions for about 20,000 images (because there are multiple captions per image in the dataset).

Learn more

Title	Content
awesome-image-captioning	A curated list of image captioning and related area resources.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
0.png		0.png
1.png		1.png
2.png		2.png
3.png		3.png
4.png		4.png
README.md		README.md
data_utils.py		data_utils.py
image_caption_model.py		image_caption_model.py
image_captioning.ipynb		image_captioning.ipynb
inference_image_caption.py		inference_image_caption.py
loss.png		loss.png
train_image_caption_model.py		train_image_caption_model.py

yuanxiaosc/Image-Captioning

Folders and files

Latest commit

History

Repository files navigation

Image-Captioning

Task Description

Main principle

Code test pass

Usage

1. Preparing data

2. Train model

3. Model inference

Experimental result

Reference Code

Learn more

About

Topics

Resources

Stars

Watchers

Forks

Languages