Here are some captions generated by this model:
This is a image caption app baseed on DeepRNN source code(tensorflow) and flask. We provide a local api(http://127.0.0.1:5000) for start the DeepRNN to inference by flask.
This neural system for image captioning is roughly based on the paper "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention" by Xu et al. (ICML2015). The input is an image, and the output is a sentence describing the content of the image. It uses a convolutional neural network to extract visual features from the image, and uses a LSTM recurrent neural network to decode these features into a sentence. A soft attention mechanism is incorporated to improve the quality of the caption. This project is implemented using the Tensorflow library, and allows end-to-end training of both CNN and RNN parts.
- Requirements
- Linux(python2/3) | MacOS(python3 only)
- python2.7/3.5
- nltk==3.3
- numpy==1.15.4
- scikit_image==0.14.0
- tqdm==4.26.0
- matplotlib==2.2.3
- tensorflow_gpu==1.12.0
- pandas==0.23.4
- opencv_python==4.1.0.25
- tensorflow==1.13.1
-
Install the requirements
pip install -r requirements.txt
-
To make pip more friendly:
cd ~
mkdir .pip
cd .pip
touch pip.conf
and add code below:[global] index-url = https://mirrors.aliyun.com/pypi/simple/ [install] trusted-host=mirrors.aliyun.com
-
-
download the pretrained model file
- option1: Box
option2: GoogleDrive- option3: BaiDuYun, code:
nubk
please put the
289999.npy
tomodels
folder. -
Inference
- Firstly, start the local api:
python main.py
- Then, start the
jupyter notebook
:jupyter notebook
- Copy some images to
test/images
folder for testing.
- Copy some images to
- Finally, open the
run.ipynb
orrun.py
:- run the first cell to start inference.
- run the second cell to visualize the result.
- or run the
run.py
.
- The generated captions will be saved in the folder
test/results
.
- Firstly, start the local api:
- DeepRNN
- Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio. ICML 2015.
- The original implementation in Theano
- An earlier implementation in Tensorflow
- Microsoft COCO dataset