Skip to content
Novel Object Captioner - Captioning Images with diverse objects
Branch: recurrent
Clone or download
vsubhashini Merge pull request #5 from vsubhashini/master
set captioner to use GPU by default
Latest commit 622d6ef Nov 26, 2017
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
cmake Merge pull request #2990 from mattdawkins/add-openblas-path Sep 2, 2015
data Add scripts for downloading COCO2014 tools & data Sep 4, 2015
docs information about new implemented solvers Aug 14, 2015
examples set captioner to use GPU by default Nov 26, 2017
include/caffe Modify sigmoid cross entropy to ignore labels for NOC Nov 26, 2017
matlab Cleanup: Fixup capitalisation of Caffe_POSTFIX. Sep 1, 2015
models Use input_shape instead of input_dim in examples Aug 20, 2015
python NetSpec: don't require lists to specify single-element repeated fields Sep 3, 2015
scripts Fix download model binary script to get correct lines on parsing table Aug 6, 2015
src Modify sigmoid cross entropy to ignore labels for NOC Nov 26, 2017
tools Show output from convert_imageset tool Sep 2, 2015
.Doxyfile update doxygen config to stop warnings Sep 3, 2014
.gitignore Update example bash scripts to expect .h5, new extensions in .gitignore Aug 7, 2015
.travis.yml Travis scripts for python3 and pytest for cmake. Also fixes CUDA CMak… Jul 21, 2015
CMakeLists.txt Modify sigmoid cross entropy to ignore labels for NOC Nov 26, 2017
CONTRIBUTING.md
CONTRIBUTORS.md clarify the license and copyright terms of the project Aug 7, 2014
INSTALL.md replace bundled install instructions with link to site Feb 10, 2014
LICENSE copyright 2015 Jun 23, 2015
Makefile New make target to only build the library. Aug 14, 2015
Makefile.config.example Add commented out helpers for homebrew users Apr 2, 2015
README.md updated noc readme Nov 26, 2017
caffe.cloc [fix] stop cloc complaint about cu type Sep 4, 2014

README.md

Captioning Images with Diverse Objects

This is repository contains pre-trained models and code accompanying the paper Captioning Images with Diverse Objects.

Novel Object Captioner (NOC)

Novel Object Captioner

While object recognition models can recognize thousands of categories of objects such as jackals and anteaters, description models cannot compose sentences to describe these objects correctly in context. Our novel object captioner model overcomes this problem by building visual description systems which can describe new objects without pairs of images and sentences about these objects.

Getting Started.

To get started you need to compile from this branch of caffe:

git clone https://github.com/vsubhashini/noc.git

Compile Caffe

To compile Caffe, please refer to the Installation page.

Caption images using our pre-trained models.

Pre-trained models corresponding to the results reported in the paper can be dowloaded here: Drive link, Dropbox link

Change directory and download the pre-trained models.

cd examples/noc
./download_models.sh

Run the captioner.

python noc_captioner.py -i images_list.txt

Output with the default options:

Captioning 10 images...
Text output will be written to:
./results/output.imgnetcoco_3loss_voc72klabel_inglove_prelm75k_sgd_lr4e5_iter_80000.caffemodel.h5_
CNN ...
Computing features for images 0-9 of 10
Generated caption (length 11, log_p = -8.323791, log_p_word = -0.756708):
A man is sitting at a table with a cake.
Generated caption (length 12, log_p = -9.886197, log_p_word = -0.823850):
A group of people standing on a beach with a kite.
Generated caption (length 12, log_p = -13.384445, log_p_word = -1.115370):
A street sign on a city street with cars and cars.
Generated caption (length 12, log_p = -9.699789, log_p_word = -0.808316):
A dog laying on top of a white and black dog.
Generated caption (length 10, log_p = -5.238667, log_p_word = -0.523867):
A man riding skis down a snow covered slope.
Generated caption (length 10, log_p = -12.567964, log_p_word = -1.256796):
A truck with a large truck on the back.
Generated caption (length 12, log_p = -9.764039, log_p_word = -0.813670):
A man is holding a glass of wine in his hand.
Generated caption (length 12, log_p = -10.339204, log_p_word = -0.861600):
A man is standing in the dirt with a baseball bat.
Generated caption (length 10, log_p = -8.151620, log_p_word = -0.815162):
A woodpecker sitting on a tree in a park.
Generated caption (length 50, log_p = -41.878472, log_p_word = -0.837569):
A woman holding a giant flounder in the background ...

NOTES

NOTE1: The model is not trained on all COCO objects and is hence not competitive with other models trained on all MSCOCO training/val data

NOTE2: The model is trained on imagenet labels for some objects refer to the following section on training the model to know more.

Training the model.

To train the model you need to download the MSCOCO image captioning dataset (the splits for training and held-out images are in data_utils/image_list/. We also use the ImageNet dataset (http://image-net.org/download). For the ImageNet experiments, some classes are outside the 1,000 classes chosen for the ILSVRC challenge. To see which images we used, refer to image ids in data_utils/image_list/ which includes imagenet image filename and label used for training.

Please refer to the Deep Compositional Captioning link here for help with downloading the data.

Model Training scripts

  • Model prototext is specified in 3loss_coco_fc7_voc72klabel.shared_glove72k.prototxt
  • Solver prototext including hyperparameters are in solver_3loss_coco_fc7_voc72klabel.shared_glove72k.prototxt
  • Script to launch the training job is in train_3loss_coco_fc7_voc72klabel.sh

Code to prepare training hdf5 data

The network has 3 components one which takes just images with labels, the next takes input images and corresponding captions, and the third part takes just text as input. The code in data_utils is provided as a reference to generate all 3 types of data.

  • data_utils/tripleloss_labels_coco_to_hdf5_data.py creates hdf5 data from images with labels (like imagenet, or coco images with multiple labels).
  • data_utils/text_labels_coco_to_hdf5_data.py creates hdf5 data from images with captions.
  • data_utils/tripleloss_text_coco_to_hdf5_data.py creates hdf5 from plain text data.

Reference

If you find this code helpful, please consider citing:

Captioning Images with Diverse Objects

Captioning Images with Diverse Objects
S. Venugopalan, L. A. Hendricks, M. Rohrbach, R. Mooney, T. Darrell, K. Saenko
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017

@inproceedings{venugopalan17cvpr,
      title = {Captioning Images with Diverse Objects},
      author={Venugopalan, Subhashini and Hendricks, Lisa Anne and Rohrbach,
      Marcus and Mooney, Raymond, and Darrell, Trevor and Saenko, Kate},
      booktitle = {Proceedings of the IEEE Conference on Computer Vision and
      Pattern Recognition (CVPR)},
      year = {2017}
}

You might also want to refer to,

You can’t perform that action at this time.