caffe/examples/s2vt at recurrent · vsubhashini/caffe

History

Name		Name	Last commit message	Last commit date
parent directory ..
download_data.sh		download_data.sh
extract_vgg_features.py		extract_vgg_features.py
framefc7_stream_mat_text_to_hdf5_data.py		framefc7_stream_mat_text_to_hdf5_data.py
framefc7_stream_text_to_hdf5_data.py		framefc7_stream_text_to_hdf5_data.py
framefc7_text_to_hdf5_data.py		framefc7_text_to_hdf5_data.py
get_s2vt.sh		get_s2vt.sh
hdf5_npsequence_generator.py		hdf5_npsequence_generator.py
hdf5_npstreamsequence_generator.py		hdf5_npstreamsequence_generator.py
movie_s2vt_solver.prototxt		movie_s2vt_solver.prototxt
readme.md		readme.md
s2vt.prototxt		s2vt.prototxt
s2vt.words_to_preds.deploy.prototxt		s2vt.words_to_preds.deploy.prototxt
s2vt_captioner.py		s2vt_captioner.py
s2vt_e2e.prototxt		s2vt_e2e.prototxt
s2vt_e2e_solver.prototxt		s2vt_e2e_solver.prototxt
s2vt_solver.prototxt		s2vt_solver.prototxt

readme.md

S2VT: Sequence to Sequence Video to Text

To train the S2VT model you will need to compile from my recurrent branch of caffe:

    git clone https://github.com/vsubhashini/caffe.git
    git checkout recurrent

To compile Caffe, please refer to the Installation page.

Using the model to generate captions

Get preprocessed model and sample data

    ./get_s2vt.sh

Run the captioner

    python s2vt_captioner.py -m s2vt_vgg_rgb

Preparing data for videos

Pre-process videos to get frame features. The code provided here does not process videos directly. VGG features need to be extracted for each frame. Pre-processed data (VGG features) for the MSVD corpus can be found here. Use this script to download training, validation, and test data. (~1.2GB)

    ./download_data.sh

Extracting features for your own videos/images If you wish to process your own videos, then you need to first extract VGG features. This can be done using the code here extract_vgg_features.py.

Convert features to hdf5. If your features are in text format use framefc7_stream_text_to_hdf5_data.py to convert to hdf5 data. If they are in a mat file you might want to use framefc7_stream_mat_text_to_hdf5_data.py.

    python framefc7_stream_text_to_hdf5_data.py

Training the model

Point to the hdf5 training data. Modify s2vt.prototxt to point to the hdf5 training and validation data.
Train the model. Use s2vt_solver.prototxt to train your model.

Evaluating the generated sentences.

Code to evaluate the predicted sentences (with example) can be found at https://github.com/vsubhashini/caption-eval.

Reference

If you find this code helpful, please consider citing:

Sequence to Sequence - Video to Text

Sequence to Sequence - Video to Text
S. Venugopalan, M. Rohrbach, J. Donahue, T. Darrell, R. Mooney, K. Saenko
The IEEE International Conference on Computer Vision (ICCV) 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

s2vt

s2vt

readme.md

S2VT: Sequence to Sequence Video to Text

Using the model to generate captions

Preparing data for videos

Training the model

Evaluating the generated sentences.

Reference

Files

s2vt

Directory actions

More options

Directory actions

More options

Latest commit

History

s2vt

Folders and files

parent directory

readme.md

S2VT: Sequence to Sequence Video to Text

Using the model to generate captions

Preparing data for videos

Training the model

Evaluating the generated sentences.

Reference