code for Two-Stream SR-CNNs for Action Recognition in Videos 2016
Matlab C++ Cuda
Switch branches/tags
Nothing to show
Clone or download
Latest commit e7c969f Apr 5, 2017
Permalink
Failed to load latest commit information.
+Dataset first upload Dec 17, 2016
+Faster_RCNN_Train first upload Dec 17, 2016
+Model first upload Dec 17, 2016
action first upload Dec 17, 2016
functions first upload Dec 17, 2016
imdb first upload Dec 17, 2016
models first upload Dec 17, 2016
util first upload Dec 17, 2016
.gitignore first upload Dec 17, 2016
LICENSE Initial commit Dec 17, 2016
caffe_mex.m first upload Dec 17, 2016
faster_rcnn_build.m first upload Dec 17, 2016
faster_rcnn_hmdb.m first upload Dec 17, 2016
faster_rcnn_human_jhmdb.m first upload Dec 17, 2016
faster_rcnn_ucf.m first upload Dec 17, 2016
readme.md Update readme.md Apr 5, 2017
test_flow.m
test_spatial.m first upload Dec 17, 2016
train_faster_rcnn_voc_ilsvrc.m first upload Dec 17, 2016

readme.md

BMVC2016 Two-Stream SR-CNNs for Action Recognition in Videos


Prerequisites

Caffe

  • clone and build caffe from here. This caffe version is based on Limin Wang's fork [1] contains merge_batch and weighted_sum layer. In addition it exposed some protected caffe functions in the matlab interface to emulate iter_size in matlab.
  • modify caffe_mex.m to the corresponding caffe matlab interface directory

Optical Flow

Bounding Boxes

  • We extracted 118 objects' bounding boxes in all video frames using Faster-RCNN [2] (retraining is required) and obtained filtered bounding boxes taking consideration of temporal coherency and motion saliency.
  • The extracted and processed bounding boxes for ucf-101 can be downloaded here. Place the downloaded mat files under imdb/cache.
  • If you wish to extract the bounding boxes yourself, you need to be able to run Ren Shaoqing's Faster-RCNN (most codes are migrated into this repository with minor modifications and more comments)
    • First generate raw object detection using faster_rcnn_{dataset}.m
    • Then use action/prepare_rois_context.m to process bounding boxes as described in the paper.

Test

datasets

create dataset.mat using imdb/get_{name}_dataset.m (Directories may need to be adjusted!) An example of generated ucf_dataset.mat

models

  • models/srcnn/{stream} contains model prototxt files

  • model weights can be downloaded in the following links

    Stream person+scene (the final proposed model in the paper)
    spatial split1 split2 split3
    flow split1 split2 split3
  • the reported two-stream results in the paper are yielded from summing spatial and temporal classification scores using weight 1 : 3.

  • other models mentioned in the paper experiments can be provided if the demand is large.

run

in matlab

% test spatial
test_spatial('model_path', path_to_weights, 'split', 1)
% test flow
`test_flow('model_path', path_to_weights, 'split', 1)`

Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems (pp. 91-99).

Wang, L., Xiong, Y., Wang, Z., & Qiao, Y. (2015). Towards good practices for very deep two-stream convnets. arXiv preprint arXiv:1507.02159.


Citation

Please cite the following if you find the code useful.

@inproceedings{wang2016two,
  title={Two-Stream SR-CNNs for Action Recognition in Videos},
  author={Wang, Yifan and Song, Jie and Wang, Limin and Van Gool, Luc and Hilliges, Otmar},
  year={2016},
  organization={BMVC}
}

Contact

Yifan Wang: yifan.wang@student.ethz.ch