Explore Action Recognition
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
readme_imgs update README Dec 10, 2017
utils reorganized README Apr 3, 2017
.gitignore separate ui to another branch Apr 7, 2017
LICENSE Create LICENSE Sep 19, 2017
predict.py separate ui to another branch Apr 7, 2017
process_CNN.py reorganized README Apr 3, 2017


Action Recognition

Project Overview

  • This project explores prominent action recognition models with UCF-101 dataset

  • Perfomance of different models are compared and analysis of experiment results are provided

File Structure of the Repo

rnn_practice: Practices on RNN models and LSTMs with online tutorials and other useful resources

data: Training and testing data. (NOTE: please don't add large data files to this repo, add them to git.ignore)

models: Defining the architecture of models

utils: Utils scripts for dataset preparation, input pre-processing and other helper functions

train_CNN: Training CNN models. The program loads corresponding models, sets the training parameters and initializes network training

process_CNN: Processing video with CNN models. The CNN component is pre-trained and fixed during the training phase of LSTM cells. We can utilize the CNN model to pre-process frames of each video and store the intermediate results for feeding into LSTMs later. This procedure improves the training efficiency of the LRCN model significantly

train_RNN: Training the LRCN model

predict: Calculating the overall testing accuracy on the entire testing set

Models Description

  • Fine-tuned ResNet50 and trained solely with single-frame image data. Each frame of the video is considered as an image for training and testing, which generates a natural data augmentation. The ResNet50 is from keras repo, with weights pre-trained on Imagenet. ./models/finetuned_resnet.py

  • LRCN (CNN feature extractor, here we use the fine-tuned ResNet50 and LSTMs). The input of LRCN is a sequence of frames uniformly extracted from each video. The fine-tuned ResNet directly uses the result of [1] without extra training (C.F.Long-term recurrent convolutional network).

    Produce intermediate data using ./process_CNN.py and then train and predict with ./models/RNN.py

  • Simple CNN model trained with stacked optical flow data (generate one stacked optical flow from each of the video, and use the optical flow as the input of the network). ./models/temporal_CNN.py

  • Two-stream model, combines the models in [2] and [3] with an extra fusion layer that output the final result. [3] and [4] refer to this paper ./models/two_stream.py