Local Temporal Bilinear Pooling for Fine-grained Action Parsing

We propose a temporal local bilinear pooling method to replace max pooling in a temporal convolutional encoder-decoder network (see below), so as to capture higher-order statistics for our fine-grained tasks. Our bilinear pooling is learnable, decoupled and has a analytical solution to halve the dimensionality. For more details, please refer to our paper

@InProceedings{Zhang_2019_CVPR,
author = {Zhang, Yan and Tang, Siyu and Muandet, Krikamol and Jarvers, Christian and Neumann, Heiko},
title = {Local Temporal Bilinear Pooling for Fine-Grained Action Parsing},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2019}
}

and a video demo, which is better to be opened by vlc. We are still looking forward to optimizing the code.

news!!

We have added new implementations for low-rank random feature projection. Such new bilinear pooling layer produces superior performance and runs faster. In particular, we have proposed the first RKHS theories for fusing two different input feature vectors via bilinear pooling. Details are in:

@article{zhang2019low,
  title={Low-rank Random Tensor for Bilinear Pooling},
  author={Zhang, Yan and Muandet, Krikamol and Ma, Qianli and Neumann, Heiko and Tang, Siyu},
  journal={arXiv preprint arXiv:1906.01004},
  year={2019}
}

getting started

The input to the network is the time sequence of frame-wise features.
The frontend file to run the code is code/TCN_main.py
Put your features to the path of features/{dataset name}/{feature name}/{Split_i}/{*.mat}
Put your dataset splits to the path of splits/{dataset name}/{Split_i}/{train, test}.txt, in which entries in the txt files should match the *.mat filenames.
The tensorflow/keras models are implemented in code/tf_models.py, in which other pooling methods are also implemented but not used.

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

acknowledgement

Our implementation is based on the following framework. When use our github code, please cite their work as well.

Temporal Convolutional Networks

This code implements the video- and sensor-based action segmentation models from Temporal Convolutional Networks for Action Segmentation and Detection by Colin Lea, Michael Flynn, Rene Vidal, Austin Reiter, Greg Hager arXiv 2016 (in-review).

It was originally developed for use with the 50 Salads, GTEA, MERL Shopping, and JIGSAWS datasets. Recently we have also achieved high action segmentation performance on medical data, in robotics applications, and using accelerometer data from the UCI Smartphone dataset.

An abbreviated version of this work was described at the ECCV 2016 Workshop on BNMW.

Requirements: TensorFlow, Keras (1.1.2+)

Requirements (optional):

Numba: This makes the metrics much faster to compute but can be removed is necessary.
LCTM: Our older Conditional Random Field-based models.

Tested on Python 3.5. May work on Python 2.7 but is untested.

Contents (code folder)

TCN_main.py. -- Main script for evaluation. I suggest interactively working with this in an iPython shell.
compare_predictions.py -- Script to output stats on each set of predictions.
datasets.py -- Adapters for processing specific datasets with a common interface.
metrics.py -- Functions for computing other performance metrics. These usually take the form score(P, Y, bg_class) where P are the predictions, Y are the ground-truth labels, and bg_class is the background class.
tf_models.py -- Models built with TensorFlow / Keras.
utils.py -- Utilities for manipulating data.

Data

The features used for many of the datasets we use are linked below. The video features are the output of a Spatial CNN trained using image and motion information as mentioned in the paper. To get features from the MERL dataset talk to Bharat Signh at UMD.

Each set of features should be placed in the features folder (e.g., [TCN_directory]/features/GTEA/SpatialCNN/).

50 Salads (mid-level action granularity)
50 Salads (eval/higher-level action granularity)
GTEA
JIGSAWS: Email colincsl@gmail.com for permission. Can only be used for academic purposes.
MERL Shopping: Email Bharat Signh at UMD for features.

Each .mat file contains three or four types of data: 'Y' refers to the ground truth action labels for each sequence, 'X' is the per-frame probability as output from a Spatial CNN applied to each frame of video, 'A' is the 128-dim intermediate fully connected layer from the Spatial CNN applied at each frame, and if available 'S' is the sensor data (accelerometer signals in 50 Salads, robot kinematics in JIGSAWS).

There are a set of corresponding splits for each dataset in [TCN_directory]/splits/[dataset]. These should be easy to use with the dataset loader included here.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
code		code
features		features
splits		splits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code

code

features

features

splits

splits

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Local Temporal Bilinear Pooling for Fine-grained Action Parsing

news!!

getting started

License

acknowledgement

Temporal Convolutional Networks

Contents (code folder)

Data

About

Releases

Packages

Languages

License

yz-cnsdqz/TemporalActionParsing-FineGrained

Folders and files

Latest commit

History

Repository files navigation

Local Temporal Bilinear Pooling for Fine-grained Action Parsing

news!!

getting started

License

acknowledgement

Temporal Convolutional Networks

Contents (code folder)

Data

About

Resources

License

Stars

Watchers

Forks

Languages