Skip to content

Spatio-Temporal Action Localization System

Notifications You must be signed in to change notification settings

uname0x96/AlphAction

 
 

Repository files navigation

AlphAction

AlphAction aims to detect the actions of multiple persons in videos. It is the first open-source project that achieves 30+ mAP (32.4 mAP) with single model on AVA dataset.

This project is the official implementation of paper Asynchronous Interaction Aggregation for Action Detection, authored by Jiajun Tang*, Jin Xia* (equal contribution), Xinzhi Mu, Bo Pang, Cewu Lu (corresponding author).


demo1 demo2
demo3

Installation

You need first to install this project, please check INSTALL.md

Data Preparation

To do training or inference on AVA dataset, please check DATA.md for data preparation instructions.

Model Zoo

config backbone structure mAP in paper model
resnet50_4x16f_parallel ResNet-50 Parallel 29.0 28.9 [link]
resnet50_4x16f_serial ResNet-50 Serial 29.8 29.6 [link]
resnet50_4x16f_denseserial ResNet-50 Dense Serial 30.0 29.8 [link]
resnet101_8x8f_denseserial ResNet-101 Dense Serial 32.4 32.3 [link]

Visual Demo

To run the demo program on video or webcam, please check the folder demo. We select 15 common categories from the 80 action categories of AVA, and provide a practical model which achieves high accuracy (about 70 mAP) on these categories.

Training and Inference

The hyper-parameters of each experiment are controlled by a .yaml config file, which is located in the directory config_files. All of these configuration files assume that we are running on 8 GPUs. We need to create a symbolic link to the directory output, where the output (logs and checkpoints) will be saved. Besides, we recommend to create a directory models to place model weights. These can be done with following commands.

mkdir -p /path/to/output
ln -s /path/to/output data/output
mkdir -p /path/to/models
ln -s /path/to/models data/models

Training

The pre-trained model weights and the training code will be public available later. 😉

Inference

First, you need to download the model weights from Model Zoo.

To do inference on single GPU, you only need to run the following command. It will load the model from the path speicified in MODEL.WEIGHT. Note that the config VIDEOS_PER_BATCH is a global config, if you face OOM error, you could overwrite the config in the command line as we do in below command.

python test_net.py --config-file "path/to/config/file.yaml" \
MODEL.WEIGHT "path/to/model/weight" \
TEST.VIDEOS_PER_BATCH 4

We use the launch utility torch.distributed.launch to launch multiple processes for inference on multiple GPUs. GPU_NUM should be replaced by the number of gpus to use. Hyper-parameters in the config file can still be modified in the way used in single-GPU inference.

python -m torch.distributed.launch --nproc_per_node=GPU_NUM \
test_net.py --config-file "path/to/config/file.yaml" \
MODEL.WEIGHT "path/to/model/weight"

Acknowledgement

We thankfully acknowledge the computing resource support of Huawei Corporation for this project.

Citation

If this project helps you in your research or project, please cite this paper:

@article{tang2020asynchronous,
  title={Asynchronous Interaction Aggregation for Action Detection},
  author={Tang, Jiajun and Xia, Jin and Mu, Xinzhi and Pang, Bo and Lu, Cewu},
  journal={arXiv preprint arXiv:2004.07485},
  year={2020}
}

About

Spatio-Temporal Action Localization System

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 93.6%
  • Cuda 4.8%
  • C++ 1.6%