Skip to content

yangbang18/video-classification-3d-cnn

Repository files navigation

This repository is forked from kenshohara/video-classification-3d-cnn and is used to extracted 3D features of videos for Non-Autoregressive-Video-Captioning.

Example code:

# extracting snippets of 16 frames with 8 frames overlapping
python main.py \
--gpu 0 \
--model ./resnext-101-kinetics.pth \
--video_root ~/VC_data/MSRVTT/all_frames/ \
--feats_dir ~/VC_data/MSRVTT/feats/

or

# extracting 60 snippets of 16 frames
python main.py \
--gpu 0 \
--model ./resnext-101-kinetics.pth \
--video_root ~/VC_data/MSRVTT/all_frames/ \
--feats_dir ~/VC_data/MSRVTT/feats/ \
--n_frames 60
  • --model is the path of a pretrained model (see the Section preparation below)
  • --video_root is the path that contain many folders (named with vid), each of which contains extracted video frames
  • --feats_dir is the path to save extracted features (stored in a .hdf5 file)

=========== original README below =============

Video Classification Using 3D ResNet

This is a pytorch code for video (action) classification using 3D ResNet trained by this code.
The 3D ResNet is trained on the Kinetics dataset, which includes 400 action classes.
This code uses videos as inputs and outputs class names and predicted class scores for each 16 frames in the score mode.
In the feature mode, this code outputs features of 512 dims (after global average pooling) for each 16 frames.

Torch (Lua) version of this code is available here.

Requirements

conda install pytorch torchvision cuda80 -c soumith
  • FFmpeg, FFprobe
wget http://johnvansickle.com/ffmpeg/releases/ffmpeg-release-64bit-static.tar.xz
tar xvf ffmpeg-release-64bit-static.tar.xz
cd ./ffmpeg-3.3.3-64bit-static/; sudo cp ffmpeg ffprobe /usr/local/bin;
  • Python 3

Preparation

  • Download this code.
  • Download the pretrained model.
    • ResNeXt-101 achieved the best performance in our experiments. (See paper in details.)

Usage

Assume input video files are located in ./videos.

To calculate class scores for each 16 frames, use --mode score.

python main.py --input ./input --video_root ./videos --output ./output.json --model ./resnet-34-kinetics.pth --mode score

To visualize the classification results, use generate_result_video/generate_result_video.py.

To calculate video features for each 16 frames, use --mode feature.

python main.py --input ./input --video_root ./videos --output ./output.json --model ./resnet-34-kinetics.pth --mode feature

Citation

If you use this code, please cite the following:

@article{hara3dcnns,
  author={Kensho Hara and Hirokatsu Kataoka and Yutaka Satoh},
  title={Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?},
  journal={arXiv preprint},
  volume={arXiv:1711.09577},
  year={2017},
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages