Unsupervised Video Summarization via Multi-source Features

This is the official GitHub page for the paper:

Hussain Kanafani, Junaid Ahmed Ghauri, Sherzod Hakimov, Ralph Ewerth. 2021. Unsupervised Video Summarization via Multi-source Features. In Proceedings of the 2021 International Conference on MultimediaRetrieval (ICMR ’21), August 21–24, 2021, Taipei, Taiwan. ACM, New York, NY, USA, https://doi.org/10.1145/3460426.3463597

The paper is available on:

arXiv: https://arxiv.org/pdf/2105.12532.pdf

Model architecture: Multi-Source Chunk and Stride Fusion (MCSF)

Get started (Requirements and Setup)

python 3.6

cd MCSF
conda create -n mcsf python=3.6
conda activate mcsf  
pip install -r requirements.txt

Project Structure

Directory: 
- /data
        - /plc_365 (places features  for summe and tvsum)
        - /splits (original and non-overlapping splits)
        - /SumMe (processed dataset h5)
        - /TVSum (processed dataset h5)
- /csnet (implementation of csnet method)
- /mcsf-places365-early-fusion 
- /mcsf-places365-late-fusion 
- /mcsf-places365-intermediate-fusion
- /src/evaluation (evaluation using F1-score)
- /src/visualization 
- /sum-ind (implementation of SUM-Ind method)

Datasets

Structured h5 files with the video features and annotations of the SumMe and TVSum datasets are available within the "data" folder. The GoogleNet features of the video frames were extracted by Ke Zhang and [Wei-Lun Chao] and the h5 files were obtained from Kaiyang Zhou.

Download

wget https://zenodo.org/record/4884870/files/datasets.tar

Files Structure

The implemented models use the provided h5 files which have the following structure:

/key
    /features                 2D-array with shape (n_steps, feature-dimension)
    /gtscore                  1D-array with shape (n_steps), stores ground truth improtance score (used for training, e.g. regression loss)
    /user_summary             2D-array with shape (num_users, n_frames), each row is a binary vector (used for test)
    /change_points            2D-array with shape (num_segments, 2), each row stores indices of a segment
    /n_frame_per_seg          1D-array with shape (num_segments), indicates number of frames in each segment
    /n_frames                 number of frames in original video
    /picks                    positions of subsampled frames in original video
    /n_steps                  number of subsampled frames
    /gtsummary                1D-array with shape (n_steps), ground truth summary provided by user (used for training, e.g. maximum likelihood)
    /video_name (optional)    original video name, only available for SumMe dataset

Original videos and annotations for each dataset are also available in the authors' project webpages:

TVSum dataset: https://github.com/yalesong/tvsum

SumMe dataset: https://gyglim.github.io/me/vsum/index.html#benchmark

MCSF Variations and CSNet

We used SUM-GAN method as a starting point for the implementation.

How to train

Run main.py file with the configurations specified in configs.py to train the model. In config.py you find argument parameters for training:

Parameter	type	default
mode	string possible values (train, test)	train
verbose	boolean	true
video_type	string (summe or tvsum)	summe
input_size	int	1024
hidden_size	int	500
split_index	int	0
n_epochs	int	20
m	int (number of divisions used for chunk and stride network)	4

For training the model using a single split, run:

python main.py --split_index N (with N being the index of the split)

How to evaluate

Using multiple human-generated summaries per video: To evaluate CSNET and all other MCSF models by comparing, after each training epoch, the generated summary for each test video against a set of reference human summaries that are available for that video (see the '/user_summary' entry in the explanation of the h5 file structure in the Data section above), run the 'src/evalution/evaluate.py' script after specifying which config file to use: 'config_summe.yaml' or 'config_tvsum.yaml'

SUM-Ind

Train and test codes are written in main.py. To see the detailed arguments, please do python main.py -h.

How to train

python main.py -d datasets/eccv16_dataset_summe_google_pool5.h5 -s datasets/summe_splits.json -m summe --gpu 0 --save-dir log/summe-split0 --split-id 0 --verbose

How to test

python main.py -d datasets/eccv16_dataset_summe_google_pool5.h5 -s datasets/summe_splits.json -m summe --gpu 0 --save-dir log/summe-split0 --split-id 0 --evaluate --resume path_to_your_model.pth.tar --verbose --save-results

Citation

@article{kanafani2021MCSF, 
   title={Unsupervised Video Summarization via Multi-source Features},
   author={Kanafani, Hussain and Ghauri, Junaid Ahmed and Hakimov, Sherzod and Ewerth, Ralph}, 
   Conference={ACM International Conference on Multimedia Retrieval (ICMR)}, 
   year={2021} 
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
csnet		csnet
data		data
imgs		imgs
mcsf-early-fusion		mcsf-early-fusion
mcsf-intermediate-fusion		mcsf-intermediate-fusion
mcsf-late-fusion		mcsf-late-fusion
results		results
src		src
sum_ind		sum_ind
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

TIBHannover/UnsupervisedVideoSummarization

Folders and files

Latest commit

History

Repository files navigation

Unsupervised Video Summarization via Multi-source Features

Get started (Requirements and Setup)

Project Structure

Datasets

Download

Files Structure

MCSF Variations and CSNet

How to train

How to evaluate

SUM-Ind

How to train

How to test

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages