Temporal segmentation of sign language videos

This repository provides code for following two papers:

Katrin Renz, Nicolaj C. Stache, Samuel Albanie and Gül Varol, Sign language segmentation with temporal convolutional networks, ICASSP 2021. [arXiv]
Katrin Renz, Nicolaj C. Stache, Neil Fox, Gül Varol and Samuel Albanie, Sign Segmentation with Changepoint-Modulated Pseudo-Labelling, CVPRW 2021. [arXiv]

Setup

# Clone this repository
git clone git@github.com:RenzKa/sign-segmentation.git
cd sign-segmentation/
# Create signseg_env environment
conda env create -f environment.yml
conda activate signseg_env

Data and models

You can download our pretrained models (models.zip [302MB]) and data (data.zip [5.5GB]) used in the experiments here or by executing download/download_*.sh. The unzipped data/ and models/ folders should be located on the root directory of the repository (for using the demo downloading the models folder is sufficient).

Data:

Please cite the original datasets when using the data: BSL Corpus | Phoenix14. We provide the pre-extracted features and metadata. See here for a detailed description of the data files.

Features: data/features/*/*/features.mat
Metadata: data/info/*/info.pkl

Models:

I3D weights, trained for sign classification: models/i3d/*.pth.tar
MS-TCN weights for the demo (see tables below for links to the other models): models/ms-tcn/*.model

The folder structure should be as below:

sign-segmentation/models/
  i3d/
    i3d_kinetics_bsl1k_bslcp.pth.tar
    i3d_kinetics_bslcp.pth.tar
    i3d_kinetics_phoenix_1297.pth.tar
  ms-tcn/
    mstcn_bslcp_i3d_bslcp.model

Demo

The demo folder contains a sample script to estimate the segments of a given sign language video. It is also possible to use pre-extracted I3D features as a starting point, and only apply the MS-TCN model. --generate_vtt generates a .vtt file which can be used with our modified version of VIA annotation tool:

usage: demo.py [-h] [--starting_point {video,feature}]
               [--i3d_checkpoint_path I3D_CHECKPOINT_PATH]
               [--mstcn_checkpoint_path MSTCN_CHECKPOINT_PATH]
               [--video_path VIDEO_PATH] [--feature_path FEATURE_PATH]
               [--save_path SAVE_PATH] [--num_in_frames NUM_IN_FRAMES]
               [--stride STRIDE] [--batch_size BATCH_SIZE] [--fps FPS]
               [--num_classes NUM_CLASSES] [--slowdown_factor SLOWDOWN_FACTOR]
               [--save_features] [--save_segments] [--viz] [--generate_vtt]

Example usage:

# Print arguments
python demo/demo.py -h
# Save features and predictions and create visualization of results in full speed
python demo/demo.py --video_path demo/sample_data/demo_video.mp4 --slowdown_factor 1 --save_features --save_segments --viz
# Save only predictions and create visualization of results slowed down by factor 6
python demo/demo.py --video_path demo/sample_data/demo_video.mp4 --slowdown_factor 6 --save_segments --viz
# Create visualization of results slowed down by factor 6 and .vtt file for VIA tool
python demo/demo.py --video_path demo/sample_data/demo_video.mp4 --slowdown_factor 6 --viz --generate_vtt

The demo will:

use the models/i3d/i3d_kinetics_bslcp.pth.tar pretrained I3D model to extract features,
use the models/ms-tcn/mstcn_bslcp_i3d_bslcp.model pretrained MS-TCN model to predict the segments out of the features,
save results (depending on which flags are used).

Training

Train ICASSP

Run the corresponding run-file (*.sh) to train the MS-TCN with pre-extracted features on BSL Corpus. During the training a .log file for tensorboard is generated. In addition the metrics get saved in train_progress.txt.

Influence of I3D training (fully-supervised segmentation results on BSL Corpus)

ID	Model	mF1B	mF1S	Links (for seed=0)
1	BSL Corpus	68.68_±0.6	47.71_±0.8	run, args, I3D model, MS-TCN model, logs
2	BSL1K -> BSL Corpus	66.17_±0.5	44.44_±1.0	run, args, I3D model, MS-TCN model, logs

Fully-supervised segmentation results on PHOENIX14

ID	I3D training data	MS-TCN training data	mF1B	mF1S	Links (for seed=0)
3	BSL Corpus	PHOENIX14	65.06_±0.5	44.42_±2.0	run, args, I3D model, MS-TCN model, logs
4	PHOENIX14	PHOENIX14	71.50_±0.2	52.78_±1.6	run, args, I3D model, MS-TCN model, logs

Train CVPRW

Requirement: pre-extracted pseudo-labels/ changepoints or CMPL-labels:

Save pre-trained model in models/ms-tcn/*.model
a) Extract pseudo-labels before extracting CMPL-labels: Extract only PL | Extract CMPL | Extract PL and CMPL b) Extract Changepoints separately for training: Extract CP -> specify correct model path

Pseudo-labelling techniques on PHOENIX14

ID	Method	Adaptation protocol	mF1B	mF1S	Links (for seed=0)
5	Pseudo-labels	inductive	47.94_±1.0	32.45_±0.3	run, args, I3D model, MS-TCN model, logs
6	Changepoints	inductive	48.51_±0.4	34.45_±1.4	run, args, I3D model, MS-TCN model, logs
7	CMPL	inductive	53.57_±0.7	33.82_±0.0	run, args, I3D model, MS-TCN model, logs
8	Pseudo-labels	transductive	47.62_±0.4	32.11_±0.9	run, args, I3D model, MS-TCN model, logs
9	Changepoints	transductive	48.29_±0.1	35.31_±1.4	run, args, I3D model, MS-TCN model, logs
10	CMPL	transductive	53.53_±0.1	32.93_±0.9	run, args, I3D model, MS-TCN model, logs

Citation

If you use this code and data, please cite the following:

@inproceedings{Renz2021signsegmentation_a,
    author       = "Katrin Renz and Nicolaj C. Stache and Samuel Albanie and G{\"u}l Varol",
    title        = "Sign Language Segmentation with Temporal Convolutional Networks",
    booktitle    = "ICASSP",
    year         = "2021",
}

@inproceedings{Renz2021signsegmentation_b,
    author       = "Katrin Renz and Nicolaj C. Stache and Neil Fox and G{\"u}l Varol and Samuel Albanie",
    title        = "Sign Segmentation with Changepoint-Modulated Pseudo-Labelling",
    booktitle    = "CVPRW",
    year         = "2021",
}

License

The license in this repository only covers the code. For data.zip and models.zip we refer to the terms of conditions of original datasets.

Acknowledgements

The code builds on the github.com/yabufarha/ms-tcn repository. The demo reuses parts from github.com/gulvarol/bsl1k. We like to thank C. Camgoz for the help with the BSLCORPUS data preparation.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
datasets		datasets
demo		demo
download		download
self_labelling		self_labelling
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
batch_gen.py		batch_gen.py
environment.yml		environment.yml
environment_tonni.yaml		environment_tonni.yaml
eval.py		eval.py
main.py		main.py
model.py		model.py
run-batch.sh		run-batch.sh

License

tonnidas/sign-segmentation

Folders and files

Latest commit

History

Repository files navigation

Temporal segmentation of sign language videos

Contents

Setup

Data and models

Data:

Models:

Demo

Training

Train ICASSP

Train CVPRW

Citation

License

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Languages