Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



3 Commits

Repository files navigation

Mask-Augmentation for Motion-Aware Video Representation Learning


This is the codebase for the paper "MAC: Mask-Augmentation for Motion-Aware Video Representation Learning".

If you find our work useful in your research please consider citing our paper:

  title={MAC: Mask-Augmentation for Motion-Aware Video Representation Learning},
  author={Akar, Arif and Senturk, Ufuk Umut and Ikizler-Cinbis, Nazli},

Getting Started


You can install some necessary libraries with conda.

# Step 0, create a new python environment
conda create -n mac python=3.7
conda activate mac

# Step 1, install mmcv with torch 1.8, cuda 11.1 
pip install mmcv-full==1.3.7 -f

# Step 2, install tensorboard
pip install tensorboard

Data Preparation

  1. Download the compressed file of raw videos from the official website.

    mkdir -p data/ucf101/
    wget -O data/ucf101/UCF101.rar --no-check-certificate 
  2. Unzip the compressed file.

    # install unrar if necessary 
    # sudo apt-get install unrar
    unrar e data/ucf101/UCF101.rar data/ucf101/UCF101_raw/
  3. Download train/test split file

    wget -O data/ucf101/ --no-check-certificate
    unzip data/ucf101/ -d data/ucf101/.
  4. Run the preprocessing script (it takes about 1hours to extract raw frames).

    python scripts/ --raw_dir data/ucf101/UCF101_raw/ --ann_dir data/ucf101/ucfTrainTestlist/ --out_dir data/ucf101/
  5. (Optional) The generated annotation file is in format of .txt. One can convert it into the json format by scripts/

  6. (Optional) delete raw videos to save disk space.

    rm data/ucf101/UCF101.rar
    rm -r data/ucf101/UCF101_raw/
  1. Download the compressed file of raw videos from the official website.

    mkdir -p data/hmdb51/
    wget -O data/hmdb51/HMDB51.rar --no-check-certificate 
  2. Unzip the compressed file.

    unrar e data/hmdb51/HMDB51.rar data/hmdb51/HMDB51_raw/
    for file in data/hmdb51/HMDB51_raw/*.rar; do unrar e ${file} ${file%".rar"}/; done
  3. Download train/test split file.

    wget -O data/hmdb51/test_train_splits.rar --no-check-certificate
    unrar e data/hmdb51/test_train_splits.rar data/hmdb51/test_train_splits/
  4. Run the preprocessing script (it takes about 20 mins to extract raw frames).

    python scripts/ --raw_dir data/hmdb51/HMDB51_raw/ --ann_dir data/hmdb51/test_train_splits/ --out_dir data/hmdb51/
  5. (Optional) delete raw videos to save disk space.

    rm data/hmdb51/HMDB51.rar
    rm -r data/hmdb51/HMDB51_raw/

Since Kinetics-400/600 dataset is relatively large, we do not provide the download script and the preprocessing script here.

You can easily follow the building instruction just like UCF-101. The raw video frames are extracted from the video file and they are further saved in a compressed .zip file.

**You can alternatively implement your own storage backend, as like src/datasets/backends/ If you do not want to use zip backend, one can use src/datasets/backends/ We provide subset of mini K400 backend src/datasets/backends/ as well.

CoT Pretraining

If you want a distributed pretraining for a CoT-2 model on UCF-101 dataset:

python -m torch.distributed.launch --master_port=<PORT_NUMBER> --nproc_per_node=<GPU_NUMBER> 
    ./tools/ --validate --cfg ./configs/mac2_moco/r2plus1d_18_ucf101/ 
    --work_dir <WORKING_DIRECTORY> --data_dir  <DATASET_DIRECTORY> --launcher pytorch

The checkpoints and logs will be saved in the work_dir. If you do clarify work_dir as above, it will use work_dir defined in configuration file. --validate only refers to validation for self supervised objective. Our models are trained 300 epochs for UCF-101.

Action Recognition

After pretraining, one can use the CoT-pretrained model to initialize the action recognizer. The checkpoint path is defined in the key of model.backbone.pretrained. Our models are trained 150 epochs for UCF-101.

Model Training (with validation):

python -m torch.distributed.launch --master_port=<PORT_NUMBER> --nproc_per_node=<GPU_NUMBER> 
    ./tools/ --validate --cfg ./configs/mac2_moco/r2plus1d_18_ucf101/ 
    --work_dir <WORKING_DIRECTORY> --data_dir  <DATASET_DIRECTORY> --launcher pytorch

Model evaluation:

python ./tools/ --cfg ./configs/mac2_moco/r2plus1d_18_ucf101/ 
--data_dir <DATASET_DIRECTORY> --progress

Video Retrieval

One can run video retrieval on UCF-101 below command with CoT-4 model:

python. /tools/ --checkpoint <MODEL_PATH> --work_dir <WORKING_DIRECTORY> --data_dir <DATASET_DIRECTORY>
--cfg ./configs/mac4_moco/r2plus1d_18_kinetics/  

Other Methods

We have implementations of CtP[1], BE[2], MemDPC[3], SpeedNet[4], VCOP[5].


This repository is based on CtP. We thank authors for their contributions.


[1] Wang, G., Zhou, Y., Luo, C., Xie, W., Zeng, W., Xiong, Z.: Unsupervised visual representation learning by tracking patches in video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 2563– 2572 (2021)
[2] Wang, J., Gao, Y., Li, K., Lin, Y., Ma, A.J., Cheng, H., Peng, P., Huang, F., Ji, R., Sun, X.: Removing the background by adding the background: Towards background robust self-supervised video representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 11804–11813 (June 2021)
[3] Han, T., Xie, W., Zisserman, A.: Memory-augmented dense predictive coding for video representation learning. In: European Conference on Computer Vision (2020)
[4] Benaim, S., Ephrat, A., Lang, O., Mosseri, I., Freeman, W.T., Rubinstein, M., Irani, M., Dekel, T.: Speednet: Learning the speediness in videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 9922–9931 (2020)
[5] Xu, D., Xiao, J., Zhao, Z., Shao, J., Xie, D., Zhuang, Y.: Self-supervised spatiotemporal learning via video clip order prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 10334–10343 (2019)


No description, website, or topics provided.






No releases published


No packages published
