Official implementation of CycDA [arXiv]
Author HomePage
Although action recognition has achieved impressive results over recent years, both collection and annotation of video training data are still time-consuming and cost intensive. Therefore, image-to-video adaptation has been proposed to exploit labeling-free web image source for adapting on unlabeled target videos. This poses two major challenges: (1) spatial domain shift between web images and video frames; (2) modality gap between image and video data. To address these challenges, we propose Cycle Domain Adaptation (CycDA), a cycle-based approach for unsupervised image-to-video domain adaptation. We leverage the joint spatial information in images and videos on the one hand and, on the other hand, train an independent spatio-temporal model to bridge the modality gap. We alternate between the spatial and spatio-temporal learning with knowledge transfer between the two in each cycle. We evaluate our approach on benchmark datasets for image-to-video as well as for mixed-source domain adaptation achieving state-of-the-art results and demonstrating the benefits of our cyclic adaptation.
- Our experiments run on Python 3.6 and PyTorch 1.7. Other versions should work but are not tested.
- All dependencies can be installed using pip:
python -m pip install -r requirements.txt
-
Image Datasets
-
Video Datasets
-
Other required data for BU101→UCF101 can be downloaded here
UCF_mapping.txt
: mapping file ofclass_id action_name
list_BU2UCF_img_new.txt
: list of image files (resized)img_path label
list_ucf_all_train_split1.txt
: list of training videosvideo_path label
list_ucf_all_val_split1.txt
: list of test videosvideo_path label
After extracting frames of videos
list_frames_train_split1.txt
: list of frames of training videosframe_path label
list_frames_val_split1.txt
: list of frames of test videosframe_path label
ucf_all_vid_info.npy
: dictionary of {videoname: (n_frames, label)}
InceptionI3d_pretrained/rgb_imagenet.pt
: I3D Inception v1 pretrained on the Kinetics dataset -
Data structure
DATA_PATH/ UCF-HMDB_all/ UCF/ CLASS_01/ VIDEO_0001.avi VIDEO_0002.avi ... CLASS_02/ ... ucf_all_imgs/ all/ CLASS_01/ VIDEO_0001/ img_00001.jpg img_00002.jpg ... VIDEO_0002/ ... BU101/ img_00001.jpg img_00002.jpg ...
Here we release the codes for training in separate stages on BU101→UCF101. Scripts for a complete cycle are still in construction.
- stage 1 - Class-agnostic spatial alignment
train the image model, output frame-level pseudo labels
python bu2ucf_train_stage1.py
- stage 3 - Class-aware spatial alignment
train the image model with video-level pseudo labels from stage 2, output frame-level pseudo labels
python bu2ucf_train_stage3.py --target_train_vid_ps_label ps_from_stage2.npy
- stage 2 & 4 - Spatio-temporal learning
- video model training: train the video model with frame-level pseudo labels from the image model
- specify in the config file:
data_dir
: main data directorypseudo_gt_dict
: frame-level pseudo labels from image model trainingpretrained_model_path
: path of pretrained modelwork_main_dir
: directory of training results and logsps_thresh
: confidence threshold p of frame-level thresholding
- specify in
mmaction/utils/update_config.py
: path oftarget_train_vid_info
- specify the
--gpu-ids
used for training
./tools/dist_train_da.sh configs/recognition/i3d/ucf101/i3d_incep_da_kinetics400_rgb_video_1x64_strid1_test3clip_128d_w_ps_img0.8_targetonly_split1.py 4 --gpu-ids 0 1 2 3 --validate
- specify in the config file:
- pseudo label computation: compute video-level pseudo labels
model_path=your_model_path ps_name=epoch_20_test1clip python tools/test_da.py configs/recognition/i3d/ucf101/i3d_incep_da_kinetics400_rgb_video_1x64_strid1_test1clip_128d_split2_compute_ps.py $model_path --out $model_dir/$ps_name.json --eval top_k_accuracy
- video model training: train the video model with frame-level pseudo labels from the image model
Thanks for citing our paper:
@inproceedings{lin2022cycda,
title={CycDA: Unsupervised Cycle Domain Adaptation to Learn from Image to Video},
author={Lin, Wei and Kukleva, Anna and Sun, Kunyang and Possegger, Horst and Kuehne, Hilde and Bischof, Horst},
booktitle={Computer Vision--ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23--27, 2022, Proceedings, Part III},
pages={698--715},
year={2022},
organization={Springer}
}
Some codes are adapted from mmaction2 and pytorch-i3d