Learnable Motion-Focused Tokenization for Effective and Efficient Video Unsupervised Domain Adaptation
Code for "Learnable Motion-Focused Tokenization for Effective and Efficient Video Unsupervised Domain Adaptation" at CVPR 2026.
Tzu Ling Liu, Ian Stavness, Mrigank Rochan
Computer Science, University of Saskatchewan, Canada
Clone our repo recursively:
git clone --recursive https://github.com/ywa826/lmft.git
cd lmftThe recursive option is to make sure that all submodules are cloned along with the main repository.
Create a conda environment to run LMFT. Our environment is provided in environment.yaml. You can create your own by running:
conda env create --name lmft --file environment.yaml
conda activate lmftOur codebase integrates the fast decoding operations from AVION to ensure efficient video processing. Users are encouraged to compile the decord library from source by these instructions following the AVION setup documentation. To build the core library, run the following commands:
cd third_party/decord
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
makeAfter compilation, configure the Python environment by installing the bindings:
cd ../python
python3 setup.py install --userIf you run into any path or compilation issues, please check the detailed setup instructions. As a fallback, you can easily install the standard version using pip install decord. However, keep in mind that it may introduce severe dataloading bottlenecks during training.
Please download the datasets from their original sources and update ./data/default.yaml with the correct paths to your .txt annotations files, which should be of the form /path/to/video.mp4,<class_id>.
We use CLIP for psudo labebeling the target data during adaptation. Please run the ./data/pseudo_labelling.py to get the pseudo labels.
Example command (UCF101):
python ./data/pseudo_labelling.py \
--metadata_path /yourpath/UCF101_train.txt \
--output_pseudo_path /yourpath/UCF101_train_pseudo_labels.txt \
--output_filtered_gt_path /yourpath/UCF101_train_filtered_gt.txt \
--dataset_name "hmdb_ucf" \
For ease of use, we recommend setting the following environment variables to avoid having to specify the data path in the config file.
Example command (HMDB51 → UCF101):
export SOURCE_TRAIN_METADATA=/yourpath/HMDB51_train.txt
export TARGET_TRAIN_METADATA=/yourpath/UCF101_train.txt
export VAL_METADATA=/yourpath/UCF101_val.txt
export TEST_METADATA=/yourpath/UCF101_test.tx
We provide checkpoints for UCF-HMDBfull. These models are finetuned from the original VideoMAE-2 repo.
| Model | Dataset | Top-1 Accuracy | Pretrained Checkpoint |
|---|---|---|---|
| ViT-B/16 | HMDB51 → UCF101 | 98.6 | Download Link |
| ViT-B/16 | UCF101 → HMDB51 | 94.2 | Download Link |
To run standard evaluation on KHMDB51 → UCF101, please use the following command:
# eval on HMDB51 → UCF101
python src/train.py experiment=h_u_val.yaml
Similarly, to tain the model, use the following command:
# train on HMDB51 → UCF101
python src/train.py experiment=h_u.yaml
LMFT is turned on by default. You can turn it off by setting model.tokenizer_cfg.drop_policy='none', which instructs the framework to use standard tokenization instead.
If you find our paper useful, please cite our work:
@article{liu2026learnable,
author={Liu, Tzu Ling and Stavness, Ian and Rochan, Mrigank},
title = {Learnable Motion-Focused Tokenization for Effective and Efficient Video
Unsupervised Domain Adaptation},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year = {2026}
}
