Skip to content

ywa826/lmft

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Learnable Motion-Focused Tokenization for Effective and Efficient Video Unsupervised Domain Adaptation

Code for "Learnable Motion-Focused Tokenization for Effective and Efficient Video Unsupervised Domain Adaptation" at CVPR 2026.

Tzu Ling Liu, Ian Stavness, Mrigank Rochan
Computer Science, University of Saskatchewan, Canada

Abstract

Video Unsupervised Domain Adaptation (VUDA) poses a significant challenge in action recognition, requiring the adaptation of a model from a labeled source domain to an unlabeled target domain. Despite recent advances, existing VUDA methods often fall short of fully supervised performance, a key reason being the prevalence of static and uninformative backgrounds that exacerbate domain shifts. Additionally, prior approaches largely overlook computational efficiency, limiting real-world adoption. To address these issues, we propose Learnable Motion-Focused Tokenization (LMFT) for VUDA. LMFT tokenizes video frames into patch tokens and learns to discard low-motion, redundant tokens, primarily corresponding to background regions, while retaining motion-rich, action-relevant tokens for adaptation. Extensive experiments on three standard VUDA benchmarks across 21 domain adaptation settings show that our VUDA framework with LMFT achieves state-of-the-art performance while significantly reducing computational overhead. LMFT thus enables VUDA that is both effective and computationally efficient.

Approach Overview

Installation

Clone our repo recursively:

git clone --recursive https://github.com/ywa826/lmft.git
cd lmft

The recursive option is to make sure that all submodules are cloned along with the main repository.

Environment Setup

Create a conda environment to run LMFT. Our environment is provided in environment.yaml. You can create your own by running:

conda env create --name lmft --file environment.yaml
conda activate lmft

Compile Decord

Our codebase integrates the fast decoding operations from AVION to ensure efficient video processing. Users are encouraged to compile the decord library from source by these instructions following the AVION setup documentation. To build the core library, run the following commands:

cd third_party/decord
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make

After compilation, configure the Python environment by installing the bindings:

cd ../python
python3 setup.py install --user

If you run into any path or compilation issues, please check the detailed setup instructions. As a fallback, you can easily install the standard version using pip install decord. However, keep in mind that it may introduce severe dataloading bottlenecks during training.

Datasets

Please download the datasets from their original sources and update ./data/default.yaml with the correct paths to your .txt annotations files, which should be of the form /path/to/video.mp4,<class_id>.

Pseudo Labeling

We use CLIP for psudo labebeling the target data during adaptation. Please run the ./data/pseudo_labelling.py to get the pseudo labels.

Example command (UCF101):

  python ./data/pseudo_labelling.py \
  --metadata_path /yourpath/UCF101_train.txt \
  --output_pseudo_path /yourpath/UCF101_train_pseudo_labels.txt \
  --output_filtered_gt_path /yourpath/UCF101_train_filtered_gt.txt \
  --dataset_name "hmdb_ucf" \

Setup Datasets Path

For ease of use, we recommend setting the following environment variables to avoid having to specify the data path in the config file.

Example command (HMDB51 → UCF101):

export SOURCE_TRAIN_METADATA=/yourpath/HMDB51_train.txt
export TARGET_TRAIN_METADATA=/yourpath/UCF101_train.txt
export VAL_METADATA=/yourpath/UCF101_val.txt
export TEST_METADATA=/yourpath/UCF101_test.tx

Pretrained Checkpoints

We provide checkpoints for UCF-HMDBfull. These models are finetuned from the original VideoMAE-2 repo.

Model Dataset Top-1 Accuracy Pretrained Checkpoint
ViT-B/16 HMDB51 → UCF101 98.6 Download Link
ViT-B/16 UCF101 → HMDB51 94.2 Download Link

Evaluation

To run standard evaluation on KHMDB51 → UCF101, please use the following command:

# eval on HMDB51 → UCF101
python src/train.py experiment=h_u_val.yaml 

Training

Similarly, to tain the model, use the following command:

# train on HMDB51 → UCF101
python src/train.py experiment=h_u.yaml 

LMFT is turned on by default. You can turn it off by setting model.tokenizer_cfg.drop_policy='none', which instructs the framework to use standard tokenization instead.

Citation

If you find our paper useful, please cite our work:

@article{liu2026learnable,
author={Liu, Tzu Ling and Stavness, Ian and Rochan, Mrigank},
title = {Learnable Motion-Focused Tokenization for Effective and Efficient Video
Unsupervised Domain Adaptation},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year = {2026}
}

About

Code for CVPR 2026 paper, "Learnable Motion-Focused Tokenization for Effective and Efficient Video Unsupervised Domain Adaptation".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages