Learnable Motion-Focused Tokenization for Effective and Efficient Video Unsupervised Domain Adaptation

Code for "Learnable Motion-Focused Tokenization for Effective and Efficient Video Unsupervised Domain Adaptation" at CVPR 2026.

Tzu Ling Liu, Ian Stavness, Mrigank Rochan
Computer Science, University of Saskatchewan, Canada

Abstract

Video Unsupervised Domain Adaptation (VUDA) poses a significant challenge in action recognition, requiring the adaptation of a model from a labeled source domain to an unlabeled target domain. Despite recent advances, existing VUDA methods often fall short of fully supervised performance, a key reason being the prevalence of static and uninformative backgrounds that exacerbate domain shifts. Additionally, prior approaches largely overlook computational efficiency, limiting real-world adoption. To address these issues, we propose Learnable Motion-Focused Tokenization (LMFT) for VUDA. LMFT tokenizes video frames into patch tokens and learns to discard low-motion, redundant tokens, primarily corresponding to background regions, while retaining motion-rich, action-relevant tokens for adaptation. Extensive experiments on three standard VUDA benchmarks across 21 domain adaptation settings show that our VUDA framework with LMFT achieves state-of-the-art performance while significantly reducing computational overhead. LMFT thus enables VUDA that is both effective and computationally efficient.

Approach Overview

Installation

Clone our repo recursively:

git clone --recursive https://github.com/ywa826/lmft.git
cd lmft

The recursive option is to make sure that all submodules are cloned along with the main repository.

Environment Setup

Create a conda environment to run LMFT. Our environment is provided in environment.yaml. You can create your own by running:

conda env create --name lmft --file environment.yaml
conda activate lmft

Compile Decord

Our codebase integrates the fast decoding operations from AVION to ensure efficient video processing. Users are encouraged to compile the decord library from source by these instructions following the AVION setup documentation. To build the core library, run the following commands:

cd third_party/decord
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make

After compilation, configure the Python environment by installing the bindings:

cd ../python
python3 setup.py install --user

If you run into any path or compilation issues, please check the detailed setup instructions. As a fallback, you can easily install the standard version using pip install decord. However, keep in mind that it may introduce severe dataloading bottlenecks during training.

Datasets

Please download the datasets from their original sources and update ./data/default.yaml with the correct paths to your .txt annotations files, which should be of the form /path/to/video.mp4,<class_id>.

Pseudo Labeling

We use CLIP for psudo labebeling the target data during adaptation. Please run the ./data/pseudo_labelling.py to get the pseudo labels.

Example command (UCF101):

  python ./data/pseudo_labelling.py \
  --metadata_path /yourpath/UCF101_train.txt \
  --output_pseudo_path /yourpath/UCF101_train_pseudo_labels.txt \
  --output_filtered_gt_path /yourpath/UCF101_train_filtered_gt.txt \
  --dataset_name "hmdb_ucf" \

Setup Datasets Path

For ease of use, we recommend setting the following environment variables to avoid having to specify the data path in the config file.

Example command (HMDB51 → UCF101):

export SOURCE_TRAIN_METADATA=/yourpath/HMDB51_train.txt
export TARGET_TRAIN_METADATA=/yourpath/UCF101_train.txt
export VAL_METADATA=/yourpath/UCF101_val.txt
export TEST_METADATA=/yourpath/UCF101_test.tx

Pretrained Checkpoints

We provide checkpoints for UCF-HMDB_full. These models are finetuned from the original VideoMAE-2 repo.

Model	Dataset	Top-1 Accuracy	Pretrained Checkpoint
ViT-B/16	HMDB51 → UCF101	98.6	Download Link
ViT-B/16	UCF101 → HMDB51	94.2	Download Link

Evaluation

To run standard evaluation on KHMDB51 → UCF101, please use the following command:

# eval on HMDB51 → UCF101
python src/train.py experiment=h_u_val.yaml

Training

Similarly, to tain the model, use the following command:

# train on HMDB51 → UCF101
python src/train.py experiment=h_u.yaml

LMFT is turned on by default. You can turn it off by setting model.tokenizer_cfg.drop_policy='none', which instructs the framework to use standard tokenization instead.

Citation

If you find our paper useful, please cite our work:

@article{liu2026learnable,
author={Liu, Tzu Ling and Stavness, Ian and Rochan, Mrigank},
title = {Learnable Motion-Focused Tokenization for Effective and Efficient Video
Unsupervised Domain Adaptation},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year = {2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
configs		configs
src		src
third_party/decord		third_party/decord
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learnable Motion-Focused Tokenization for Effective and Efficient Video Unsupervised Domain Adaptation

Abstract

Approach Overview

Installation

Environment Setup

Compile Decord

Datasets

Pseudo Labeling

Setup Datasets Path

Pretrained Checkpoints

Evaluation

Training

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Learnable Motion-Focused Tokenization for Effective and Efficient Video Unsupervised Domain Adaptation

Abstract

Approach Overview

Installation

Environment Setup

Compile Decord

Datasets

Pseudo Labeling

Setup Datasets Path

Pretrained Checkpoints

Evaluation

Training

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages