This study proposes using edge detection to suppress static bias in action recognition tasks. However, introducing edge frames causes a domain shift with RGB information, leading to dropped recognition performance. To address this, we propose a lightweight domain adaptation method called MoExDA (Moment Exchange Domain Adaptation), which performs moment exchange within a Vision Transformer to bridge the gap between RGB and edge information, thereby mitigating performance degradation.
conda env create -f environment.ymlUse make_shards.py (inside the make_shards folder) to create dataset shards.
For more details, refer to tamaki-lab/webdataset-video.
python3 make_shards/make_shards.py \
-s ./2024_sugimoto_edge/datasets/UCF_shards_train \
-d /path/to/your/raw_UCF101_train \
-p UCF101 \
-w 32 \
--max_size_gb 1python3 make_shards/make_shards.py \
-s ./2024_sugimoto_edge/datasets/UCF_shards_val \
-d /path/to/your/raw_UCF101_val \
-p UCF101 \
-w 32 \
--max_size_gb 1python3 make_shards/make_shards.py \
-s ./2024_sugimoto_edge/datasets/HMDB51_shards_train \
-d /path/to/your/raw_HMDB51_train \
-p HMDB51 \
-w 32 \
--max_size_gb 0.3python3 make_shards/make_shards.py \
-s ./2024_sugimoto_edge/datasets/HMDB51_shards_val \
-d /path/to/your/raw_HMDB51_val \
-p HMDB51 \
-w 32 \
--max_size_gb 0.1We use a subset of Kinetics50 overwrapping only Mimetics classes.
python3 make_shards/make_mimetics_shards.py \
-s ./2024_sugimoto_edge/datasets/Kinetics50_shards \
-d /path/to/your/raw_Kinetics400_train \
-p Kinetics50 \
-w 32 \
--max_size_gb 10python3 make_shards/make_mimetics_shards.py \
-s ./2024_sugimoto_edge/datasets/Kinetics50_shards \
-d /path/to/your/raw_Kinetics400_val \
-p Kinetics50 \
-w 32 \
--max_size_gb 1After completion, organize the dataset into the following structure:
2024_sugimoto_edge/
├── datasets/
│ ├── UCF101_shards/
│ │ ├── train/
│ │ └── val/
│ ├── HMDB51_shards/
│ │ ├── train/
│ │ └── val/
│ └── Kinetics50_shards/
│ ├── train/
│ └── val/Refer to ./model/Moex_Video_Visiontransformer/moex_video_visiontransformer.py for the model implementation.
usage: python main_pl.py [-h]
[--use_moex]
[--moex_layers ML [ML ...]]
[--norm_type {in,pono}]
[--position_moex {BeforeMHA,AfterMHA,BeforeMLP,AfterMLP,AfterResidual}]
[--exchange_direction {edge_to_rgb,rgb_to_edge,bidirectional}]
[--stop_gradient {True,False}]-
PONO (Positional Normalization) To use PONO for moment calculation:
-norm pono
See the PONO paper for details.
-
IN (Instance Normalization) To use IN for moment calculation:
-norm inSee the IN paper for details.
Select which layers (out of 12 ViT layers) to insert MoExDA:
-ml 0 1 2 # Inserted layer 0~2
-ml 0 1 ... 11 # Inserted all layersSelect where to insert within the TransformerBlock:
-pos_moex AfterMHA # AfterMHA
-pos_moex AfterMLP # AfterMLPSelect the direction of moment exchange:
-ex_direction edge_to_rgb
-ex_direction rgb_to_edge
-ex_direction bidirectional-stop_grad False # Without stop gradient
-stop_grad True # With stop gradientConfiguration:
-
Moment Calculation Method:
PONO -
Exchange Direction:
edge_to_rgb -
Position of Moment Exchange:
AfterMHA -
Number of Layers: All layers
(0–11) -
Stop Gradient:
False
python main_pl.py \
-d Mimetics_wds \
--shards_path ./datasets/Kinetics50_shards \
-w 8 -b 2 -e 10 -lr 3e-4 \
--optimizer SGD \
-m Moexlayervit \
--log_interval_steps 10 \
--scheduler CosineAnnealingLR \
--use_moex \
-norm pono \
-ex_direction edge_to_rgb \
-pos_moex AfterMHA \
-ml 0 1 2 3 4 5 6 7 8 9 10 11 \
-stop_grad False \
--use_pretrained
You can download the trained weights for this configuration from:
MoExDA_PONO_AfterMHA_All_layers_False.ckpt
Use the --use_pretrained option to load the downloaded .ckpt file.
We use a static bias evaluation dataset generated with the HAT toolkit. Download the dataset from princetonvisualai/HAT .
To enable static bias evaluation, simply add --use_hat to the example command.
python main_pl.py \
-d Mimetics_wds \
--shards_path ./datasets/Kinetics50_shards \
-w 8 -b 2 -e 10 -lr 3e-4 \
--optimizer SGD \
-m Moexlayervit \
--log_interval_steps 10 \
--scheduler CosineAnnealingLR \
--use_moex \
-norm pono \
-ex_direction edge_to_rgb \
-pos_moex AfterMHA \
-ml 0 1 2 3 4 5 6 7 8 9 10 11 \
-stop_grad False \
--use_pretrained \
--use_hat
@inproceedings{sugimoto2025moexda,
author = {Takuya Sugimoto and Ning Ding and Toru Tamaki},
title = {MoExDA: Domain Adaptation for Edge‐based Action Recognition},
booktitle = {Proceedings of the 19th International Conference on Machine Vision Applications (MVA 2025)},
year = {2025},
month = jul,
day = {26--28},
address = {Kyoto, Japan},
note = {Oral presentation (O2‑1‑2)},
}
