1. MSCA: Temporal Cross-attention for Action Recognition

1. MSCA: Temporal Cross-attention for Action Recognition

This is an official repo of paper "Temporal Cross-attention for Action Recognition" at ACCV2022 Workshop on Vision Transformers: Theory and applications (VTTA-ACCV2022).

1.1. Citation

@InProceedings{Hashiguchi_2022_ACCV,
    author    = {Hashiguchi, Ryota and Tamaki, Toru},
    title     = {Temporal Cross-attention for Action Recognition},
    booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV) Workshops},
    month     = {December},
    year      = {2022},
    pages     = {276-288}
}

1.2. Acknowledgement

We thank for the author of TokenShift:

https://github.com/VideoNetworks/TokShift-Transformer

1.4. Implementation

MSCA is build upon TokenShift.

1.5. Weights

Download ImageNet-22k pretrained weights from Base16.

1.6. Prepare dataset

Prepare Kinetics-400 dataset organized in the following structure.

Almost same with TokenShift, with slight modificaitons. See config file.

k400
|_ frames331_train
|  |_ [category name 0]
|  |  |_ [video name 0]
|  |  |  |_ img_00001.jpg
|  |  |  |_ img_00002.jpg
|  |  |  |_ ...
|  |  |
|  |  |_ [video name 1]
|  |  |   |_ img_00001.jpg
|  |  |   |_ img_00002.jpg
|  |  |   |_ ...
|  |  |_ ...
|  |
|  |_ [category name 1]
|  |  |_ [video name 0]
|  |  |  |_ img_00001.jpg
|  |  |  |_ img_00002.jpg
|  |  |  |_ ...
|  |  |
|  |  |_ [video name 1]
|  |  |   |_ img_00001.jpg
|  |  |   |_ img_00002.jpg
|  |  |   |_ ...
|  |  |_ ...
|  |_ ...
|
|_ frames331_val
|  |_ [category name 0]
|  |  |_ [video name 0]
|  |  |  |_ img_00001.jpg
|  |  |  |_ img_00002.jpg
|  |  |  |_ ...
|  |  |
|  |  |_ [video name 1]
|  |  |   |_ img_00001.jpg
|  |  |   |_ img_00002.jpg
|  |  |   |_ ...
|  |  |_ ...
|  |
|  |_ [category name 1]
|  |  |_ [video name 0]
|  |  |  |_ img_00001.jpg
|  |  |  |_ img_00002.jpg
|  |  |  |_ ...
|  |  |
|  |  |_ [video name 1]
|  |  |   |_ img_00001.jpg
|  |  |   |_ img_00002.jpg
|  |  |   |_ ...
|  |  |_ ...
|  |_ ...
|
|_ trainValTest
   |_ train.txt
   |_ val.txt

1.7. train and val

python main.py --tune_from pretrain/ViT-B_16_Img21.npz --cfg config/custom/kinetics400/k400_attentionshift_div4_8x32_base_224.yml

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
config		config
data		data
demo		demo
model		model
timm		timm
utils		utils
vis		vis
vit_models		vit_models
.gitignore		.gitignore
MSCA.diff		MSCA.diff
README.md		README.md
main.py		main.py
test.sh		test.sh
train.sh		train.sh

tamaki-lab/MSCA

Folders and files

Latest commit

History

Repository files navigation

1. MSCA: Temporal Cross-attention for Action Recognition

1.1. Citation

1.2. Acknowledgement

1.4. Implementation

1.5. Weights

1.6. Prepare dataset

1.7. train and val

About

Resources

Stars

Watchers

Forks

Languages