1. Joint learning of images and videos with a single Transformer

1. Joint learning of images and videos with a single Transformer

This is an official repo of IV-ViT paper "Joint learning of images and videos with a single Vision Transformer".

@inproceedings{Shimizu_MVA2023_IV_VIT,
  author       = {Shuki Shimizu and Toru Tamaki},
  title        = {Joint learning of images and videos with a single Vision Transformer},
  booktitle    = {18th International Conference on Machine Vision and Applications,
                  {MVA} 2023, Hamamatsu, Japan, July 23-25, 2023},
  pages        = {1--6},
  publisher    = {{IEEE}},
  year         = {2023},
  url          = {https://doi.org/10.23919/MVA57639.2023.10215661>,
  doi          = {10.23919/MVA57639.2023.10215661},
}

1.1. Preparing

You can learn our proposed IV-ViT in various settings with this code. You will need to make the following preparations.

prepare datasets
prepare pretrained weights
prepare library

1.1.1. 1.datasets

In this code, Tiny-ImageNet and CIFAR100 can be used as image datasets, and UCF101 and mini-Kinetics as video datasets. You need to prepare the datasets under datasets/ with the following directory structure.

datasets/
  ├─Tiny-ImageNet/
  │   └─tiny-imagenet/
  │       ├─train/
  │       │   ├─[category0]/
  │       │   ├─[category1]/
  │       │   ├─...
  │       │
  │       └─val/
  │           ├─[category0]/
  │           ├─[category1]/
  │           ├─...
  │
  ├─CIFAR100/
  │   └─cifar-100-python/
  │
  ├─UCF101/
  │   └─ucfTrainTestlist/
  │       ├─trainlist01.txt
  │       └─testlist01.txt
  │
  └─Kinetics200/
      ├─train/
      │   ├─[category0]/
      │   ├─[category1]/
      │   ├─...
      │
      └─val/
          ├─[category0]/
          ├─[category1]/
          ├─...

1.1.2. pretrained weights

In this paper we use multiple pretrained weights. You will need to download the pretrained weights under pretrained_weight/ with the following directory structure.

pretrained_weight/
  ├─ImageNet21k/
  │   └─video_model_imagenet21k_pretrained.pth
  └─Kinetics400/
      └─video_model_kinetics_pretrained.pth

You can download each pretrained weight with the following command.

sh download.sh

1.1.3. library

You can install all libraries required by this code with the following command.

pip install -r requirements.txt

1.2. Quick Start

You can do two things with this code, model training and searching hyperparameters.

1.2.1. Training

python main.py --mode train

1.2.2. Serching hyperparameters

python main.py --mode optuna

1.3. Training (Detailed)

We use argument to manage the experimental setup. The following is a description of the main arguments (see args.py for details).

i (int): you can set training iteration.
bsl (list[int]): you can set batch size for each datset. Then you must same order with dataset (dn).
dn (list[string]): you can set dataset with following choices: [Tiny-ImageNet, CIFAR100, UCF101, Kinetics200].
model (string): you can set model with following choices: [IV-ViT, TokenShift, MSCA, polyvit].
pretrain (string): you can set pretrained weights with following choices: [Kinetics400, ImageNet-21k, ImageNet-1k, polyvit].
use_comet (bool): you can select if use comet or not by given use_comet.
root_path (string): you must set root path, e.g. ~/data_root/.

1.3.1. example

For example, you want to train with following settings.

iteration：10000
batch size
- Tiny-ImageNet：16
- CIFAR100：16
- UCF101：4
- Kinetics200：4
model：IV-ViT
pretrain weight：Kinetics400
you don't use comet
root path：~/data_root/

Then, you execute the following command.

python main.py -i 10000 -bsl 16 16 4 4 -dn Tiny-ImageNet CIFAR100 UCF101 Kinetics200 --model IV-ViT --pretrain Kinetics400 --use_comet --root_path ~/data_root/

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
datasets		datasets
models		models
.gitignore		.gitignore
README.md		README.md
args.py		args.py
comet.py		comet.py
config.ini		config.ini
download.sh		download.sh
main.py		main.py
requirement.txt		requirement.txt
train.py		train.py
train_one_iter.py		train_one_iter.py
train_optuna.py		train_optuna.py
util.py		util.py
validation.py		validation.py

tamaki-lab/IV-ViT

Folders and files

Latest commit

History

Repository files navigation

1. Joint learning of images and videos with a single Transformer

1.1. Preparing

1.1.1. 1.datasets

1.1.2. pretrained weights

1.1.3. library

1.2. Quick Start

1.2.1. Training

1.2.2. Serching hyperparameters

1.3. Training (Detailed)

1.3.1. example

About

Resources

Stars

Watchers

Forks

Languages