Skip to content

Joint learning of images and videos with a single Vision Transformer

Notifications You must be signed in to change notification settings

tamaki-lab/IV-ViT

Repository files navigation

1. Joint learning of images and videos with a single Transformer

This is an official repo of IV-ViT paper "Joint learning of images and videos with a single Vision Transformer".

@inproceedings{Shimizu_MVA2023_IV_VIT,
  author       = {Shuki Shimizu and Toru Tamaki},
  title        = {Joint learning of images and videos with a single Vision Transformer},
  booktitle    = {18th International Conference on Machine Vision and Applications,
                  {MVA} 2023, Hamamatsu, Japan, July 23-25, 2023},
  pages        = {1--6},
  publisher    = {{IEEE}},
  year         = {2023},
  url          = {https://doi.org/10.23919/MVA57639.2023.10215661>,
  doi          = {10.23919/MVA57639.2023.10215661},
}

1.1. Preparing

You can learn our proposed IV-ViT in various settings with this code. You will need to make the following preparations.

  1. prepare datasets
  2. prepare pretrained weights
  3. prepare library

1.1.1. 1.datasets

In this code, Tiny-ImageNet and CIFAR100 can be used as image datasets, and UCF101 and mini-Kinetics as video datasets. You need to prepare the datasets under datasets/ with the following directory structure.

datasets/
  ├─Tiny-ImageNet/
  │   └─tiny-imagenet/
  │       ├─train/
  │       │   ├─[category0]/
  │       │   ├─[category1]/
  │       │   ├─...
  │       │
  │       └─val/
  │           ├─[category0]/
  │           ├─[category1]/
  │           ├─...
  │
  ├─CIFAR100/
  │   └─cifar-100-python/
  │
  ├─UCF101/
  │   └─ucfTrainTestlist/
  │       ├─trainlist01.txt
  │       └─testlist01.txt
  │
  └─Kinetics200/
      ├─train/
      │   ├─[category0]/
      │   ├─[category1]/
      │   ├─...
      │
      └─val/
          ├─[category0]/
          ├─[category1]/
          ├─...

1.1.2. pretrained weights

In this paper we use multiple pretrained weights. You will need to download the pretrained weights under pretrained_weight/ with the following directory structure.

pretrained_weight/
  ├─ImageNet21k/
  │   └─video_model_imagenet21k_pretrained.pth
  └─Kinetics400/
      └─video_model_kinetics_pretrained.pth

You can download each pretrained weight with the following command.

sh download.sh

1.1.3. library

You can install all libraries required by this code with the following command.

pip install -r requirements.txt

1.2. Quick Start

You can do two things with this code, model training and searching hyperparameters.

1.2.1. Training

python main.py --mode train

1.2.2. Serching hyperparameters

python main.py --mode optuna

1.3. Training (Detailed)

We use argument to manage the experimental setup. The following is a description of the main arguments (see args.py for details).

  • i (int): you can set training iteration.
  • bsl (list[int]): you can set batch size for each datset. Then you must same order with dataset (dn).
  • dn (list[string]): you can set dataset with following choices: [Tiny-ImageNet, CIFAR100, UCF101, Kinetics200].
  • model (string): you can set model with following choices: [IV-ViT, TokenShift, MSCA, polyvit].
  • pretrain (string): you can set pretrained weights with following choices: [Kinetics400, ImageNet-21k, ImageNet-1k, polyvit].
  • use_comet (bool): you can select if use comet or not by given use_comet.
  • root_path (string): you must set root path, e.g. ~/data_root/.

1.3.1. example

For example, you want to train with following settings.

  • iteration:10000
  • batch size
    • Tiny-ImageNet:16
    • CIFAR100:16
    • UCF101:4
    • Kinetics200:4
  • model:IV-ViT
  • pretrain weight:Kinetics400
  • you don't use comet
  • root path:~/data_root/

Then, you execute the following command.

python main.py -i 10000 -bsl 16 16 4 4 -dn Tiny-ImageNet CIFAR100 UCF101 Kinetics200 --model IV-ViT --pretrain Kinetics400 --use_comet --root_path ~/data_root/

About

Joint learning of images and videos with a single Vision Transformer

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published