I²Transformer: Intra- and Inter-relation Embedding Transformer for TV Show Captioning

This package contains the accompanying code for the following paper:

Tu, Yunbin, et al. "I²Transformer: Intra- and Inter-relation Embedding Transformer for TV Show Captioning.", which has appeared as regular paper in IEEE TIP。

We illustrate the training details as follows:

1. Prepare feature files

Download tvc_feature_release.tar.gz (23GB). After downloading the file, extract it to the data directory.

tar -xf path/to/tvc_feature_release.tar.gz -C data

You should be able to see video_feature under data/tvc_feature_release directory. It contains video features (ResNet, I3D, ResNet+I3D). Plase note that this code only used the features of ResNet+I3D.

2. Install dependencies:

Ubuntu 16.04
Python 2.7
PyTorch 1.1.0
nltk
easydict
tqdm
h5py
tensorboardX
An RTX 2080Ti

3. Add project root to `PYTHONPATH`

source setup.sh

Note that you need to do this each time you start a new session.

4. Git clone the Microsoft COCO evaluation server to evaluate captions and place it in the dir: 'I2Transformer/standalone_eval/'

git clone https://github.com/tylin/coco-caption.git

5. Build Vocabulary

You could skip this step because this file has been provided.

bash baselines/multimodal_transformer/scripts/build_vocab.sh

Running this command will build vocabulary cache/tvc_word2idx.json from TVC train set.

6. I²Transformer training

bash baselines/multimodal_transformer/scripts/train.sh video_sub resnet_i3d

This code will load all the data (~30GB) into RAM to speed up training, use --no_core_driver to disable this behavior.

Training using the above config will stop at around epoch 22, around 7 hours with a single 2080Ti GPU. You should get ~47.2 CIDEr and ~11.4 BLEU@4 scores on val set. The resulting model and config will be saved at a dir: baselines/multimodal_transformer/results/video_sub-res-*

7. I²Transformer inference

After training, you can inference using the saved model on val or test_public set:

bash baselines/multimodal_transformer/scripts/translate.sh MODEL_DIR_NAME SPLIT_NAME

MODEL_DIR_NAME is the name of the dir containing the saved model, e.g., video_sub-res-*. SPLIT_NAME could be val or test_public.

8. Our results

The generated captions and evaluation scores on the val and test_public set are in the dir: 'our_results'

Citing

If you find this helps your research, please consider citing:

@article{tu2022i2transformer,
  title={I2Transformer: Intra-and Inter-relation Embedding Transformer for TV Show Captioning},
  author={Tu, Yunbin and Li, Liang and Su, Li and Gao, Shengxiang and Yan, Chenggang and Zha, Zheng-Jun and Yu, Zhengtao and Huang, Qingming},
  journal={IEEE Transactions on Image Processing},
  year={2022},
  publisher={IEEE}
}

Contact

My email is tuyunbin1995@foxmail.com

Any discussions and suggestions are welcome!

Acknowledgement

This work and code are inspired by TVCaption. Thanks for their solid work!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

baselines

baselines

cache

cache

data

data

our_result

our_result

standalone_eval

standalone_eval

utils

utils

README.md

README.md

init.py

init.py

setup.sh

setup.sh

Repository files navigation

I²Transformer: Intra- and Inter-relation Embedding Transformer for TV Show Captioning

We illustrate the training details as follows:

1. Prepare feature files

2. Install dependencies:

3. Add project root to `PYTHONPATH`

4. Git clone the Microsoft COCO evaluation server to evaluate captions and place it in the dir: 'I2Transformer/standalone_eval/'

5. Build Vocabulary

6. I²Transformer training

7. I²Transformer inference

8. Our results

Citing

Contact

Acknowledgement

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
baselines		baselines
cache		cache
data		data
our_result		our_result
standalone_eval		standalone_eval
utils		utils
README.md		README.md
__init__.py		__init__.py
setup.sh		setup.sh

tuyunbin/I2Transformer

Folders and files

Latest commit

History

Repository files navigation

I2Transformer: Intra- and Inter-relation Embedding Transformer for TV Show Captioning

We illustrate the training details as follows:

1. Prepare feature files

2. Install dependencies:

3. Add project root to PYTHONPATH

4. Git clone the Microsoft COCO evaluation server to evaluate captions and place it in the dir: 'I2Transformer/standalone_eval/'

5. Build Vocabulary

6. I2Transformer training

7. I2Transformer inference

8. Our results

Citing

Contact

Acknowledgement

About

Resources

Stars

Watchers

Forks

Languages

I²Transformer: Intra- and Inter-relation Embedding Transformer for TV Show Captioning

3. Add project root to `PYTHONPATH`

6. I²Transformer training

7. I²Transformer inference