Skip to content

videoturingtest/tapm

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Transitional Adaptation of Pretrained Models for Visual Storytelling (TAPM)

  • Authors: Youngjae Yu1∗, Jiwan Chung*, Heeseung Yun, Jongseok Kim, Gunhee Kim
  • Paper: CVPR2021

Introduction

PyTorch code for the CVPR 2021 paper "Transitional Adaptation of Pretrained Models for Visual Storytelling".

We propose an explicit visual adaptation step to harmonize the visual encoder with the pretrained language models. Our simple adaptation objective aims to bridge the gap between the nature of the information stored in the visual encoder and the language decoder.

model architecture figure

Requirements

Python 3.7 PyTorch 1.5

The other dependencies are specified in the requirements.txt file.

installation

git clone $THIS_REPO
cd $THIS_REPO
pip install requirements_primary.txt
pip install requirements.txt

download stanfordnlp.download('en_ewt')

Data Preparation

Store the datasets in $THIS_REPO/data e.g. data/LSMDC and data/VIST

For detailed instructions on how to extract relevant features, please refer to our guide on Dataset Preperation

LSMDC 2019

Please follow the instructions on Download to download the dataset.

Text

From the downloaded files, extract and move the task1 folder to under $THIS_REPO/data/LSMDC directory.

Features

The above link contains the two features: I3D and Resnet152. Extract and move both features to under $THIS_REPO/data/LSMDC/features directory.

VIST

Please follow the instructions on Download to download the dataset.

Text

Download the Stories of Images-in-Sequence (SIS) set, extract and move the folder to under $THIS_REPO/data/VIST directory. e.g. data/VIST/sis

Features

The above link contains the raw image files.

Images

Use Resnet152 pretrained on ImageNet to extract features for each image. Store the features with numpy.save following the below structure.

resnet/
  train/
    {image_id}.npy
  test/
  val/

Box

Use Faster-RCNN model to extract object classification logits. Store the features with numpy.save following the below structure.

rcnn/
  train/
    {image_id}.npy
  test/
  val/

vilbert

Use VILBERT model to extract last hidden state vector. Store the features with pickle.dump following the below structure.

rcnn/
  train/
    {album_id}/
      {image_id}.pickle
  test/
  val/

Train

LSMDC 2019

cd code
python cli.py train with model=no_gt_sos fix_gpt_epoch=5 feature_names="['video', 'images']"

VIST

python cli.py train with model=no_gt_sos fix_gpt_epoch=3 feature_names="['images', 'box']" use_vist=True

with additional vilbert features

cd code
python cli.py train with model=no_gt_sos fix_gpt_epoch=3 feature_names="['images', 'box', 'vilbert']" use_vist=True

Run Scripts

python cli.py scripts with cript=[SCRIPT_NAME] (additional args)

Please take a look at the config.py file for more options.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 60.1%
  • Python 39.9%