Skip to content
/ VSD Public

Code for "Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text Generation"

Notifications You must be signed in to change notification settings

zhaoyucs/VSD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Visual Description Description

  • The datasets VSDv2 are available now.

This repository cotains code and data for our paper Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text Generation

** Note ** Please go into VLT5 and follow the README there for Pretrained Models and Feature Extraction.

Setup

# Create python environment (optional)
conda create -n vsd python=3.7
source activate vsd

# Install python dependencies
pip install -r requirements.txt

# For captioning evaluation
python -c "import language_evaluation; language_evaluation.download('coco')"

Code structure

# Store images, features, and annotations
./datasets

# Image feature extraction
./feature_extraction

# Train VL-T5
./VL-T5/
    src/
        modeling_t5.py modeling_bart.py                       <= VL-T5/VL-BART model classes
        caption_sp.py, vrd_caption.py                         <= fine-tuning
        param.py                                              <= (argparse) configuration
        tokenization.py                                       <= custom tokenizer
        utils.py, dist_utils.py                               <= utility functions
    snap/                                                     <= store weight checkpoints

Pretrained Models

  • pretrained VL-BART and VL-T5 are provided by [1]
  • Download snap/ from Google Drive
gdrive download 1_SBj4sZ0gUqfBon1gFBiNRAmfHv5w_ph --recursive

Run

bash ./baseline.sh gpu_num
bash ./end2end.sh gpu_num

Acknowledgement

This repo is adapted from VLT5.

Reference

Please cite our paper if you use our models or data in your project.

@inproceedings{zhao2022vsd,
  title     = {Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text
               Generation},
  author    = {Yu Zhao and
               Jianguo Wei and
               Zhichao Lin and
               Yueheng Sun and
               Meishan Zhang and
               Min Zhang},
  booktitle = {EMNLP},
  year      = {2022}
}

About

Code for "Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text Generation"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published