Self-Supervised Cross-View Representation Reconstruction for Change Captioning

This package contains the accompanying code for the following paper:

Tu, Yunbin, et al. "Self-Supervised Cross-View Representation Reconstruction for Change Captioning", which has appeared as a regular paper in ICCV 2023.

We illustrate the training details as follows:

Installation

Clone this repository
cd SCORER
Make virtual environment with Python 3.8
Install requirements (pip install -r requirements.txt)
Setup COCO caption eval tools (github)
An NVIDA 3090 GPU or others.

Data

Download data from here: google drive link

python google_drive.py 1HJ3gWjaUJykEckyb2M0MB4HnrJSihjVe clevr_change.tar.gz
tar -xzvf clevr_change.tar.gz

Extracting this file will create data directory and fill it up with CLEVR-Change dataset.

Preprocess data

We are providing the preprocessed data here: google drive link. You can skip the procedures explained below and just download them using the following command:

python google_drive.py 1FA9mYGIoQ_DvprP6rtdEve921UXewSGF ./data/clevr_change_features.tar.gz
cd data
tar -xzvf clevr_change_features.tar.gz

Extract visual features using ImageNet pretrained ResNet-101:

# processing default images
python scripts/extract_features.py --input_image_dir ./data/images --output_dir ./data/features --batch_size 128

# processing semantically changes images
python scripts/extract_features.py --input_image_dir ./data/sc_images --output_dir ./data/sc_features --batch_size 128

# processing distractor images
python scripts/extract_features.py --input_image_dir ./data/nsc_images --output_dir ./data/nsc_features --batch_size 128

Build vocab and label files using caption annotations:

python scripts/preprocess_captions_transformer.py --input_captions_json ./data/change_captions.json --input_neg_captions_json ./data/no_change_captions.json --input_image_dir ./data/images --split_json ./data/splits.json --output_vocab_json ./data/transformer_vocab.json --output_h5 ./data/transformer_labels.h5

Training

To train the proposed method, run the following commands:

# create a directory or a symlink to save the experiments logs/snapshots etc.
mkdir experiments
# OR
ln -s $PATH_TO_DIR$ experiments

# this will start the visdom server for logging
# start the server on a tmux session since the server needs to be up during training
python -m visdom.server

# start training
python train.py --cfg configs/dynamic/transformer.yaml

Testing/Inference

To test/run inference on the test dataset, run the following command

python test.py --cfg configs/dynamic/transformer.yaml  --snapshot 10000 --gpu 1

The command above will take the model snapshot at 10000th iteration and run inference using GPU ID 1.

Evaluation

Caption evaluation

Run the following command to run evaluation:

# This will run evaluation on the results generated from the validation set and print the best results
python evaluate.py --results_dir ./experiments/SCORER+CBR/eval_sents --anno ./data/total_change_captions_reformat.json --type_file ./data/type_mapping.json

Once the best model is found on the validation set, you can run inference on test set for that specific model using the command exlpained in the Testing/Inference section and then finally evaluate on test set:

python evaluate.py --results_dir ./experiments/SCORER+CBR/test_output/captions --anno ./data/total_change_captions_reformat.json --type_file ./data/type_mapping.json

The results are saved in ./experiments/SCORER+CBR/test_output/captions/eval_results.txt

If you find this helps your research, please consider citing:

@inproceedings{tu2023self,
  title={Self-supervised Cross-view Representation Reconstruction for Change Captioning},
  author={Tu, Yunbin and Li, Liang and Su, Li and Zha, Zheng-Jun and Yan, Chenggang and Huang, Qingming},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={2805--2815},
  year={2023}
}

Contact

My email is tuyunbin1995@foxmail.com

Any discussions and suggestions are welcome!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

configs

configs

data

data

datasets

datasets

models

models

scripts

scripts

utils

utils

README.md

README.md

evaluate.py

evaluate.py

google_drive.py

google_drive.py

requirements.txt

requirements.txt

test.py

test.py

train.py

train.py

Repository files navigation

Self-Supervised Cross-View Representation Reconstruction for Change Captioning

We illustrate the training details as follows:

Installation

Data

Training

Testing/Inference

Evaluation

Contact

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
configs		configs
data		data
datasets		datasets
models		models
scripts		scripts
utils		utils
README.md		README.md
evaluate.py		evaluate.py
google_drive.py		google_drive.py
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

tuyunbin/SCORER

Folders and files

Latest commit

History

Repository files navigation

Self-Supervised Cross-View Representation Reconstruction for Change Captioning

We illustrate the training details as follows:

Installation

Data

Training

Testing/Inference

Evaluation

Contact

About

Resources

Stars

Watchers

Forks

Languages