Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.
You can find the dataset at Huggingface. train.json
and test.json
are the meta data of VLGuard and the images are in train.zip
and test.zip
.
To fine-tune LLaVA or MiniGPT-v2, you can first run
python convert_to_llava_format.py
to convert VLGuard to LLaVA data format and follow their fine-tuning scripts to do the fine-tuning.
@article{zong2023safety,
title={Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models},
author={Zong, Yongshuo and Bohdal, Ondrej and Yu, Tingyang and Yang, Yongxin and Hospedales Timothy},
journal={arXiv preprint arXiv:2402.02207},
year={2024}
}