VDC: Versatile Data Cleanser based on Visual-Linguistic Inconsistency by Multimodal Large Language Models
Website | Paper | Slides | Poster
This is the official implementation of ICLR 2024 paper "VDC: Versatile Data Cleanser based on Visual-Linguistic Inconsistency by Multimodal Large Language Models".
We find a commonality of various dirty samples is visual-linguistic inconsistency between images and associated labels. To capture the semantic inconsistency between modalities, we propose versatile data cleanser (VDC) leveraging the surpassing capabilities of multimodal large language models (MLLM) in cross-modal alignment and reasoning. It consists of three consecutive modules: the visual question generation module to generate insightful questions about the image; the visual question answering module to acquire the semantics of the visual content by answering the questions with MLLM; followed by the visual answer evaluation module to evaluate the inconsistency. Extensive experiments demonstrate its superior performance and generalization to various categories and types of dirty samples.
git clone https://github.com/zihao-ai/vdc
cd vdc
pip install -r requirements.txt
cd LLMs/LAVIS
pip install -e .
Let's take CIFAR-10 as an example.
Download the poisoned dataset (download link) and put it in the data
folder.
Unzip the dataset:
cd data
unzip cifar10_backdoor.zip
The generated questions have been provided in the prompts
folder.
You should first download the pre-trained MLLM checkpoints following the docs of InstructBLIP. You can also choose other MLLMs, such as LLAVA, MiniGPT4, GPT4, QWen, Otter, LLama Adapter, etc.
Then you can run the following command to answer the questions:
python vqa_bd.py
Replace the API key in LLMs/llm_models/openai_api_pool.py
with your own OpenAI API key.
Then you can run the following command to evaluate the answers:
python vae_bd.py
The indices of selected clean samples will be saved in the results
folder.
Training the neural network on the original poisoned dataset:
python train/train_on_bd.py
Training the neural network on the cleaned dataset:
python train/train_on_cleaned_bd.py
If you find our work useful, please consider citing us!
@article{zhu2023vdc,
title={VDC: Versatile Data Cleanser for Detecting Dirty Samples via Visual-Linguistic Inconsistency},
author={Zhu, Zihao and Zhang, Mingda and Wei, Shaokui and Wu, Bingzhe and Wu, Baoyuan},
journal={arXiv preprint arXiv:2309.16211},
year={2023}
}