Skip to content
/ vdc Public

This is the official implementation of ICLR 2024 paper "VDC: Versatile Data Cleanser based on Visual-Linguistic Inconsistency by Multimodal Large Language Models".

License

Notifications You must be signed in to change notification settings

zihao-ai/vdc

Repository files navigation

VDC: Versatile Data Cleanser based on Visual-Linguistic Inconsistency by Multimodal Large Language Models

Website | Paper | Slides | Poster

This is the official implementation of ICLR 2024 paper "VDC: Versatile Data Cleanser based on Visual-Linguistic Inconsistency by Multimodal Large Language Models".

Overview

We find a commonality of various dirty samples is visual-linguistic inconsistency between images and associated labels. To capture the semantic inconsistency between modalities, we propose versatile data cleanser (VDC) leveraging the surpassing capabilities of multimodal large language models (MLLM) in cross-modal alignment and reasoning. It consists of three consecutive modules: the visual question generation module to generate insightful questions about the image; the visual question answering module to acquire the semantics of the visual content by answering the questions with MLLM; followed by the visual answer evaluation module to evaluate the inconsistency. Extensive experiments demonstrate its superior performance and generalization to various categories and types of dirty samples.

Installation

git clone https://github.com/zihao-ai/vdc
cd vdc
pip install -r requirements.txt
cd LLMs/LAVIS
pip install -e .

Usage

Detect Poisoned Samples

Let's take CIFAR-10 as an example.

1. Data Preparation

Download the poisoned dataset (download link) and put it in the data folder.

Unzip the dataset:

cd data
unzip cifar10_backdoor.zip

2. Visual Question Generation

The generated questions have been provided in the prompts folder.

3. Visual Question Answering

You should first download the pre-trained MLLM checkpoints following the docs of InstructBLIP. You can also choose other MLLMs, such as LLAVA, MiniGPT4, GPT4, QWen, Otter, LLama Adapter, etc.

Then you can run the following command to answer the questions:

python vqa_bd.py

4. Visual Answer Evaluation

Replace the API key in LLMs/llm_models/openai_api_pool.py with your own OpenAI API key.

Then you can run the following command to evaluate the answers:

python vae_bd.py

The indices of selected clean samples will be saved in the results folder.

5. Training neural network

Training the neural network on the original poisoned dataset:

python train/train_on_bd.py

Training the neural network on the cleaned dataset:

python train/train_on_cleaned_bd.py

Citation

If you find our work useful, please consider citing us!

 @article{zhu2023vdc,
      title={VDC: Versatile Data Cleanser for Detecting Dirty Samples via Visual-Linguistic Inconsistency},
      author={Zhu, Zihao and Zhang, Mingda and Wei, Shaokui and Wu, Bingzhe and Wu, Baoyuan},
      journal={arXiv preprint arXiv:2309.16211},
      year={2023}
      }

About

This is the official implementation of ICLR 2024 paper "VDC: Versatile Data Cleanser based on Visual-Linguistic Inconsistency by Multimodal Large Language Models".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published