AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness
The repository for AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness.
Please refer to LLaVA
Install other requirements by:
pip install -r requirements.txtFor data used in our paper, please refer to MAMI, HarM and FHM. To erase texts from image, please refer to OCR-SAM. The data file should look like this:
├── data
│ └── sampled_data
│ └── image
│ └── ori
│ └── erased
├── results
└── scripts
Run harmfulness mining by:
cd scripts
python mining.py
First generate misbelief statement and reference answer by:
python gen_misb.py
Run model scoring by:
python scoring.py --exp_name exp_name --model_name model_name
Run iterative refinement by:
python refinement.py --exp_name exp_name --model_name model_name