Skip to content

yuezih/less-is-more

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Less is More

Issues Forks Stars Static Badge

Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective


Selective EOS Supervision

Training

Following the instruction of LLaVA to prepare the environment, data (LLaVA-Instruction-150K) and pretraining models (e.g., LLaVA-1.5-7b).

Train the model with Selective EOS Supervision. The default configuration is set to train the llava-1.5-7b model with Detail23k for one epoch.

cd LLaVA
bash scripts/v1_5/selective_eos_finetune.sh

The main modifications to the original LLaVA code for Selective EOS Supervision are detailed in ./docs/selective-eos-supervision.md.

Checkpoint

Our models finetuned with Selective EOS Supervision:

Basic Model Finetuning Data Checkpoint
llava-1.5-7b Detail23k llava-v1.5-7b-selective-23k
llava-1.5-7b LLaVA-Instruction-150K llava-v1.5-7b-selective-150k

Scoring EOS Supervision

Data Scoring

For the LLaVA codebase, due to some constraints related to deepspeed, currently I have no idea about how to efficiently score a dataset with a standalone script. Our scoring relies on the training process, i.e., for each training step:

  • Score the data in the minibatch and save the scores;
  • Cancel loss backward (can be achieved by modifying the trainer code).

The core code for data scoring are provided in ./LLaVA/llava/model/language_model/llava_llama_filter.py.

Filtered Data

Our data filtered with Scoring EOS Supervision:

Basic Data Filtered Data
LLaVA-Instruction-150K LLaVA-Instruction-150K-filtered

Training

Instruction tune the LLaVA-7b model on our filtered data with:

cd LLaVA
bash scripts/finetune_qlora_filtered.sh

CHAIR Evaluation

Data

The test set used in our paper for CHAIR evaluation is provided in ./CHAIR-eval/data/chair-500.jsonl. The data is randomly sampled from the MSCOCO validation set with a random seed of 0.

For test set images, we provide a python script to collect images from the original MSCOCO images with softlinks. Please specify the path of your own MSCOCO image path. The script will create a folder ./CHAIR-eval/data/chair-500 for the CHAIR images.

python ./CHAIR-eval/prepare_data.py

The script also downloads the annotation files of MSCOCO detection, which will be used for CHAIR evaluation.

Evaluation

We provide a script for CHAIR inference and evaluation.
Set your model in the following script and then run it:

bash ./CHAIR-eval/eval.sh

The first-time evaluation can be slow because of the ground-truth object set construction. Subsequent evaluations will be faster with the cache.

Citation

If you find this repo helpful, please consider citing our paper:

@misc{yue2024less,
      title={Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective}, 
      author={Zihao Yue and Liang Zhang and Qin Jin},
      year={2024},
      eprint={2402.14545},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Acknowledgement

This repo is built on LLaVA (models) and OPERA (CHAIR evaluation). Many thanks for their efforts. The use of our code should also follow the original licenses.

About

Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published