GitHub - yuezih/less-is-more: Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective

Less is More

Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective

Selective EOS Supervision

Training

Following the instruction of LLaVA to prepare the environment, data (LLaVA-Instruction-150K) and pretraining models (e.g., LLaVA-1.5-7b).

Train the model with Selective EOS Supervision. The default configuration is set to train the llava-1.5-7b model with Detail23k for one epoch.

cd LLaVA
bash scripts/v1_5/selective_eos_finetune.sh

The main modifications to the original LLaVA code for Selective EOS Supervision are detailed in ./docs/selective-eos-supervision.md.

Checkpoint

Our models finetuned with Selective EOS Supervision:

Basic Model	Finetuning Data	Checkpoint
`llava-1.5-7b`	`Detail23k`	llava-v1.5-7b-selective-23k
`llava-1.5-7b`	`LLaVA-Instruction-150K`	llava-v1.5-7b-selective-150k

Scoring EOS Supervision

Data Scoring

For the LLaVA codebase, due to some constraints related to deepspeed, currently I have no idea about how to efficiently score a dataset with a standalone script. Our scoring relies on the training process, i.e., for each training step:

Score the data in the minibatch and save the scores;
Cancel loss backward (can be achieved by modifying the trainer code).

The core code for data scoring are provided in ./LLaVA/llava/model/language_model/llava_llama_filter.py.

Filtered Data

Our data filtered with Scoring EOS Supervision:

Basic Data	Filtered Data
`LLaVA-Instruction-150K`	LLaVA-Instruction-150K-filtered

Training

Instruction tune the LLaVA-7b model on our filtered data with:

cd LLaVA
bash scripts/finetune_qlora_filtered.sh

CHAIR Evaluation

Data

The test set used in our paper for CHAIR evaluation is provided in ./CHAIR-eval/data/chair-500.jsonl. The data is randomly sampled from the MSCOCO validation set with a random seed of 0.

For test set images, we provide a python script to collect images from the original MSCOCO images with softlinks. Please specify the path of your own MSCOCO image path. The script will create a folder ./CHAIR-eval/data/chair-500 for the CHAIR images.

python ./CHAIR-eval/prepare_data.py

The script also downloads the annotation files of MSCOCO detection, which will be used for CHAIR evaluation.

Evaluation

We provide a script for CHAIR inference and evaluation.
Set your model in the following script and then run it:

bash ./CHAIR-eval/eval.sh

The first-time evaluation can be slow because of the ground-truth object set construction. Subsequent evaluations will be faster with the cache.

Citation

If you find this repo helpful, please consider citing our paper:

@misc{yue2024less,
      title={Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective}, 
      author={Zihao Yue and Liang Zhang and Qin Jin},
      year={2024},
      eprint={2402.14545},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Acknowledgement

This repo is built on LLaVA (models) and OPERA (CHAIR evaluation). Many thanks for their efforts. The use of our code should also follow the original licenses.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
CHAIR-eval		CHAIR-eval
LLaVA		LLaVA
docs		docs
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CHAIR-eval

CHAIR-eval

LLaVA

LLaVA

docs

docs

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Less is More

Selective EOS Supervision

Training

Checkpoint

Scoring EOS Supervision

Data Scoring

Filtered Data

Training

CHAIR Evaluation

Data

Evaluation

Citation

Acknowledgement

About

Releases

Packages

Languages

License

yuezih/less-is-more

Folders and files

Latest commit

History

Repository files navigation

Less is More

Selective EOS Supervision

Training

Checkpoint

Scoring EOS Supervision

Data Scoring

Filtered Data

Training

CHAIR Evaluation

Data

Evaluation

Citation

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Languages