From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration
The overview of our Adaptive Data Refinement Framework (ADR). (a) In the Analyzing Stage, we first extract tokens, objects, co-occurrences, and interrogations from the training instances, then construct corresponding distribution using a reverse-indexed mapping. (b) In the Data Rebalancing stage, we analyze the optimizing direction and adaptively rebalance the redundant data based on the entity distribution identified in the Analyzing stage. (c) Finally, in the Data Synthesis stage, we utilize DDPM and the latent representations of scarce image instances to synthesize the underrepresented data.
git clone https://github.com/ssmisya/VLMLT.git
cd VLMLT
pip install -e .
During the analyzing stage, we first extract tokens, objects, co-occurrences, and interrogations from the training instances. Here the analyzing stage mainly refers to the concept extractions. The entity distribution construction code are in VLMLT/robustlmm/analysis/longTail/language_level
. You can refer to VLMLT/robustlmm/analysis/longTail/language_level/examples
to see how to extract concepts from your own dataset.
During the DR stage, we build the reverse indexing mapping and rebalance the redundant data based on the entity distribution identified in the analyzing stage. The code is in VLMLT/robustlmm/data_adjustment/dr_algo
. You can refer to VLMLT/robustlmm/data_adjustment/dr_algo/examples
to see how to rebalance your own dataset.
First, build the reverse indexing mapping:
cd VLMLT/robustlmm/data_adjustment/dr_algo
python reverse_indexing.py \
--input_path $token_file \
--output_path ${reverse_index_prefix}/${function_type}_reverse_index.jsonl \
--function $function_type \
--id_key new_idx
Subsequently run Data rebalancing:
python ./reform_data_one_scope.py \
--input_dataset_file <origin-dataset-file> \
--output_file /path/to/llava_meta_toc_p${pass_num}_a${alpha}.json \
--mode "compose_alpha" \
--compose_list ${compose_list} \
--pass_num $pass_num \
--alpha ${alpha} \
--target_model llava_pt
During Data Synthesis stage, we supplement the underrepresented data by utilizing DDPM and the latent representations of scarce image instances. We mainly use the scripts under VLMLT/robustlmm/model_inference
to conduct the data synthesis.
You can follow the instructions given in the paper and use the proper model inference scripts to synthesize the data.
@inproceedings{song2025head,
title={From head to tail: Towards balanced representation in large vision-language models through adaptive data calibration},
author={Song, Mingyang and Qu, Xiaoye and Zhou, Jiawei and Cheng, Yu},
booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
pages={9434--9444},
year={2025}
}