[CVPR 2025] VLMLT

From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration

The overview of our Adaptive Data Refinement Framework (ADR). (a) In the Analyzing Stage, we first extract tokens, objects, co-occurrences, and interrogations from the training instances, then construct corresponding distribution using a reverse-indexed mapping. (b) In the Data Rebalancing stage, we analyze the optimizing direction and adaptively rebalance the redundant data based on the entity distribution identified in the Analyzing stage. (c) Finally, in the Data Synthesis stage, we utilize DDPM and the latent representations of scarce image instances to synthesize the underrepresented data.

Quick Start

Install Dependencies

git clone https://github.com/ssmisya/VLMLT.git
cd VLMLT
pip install -e .

Analyzing Stage

During the analyzing stage, we first extract tokens, objects, co-occurrences, and interrogations from the training instances. Here the analyzing stage mainly refers to the concept extractions. The entity distribution construction code are in VLMLT/robustlmm/analysis/longTail/language_level. You can refer to VLMLT/robustlmm/analysis/longTail/language_level/examples to see how to extract concepts from your own dataset.

Data Rebalancing Stage

During the DR stage, we build the reverse indexing mapping and rebalance the redundant data based on the entity distribution identified in the analyzing stage. The code is in VLMLT/robustlmm/data_adjustment/dr_algo. You can refer to VLMLT/robustlmm/data_adjustment/dr_algo/examples to see how to rebalance your own dataset.

First, build the reverse indexing mapping:

cd VLMLT/robustlmm/data_adjustment/dr_algo
python reverse_indexing.py \
--input_path  $token_file \
--output_path ${reverse_index_prefix}/${function_type}_reverse_index.jsonl \
--function $function_type \
--id_key new_idx

Subsequently run Data rebalancing:

python ./reform_data_one_scope.py \
--input_dataset_file <origin-dataset-file> \
--output_file  /path/to/llava_meta_toc_p${pass_num}_a${alpha}.json \
--mode "compose_alpha" \
--compose_list ${compose_list} \
--pass_num $pass_num \
--alpha ${alpha} \
--target_model llava_pt

Data Synthesis Stage

During Data Synthesis stage, we supplement the underrepresented data by utilizing DDPM and the latent representations of scarce image instances. We mainly use the scripts under VLMLT/robustlmm/model_inference to conduct the data synthesis. You can follow the instructions given in the paper and use the proper model inference scripts to synthesize the data.

Citations

@inproceedings{song2025head,
  title={From head to tail: Towards balanced representation in large vision-language models through adaptive data calibration},
  author={Song, Mingyang and Qu, Xiaoye and Zhou, Jiawei and Cheng, Yu},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={9434--9444},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
docs		docs
robustlmm		robustlmm
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

[CVPR 2025] VLMLT

From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration

Quick Start

Install Dependencies

Analyzing Stage

Data Rebalancing Stage

Data Synthesis Stage

Citations

About

Uh oh!

Releases

Packages

Languages

License

ssmisya/VLMLT

Folders and files

Latest commit

History

Repository files navigation

[CVPR 2025] VLMLT

From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration

Quick Start

Install Dependencies

Analyzing Stage

Data Rebalancing Stage

Data Synthesis Stage

Citations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages