Skip to content
/ VLMLT Public

[CVPR' 25] Official repo for From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration

License

Notifications You must be signed in to change notification settings

ssmisya/VLMLT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Logo

[CVPR 2025] VLMLT

Paper Github Hugging Face Collection

From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration

The overview of our Adaptive Data Refinement Framework (ADR). (a) In the Analyzing Stage, we first extract tokens, objects, co-occurrences, and interrogations from the training instances, then construct corresponding distribution using a reverse-indexed mapping. (b) In the Data Rebalancing stage, we analyze the optimizing direction and adaptively rebalance the redundant data based on the entity distribution identified in the Analyzing stage. (c) Finally, in the Data Synthesis stage, we utilize DDPM and the latent representations of scarce image instances to synthesize the underrepresented data.

Quick Start

Install Dependencies

git clone https://github.com/ssmisya/VLMLT.git
cd VLMLT
pip install -e .

Analyzing Stage

During the analyzing stage, we first extract tokens, objects, co-occurrences, and interrogations from the training instances. Here the analyzing stage mainly refers to the concept extractions. The entity distribution construction code are in VLMLT/robustlmm/analysis/longTail/language_level. You can refer to VLMLT/robustlmm/analysis/longTail/language_level/examples to see how to extract concepts from your own dataset.

Data Rebalancing Stage

During the DR stage, we build the reverse indexing mapping and rebalance the redundant data based on the entity distribution identified in the analyzing stage. The code is in VLMLT/robustlmm/data_adjustment/dr_algo. You can refer to VLMLT/robustlmm/data_adjustment/dr_algo/examples to see how to rebalance your own dataset.

First, build the reverse indexing mapping:

cd VLMLT/robustlmm/data_adjustment/dr_algo
python reverse_indexing.py \
--input_path  $token_file \
--output_path ${reverse_index_prefix}/${function_type}_reverse_index.jsonl \
--function $function_type \
--id_key new_idx

Subsequently run Data rebalancing:

python ./reform_data_one_scope.py \
--input_dataset_file <origin-dataset-file> \
--output_file  /path/to/llava_meta_toc_p${pass_num}_a${alpha}.json \
--mode "compose_alpha" \
--compose_list ${compose_list} \
--pass_num $pass_num \
--alpha ${alpha} \
--target_model llava_pt

Data Synthesis Stage

During Data Synthesis stage, we supplement the underrepresented data by utilizing DDPM and the latent representations of scarce image instances. We mainly use the scripts under VLMLT/robustlmm/model_inference to conduct the data synthesis. You can follow the instructions given in the paper and use the proper model inference scripts to synthesize the data.

Citations

@inproceedings{song2025head,
  title={From head to tail: Towards balanced representation in large vision-language models through adaptive data calibration},
  author={Song, Mingyang and Qu, Xiaoye and Zhou, Jiawei and Cheng, Yu},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={9434--9444},
  year={2025}
}

License: Apache-2.0

About

[CVPR' 25] Official repo for From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published