Skip to content

HealthX-Lab/BiomedCoOp

Repository files navigation

BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models

Health-X Lab | IMPACT Lab

Taha Koleilat, Hojat Asgariandehkordi, Hassan Rivaz, Yiming Xiao

paper Overview Datasets Models BibTeX

Overview

main figure

Abstract: Recent advancements in vision-language models (VLMs), such as CLIP, have demonstrated substantial success in self-supervised representation learning for vision tasks. However, effectively adapting VLMs to downstream applications remains challenging, as their accuracy often depends on time-intensive and expertise-demanding prompt engineering, while full model fine-tuning is costly. This is particularly true for biomedical images, which, unlike natural images, typically suffer from limited annotated datasets, unintuitive image contrasts, and nuanced visual features. Recent prompt learning techniques, such as Context Optimization (CoOp) intend to tackle these issues, but still fall short in generalizability. Meanwhile, explorations in prompt learning for biomedical image analysis are still highly limited. In this work, we propose BiomedCoOp, a novel prompt learning framework that enables efficient adaptation of BiomedCLIP for accurate and highly generalizable few-shot biomedical image classification. Our approach achieves effective prompt context learning by leveraging semantic consistency with average prompt ensembles from Large Language Models (LLMs) and knowledge distillation with a statistics-based prompt selection strategy. We conducted comprehensive validation of our proposed framework on 11 medical datasets across 9 modalities and 10 organs against existing state-of-the-art methods, demonstrating significant improvements in both accuracy and generalizability.

Method

  1. Semantic Consistency with LLM-Enhanced Prompt Ensembles: Enhance context vector learning using prompt ensembles derived from GPT-4, combined with a knowledge distillation strategy to enforce semantic consistency.
  2. Outlier Pruning for Robust Generalization: Employ a statistics-based pruning strategy to filter outlier prompts from LLMs, mitigating over-specialization and preserving essential biomedical patterns.
  3. First Adoption of BiomedCLIP for Prompt Learning: Leverage BiomedCLIP for prompt learning for the first time, demonstrating superior performance over general knowledge CLIP in clinical tasks.
  4. Extensive Multi-Modal Evaluation: Evaluate across 11 biomedical image classification datasets, 9 modalities, and 10 organs, showcasing BiomedCoOp's superior generalizability and robustness in few-shot and base-to-novel benchmarks.

☑️ Supported Methods

Method Paper Configs Training Scripts Trainers
BiomedCoOp CVPR 2025 link link link
CLIP ICML 2021 link link link
CoOp IJCV 2022 link link link
CoCoOp CVPR 2022 link link link
KgCoOp CVPR 2023 link link link
ProGrad ICCV 2023 link link link
CLIP-Adapter IJCV 2024 link link link
Tip-Adapter ECCV 2022 link link link
LP ICML 2021 link link link
LP++ CVPR 2024 link link link

Results

Results reported below show accuracy for few-shot scenarios as well as base and novel classes across 11 biomedical recognition datasets averaged over 3 seeds.

Few-shot Evaluation

Method $K=1$ $K=2$ $K=4$ $K=8$ $K=16$
CLIP-Adapter 44.66 43.91 44.36 45.42 46.69
Tip-Adapter 49.19 52.36 57.33 61.98 67.15
Tip-Adapter-F 51.17 52.74 61.23 65.91 70.91
Standard LP 47.25 54.21 61.00 65.85 69.40
LP++ 47.24 53.18 59.02 63.69 68.35
CoOp 50.16 54.18 59.75 65.84 69.62
CoCoOp 48.49 51.28 54.69 61.08 65.09
KgCoOp 50.85 53.18 57.82 62.08 62.84
ProGrad 51.88 54.71 60.42 65.61 67.13
BiomedCoOp 57.03 59.13 63.95 68.32 72.42

Base-to-Novel Generalization

Name Base Acc. Novel Acc. HM
BiomedCLIP 47.84 65.42 53.81
CoOp 73.85 64.75 67.23
CoCoOp 72.26 67.03 67.22
KgCoOp 68.36 64.08 64.61
ProGrad 71.67 66.93 67.43
BiomedCoOp (ours) 76.26 73.92 75.07

Model Checkpoints and Logs

Name Few-Shot Base-to-Novel
BiomedCoOp link link

Installation

For installation and other package requirements, please follow the instructions detailed in INSTALL.md.

Data preparation

Please follow the instructions at DATASETS.md to prepare all datasets.

Training and Evaluation

Please refer to the RUN.md for detailed instructions on training, evaluating and reproducing the results using our pre-trained models.


Citation

If you use our work, please consider citing:

@inproceedings{koleilat2025biomedcoop,
  title={Biomedcoop: Learning to prompt for biomedical vision-language models},
  author={Koleilat, Taha and Asgariandehkordi, Hojat and Rivaz, Hassan and Xiao, Yiming},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={14766--14776},
  year={2025}
}

Acknowledgements

Our code builds upon the CoOp, MaPLe, and LP++ repositories. We are grateful to the authors for making their code publicly available. If you use our model or code, we kindly request that you also consider citing these foundational works.

About

[CVPR 2025] Official implementation of BiomedCoOp

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published