Skip to content

YutingHe-list/Awesome-Foundation-Models-for-Advancing-Healthcare

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

Awesome-Foundation-Models-for-Advancing-Healthcare

Awesome

[NEWS.20240405] The related survey paper has been released.

[NOTE] If you have any questions, please don't hesitate to contact us.

Foundation model, which is pre-trained on broad data and is able to adapt to a wide range of tasks, is advancing healthcare. It promotes the development of healthcare artificial intelligence (AI) models, breaking the contradiction between limited AI models and diverse healthcare practices. Much more widespread healthcare scenarios will benefit from the development of a healthcare foundation model (HFM), improving their advanced intelligent healthcare services.

This repository is a collection of AWESOME things about Foundation models in healthcare, including language foundation models (LFMs), vision foundation models (VFMs), bioinformatics foundation models (BFMs), and multimodal foundation models (MFMs). Feel free to star and fork.

This repository provides the improment advicement of current healthcare foundation models based on the following paper:

Foundation Model for Advancing Healthcare: Challenges, Opportunities and Future Directions 中译版
Yuting He, Fuxiang Huang, Xinrui Jiang, Yuxiang Nie, Minghao Wang, Jiguang Wang, Hao Chen
SMART Lab, The Hong Kong University of Science and Technology

If you find this repository is useful for you, please cite our paper:

@misc{he2024foundation,
      title={Foundation Model for Advancing Healthcare: Challenges, Opportunities, and Future Directions}, 
      author={Yuting He and Fuxiang Huang and Xinrui Jiang and Yuxiang Nie and Minghao Wang and Jiguang Wang and Hao Chen},
      year={2024},
      eprint={2404.03264},
      archivePrefix={arXiv},
      primaryClass={cs.CY}
}

Contents

Related survey

2024

  • [arXiv] Foundation models for biomedical image segmentation: A survey. [Paper]
  • [arXiv] Progress and opportunities of foundation models in bioinformatics. [Paper]
  • [arXiv] Large language models in bioinformatics: applications and perspectives. [Paper]
  • [arXiv] Data-centric foundation models in computational healthcare: A survey. [Paper]
  • [arXiv] Vision-Language Models for Medical Report Generation and Visual Question Answering: A Review. [Paper]

2023

  • [ACM Computing Surveys] Pre-trained language models in biomedical domain: A systematic survey. [Paper]
  • [Nature medicine] Large language models in medicine. [Paper]
  • [arXiv] A survey of large language models in medicine: Progress, application, and challenge. [Paper]
  • [arXiv] A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics. [Paper]
  • [arXiv] Large language models illuminate a progressive pathway to artificial healthcare assistant: A review. [Paper]
  • [arXiv] Foundational models in medical imaging: A comprehensive survey and future vision. [Paper]
  • [arXiv] CLIP in medical imaging: A comprehensive survey. [Paper]
  • [arXiv] Medical vision language pretraining: A survey. [Paper]
  • [MIR] Pre-training in medical data: A survey. [Paper]
  • [J-BHI] Large AI models in health informatics: Applications, challenges, and the future. [Paper]
  • [MedComm–Future Medicine] Accelerating the integration of ChatGPT and other large-scale AI models into biomedical research and healthcare. [Paper]
  • [Nature] Foundation models for generalist medical artificial intelligence. [Paper]
  • [MedIA] On the challenges and perspectives of foundation models for medical image analysis. [Paper]

Methods

LFM methods

2024

  • [AAAI] Zhongjing: Enhancing the chinese medical capabilities of large language model through expert feedback and realworld multi-turn dialogue. [Paper] [Code]
  • [arXiv] Me LLaMA: Foundation large language models for medical applications [Paper] [Code]
  • [arXiv] BioMistral: A collection of open-source pretrained large language models for medical domains [Paper] [Code]
  • [arXiv] BiMediX: Bilingual medical mixture of experts LLM [Paper] [Code]
  • [arXiv] OncoGPT: A medical conversational model tailored with oncology domain expertise on a large language model Meta-AI (LLaMA) [Paper] [Code]
  • [arXiv] JMLR: Joint medical LLM and retrieval training for enhancing reasoning and professional question answering capability [Paper]

2023

  • [Bioinformatics] MedCPT: A method for zero-shot biomedical information retrieval using contrastive learning with PubMedBERT. [Paper] [Code]
  • [arXiv] Pmc-llama: Towards building open-source language models for medicine. [Paper] [Code]
  • [arXiv] Meditron-70b: Scaling medical pretraining for large language models. [Paper] [Code]
  • [arXiv] Qilin-med: Multi-stage knowledge injection advanced medical large language model. [Paper] [Code]
  • [arXiv] Huatuogpt-ii, one-stage training for medical adaption of llms. [Paper] [Code]
  • [NPJ Digit. Med.] A study of generative large language model for medical research and healthcare. [Paper] [Code]
  • [arXiv] From beginner to expert: Modeling medical knowledge into general llms. [Paper]
  • [arXiv] Huatuo: Tuning llama model with chinese medical knowledge. [Paper] [Code]
  • [arXiv] Chatdoctor: A medical chat model fine-tuned on a large language model meta-ai (llama) using medical domain knowledge. [Paper] [Code]
  • [arXiv] Medalpaca–an open-source collection of medical conversational ai models and training data. [Paper] [Code]
  • [arXiv] Alpacare: Instruction-tuned large language models for medical application. [Paper] [Code]
  • [arXiv] Huatuogpt, towards taming language model to be a doctor. [Paper] [Code]
  • [arXiv] Doctorglm: Fine-tuning your chinese doctor is not a herculean task. [Paper] [Code]
  • [arXiv] Bianque: Balancing the questioning and suggestion ability of health llms with multi-turn health conversations polished by chatgpt. [Paper] [Code]
  • [arXiv] Taiyi: A bilingual fine-tuned large language model for diverse biomedical tasks. [Paper] [Code]
  • [Github] Visual med-alpaca: A parameter-efficient biomedical llm with visual capabilities. [Code]
  • [arXiv] Ophglm: Training an ophthalmology large languageand-vision assistant based on instructions and dialogue. [Paper] [Code]
  • [arXiv] Chatcad: Interactive computer-aided diagnosis on medical image using large language models. [Paper] [Code]
  • [arXiv] Chatcad+: Towards a universal and reliable interactive cad using llms. [Paper] [Code]
  • [arXiv] Deid-gpt: Zero-shot medical text de-identification by gpt-4. [Paper] [Code]
  • [arXiv] Can generalist foundation models outcompete special-purpose tuning? case study in medicine. [Paper] [Code]
  • [arXiv] Medagents: Large language models as collaborators for zero-shot medical reasoning. [Paper] [Code]
  • [AIME] Soft-prompt tuning to predict lung cancer using primary care free-text dutch medical notes. [Paper] [Code]
  • [arXiv] Clinical decision transformer: Intended treatment recommendation through goal prompting. [Paper] [Code]
  • [Nature] Large language models encode clinical knowledge [Paper]
  • [arXiv] Towards expert-level medical question answering with large language models [Paper]
  • [arXiv] Gpt-doctor: Customizing large language models for medical consultation [Paper]
  • [arXiv] Clinicalgpt: Large language models finetuned with diverse medical data and comprehensive evaluation [Paper]
  • [arXiv] Leveraging a medical knowledge graph into large language models for diagnosis prediction [Paper]

2022

  • [NPJ Digit. Med.] A large language model for electronic health records. [Paper] [Code]
  • [AMIA Annu. Symp. Proc.] Healthprompt: A zero-shot learning paradigm for clinical natural language processing. [Paper]
  • [BioNLP] Position-based prompting for health outcome generation [Paper]

2021

  • [ACM Trans. Comput. Healthc.] Domain-specific language model pretraining for biomedical natural language processing. [Paper] [Code]

2020

  • [JMIR Med. Info.] Modified bidirectional encoder representations from transformers extractive summarization model for hospital information systems based on character-level tokens (alphabert): development and performance evaluation. [Paper] [Code]
  • [Scientific reports] Behrt: transformer for electronic health records. [Paper] [Code]
  • [BioNLP] BioBART: Pretraining and evaluation of a biomedical generative language model. [Paper] [Code]

2019

  • [NPJ Digit. Med.] ClinicalBERT: A hybrid learning model for natural language inference in healthcare using BERT. [Paper] [Code]
  • [Method. Biochem. Anal.] Biobert: a pre-trained biomedical language representation model for biomedical text mining. [Paper] [Code]

VFM methods

2024

  • [arXiv] USFM: A universal ultrasound foundation model generalized to tasks and organs towards label efficient image analysis. [paper]
  • [CVPR] VoCo: A simple-yet-effective volume contrastive learning framework for 3D medical image analysis. [paper][Code]
  • [NeurIPS] LVM-Med: Learning large-scale self-supervised vision models for medical imaging via second-order graph matching. [paper] [Code]
  • [Nature Medicine] Towards a general-purpose foundation model for computational pathology. [paper] [Code]
  • [arXiv] RudolfV: A foundation model by pathologists for pathologists. [paper] [Code]
  • [Nature Communications] Segment anything in medical images. [paper] [Code]
  • [ICASSP] SAM-OCTA: A fine-tuning strategy for applying foundation model to OCTA image segmentation tasks.[paper] [Code]
  • [WACV] AFTer-SAM: Adapting SAM with axial fusion transformer for medical imaging segmentation. [paper]
  • [MIDL] Adaptivesam: Towards efficient tuning of sam for surgical scene segmentation. [paper] [Code]
  • [arXiv] SegmentAnyBone: A universal model that segments any bone at any location on MRI [paper] [Code]
  • [SSRN] Swinsam: Fine-grained polyp segmentation in colonoscopy images via segment anything model integrated with a Swin transformer decoder. [paper]
  • [AAAI] Surgicalsam: Efficient class promptable surgical instrument segmentation [paper] [Code]
  • [Medical Image Analysis] Prompt tuning for parameter-efficient medical image segmentation. [paper] [Code]

2023

  • [ICCV] UniverSeg: Universal medical image segmentation. [paper] [Code]
  • [arXiv] STU-Net: Scalable and transferable medical image segmentation models empowered by large-scale supervised pre-training. [paper] [Code]
  • [arXiv] SAM-Med3D. [paper] [Code]
  • [Nature] A foundation model for generalizable disease detection from retinal images. [paper]
  • [arXiv] VisionFM: a multi-modal multi-task vision foundation model for generalist ophthalmic Artificial Intelligence. [paper]
  • [arXiv] Segvol: Universal and interactive volumetric medical image segmentation. [paper] [Code]
  • [MICCAI] Models Genesis: Generic autodidactic models for 3D medical image analysis. [paper] [Code]
  • [MICCAI] Deblurring masked autoencoder is better recipe for ultrasound image recognition. [paper] [Code]
  • [arXiv] Mis-fm: 3d medical image segmentation using foundation models pretrained on a large-scale unannotated dataset. [paper] [Code]
  • [MICCAI] Foundation model for endoscopy video analysis via large-scale self-supervised pre-train. [paper][Code]
  • [MIDL] MoCo pretraining improves representation and transferability of chest X-ray models. [paper] [Code]
  • [arXiv] BROW: Better features for whole slide image based on self-distillation[paper]
  • [arXiv] Computational pathology at health system scale--self-supervised foundation models from three billion images. [paper]
  • [CVPR] Geometric visual similarity learning in 3D medical image self-supervised pre-training.[paper] [Code]
  • [arXiv] Virchow: A million-slide digital pathology foundation model.[paper] [Code]
  • [arXiv] Ma-sam: Modality-agnostic sam adaptation for 3d medical image segmentation.[paper] [Code]
  • [ICCV] Comprehensive multimodal segmentation in medical imaging: combining YOLOv8 with SAM and HQ-SAM models. [paper]
  • [arXiv] 3DSAM-adapter: Holistic adaptation of SAM from 2D to 3D for promptable medical image segmentation.[paper] [Code]
  • [arXiv] Part to whole: Collaborative prompting for surgical instrument segmentation. [paper] [Code]
  • [arXiv] Towards general purpose vision foundation models for medical image analysis: An experimental study of DINOv2 on radiology benchmarks.[paper] [Code]
  • [arXiv] Skinsam: Empowering skin cancer segmentation with segment anything model.[paper]
  • [arXiv] Polyp-sam: Transfer sam for polyp segmentation. [paper] [Code]
  • [arXiv] Customized segment anything model for medical image segmentation. [paper] [Code]
  • [arXiv] Ladder fine-tuning approach for SAM integrating complementary network. [paper] [Code]
  • [arXiv] Cheap lunch for medical image segmentation by fine-tuning sam on few exemplars. [paper]
  • [arXiv] SemiSAM: Exploring SAM for enhancing semi-supervised medical image segmentation with extremely limited annotations. [paper]
  • [IWMLMI] Mammo-sam: Adapting foundation segment anything model for automatic breast mass segmentation in whole mammograms. [paper]
  • [arXiv] Promise: Prompt-driven 3D medical image segmentation using pretrained image foundation models. [paper] [Code]
  • [arXiv] Medical sam adapter: Adapting segment anything model for medical image segmentation. [paper] [Code]
  • [arXiv] SAM-Med2D [paper] [Code]
  • [arXiv] Medivista-sam: Zero-shot medical video analysis with spatio-temporal sam adaptation. [paper] [Code]
  • [arXiv] Samus: Adapting segment anything model for clinically-friendly and generalizable ultrasound image segmentation. [paper]
  • [MICCAI] Input augmentation with sam: Boosting medical image segmentation with segmentation foundation model. [paper] [Code]
  • [arXiv] AutoSAM: Adapting SAM to medical images by overloading the prompt encoder. [paper]
  • [arXiv] DeSAM: Decoupling segment anything model for generalizable medical image segmentation [paper] [Code]
  • [bioRxiv] A foundation model for cell segmentation.[paper] [Code]
  • [MICCAI] SAM-U: Multi-box prompts triggered uncertainty estimation for reliable SAM in medical image. [paper]
  • [MICCAI] Sam-path: A segment anything model for semantic segmentation in digital pathology. [paper]
  • [arXiv] All-in-sam: from weak annotation to pixel-wise nuclei segmentation with prompt-based finetuning.[paper]
  • [arXiv] Polyp-sam++: Can a text guided sam perform better for polyp segmentation? [paper] [Code]
  • [arXiv] Segment anything model with uncertainty rectification for auto-prompting medical image segmentation. [paper]
  • [arXiv] MedLSAM: Localize and segment anything model for 3D medical images. [paper] [Code]
  • [arXiv] nnSAM: Plug-and-play segment anything model improves nnUNet performance. [paper] [Code]
  • [arXiv] EviPrompt: A training-free evidential prompt generation method for segment anything model in medical images. [paper]
  • [arXiv] One-shot localization and segmentation of medical images with foundation models. [paper]
  • [arXiv] Samm (segment any medical model): A 3d slicer integration to sam. [paper] [Code]
  • [arXiv] Task-driven prompt evolution for foundation models.[paper]

2022

  • [Machine Learning with Applications] Self supervised contrastive learning for digital histopathology. [paper] [Code]
  • [Medical Image Analysis] Transformer-based unsupervised contrastive learning for histopathological image classification. [paper] [Code]
  • [arXiv] Self-supervised learning from 100 million medical images. [paper]
  • [CVPR] Self-supervised pre-training of swin transformers for 3d medical image analysis.[paper] [Code]

2021

  • [Medical Image Analysis] Models genesis. [paper] [Code]
  • [Medical Imaging with Deep Learning] MoCo pretraining improves representation and transferability of chest X-ray models. [paper]
  • [IEEE transactions on medical imaging] Transferable visual words: Exploiting the semantics of anatomical patterns for self-supervised learning.[Paper]

2020

  • [MICCAI] Comparing to learn: Surpassing imageNet pretraining on radiographs by comparing image representations. [paper] [Code]

2019

  • [arXiv] Med3D: Transfer learning for 3D medical image analysis. [paper] [Code]

BFM methods

2024

  • [Nucleic Acids Research] Multiple sequence alignment-based RNA language model and its application to structural inference. [Paper], [Code]
  • [Nature Methods] scGPT: toward building a foundation model for single-cell multi-omics using generative AI. [Paper], [Code]
  • [Nature Machine Intelligence] A 5’ UTR language model for decoding untranslated regions of mRNA and function predictions. [Paper], [Code]
  • [ICLR 2024] CellPLM: Pre-training of Cell Language Model Beyond Single Cells. [Paper], [Code] 2023
  • [arXiv] DNAGPT: A generalized pre-trained tool for versatile DNA sequence analysis tasks. [Paper], [Code]
  • [arXiv] HyenaDNA: Long-range genomic sequence modeling at single nucleotide resolution. [Paper], [Code]
  • [Nature Biotechnology] Large language models generate functional protein sequences across diverse families. [Paper], [Code]
  • [Cell Systems] ProGen2: Exploring the boundaries of protein language models. [Paper], [Code]
  • [Nature] Transfer learning enables predictions in network biology. [Paper], [Code]
  • [arXiv] DNABERT-2: Efficient foundation model and benchmark for multi-species genome. [Paper], [Code]
  • [bioRxiv] The nucleotide transformer: Building and evaluating robust foundation models for human genomics. [Paper], [Code]
  • [bioRxiv] GENA-LM: A family of open-source foundational models for long DNA sequences. [Paper], [Code]
  • [bioRxiv] Self-supervised learning on millions of pre-mRNA sequences improves sequence-based RNA splicing prediction. [Paper], [Code]
  • [bioRxiv] Deciphering 3’ UTR mediated gene regulation using interpretable deep representation learning. [Paper], [Code]
  • [Science] Evolutionary-scale prediction of atomic-level protein structure with a language model. [Paper], [Code]
  • [bioRxiv] Universal cell embeddings: A foundation model for cell biology. [Paper], [Code]
  • [bioRxiv] Large scale foundation model on single-cell transcriptomics. [Paper], [Code]
  • [arXiv] Large-scale cell representation learning via divide-and-conquer contrastive learning. [Paper], [Code]
  • [bioRxiv] CodonBERT: Large language models for mRNA design and optimization. [Paper], [Code]
  • [bioRxiv] xTrimoPGLM: Unified 100B-scale pre-trained transformer for deciphering the language of protein. [Paper]
  • [bioRxiv] GenePT: A simple but effective foundation model for genes and cells built from ChatGPT. [Paper], [Code]
  • [bioRxiv] scELMo: Embeddings from language models are good learners for single-cell data analysis. [Paper], [Code]
  • [bioRxiv] Evaluating the Utilities of Foundation Models in Single-cell Data Analysis. [Paper], [Code]
  • [bioRxiv] GeneCompass: Deciphering Universal Gene Regulatory Mechanisms with Knowledge-Informed Cross-Species Foundation Model. [Paper], [Code]

2022

  • [Nature Machine Intelligence] scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. [Paper], [Code]
  • [bioRxiv] Interpretable RNA foundation model from unannotated data for highly accurate RNA structure and function predictions. [Paper], [Code]
  • [NAR Genomics & Bioinformatics] Informative RNA base embedding for RNA structural alignment and clustering by deep representation learning. [Paper], [Code]
  • [Nature Biotechnology] Single-sequence protein structure prediction using language models and deep learning. [Paper], [Code]

2021

  • [Bioinformatics] DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome. [Paper], [Code]
  • [IEEE TPAMI] ProtTrans: Toward understanding the language of life through self-supervised learning. [Paper], [Code]
  • [ICML 2021] MSA Transformer. [Paper], [Code]
  • [PNAS] Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. [Paper], [Code]
  • [Nature] Highly accurate protein structure prediction with AlphaFold. [Paper], [Code]
  • [arXiv] Multi-modal self-supervised pre-training for regulatory genome across cell types. [Paper], [Code]

MFM methods

2024

  • [ICASSP] Etp: Learning transferable ecg representations via ecg-text pretraining. [Paper]
  • [NeurIPS] Med-unic: Unifying cross-lingual medical vision language pre-training by diminishing bias. [Paper] [Code]
  • [NeurIPS] Quilt-1m: One million image-text pairs for histopathology. [Paper] [Code]
  • [Nature Medicine] A visual-language foundation model for computational pathology. [Paper]
  • [NeurIPS] LLaVA-Med: Training a large language-and-vision assistant for biomedicine in one day. [Paper] [Code]
  • [AAAI] PathAsst: Generative foundation AI assistant for pathology. [Paper] [Code]
  • [WACV] I-AI: A controllable & interpretable AI system for decoding radiologists’ intense focus for accurate CXR diagnoses. [Paper] [Code]
  • [arXiv] M3D: Advancing 3D medical image analysis with multi-modal large language models. [Paper] [Code]

2023

  • [ICLR] Advancing radiograph representation learning with masked record modeling. [Paper] [Code]
  • [arXiv] BiomedGPT: A unified and generalist biomedical generative pre-trained transformer for vision, language, and multimodal Tasks. [Paper] [Code]
  • [arXiv] BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs. [Paper] [Code]
  • [arXiv] Towards generalist foundation model for radiology by leveraging web-scale 2D&3D medical data. [Paper] [Code]
  • [CVPR] Visual language pretrained multiple instance zero-shot transfer for histopathology images. [Paper] [Code]
  • [ICCV] Medklip: Medical knowledge enhanced language-image pre-training. [Paper] [Code]
  • [arXiv] UniBrain: Universal brain MRI diagnosis with hierarchical knowledge-enhanced pre-training. [Paper] [Code]
  • [EACL] PubMedCLIP: How much does CLIP benefit visual question answering in the medical domain. [Paper] [Code]
  • [MICCAI] M-FLAG: Medical vision-language pre-training with frozen language models and latent space geometry optimization. [Paper] [Code]
  • [arXiv] IMITATE: Clinical prior guided hierarchical vision-language pre-training. [Paper]
  • [arXiv] CXR-CLIP: Toward large scale chest X-ray language-image pre-training. [Paper] [Code]
  • [BIBM] UMCL: Unified medical image-text-label contrastive learning with continuous prompt. [Paper]
  • [Nature Communications] Knowledge-enhanced visual-language pre-training on chest radiology images. [Paper]
  • [Nature Machine Intelligence] Multi-modal molecule structure–text model for text-based retrieval and editing. [Paper] [Code]
  • [MICCAI] Clip-lung: Textual knowledge-guided lung nodule malignancy prediction. [Paper]
  • [MICCAI] Pmc-clip: Contrastive language-image pre-training using biomedical documents. [Paper] [Code]
  • [arXiv] Enhancing representation in radiography-reports foundation model: A granular alignment algorithm using masked contrastive learning. [Paper] [Code]
  • [ICCV] Prior: Prototype representation joint learning from medical images and reports. [Paper] [Code]
  • [MICCAI] Masked vision and language pre-training with unimodal and multimodal contrastive losses for medical visual question answering. [Paper] [Code]
  • [arXiv] T3d: Towards 3d medical image understanding through vision-language pre-training. [Paper]
  • [MICCAI] Gene-induced multimodal pre-training for imageomic classification. [Paper] [Code]
  • [arXiv] A text-guided protein design framework. [Paper] [Code]
  • [Nature Medicine] A visual--language foundation model for pathology image analysis using medical Twitter. [Paper] [Code]
  • [arXiv] Towards generalist biomedical ai. [Paper] [Code]
  • [ML4H] Med-Flamingo: A multimodal medical few-shot learner. [Paper] [Code]
  • [MLMIW] Exploring the transfer learning capabilities of CLIP on domain generalization for diabetic retinopathy. [Paper] [Code]
  • [MICCAI] Open-ended medical visual question answering through prefix tuning of language models. [Paper] [Code]
  • [arXiv] Qilin-Med-VL: Towards chinese large vision-language model for general healthcare. [Paper] [Code]
  • [arXiv] A foundational multimodal vision language AI assistant for human pathology. [Paper]
  • [arXiv] Effectively fine-tune to improve large multimodal models for radiology report generation. [Paper]
  • [MLMIW] Multi-modal adapter for medical vision-and-language learning. [Paper]
  • [arXiv] Text-guided foundation model adaptation for pathological image classification. [Paper] [Code]
  • [arXiv] XrayGPT: Chest radiographs summarization using medical vision-language models. [Paper] [Code]
  • [MICCAI] Xplainer: From X-Ray observations to explainable zero-shot diagnosis. [Paper] [Code]
  • [MICCAI] Multiple prompt fusion for zero-shot lesion detection using vision-language models. [Paper]

2022

  • [JMLR] Contrastive learning of medical visual representations from paired images and text. [Paper] [Code]
  • [ECCV] Joint learning of localized representations from medical images and reports. [Paper]
  • [NeurIPS] Multi-granularity cross-modal alignment for generalized medical visual representation learning. [Paper] [Code]
  • [AAAI] Clinical-BERT: Vision-language pre-training for radiograph diagnosis and reports generation. [Paper]
  • [MICCAI] Multi-modal masked autoencoders for medical vision-and-language pre-training. [Paper] [Code]
  • [JBHI] Multi-modal understanding and generation for medical images and text via vision-language pre-training. [Paper] [Code]
  • [ACM MM] Align, reason and learn: Enhancing medical vision-and-language pre-training with knowledge. [Paper] [Code]
  • [ECCV] Making the most of text semantics to improve biomedical vision–language processing. [Paper]
  • [Nature Biomedical Engineering] Expert-level detection of pathologies from unannotated chest x-ray images via self-supervised learning. [Paper] [Code]
  • [arXiv] RoentGen: Vision-language foundation model for chest X-ray generation. [Paper]
  • [arXiv] Adapting pretrained vision-language foundational models to medical imaging domains. [Paper]
  • [arXiv] Medical image understanding with pretrained vision language models: A comprehensive study. [Paper]
  • [EMNLP] Medclip: Contrastive learning from unpaired medical images and text. [Paper] [Code]
  • [MICCAI] Breaking with fixed set pathology recognition through report-guided contrastive training. [Paper]

2021

  • [arXiv] MMBERT: Multimodal BERT pretraining for improved medical VQA. [Paper] [Code]
  • [ICCV] GLoRIA: A multimodal global-local representation learning framework for label-efficient medical image recognition. [Paper] [Code]

Datasets

LFM datasets

Dataset Name Text Types Scale Task Link
PubMed Literature 18B tokens Language modeling *
MedC-I Literature 79.2B tokens Dialogue *
Guidelines Literature 47K instances Language modeling *
PMC-Patients Literature 167K instances Information retrieval *
MIMIC-III Health records 122K instances Language modeling *
MIMIC-IV Health record 299K instances Language modeling *
eICU-CRDv2.0 Health record 200K instances Language modeling *
EHRs Health record 82B tokens Named entity recognition, Relation extraction, Semantic textual similarity, Natural language inference, Dialogue -
MD-HER Health record 96K instances Dialogue, Question answering -
IMCS-21 Dialogue 4K instances Dialogue *
Huatuo-26M Dialogue 26M instances Question answering *
MedInstruct-52k Dialogue 52K instances Dialogue *
MASH-QA Dialogue 35K instances Dialogue *
MedQuAD Dialogue 47K instances Dialogue *
MedDG Dialogue 17K instances Dialogue *
CMExam Dialogue 68K instances Dialogue *
cMedQA2 Dialogue 108K instances Dialogue *
CMtMedQA Dialogue 70K instances Dialogue *
CliCR Dialogue 100K instances Dialogue *
webMedQA Dialogue 63K instances Dialogue *
ChiMed Dialogue 1.59B tokens Dialogue *
MedDialog Dialogue 20K instances Dialogue *
CMD Dialogue 882K instances Dialogue *
BianqueCorpus Dialogue 2.4M instances Dialogue *
MedQA Dialogue 4K instances Dialogue *
HealthcareMagic Dialogue 100K instances Dialogue *
iCliniq Dialogue 10K instances Dialogue *
CMeKG-8K Dialogue 8K instances Dialogue *
Hybrid SFT Dialogue 226K instances Dialogue *
VariousMedQA Dialogue 54K instances Dialogue *
Medical Meadow Dialogue 160K instances Dialogue *
MultiMedQA Dialogue 193K instances Dialogue -
BiMed1.3M Dialogue 250K instances Dialogue *
OncoGPT Dialogue 180K instances Dialogue *

VFM datasets

Dataset Name Modality Scale Task Link
LIMUC Endoscopy 1043 videos (11276 frames) Detection *
SUN Endoscopy 1018 videos (158,690 frames) Detection *
Kvasir-Capsule Endoscopy 117 videos (4,741,504 frames) Detection *
EndoSLAM Endoscopy 1020 videos (158,690 frames) Detection, Registration *
LDPolypVideo Endoscopy 263 videos (895,284 frames) Detection *
HyperKvasir Endoscopy 374 videos (1,059,519 frames) Detection *
CholecT45 Endoscopy 45 videos (90489 frames) Segmentation, Detection *
DeepLesion CT slices (2D) 32,735 images Segmentation, Registration *
LIDC-IDRI 3D CT 1,018 volumes Segmentation *
TotalSegmentator 3D CT 1,204 volumes Segmentation *
TotalSegmentatorv2 3D CT 1,228 volumes Segmentation *
AutoPET 3D CT, 3D PET 1,214 PET-CT pairs Segmentation *
ULS 3D CT 38,842 volumes Segmentation *
FLARE 2022 3D CT 2,300 volumes Segmentation *
FLARE 2023 3D CT 4,500 volumes Segmentation *
AbdomenCT-1K 3D CT 1,112 volumes Segmentation *
CTSpine1K 3D CT 1,005 volumes Segmentation *
CTPelvic1K 3D CT 1,184 volumes Segmentation *
MSD 3D CT, 3D MRI 1,411 CT, 1,222 MRI Segmentation *
BraTS21 3D MRI 2,040 volumes Segmentation *
BraTS2023-MEN 3D MRI 1,650 volumes Segmentation *
ADNI 3D MRI - Clinical study *
PPMI 3D MRI - Clinical study *
ATLAS v2.0 3D MRI 1,271 volumes Segmentation *
PI-CAI 3D MRI 1,500 volumes Segmentation *
MRNet 3D MRI 1,370 volumes Segmentation *
Retinal OCT-C8 2D OCT 24,000 volumes Classification *
Ultrasound Nerve Segmentation US 11,143 images Segmentation *
Fetal Planes US 12,400 images Classification *
EchoNet-LVH US 12,000 videos Detection, Clinical study *
EchoNet-Dynamic US 10,030 videos Function assessment *
AIROGS CFP 113,893 images Classification *
ISIC 2020 Dermoscopy 33,126 images Classification *
LC25000 Pathology 25,000 images Classification *
DeepLIIF Pathology 1,667 WSIs Classification *
PAIP Pathology 2,457 WSIs Segmentation *
TissueNet Pathology 1,016 WSIs Classification *
NLST 3D CT, Pathology 26,254 CT, 451 WSIs Clinical study *
CRC Pathology 100k images Classification *
MURA X-ray 40,895 images Detection *
ChestX-ray14 X-ray 112,120 images Detection *
SNOW Synthetic pathology 20K image tiles Segmentation *

BFM datasets

Dataset Name Modality Scale Task Link
CellxGene Corpus scRNA-seq over 72M scRNA-seq data Single cell omics study *
NCBI GenBank DNA 3.7B sequences Genomics study *
SCP scRNA-seq over 40M scRNA-seq data Single cell omics study *
Gencode DNA Genomics study *
10x Genomics scRNA-seq, DNA Single cell omics and genomics study *
ABC Atlas scRNA-seq over 15M scRNA-seq data Single cell omics study *
Human Cell Atlas scRNA-seq over 50M scRNA-seq data Single cell omics study *
UCSC Genome Browser DNA Genomics study *
CPTAC DNA, RNA, protein - Genomics and proteomics study *
Ensembl Project Protein Proteomics study *
RNAcentral database RNA 36M sequences Transcriptomics study *
AlphaFold DB Protein 214M structures Proteomics study *
PDBe Protein Proteomics study *
UniProt Protein over 250M sequences Proteomics study *
LINCS L1000 Small molecules 1,000 genes with 41k small molecules Disease research, drug response *
GDSC Small molecules 1,000 cancer cells with 400 compounds Disease research, drug response *
CCLE Bioinformatics study *

MFM datasets

Dataset Name Modalities Scale Task Link
MIMIC-CXR X-ray, Medical report 377K images, 227K texts Vision-Language Learning *
PadChest X-ray, Medical report 160K images, 109K texts Vision-Language Learning *
CheXpert X-ray, Medical report 224K images, 224K texts Vision-Language Learning *
ImageCLEF2018 Multimodal, Captions 232K images, 232K texts Image captioning *
OpenPath Pathology, Tweets 208K images, 208K texts Vision-Language learning *
PathVQA Pathology, QA 4K images, 32K QA pairs VQA *
Quilt-1M Pathology Images, Mixed-source text 1M images, 1M texts Vision-Language learning *
PatchGastricADC22 Pathology, Captions 991 WSIs, 991 texts Image captioning *
PTB-XL ECG, Medical report 21K records, 21K texts Vision-Language learning *
ROCO Multimodal, Captions 87K images, 87K texts Vision-Language learning *
MedICaT Multimodal, Captions 217K images, 217K texts Vision-Language learning *
PMC-OA Multimodal, Captions 1.6M images, 1.6M texts Vision-Language learning *
ChiMed-VL Multimodal, Medical report 580K images, 580K texts Vision-Language learning *
PMC-VQA Multimodal, QA 149K images, 227K QA pairs VQA *
SwissProtCLAP Protein Sequence, Text 441K protein sequence, 441K texts Protein-Language learning *
Duke Breast Cancer MRI Genomic, MRI images, Clinical data 922 patients Multimodal learning *
I-SPY2 MRI images, Clinical data 719 patients Multimodal learning *

Large-scale comprehensive databases

Database Discription Link
CGGA Chinese Glioma Genome Atlas (CGGA) database contains clinical and sequencing data of over 2,000 brain tumor samples from Chinese cohorts. *
UK Biobank UK Biobank is a large-scale biomedical database and research resource containing de-identified genetic, lifestyle and health information and biological samples from half a million UK participants. *
TCGA The Cancer Genome Atlas program (TCGA) molecularly characterizes over 20,000 primary cancer, matches normal samples spanning 33 cancer types, and generates over 2.5 petabytes of genomic, epigenomic, transcriptomic, and proteomic data. *
TCIA The Cancer Imaging Archive (TCIA) is a service which de-identifies and hosts a large publicly available archive of medical images of cancer. *

Other resources

Lectures and tutorials

Blogs

Related awesome repositories

About

We present a comprehensive and deep review of the HFM in challenges, opportunities, and future directions. The released paper: https://arxiv.org/abs/2404.03264

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published