Skip to content

[Under Review] Knowledge Extraction and Distillation from Large-Scale Image-Text Colonoscopy Reports Leveraging Large Language and Vision Models

Notifications You must be signed in to change notification settings

zwyang6/ENDOKED

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Knowledge Extraction and Distillation from Large-Scale Image-Text Colonoscopy Reports Leveraging Large Language and Vision Models.

We propose to leverage text reports using large language models (LLMs) and colonoscopy images (representations) to provide pixel-level annotation of polyps thereby tackling data annotation challenges in colonoscopy.

You may also refer to the repository contributed by other co-authors.

News

  • Feb. 29th, 2024: EndoKED is under review .
  • If you find this work helpful, please give us a 🌟 to receive the updation.

Overeview

Overview of the EndoKED design and applications to polyp diagnosis. (a) The intrinsic supervision from raw colonoscopy reports is extracted leveraging large language and vision models. The report-level lesion label is firstly extracted from the free-text description by a large language model. Then multiple instance learning (MIL) technique propagates the report-level label to the image level. The region-level bounding box is obtained from class activation map (CAM). A large vision model takes the region-level boxes as prompt and generate pixel-level lesion segmentation. (b) The image classification model for optical biopsy is developed in a data-efficient way - pre-training using multi-centre colonoscopy reports and fine-tuning with limited pathology annotation.

SeCo pipeline

Dependencies

To clone all files:

git clone -i https://github.com/zwyang6/ENDOKED.git

To install Python dependencies:

pip install -r requirements.txt

Datasets

Training Dataset

  1. Updating soon.

Evaluation Dataset

EndoKED is evaluated on five public out-of-domain datasets, i.e., CVC-ClinicDB, Kvasir-SEG, ETIS, CVC-ColonDB, and CVC-300. Following the common experimental setups, the training set from CVC-ClinicDB and Kvasir-SEG are not used during the training and we evaluate our model only in the testing set for a fair comparison. The detailed description for the datasets are reported in Table below.

The five public datasets are publicly available at https://pan.baidu.com/s/1A4e7kmvAShaz3BCitpunFA?pwd=s5t5.

Dataset Year Resolution Training Testing Total
CVC-ClinincDB 2015 384x384 550 62 612
Kvasir-SEG 2020 332x487~1920x1072 900 100 1000
ETIS 2014 1225x966 N/A 196 196
CVC-ColonDB 2016 574x500 N/A 380 380
CVC-300 2017 574x500 N/A 60 60

Semantic Results

The results on five public datasets for EndoKED-SEG are reported in the following Table.

Models Kvasir ClinicDB ColonDB CVC-300 ETIS
U-Net 0.818 0.823 0.504 0.710 0.398
U-Net 0.821 0.794 0.482 0.707 0.401
C2FNet 0.886 0.919 0.724 0.874 0.699
DCRNet 0.886 0.896 0.704 0.856 0.556
LDNet 0.887 0.881 0.740 0.869 0.645
Polyp-PVT 0.917 0.948 0.808 0.900 0.787
EndoKED-SEG 0.908 0.920 0.809 0.893 0.818

Training of EndoKED

1. EndoKED-MIL

pyhon ./EndoKED_MIL/train_Endo_BagDistillation_SharedEnc_Similarity_StuFilter.py

2. EndoKED-WSSS

  • 2.1 Data processing
    bash ./EndoKED_WSSS/launch/1_data_processing.sh
    
  • 2.2 Generating Class Activation Maps (CAMs)
    bash ./EndoKED_WSSS/launch/run_ALL.sh
    
  • 2.3 Refine CAMs to Pseudo Labels
    bash ./EndoKED_WSSS/launch/3_refine_CAM_2_Pseudo.sh
    

3. EndoKED-SEG

  • 3.1 Train EndoKED-SEG
    bash ./EndoKED_SEG/train.sh
    
  • 3.2 Refine Preds to Pseudo Labels
    bash ./EndoKED_WSSS/launch/5_refine_Preds_2_Pseudo.sh
    
  • Iterate Step 3.1-3.2 to optimize EndoKED-SEG

Evaluation of EndoKED

1. EndoKED-MIL

Updating soon.

2. EndoKED-SEG

python ./EndoKED_WSSS/eval_tools/a1_eval_pseuo_labels_from_SAM_byPreds_fromDecoder.py

Model logs and checkpoints

We provide the models' logs and checkpoints for EndoKED-SEG, which can be download from https://pan.baidu.com/s/1HaxIZf281lWFpk2USXs6OQ (a9d4) or from google drive with link: https://drive.google.com/drive/folders/1QPGI7T9fa2ogC6_ZB9TChJg2DHIwCvub?usp=drive_link.

Acknowledgement

We borrowed Polyp-PVT as our segmentation model.Segment Anything and their pre-trained weights are leveraged to refine the pseudo labels. ToCo inspires us to conduct the generation of CAMs. Many thanks to their brilliant works!

Citation

Updating soon.

If you have any question, please feel free to contact.

About

[Under Review] Knowledge Extraction and Distillation from Large-Scale Image-Text Colonoscopy Reports Leveraging Large Language and Vision Models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published