Knowledge Extraction and Distillation from Large-Scale Image-Text Colonoscopy Reports Leveraging Large Language and Vision Models.
We propose to leverage text reports using large language models (LLMs) and colonoscopy images (representations) to provide pixel-level annotation of polyps thereby tackling data annotation challenges in colonoscopy.
You may also refer to the repository contributed by other co-authors.
Feb. 29th, 2024
: EndoKED is under review .- If you find this work helpful, please give us a 🌟 to receive the updation.
Overview of the EndoKED design and applications to polyp diagnosis. (a) The intrinsic supervision from raw colonoscopy reports is extracted leveraging large language and vision models. The report-level lesion label is firstly extracted from the free-text description by a large language model. Then multiple instance learning (MIL) technique propagates the report-level label to the image level. The region-level bounding box is obtained from class activation map (CAM). A large vision model takes the region-level boxes as prompt and generate pixel-level lesion segmentation. (b) The image classification model for optical biopsy is developed in a data-efficient way - pre-training using multi-centre colonoscopy reports and fine-tuning with limited pathology annotation.
To clone all files:
git clone -i https://github.com/zwyang6/ENDOKED.git
To install Python dependencies:
pip install -r requirements.txt
- Updating soon.
EndoKED is evaluated on five public out-of-domain datasets, i.e., CVC-ClinicDB, Kvasir-SEG, ETIS, CVC-ColonDB, and CVC-300. Following the common experimental setups, the training set from CVC-ClinicDB and Kvasir-SEG are not used during the training and we evaluate our model only in the testing set for a fair comparison. The detailed description for the datasets are reported in Table below.
The five public datasets are publicly available at https://pan.baidu.com/s/1A4e7kmvAShaz3BCitpunFA?pwd=s5t5.
Dataset | Year | Resolution | Training | Testing | Total |
---|---|---|---|---|---|
CVC-ClinincDB | 2015 | 384x384 | 550 | 62 | 612 |
Kvasir-SEG | 2020 | 332x487~1920x1072 | 900 | 100 | 1000 |
ETIS | 2014 | 1225x966 | N/A | 196 | 196 |
CVC-ColonDB | 2016 | 574x500 | N/A | 380 | 380 |
CVC-300 | 2017 | 574x500 | N/A | 60 | 60 |
The results on five public datasets for EndoKED-SEG are reported in the following Table.
Models | Kvasir | ClinicDB | ColonDB | CVC-300 | ETIS |
---|---|---|---|---|---|
U-Net | 0.818 | 0.823 | 0.504 | 0.710 | 0.398 |
U-Net | 0.821 | 0.794 | 0.482 | 0.707 | 0.401 |
C2FNet | 0.886 | 0.919 | 0.724 | 0.874 | 0.699 |
DCRNet | 0.886 | 0.896 | 0.704 | 0.856 | 0.556 |
LDNet | 0.887 | 0.881 | 0.740 | 0.869 | 0.645 |
Polyp-PVT | 0.917 | 0.948 | 0.808 | 0.900 | 0.787 |
EndoKED-SEG | 0.908 | 0.920 | 0.809 | 0.893 | 0.818 |
1. EndoKED-MIL
pyhon ./EndoKED_MIL/train_Endo_BagDistillation_SharedEnc_Similarity_StuFilter.py
2. EndoKED-WSSS
-
bash ./EndoKED_WSSS/launch/1_data_processing.sh
-
bash ./EndoKED_WSSS/launch/run_ALL.sh
-
bash ./EndoKED_WSSS/launch/3_refine_CAM_2_Pseudo.sh
3. EndoKED-SEG
-
bash ./EndoKED_SEG/train.sh
-
bash ./EndoKED_WSSS/launch/5_refine_Preds_2_Pseudo.sh
1. EndoKED-MIL
Updating soon.
2. EndoKED-SEG
python ./EndoKED_WSSS/eval_tools/a1_eval_pseuo_labels_from_SAM_byPreds_fromDecoder.py
We provide the models' logs and checkpoints for EndoKED-SEG, which can be download from https://pan.baidu.com/s/1HaxIZf281lWFpk2USXs6OQ (a9d4) or from google drive with link: https://drive.google.com/drive/folders/1QPGI7T9fa2ogC6_ZB9TChJg2DHIwCvub?usp=drive_link.
We borrowed Polyp-PVT as our segmentation model.Segment Anything and their pre-trained weights are leveraged to refine the pseudo labels. ToCo inspires us to conduct the generation of CAMs. Many thanks to their brilliant works!
Updating soon.
If you have any question, please feel free to contact.