RECAP: Towards Precise Radiology Report Generation via Dynamic Disease Progression Reasoning

This repository is the implementation of RECAP: Towards Precise Radiology Report Generation via Dynamic Disease Progression Reasoning. Before running the code, please install the prerequisite libraries, and follow our instructions to replicate the experiments.

Update

[2024/01/13] Checkpoints (Stage 1 and Stage 2) for the MIMIC-ABN dataset are available at Google Drive
[2024/01/12] Checkpoints (Stage 1 and Stage 2) for the MIMIC-CXR dataset are available at Google Drive

Overview

Automating radiology report generation can significantly alleviate radiologists' workloads. Previous research has primarily focused on realizing highly concise observations while neglecting the precise attributes that determine the severity of diseases (e.g., small pleural effusion). Since incorrect attributes will lead to imprecise radiology reports, strengthening the generation process with precise attribute modeling becomes necessary. Additionally, the temporal information contained in the historical records, which is crucial in evaluating a patient's current condition (e.g., heart size is unchanged), has also been largely disregarded. To address these issues, we propose Recap, which generates precise and accurate radiology reports via dynamic disease progression reasoning. Specifically, Recap first predicts the observations and progressions (i.e., spatiotemporal information) given two consecutive radiographs. It then combines the historical records, spatiotemporal information, and radiographs for report generation, where a disease progression graph and dynamic progression reasoning mechanism are devised to accurately select the attributes of each observation and progression. Extensive experiments on two publicly available datasets demonstrate the effectiveness of our model.

Requirements

torch==1.9.1
transformers==4.24.0

Data Preparation and Preprocessing

Please download the two datasets: MIMIC-ABN and MIMIC-CXR, and put the annotation files into the data folder.

For observation preprocessing, we use CheXbert to extract relevant observation information. Please follow the instruction to extract the observation tags.
For progression preprocessing, we adopt Chest ImaGenome to extract relevant observation information.
For entity preprocessing, we use RadGraph to extract relevant entities.
For CE evaluation, please clone CheXbert into the folder and download the checkpoint chexbert.pth into CheXbert:

git clone https://github.com/stanfordmlgroup/CheXbert.git

We share the encrypted reports for the MIMIC-CXR and MIMIC-ABN datasets. To decrypt the reports, you will need to download the mimic-cxr-2.0.0-split.csv.gz from here.

Step 1: MIMIC-ABN Data-split Recovery

We recover the data-split of MIMIC-ABN according to study_id provided by the MIMIC-CXR dataset. We provide an example code as reference. Please run the following code and change the data location accordingly for preprocessig:

python src_preprocessing/run_abn_preprocess.py \
      --mimic_cxr_annotation data/mimic_cxr_annotation.json \
      --mimic_abn_annotation data/mimic_abn_annotation.json \
      --image_path data/mimic_cxr/images/ \
      --output_path data/mimic_abn_annotation_processed.json

Trained Model Weights

Model weights trained on two datasets are available at:

MIMIC-ABN: Google Drive
MIMIC-CXR: Google Drive

Training and Testing Models

Recap is a two-stage framework as shown the figure above. Here are snippets for training and testing Recap.

Stage 1: Observation and Progression Prediction

chmod +x script_stage1/run_mimic_abn.sh
./script_stage1/run_mimic_abn.sh 1

Stage 2: SpatioTemporal-aware Report Generation

chmod +x script_stage2/run_mimic_abn.sh
./script_stage2/run_mimic_abn.sh 1

Citation

If you use the Recap, please cite our paper:

@inproceedings{hou-etal-2023-recap,
    title = "{RECAP}: Towards Precise Radiology Report Generation via Dynamic Disease Progression Reasoning",
    author = "Hou, Wenjun and Cheng, Yi and Xu, Kaishuai and Li, Wenjie and Liu, Jiang",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2023",
    month = dec,
    year = "2023",
    address = "Singapore",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.findings-emnlp.140",
    doi = "10.18653/v1/2023.findings-emnlp.140",
    pages = "2134--2147",
}

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
data/20240101		data/20240101
figure		figure
pycocoevalcap		pycocoevalcap
script_stage1		script_stage1
script_stage2		script_stage2
src_preprocessing		src_preprocessing
src_stage1		src_stage1
src_stage2		src_stage2
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data/20240101

data/20240101

figure

figure

pycocoevalcap

pycocoevalcap

script_stage1

script_stage1

script_stage2

script_stage2

src_preprocessing

src_preprocessing

src_stage1

src_stage1

src_stage2

src_stage2

LICENSE

LICENSE

README.md

README.md

Repository files navigation

RECAP: Towards Precise Radiology Report Generation via Dynamic Disease Progression Reasoning

Update

Overview

Requirements

Data Preparation and Preprocessing

Step 1: MIMIC-ABN Data-split Recovery

Trained Model Weights

Training and Testing Models

Stage 1: Observation and Progression Prediction

Stage 2: SpatioTemporal-aware Report Generation

Citation

About

Releases

Packages

Languages

License

wjhou/Recap

Folders and files

Latest commit

History

Repository files navigation

RECAP: Towards Precise Radiology Report Generation via Dynamic Disease Progression Reasoning

Update

Overview

Requirements

Data Preparation and Preprocessing

Step 1: MIMIC-ABN Data-split Recovery

Trained Model Weights

Training and Testing Models

Stage 1: Observation and Progression Prediction

Stage 2: SpatioTemporal-aware Report Generation

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages