This is the official repository for "Revisiting Computer-Aided Tuberculosis Diagnosis".
Tuberculosis (TB) is a major global health threat, causing millions of deaths annually. Although early diagnosis and treatment can greatly improve the chances of survival, it remains a major challenge, especially in developing countries. Recently, computer-aided tuberculosis diagnosis (CTD) using deep learning has shown promise, but progress is hindered by limited training data. To address this, we establish a large-scale dataset, namely the Tuberculosis X-ray (TBX11K) dataset, which contains 11,200 chest X-ray (CXR) images with corresponding bounding box annotations for TB areas. This dataset enables the training of sophisticated detectors for high-quality CTD. Furthermore, we leverage the bilateral symmetry property of CXR images to propose a strong baseline, SymFormer, for simultaneous CXR image classification and TB infection area detection. To promote future research on CTD, we build a benchmark by introducing evaluation metrics, evaluating baseline models reformed from existing detectors, and running an online challenge.
This work extends the preliminary CVPR 2020 version ("Rethinking Computer-aided Tuberculosis Diagnosis", CVPR 2020, Oral) by proposing a novel SymFormer framework for CTD and validating its effectiveness with extensive experiments.
[PDF] [Project Page] [Dataset on Google Drive] [Dataset on Baidu Yunpan] [Online Challenge] [中译版]
- torch==1.9.0
- torchvision==0.10.0
- mmcv==1.3.12
Run pip install -v -e .
to install this repository.
Summary of publicly available TB datasets. The size of our dataset is about
Datasets | Pub. Year | #Classes | Annotations | #Samples |
---|---|---|---|---|
MC | 2014 | 2 | Image-level | 138 |
Shenzhen | 2014 | 2 | Image-level | 662 |
DA | 2014 | 2 | Image-level | 156 |
DB | 2014 | 2 | Image-level | 150 |
TBX11K (Ours) | 2020 & 2023 | 4 | Bounding box | 11,200 |
Split for the TBX11K dataset. Active & Latent TB
refers to CXR images with both active and latent TB; Active TB
refers to CXR images with only active TB; Latent TB
refers to CXR images with only latent TB; Uncertain TB
refers to TB CXR images where the type of TB infection cannot be recognized using current medical conditions.
Classes | Train | Val | Test | Total | |
Non-TB | Healthy | 3,000 | 800 | 1,200 | 5,000 |
Sick & Non-TB | 3,000 | 800 | 1,200 | 5,000 | |
TB | Active TB | 473 | 157 | 294 | 924 |
Latent TB | 104 | 36 | 72 | 212 | |
Active & Latent TB | 23 | 7 | 24 | 54 | |
Uncertain TB | 0 | 0 | 10 | 10 | |
Total | 6,600 | 1,800 | 2,800 | 11,200 |
Methods | Backbones | Accuracy | AUC (TB) | Sensitivity | Specificity | AP | AR | Result |
---|---|---|---|---|---|---|---|---|
Deformable DETR | ResNet-50 w/ FPN | 91.3 | 97.6 | 89.2 | 95.3 | 89.8 | 91.0 | [JSON] [TXT] |
SymFormer w/ Deformable DETR | ResNet-50 w/ FPN | 94.3 | 98.5 | 87.3 | 97.3 | 93.2 | 93.2 | [JSON] [TXT] |
SymFormer w/ RetinaNet | ResNet-50 w/ FPN | 94.5 | 98.9 | 91.0 | 96.8 | 93.3 | 94.0 | [JSON] [TXT] |
SymFormer w/ RetinaNet | P2T-Small w/ FPN | 94.6 | 99.1 | 92.1 | 96.7 | 93.4 | 94.2 | [JSON] [TXT] |
TP: True Positives; TN: True Negatives; FP: False Positives; FN: False Negatives.
#Total
denotes the total number of test CXR images. We test FPS on a single TITAN XP GPU. For the ground truths, the ratio of positives (TP + FN) is 19.6%, and the ratio of negatives (TN + FP) is 80.4%.
Methods | Backbones | #FLOPs | #Params | FPS | TP/#Total | TN/#Total | FP/#Total | FN/#Total | |
---|---|---|---|---|---|---|---|---|---|
Deformable DETR | ResNet-50 w/ FPN | 54.07 | 52.67 | 23.0 | 85.6 | 17.5 | 76.6 | 3.8 | 2.1 |
SymFormer w/ Deformable DETR | ResNet-50 w/ FPN | 54.08 | 52.69 | 22.5 | 87.9 | 17.1 | 78.2 | 2.2 | 2.5 |
SymFormer w/ RetinaNet | ResNet-50 w/ FPN | 59.14 | 50.03 | 24.3 | 89.0 | 17.8 | 77.8 | 2.6 | 1.8 |
SymFormer w/ RetinaNet | P2T-Small w/ FPN | 55.46 | 45.10 | 17.9 | 89.6 | 18.1 | 77.7 | 2.7 | 1.5 |
Methods | Test Data | Backbones | Category-agnostic TB | Active TB | Latent TB | |||
AP50bb | APbb | AP50bb | APbb | AP50bb | APbb | |||
Deformable DETR | ALL | ResNet-50 w/ FPN | 51.7 | 22.0 | 48.9 | 21.2 | 7.1 | 1.9 |
SymFormer w/ Deformable DETR | ResNet-50 w/ FPN | 57.0 | 23.3 | 52.1 | 22.7 | 7.1 | 2.0 | |
SymFormer w/ RetinaNet | ResNet-50 w/ FPN | 68.0 | 29.5 | 62.0 | 27.3 | 13.3 | 4.4 | |
SymFormer w/ RetinaNet | P2T-Small w/ FPN | 70.4 | 30.0 | 63.6 | 26.9 | 11.4 | 4.3 | |
Deformable DETR | Only TB | ResNet-50 w/ FPN | 57.4 | 24.2 | 54.5 | 23.5 | 7.6 | 2.3 |
SymFormer w/ Deformable DETR | ResNet-50 w/ FPN | 60.8 | 24.5 | 55.2 | 23.8 | 9.2 | 2.6 | |
SymFormer w/ RetinaNet | ResNet-50 w/ FPN | 73.4 | 31.5 | 67.1 | 29.2 | 14.7 | 4.8 | |
SymFormer w/ RetinaNet | P2T-Small w/ FPN | 75.7 | 32.1 | 68.9 | 28.9 | 13.0 | 4.7 |
Visualization of the learned deep features from CXR images using SymFormer w/ RetinaNet. We randomly select CXR images from the TBX11K test set. In each example, the infection areas of active TB, latent TB, and uncertain TB are indicated by boxes colored in green, red, and blue, respectively. The ground-truth boxes are displayed with thick lines, while the detected boxes are shown with thin lines.
Here, we show the training/testing commands by using P2T-Small as the backbone network and RetinaNet as the base detector.
Download the ImageNet-pretrained model first: P2T-Small.
Use the following commands to train SymFormer
:
# step I: train detection
CUDA_VISIBLE_DEVICE=0 python tools/train.py \
configs/symformer/symformer_retinanet_p2t_fpn_2x_TBX11K.py \
--work-dir work_dirs/symformer_retinanet_p2t/ \
--no-validate
# step II: train classification
CUDA_VISIBLE_DEVICES=0 python tools/train.py \
configs/symformer/symformer_retinanet_p2t_cls_fpn_1x_TBX11K.py \
--work-dir work_dirs/symformer_retinanet_p2t_cls/ \
--no-validate
Use the following commands to generate results for the TBX11K test set:
CUDA_VISIBLE_DEVICES=0 python -W ignore tools/test.py \
configs/symformer/symformer_retinanet_p2t_cls_fpn_1x_TBX11K.py \
work_dirs/symformer_retinanet_p2t_cls/latest.pth \
--out work_dirs/symformer_retinanet_p2t_cls/result/result.pkl \
--format-only --cls-filter True \
--options "jsonfile_prefix=work_dirs/symformer_retinanet_p2t_cls/result/bbox_result" \
--txt work_dirs/symformer_retinanet_p2t_cls/result/cls_result.txt
We only release the ground truths for the training and validation sets of our TBX11K dataset. The test set is retained as an online challenge for TB X-ray classification and TB infection area detection. To participate this challenge, you need to create an account on CodaLab and register for the TBX11K Tuberculosis Classification and Detection Challenge. Please refer to this webpage or our paper to see the evaluation metrics. Then, open the "Participate" tab to read the submission guidelines carefully. Next, you can upload your submission. Once uploaded, your submissions will be evaluated automatically.
If you are using the code/model/data provided here in a publication, please consider citing our papers:
@article{liu2023revisiting,
title={Revisiting Computer-Aided Tuberculosis Diagnosis},
author={Liu, Yun and Wu, Yu-Huan and Zhang, Shi-Chen and Liu, Li and Wu, Min and Cheng, Ming-Ming},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
year={2023}
}
@inproceedings{liu2020rethinking,
title={Rethinking Computer-aided Tuberculosis Diagnosis},
author={Liu, Yun and Wu, Yu-Huan and Ban, Yunfeng and Wang, Huifang and Cheng, Ming-Ming},
booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={2646--2655},
year={2020}
}
This repository exemplifies the training/testing commands by using P2T-Small as the backbone network and RetinaNet as the base detector:
@article{wu2022p2t,
title={P2T: Pyramid Pooling Transformer for Scene Understanding},
author={Wu, Yu-Huan and Liu, Yun and Zhan, Xin and Cheng, Ming-Ming},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
volume={45},
number={11},
pages={12760--12771},
year={2023},
publisher={IEEE}
}
@inproceedings{lin2017focal,
title={Focal Loss for Dense Object Detection},
author={Lin, Tsung-Yi and Goyal, Priya and Girshick, Ross and He, Kaiming and Doll{\'a}r, Piotr},
booktitle={IEEE International Conference on Computer Vision},,
pages={2980--2988},
year={2017}
}