Skip to content

XueZ-phd/TFDet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TFDet: Target-aware Fusion for RGB-T Pedestrian Detection

This is the official repository for our paper "TFDet: Target-aware Fusion for RGB-T Pedestrian Detection" (arxiv paper link).

Our main contributions are summarized as follows:

  • We comprehensively analyze the adverse impact of false positives on detection performance and identify the noisy feature map as a key factor contributing to these false positives.
  • To address the noisy feature map issue, we propose the target-aware fusion strategy, which can effectively fuse complementary features from both modalities and highlight the feature representation in pedestrian areas while suppressing the representations in the background areas.
  • Experiments show that our TFDet generates discriminative feature maps, significantly reducing false positives. Our TFDet achieves state-of-the-art performance on two challenging multi-spectral pedestrian detection benchmarks: KAIST and LLVIP. Additionally, it is computationally effective and has a comparable inference time to previous state-of-the-art approaches.
  • Notably, TFDet performs especially well under challenging night scenes.

Dataset and Models

  • Datasets and model checkpoints can be downloaded from this cloud link, extractor code: tfde.
  • Since the KAIST dataset has been updated by several previous works, such as Hwang et al., Li et al., Liu et al., and Zhang et al., we upload this dataset for your convenience in using our code.
  • The LLVIP dataset can be downloaded from its official repository.

In the cloud link, files are organized as follow:

TFDet:
├─datasets
│  ├─kaist
│  │  └─zx-sanitized-kaist-keepPerson-fillNonPerson
│  │      ├─annotations.zip
│  │      ├─coco_format.zip
│  │      ├─images
│  │      │  ├─test.zip
│  │      │  ├─train_lwir.zip
│  │      │  └─train_visible.zip
│  │      ├─test.avi
│  │      └─train.avi
│  └─LLVIP		# LLVIP should be downloaded here
│      ├─LLVIP
│      │  ├─coco_format	# For mmdetection
│      │  ├─lwir
│      │  └─visible
│      └─yolov5_format	# For yolov5
│          ├─images
│          │  ├─lwir
│          │  │  ├─test
│          │  │  └─train
│          │  └─visible
│          │      ├─test
│          │      └─train
│          └─labels
│              ├─lwir
│              │  ├─test
│              │  └─train
│              └─visible
│                  ├─test
│                  └─train
├─mmdetection
│  ├─runs
│  │  └─FasterRCNN_vgg16_channelRelation_dscSEFusion_similarityMax_1
│  │      ├─epoch_
│  │      │  ├─epoch_3-test-all.txt
│  │      │  ├─epoch_3-test-day.txt
│  │      │  └─epoch_3-test-night.txt
│  │      ├─epoch_3.pkl
│  │      └─epoch_3.pth
│  └─runs_llvip
│      └─FasterRCNN_r50wMask_ROIFocalLoss5_CIOU20_cosineSE_dcnGWConvGlobalCC_1024x1280
│          ├─20230825_171907.log
│          ├─20230825_171907.log.json
│          └─epoch_7.pth
└─yolov5-master
    └─runs
        └─train
            └─modifiedDCN_MaskSup_negCorr_1024
                └─weights
                    └─best.pt

KAIST

Environmental Requirements

  • We use Faster R-CNN implemented by the MMDetection toolbox to detect pedestrians on the KAIST dataset. Please follow the MMDetection documents to install environments.

In our environment, we use:

python==3.7.13
torch==1.10.1+cu111
torchvision==0.11.2+cu111
mmcv==1.6.0
mmdet==2.24.1

Dataset

  • Please download the KAIST dataset and checkpoint from the above cloud link, and save them following the structure shown in the Dataset and Models section.
  • If you solely intend to assess the inference results, feel free to download the following files: images/test.zip, annotations.zip and coco_format.zip.

Inference

Note: the data_root and img_prefix in configuration files should be correctly modified according to your local dataset path. Please refer to the mmdetection document for more details.

Since the KAIST dataset is evaluated by the log-average miss rate metric, we should run three files: mmdetection/tools/test.py, mmdetection/myCodesZoo/cvtpkl2txt_person.py, and KAISTdevkit-matlab-wrapper/demo_test.m.

We should first run the tools/test.py to generate the detection results.

cd mmdetection

# generate detection result in pkl format
python tools/test.py configs/faster_rcnn/faster_rcnn_vgg16_fpn_sanitized-kaist_v5.py runs/FasterRCNN_vgg16_channelRelation_dscSEFusion_similarityMax_1/epoch_3.pth --work-dir runs/FasterRCNN_vgg16_channelRelation_dscSEFusion_similarityMax_1 --gpu-id 7 --eval bbox --out runs/FasterRCNN_vgg16_channelRelation_dscSEFusion_similarityMax_1/epoch_3.pkl

A few minutes later, you will obtain one pkl file named runs/FasterRCNN_vgg16_channelRelation_dscSEFusion_similarityMax_1/epoch_3.pkl. Then, run the myCodesZoo/cvtpkl2txt_person.py to parse the pkl file

cd myCodesZoo

# convert detections to txt format
python cvtpkl2txt_person.py

Then, you will obtain a folder runs/FasterRCNN_vgg16_channelRelation_dscSEFusion_similarityMax_1/epoch_, which includes epoch_3-test-all.txt, epoch_3-test-day.txt, epoch_3-test-night.txt. Meanwhile, we use the python code provided by MLPD to compute the log-average miss rate, and get a state-of-the-art performance: MR_all: 4.37, MR_day: 5.08, and MR_night: 3.36, which is significantly better than previous state-of-the-art approaches. Nevertheless, we use the commomly-used matlab code to evaluate the log-average miss rate for fair comparisons with other approaches.

cd ../../KAISTdevkit-matlab-wrapper

run demo_test.m

Finally, you will get the result:

Methods MR-All($\downarrow$) MR-Day($\downarrow$) MR-Night($\downarrow$) MR-Near($\downarrow$) MR-Medium($\downarrow$)
MSR (AAAI 2022) 11.39 15.28 6.48 - -
AR-CNN (ICCV 2019) 9.34 9.94 8.38 0.00 16.08
MBNet (ECCV2020) 8.13 8.28 7.86 0.00 16.07
DCMNet (ACM MM 2022) 5.84 6.48 4.60 0.02 16.07
ProbEn3 (ECCV 2022) 5.14 6.04 3.59 0.00 9.59
TFDet (Ours) 4.47 5.22 3.36 0.00 9.29

LLVIP

Environmental Requirements

We use the official repositories of MMDetection and YOLOv5 official repository in our experiments. Please configure your environment following the official documentation. For MMDetection, we employ the same environment as in the experiments on the KAIST dataset. For YOLOv5, some dependencies in our environment include:

python==3.7.16
torch==1.12.1
torchvision==0.13.1

Dataset

  • Please download the LLVIP dataset from this link.

  • For MMDetection, you should convert the annotation into coco-format following this document.

  • For YOLOv5, you should convert the annotation into yolo-format follow this document.

Inference

For MMDetection, we evaluate TFDet at two resolutions. Since the LLVIP dataset have a large number of multispectral images, we use distributed inference code.

# 640 x 512 resolution
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7,8 PORT=29500 bash tools/zx_dist_test_llvip_640x512.sh 8 --eval bbox
# 1280 x 1024 resolution
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7,8 PORT=29500 bash tools/zx_dist_test_llvip_1024x1280.sh 8 --eval bbox

Then, you will get:

Methods Resolution AP.50($\uparrow$) AP($\uparrow$)
DetFusion (ACM MM 2022) 640 x 512 80.7 -
ProbEn (ECCV2022) 640 x 512 93.4 51.5
TFDet (Ours) 640 x 512 95.7 56.1
DCMNet (ACM MM 2022) 1280 x 1024 - 58.4
TFDet (Ours) 1280 x 1024 96.0 59.4

For YOLOv5,

python val.py --device 0 --data LLVIP.yaml --weights runs/train/modifiedDCN_MaskSup_negCorr_1024/weights/best.pt --batch-size 32 --img 1024 --conf-thres 0.008 --iou-thres 0.4 --exist-ok

Finally, you will get:

Methods AP.50($\uparrow$) AP.75($\uparrow$) AP($\uparrow$)
RGB (ICCV 2021) 90.8 56.4 52.7
Thermal (ICCV 2021) 96.5 76.4 67.0
TFDet (Ours) 97.9 83.4 71.1

Citation

If you find our TFDet useful, please cite our paper:

@article{tfdet,
title={TFDet: Target-aware Fusion for RGB-T Pedestrian Detection},
author={Zhang, Xue and Zhang, Xiaohan and Sheng, Zehua and Shen, Hui-Liang},
journal={arXiv preprint arXiv:2305.16580},
year={2023}}