Skip to content

Zzh-tju/ZoneEval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Zone Evaluation: Revealing Spatial Bias in Object Detection

This is the source codes of our paper. We provide zone evaluation on MMDetection v2.25.3, YOLOv5, and YOLOv8.

Here is a detailed step-by-step tutorial.

@article{zheng2023ZoneEval,
  title={Zone Evaluation: Revealing Spatial Bias in Object Detection},
  author= {Zheng, Zhaohui and Chen, Yuming and Hou, Qibin and Li, Xiang and Wang, Ping and Cheng, Ming-Ming},
  journal={arXiv preprint arXiv:2310.13215},
  year={2023}
}

Introduction

A fundamental limitation of object detectors is that they suffer from ``spatial bias'', and in particular perform less satisfactorily when detecting objects near image borders. For a long time, there has been a lack of effective ways to measure and identify spatial bias, and little is known about where it comes from and what degree it is. To this end, we present a new zone evaluation protocol, extending from the traditional evaluation to a more generalized one, which measures the detection performance over zones, yielding a series of Zone Precisions (ZPs). For the first time, we provide numerical results, showing that the object detectors perform quite unevenly across the zones. Surprisingly, the detector's performance in the 96% border zone of the image does not reach the AP value (Average Precision, commonly regarded as the average detection performance in the entire image zone). To better understand spatial bias, a series of heuristic experiments are conducted. Our investigation excludes two intuitive conjectures about spatial bias that the object scale and the absolute positions of objects barely influence the spatial bias. We find that the key lies in the human-imperceptible divergence in data patterns between objects in different zones, thus eventually forming a visible performance gap between the zones. With these findings, we finally discuss a future direction for object detection, namely, spatial disequilibrium problem, aiming at pursuing a balanced detection ability over the entire image zone. By broadly evaluating 10 popular object detectors and 5 detection datasets, we shed light on the spatial bias of object detectors. We hope this work could raise a focus on detection robustness.

Installation

conda create --name ZoneEval python=3.8 -y

conda activate ZoneEval

conda install pytorch=1.12 cudatoolkit=11.3 torchvision=0.13.0 -c pytorch

pip install mmcv-full==1.6.0 -f https://download.openmmlab.com/mmcv/dist/cu113/torch1.12.0/index.html

git clone https://github.com/Zzh-tju/ZoneEval.git

cd ZoneEval/pycocotools

pip install -e .

cd ..

cd mmdetection

pip install -v -e .

Dataset Preparations

Please refer to Dataset Preparations for preparing PASCAL VOC 07+12, Face Mask, Fruit, Helmet, and MS COCO datasets.

Evaluation

Turn on zone evaluation

The relevant options can be specified on the config file,

model = dict(
    test_cfg=dict(zone_eval=True))   # set to False and evaluate in the conventional way.

Evaluation command

# for VOC and 3 application datasets

./tools/dist_test.sh configs/sela/your_config_file.py your_model.pth 2 --eval mAP

# for MS COCO

./tools/dist_test.sh configs/sela/your_config_file.py your_model.pth 2 --eval bbox

Currently, we provide evaluation for various object detectors, and the pretrained weight file can be downloaded from MMDetection or their official websites.

Detector Network & TS $\text{ZP}^{0,5}$ Var $\text{ZP}^{0,1}$ $\text{ZP}^{1,2}$ $\text{ZP}^{2,3}$ $\text{ZP}^{3,4}$ $\text{ZP}^{4,5}$ FPS
RetinaNet R50_1x 36.5 14.8 27.3 33.3 35.5 34.5 39.2 35.4
RetinaNet R50_2x 37.4 16.9 27.6 34.6 35.8 35.1 40.4 35.4
Faster R-CNN R50_1x 37.4 11.8 29.3 34.2 36.1 35.0 39.9 37.5
YOLOF R50_1x 37.5 12.8 28.4 35.2 36.6 35.3 39.2 61.6
Sparse R-CNN R50_1x 37.9 22.8 27.8 34.7 37.1 37.1 42.6 37.8
YOLOv5-s 37.4 10.5 28.8 34.9 36.9 35.1 38.4 140.0
RepPoints R50_1x 38.1 12.9 29.2 34.7 36.7 35.6 40.3 27.4
FCOS R50_1x 38.7 14.7 29.5 35.3 38.0 36.7 41.1 37.3
DETR R50_150e 40.1 26.9 29.8 36.2 39.8 39.1 45.7 49.9
RetinaNet PVT-s_1x 40.4 19.7 30.8 36.9 39.0 37.4 44.6 20.0
Cascade R-CNN R50_1x 40.3 18.7 30.9 36.6 39.2 38.6 44.2 30.7
GFocal R50_1x 40.1 16.9 31.1 37.5 39.4 38.5 43.8 37.2
YOLOv8-s 44.9 24.4 33.4 42.2 44.3 43.2 48.5 128.5
Cascade Mask R-CNN R101_3x 45.4 22.4 34.7 41.6 44.3 44.4 49.1 18.7
Sparse R-CNN R50_3x 45.0 21.6 35.8 41.9 43.4 44.0 50.3 32.1
YOLOv5-m 45.2 12.9 36.0 42.3 44.5 43.2 46.7 104.6
Mask R-CNN Swin-T_3x 46.0 15.4 36.8 41.7 44.1 43.5 49.0 24.3
Mask R-CNN ConvNeXt-T_3x 46.2 17.6 36.7 41.9 44.5 43.6 49.7 22.6
Cascade Mask R-CNN X101-32x8d_3x 46.1 21.1 36.1 42.0 44.8 45.9 49.9 13.5
VFNet R101_2x 46.2 15.6 36.7 43.0 45.0 44.5 48.8 25.9
Deformable DETR R50_50e 46.1 23.2 36.3 42.6 45.6 45.1 51.2 25.9
Sparse R-CNN R101_3x 46.2 21.1 36.9 42.9 44.9 44.7 51.3 25.2
GFocal X101-32x4d_2x 46.1 15.7 37.0 43.5 45.0 44.4 49.3 25.2

Note:

  • 'TS': Training Schedule.
  • ' $\text{ZP}^{0,5}$ ': the traditional Average Precision.
  • 'Var': the variance of the 5 ZP ( $\text{ZP}^{0,1}$, $\text{ZP}^{1,2}$, ..., $\text{ZP}^{4,5}$ ).
  • 'FPS' is measured on a single RTX 3090 GPU. Class score threshold=0.05, NMS IoU threshold=0.6. The test resolution is 640 for YOLOv5 and YOLOv8, while [1333, 800] for the others.
  • If you test DETR series, you must modify the simple_test() function in mmdet/models/detectors/single_stage.py,
        #outs = self.bbox_head(feat)
        outs = self.bbox_head(feat, img_metas) # if you test DETR series

Currently, we do not support zone evaluation for instance segmentation models.