Skip to content

Latest commit

 

History

History
42 lines (30 loc) · 6.09 KB

File metadata and controls

42 lines (30 loc) · 6.09 KB

RoI Trans

Learning RoI Transformer for Oriented Object Detection in Aerial Images

Abstract

Object detection in aerial images is an active yet challenging task in computer vision because of the bird’s-eye view perspective, the highly complex backgrounds, and the variant appearances of objects. Especially when detecting densely packed objects in aerial images, methods relying on horizontal proposals for common object detection often introduce mismatches between the Region of Interests (RoIs) and objects. This leads to the common misalignment between the final object classification confidence and localization accuracy. In this paper, we propose a RoI Transformer to address these problems. The core idea of RoI Transformer is to apply spatial transformations on RoIs and learn the transformation parameters under the supervision of oriented bounding box (OBB) annotations. RoI Transformer is with lightweight and can be easily embedded into detectors for oriented object detection. Simply apply the RoI Transformer to light-head RCNN has achieved state-of-the-art performances on two common and challenging aerial datasets, i.e., DOTA and HRSC2016, with a neglectable reduction to detection speed. Our RoI Transformer exceeds the deformable Position Sensitive RoI pooling when oriented bounding-box annotations are available. Extensive experiments have also validated the flexibility and effectiveness of our RoI Transformer

Results and models

DOTA1.0

Backbone mAP Angle lr schd Mem (GB) Inf Time (fps) Aug Batch Size Configs Download
ResNet50 (1024,1024,200) 73.40 le90 1x 8.46 16.5 - 2 rotated-faster-rcnn-le90_r50_fpn_1x_dota model | log
ResNet50 (1024,1024,200) 75.75 le90 1x 7.56 19.3 - 2 roi-trans-le90_r50_fpn_amp-1x_dota model | log
ResNet50 (1024,1024,200) 76.08 le90 1x 8.67 14.4 - 2 roi-trans-le90_r50_fpn_1x_dota model | log
Swin-tiny (1024,1024,200) 77.51 le90 1x 10.9 - 2 roi-trans-le90_swin-tiny_fpn_1x_dota model | log
ResNet50 (1024,1024,500) 79.66 le90 1x 14.4 MS+RR 2 roi_trans_r50_fpn_1x_dota_ms_rr_le90 model | log

Notes:

  • MS means multiple scale image split.
  • RR means random rotation.

Citation

@InProceedings{ding2018learning,
	author = {Ding, Jian and Xue, Nan and Long, Yang and Xia, Gui-Song and Lu, Qikai},
	title = {Learning RoI Transformer for Oriented Object Detection in Aerial Images},
	booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
	pages={2849--2858},
	year = {2019}
}