RoI Trans

Learning RoI Transformer for Oriented Object Detection in Aerial Images

Abstract

Object detection in aerial images is an active yet challenging task in computer vision because of the bird’s-eye view perspective, the highly complex backgrounds, and the variant appearances of objects. Especially when detecting densely packed objects in aerial images, methods relying on horizontal proposals for common object detection often introduce mismatches between the Region of Interests (RoIs) and objects. This leads to the common misalignment between the final object classification confidence and localization accuracy. In this paper, we propose a RoI Transformer to address these problems. The core idea of RoI Transformer is to apply spatial transformations on RoIs and learn the transformation parameters under the supervision of oriented bounding box (OBB) annotations. RoI Transformer is with lightweight and can be easily embedded into detectors for oriented object detection. Simply apply the RoI Transformer to light-head RCNN has achieved state-of-the-art performances on two common and challenging aerial datasets, i.e., DOTA and HRSC2016, with a neglectable reduction to detection speed. Our RoI Transformer exceeds the deformable Position Sensitive RoI pooling when oriented bounding-box annotations are available. Extensive experiments have also validated the flexibility and effectiveness of our RoI Transformer

Results and models

DOTA1.0

Backbone	mAP	Angle	lr schd	Mem (GB)	Inf Time (fps)	Aug	Batch Size	Configs	Download
ResNet50 (1024,1024,200)	73.40	le90	1x	8.46	16.5	-	2	rotated-faster-rcnn-le90_r50_fpn_1x_dota	model \| log
ResNet50 (1024,1024,200)	75.75	le90	1x	7.56	19.3	-	2	roi-trans-le90_r50_fpn_amp-1x_dota	model \| log
ResNet50 (1024,1024,200)	76.08	le90	1x	8.67	14.4	-	2	roi-trans-le90_r50_fpn_1x_dota	model \| log
Swin-tiny (1024,1024,200)	77.51	le90	1x		10.9	-	2	roi-trans-le90_swin-tiny_fpn_1x_dota	model \| log
ResNet50 (1024,1024,500)	79.66	le90	1x		14.4	MS+RR	2	roi_trans_r50_fpn_1x_dota_ms_rr_le90	model \| log

Notes:

MS means multiple scale image split.
RR means random rotation.

Citation

@InProceedings{ding2018learning,
	author = {Ding, Jian and Xue, Nan and Long, Yang and Xia, Gui-Song and Lu, Qikai},
	title = {Learning RoI Transformer for Oriented Object Detection in Aerial Images},
	booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
	pages={2849--2858},
	year = {2019}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

RoI Trans

Abstract

Results and models

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

RoI Trans

Abstract

Results and models

Citation