Faster R-CNN / Mask R-CNN on COCO
This example provides a minimal (2k lines) and faithful implementation of the following object detection / instance segmentation papers:
- Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
- Feature Pyramid Networks for Object Detection
- Mask R-CNN
- Cascade R-CNN: Delving into High Quality Object Detection
with the support of:
- Multi-GPU / distributed training, multi-GPU evaluation
- Cross-GPU BatchNorm (aka Sync-BN, from MegDet: A Large Mini-Batch Object Detector)
- Group Normalization
- Training from scratch (from Rethinking ImageNet Pre-training)
This is likely the best-performing open source TensorFlow reimplementation of the above papers.
- Python 3.3+; OpenCV
- TensorFlow ≥ 1.6
pip install 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
- Pre-trained ImageNet ResNet model from tensorpack model zoo
- COCO data. It needs to have the following directory structure:
COCO/DIR/ annotations/ instances_train201?.json instances_val201?.json train201?/ COCO_train201?_*.jpg val201?/ COCO_val201?_*.jpg
You can use either the 2014 version or the 2017 version of the dataset.
To use the common "trainval35k + minival" split for the 2014 dataset, just
download the annotation files
annotations/ as well.
Note that train2017==trainval35k==train2014+val2014-minival2014, and val2017==minival2014
On a single machine:
./train.py --config \ MODE_MASK=True MODE_FPN=True \ DATA.BASEDIR=/path/to/COCO/DIR \ BACKBONE.WEIGHTS=/path/to/ImageNet-R50-AlignPadding.npz
To run distributed training, set
TRAINER=horovod and refer to HorovodTrainer docs.
Options can be changed by either the command line or the
config.py file (recommended).
Some reasonable configurations are listed in the table below.
To predict on an image (needs DISPLAY to show the outputs):
./train.py --predict input.jpg --load /path/to/Trained-Model-Checkpoint --config SAME-AS-TRAINING
To evaluate the performance of a model on COCO:
./train.py --evaluate output.json --load /path/to/Trained-Model-Checkpoint \ --config SAME-AS-TRAINING
Several trained models can be downloaded in the table below. Evaluation and prediction will need to be run with the corresponding configs used in training.
These models are trained on trainval35k and evaluated on minival2014 using mAP@IoU=0.50:0.95. All models are fine-tuned from ImageNet pre-trained R50/R101 models in tensorpack model zoo, unless otherwise noted. All models are trained with 8 NVIDIA V100s, unless otherwise noted.
Performance in Detectron can be roughly reproduced.
|Detectron mAP 1
|Time (on 8 V100s)||Configurations
(click to expand)
standardthis is the default
||47.4;40.54||45h (on 48 V100s)||
1: Numbers taken from Detectron Model Zoo. We compare models that have identical training & inference cost between the two implementations. However their numbers can be different due to many small implementation details. For example, our FPN models are sometimes slightly worse in box AP, which is probably due to batch size.
4: This entry does not use ImageNet pre-training. Detectron numbers are taken from Fig. 5 in Rethinking ImageNet Pre-training. Note that our training strategy is slightly different: we enable cascade throughout the entire training.
NOTES.md has some notes about implementation details & speed.