From 5245161c96cc057dc7a883ef4283ed7fab735bcf Mon Sep 17 00:00:00 2001 From: vivek rathod Date: Fri, 10 Jul 2020 10:03:31 -0700 Subject: [PATCH] Merged commit includes the following changes: (#8830) 320622111 by rathodv: Internal Change. -- PiperOrigin-RevId: 320622111 Co-authored-by: TF Object Detection Team --- research/object_detection/README.md | 448 +++++------------- ...ourglass104_1024x1024_coco17_tpu-32.config | 129 +++++ ...t_hourglass104_512x512_coco17_tpu-8.config | 143 ++++++ ...snet101_v1_fpn_512x512_coco17_tpu-8.config | 141 ++++++ ...resnet101_v1_1024x1024_coco17_tpu-8.config | 166 +++++++ ...n_resnet101_v1_640x640_coco17_tpu-8.config | 145 ++++++ ..._resnet101_v1_800x1333_coco17_gpu-8.config | 154 ++++++ ...resnet152_v1_1024x1024_coco17_tpu-8.config | 166 +++++++ ...n_resnet152_v1_640x640_coco17_tpu-8.config | 145 ++++++ ..._resnet152_v1_800x1333_coco17_gpu-8.config | 154 ++++++ ..._resnet50_v1_1024x1024_coco17_tpu-8.config | 166 +++++++ ...nn_resnet50_v1_640x640_coco17_tpu-8.config | 145 ++++++ ...n_resnet50_v1_800x1333_coco17_gpu-8.config | 154 ++++++ ...on_resnet_v2_1024x1024_coco17_gpu-8.config | 160 +++++++ ...fficientdet_d0_512x512_coco17_tpu-8.config | 199 ++++++++ ...fficientdet_d1_640x640_coco17_tpu-8.config | 199 ++++++++ ...fficientdet_d2_768x768_coco17_tpu-8.config | 199 ++++++++ ...ficientdet_d3_896x896_coco17_tpu-32.config | 199 ++++++++ ...cientdet_d4_1024x1024_coco17_tpu-32.config | 199 ++++++++ ...cientdet_d5_1280x1280_coco17_tpu-32.config | 199 ++++++++ ...cientdet_d6_1408x1408_coco17_tpu-32.config | 201 ++++++++ ...cientdet_d7_1536x1536_coco17_tpu-32.config | 201 ++++++++ ...bilenet_v1_fpn_640x640_coco17_tpu-8.config | 197 ++++++++ ...d_mobilenet_v2_320x320_coco17_tpu-8.config | 197 ++++++++ ...net_v2_fpnlite_320x320_coco17_tpu-8.config | 201 ++++++++ ...net_v2_fpnlite_640x640_coco17_tpu-8.config | 201 ++++++++ ...et101_v1_fpn_1024x1024_coco17_tpu-8.config | 197 ++++++++ ...snet101_v1_fpn_640x640_coco17_tpu-8.config | 197 ++++++++ ...et152_v1_fpn_1024x1024_coco17_tpu-8.config | 197 ++++++++ ...snet152_v1_fpn_640x640_coco17_tpu-8.config | 197 ++++++++ ...net50_v1_fpn_1024x1024_coco17_tpu-8.config | 197 ++++++++ ...esnet50_v1_fpn_640x640_coco17_tpu-8.config | 197 ++++++++ .../g3doc/challenge_evaluation.md | 6 +- .../g3doc/configuring_jobs.md | 22 +- .../object_detection/g3doc/context_rcnn.md | 2 + .../g3doc/defining_your_own_model.md | 8 +- .../g3doc/evaluation_protocols.md | 2 +- .../g3doc/exporting_models.md | 4 +- research/object_detection/g3doc/faq.md | 2 +- .../object_detection/g3doc/installation.md | 184 ------- .../g3doc/instance_segmentation.md | 2 +- .../g3doc/oid_inference_and_evaluation.md | 4 +- .../g3doc/preparing_inputs.md | 2 +- .../object_detection/g3doc/release_notes.md | 339 +++++++++++++ .../object_detection/g3doc/running_locally.md | 66 --- .../g3doc/running_notebook.md | 3 + .../g3doc/running_on_cloud.md | 170 ------- .../g3doc/running_on_mobile_tensorflowlite.md | 2 + .../object_detection/g3doc/running_pets.md | 22 +- research/object_detection/g3doc/tf1.md | 92 ++++ ...tion_model_zoo.md => tf1_detection_zoo.md} | 9 +- .../g3doc/tf1_training_and_evaluation.md | 237 +++++++++ research/object_detection/g3doc/tf2.md | 82 ++++ .../g3doc/tf2_classification_zoo.md | 25 + .../g3doc/tf2_detection_zoo.md | 65 +++ .../g3doc/tf2_training_and_evaluation.md | 285 +++++++++++ .../g3doc/tpu_compatibility.md | 6 +- .../object_detection/g3doc/tpu_exporters.md | 2 + .../g3doc/using_your_own_dataset.md | 2 +- 59 files changed, 6823 insertions(+), 812 deletions(-) create mode 100644 research/object_detection/configs/tf2/center_net_hourglass104_1024x1024_coco17_tpu-32.config create mode 100644 research/object_detection/configs/tf2/center_net_hourglass104_512x512_coco17_tpu-8.config create mode 100644 research/object_detection/configs/tf2/center_net_resnet101_v1_fpn_512x512_coco17_tpu-8.config create mode 100644 research/object_detection/configs/tf2/faster_rcnn_resnet101_v1_1024x1024_coco17_tpu-8.config create mode 100644 research/object_detection/configs/tf2/faster_rcnn_resnet101_v1_640x640_coco17_tpu-8.config create mode 100644 research/object_detection/configs/tf2/faster_rcnn_resnet101_v1_800x1333_coco17_gpu-8.config create mode 100644 research/object_detection/configs/tf2/faster_rcnn_resnet152_v1_1024x1024_coco17_tpu-8.config create mode 100644 research/object_detection/configs/tf2/faster_rcnn_resnet152_v1_640x640_coco17_tpu-8.config create mode 100644 research/object_detection/configs/tf2/faster_rcnn_resnet152_v1_800x1333_coco17_gpu-8.config create mode 100644 research/object_detection/configs/tf2/faster_rcnn_resnet50_v1_1024x1024_coco17_tpu-8.config create mode 100644 research/object_detection/configs/tf2/faster_rcnn_resnet50_v1_640x640_coco17_tpu-8.config create mode 100644 research/object_detection/configs/tf2/faster_rcnn_resnet50_v1_800x1333_coco17_gpu-8.config create mode 100644 research/object_detection/configs/tf2/mask_rcnn_inception_resnet_v2_1024x1024_coco17_gpu-8.config create mode 100644 research/object_detection/configs/tf2/ssd_efficientdet_d0_512x512_coco17_tpu-8.config create mode 100644 research/object_detection/configs/tf2/ssd_efficientdet_d1_640x640_coco17_tpu-8.config create mode 100644 research/object_detection/configs/tf2/ssd_efficientdet_d2_768x768_coco17_tpu-8.config create mode 100644 research/object_detection/configs/tf2/ssd_efficientdet_d3_896x896_coco17_tpu-32.config create mode 100644 research/object_detection/configs/tf2/ssd_efficientdet_d4_1024x1024_coco17_tpu-32.config create mode 100644 research/object_detection/configs/tf2/ssd_efficientdet_d5_1280x1280_coco17_tpu-32.config create mode 100644 research/object_detection/configs/tf2/ssd_efficientdet_d6_1408x1408_coco17_tpu-32.config create mode 100644 research/object_detection/configs/tf2/ssd_efficientdet_d7_1536x1536_coco17_tpu-32.config create mode 100644 research/object_detection/configs/tf2/ssd_mobilenet_v1_fpn_640x640_coco17_tpu-8.config create mode 100644 research/object_detection/configs/tf2/ssd_mobilenet_v2_320x320_coco17_tpu-8.config create mode 100644 research/object_detection/configs/tf2/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8.config create mode 100644 research/object_detection/configs/tf2/ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8.config create mode 100644 research/object_detection/configs/tf2/ssd_resnet101_v1_fpn_1024x1024_coco17_tpu-8.config create mode 100644 research/object_detection/configs/tf2/ssd_resnet101_v1_fpn_640x640_coco17_tpu-8.config create mode 100644 research/object_detection/configs/tf2/ssd_resnet152_v1_fpn_1024x1024_coco17_tpu-8.config create mode 100644 research/object_detection/configs/tf2/ssd_resnet152_v1_fpn_640x640_coco17_tpu-8.config create mode 100644 research/object_detection/configs/tf2/ssd_resnet50_v1_fpn_1024x1024_coco17_tpu-8.config create mode 100644 research/object_detection/configs/tf2/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.config delete mode 100644 research/object_detection/g3doc/installation.md create mode 100644 research/object_detection/g3doc/release_notes.md delete mode 100644 research/object_detection/g3doc/running_locally.md delete mode 100644 research/object_detection/g3doc/running_on_cloud.md create mode 100644 research/object_detection/g3doc/tf1.md rename research/object_detection/g3doc/{detection_model_zoo.md => tf1_detection_zoo.md} (97%) create mode 100644 research/object_detection/g3doc/tf1_training_and_evaluation.md create mode 100644 research/object_detection/g3doc/tf2.md create mode 100644 research/object_detection/g3doc/tf2_classification_zoo.md create mode 100644 research/object_detection/g3doc/tf2_detection_zoo.md create mode 100644 research/object_detection/g3doc/tf2_training_and_evaluation.md diff --git a/research/object_detection/README.md b/research/object_detection/README.md index c88e88c4703..32011c96ff9 100644 --- a/research/object_detection/README.md +++ b/research/object_detection/README.md @@ -1,7 +1,7 @@ -![TensorFlow Requirement: 1.15](https://img.shields.io/badge/TensorFlow%20Requirement-1.15-brightgreen) -![TensorFlow 2 Not Supported](https://img.shields.io/badge/TensorFlow%202%20Not%20Supported-%E2%9C%95-red.svg) - -# Tensorflow Object Detection API +# TensorFlow Object Detection API +[![TensorFlow 2.2](https://img.shields.io/badge/TensorFlow-2.2-FF6F00?logo=tensorflow)](https://github.com/tensorflow/tensorflow/releases/tag/v2.2.0) +[![TensorFlow 1.15](https://img.shields.io/badge/TensorFlow-1.15-FF6F00?logo=tensorflow)](https://github.com/tensorflow/tensorflow/releases/tag/v1.15.0) +[![Python 3.6](https://img.shields.io/badge/Python-3.6-3776AB)](https://www.python.org/downloads/release/python-360/) Creating accurate machine learning models capable of localizing and identifying multiple objects in a single image remains a core challenge in computer vision. @@ -11,7 +11,7 @@ models. At Google we’ve certainly found this codebase to be useful for our computer vision needs, and we hope that you will as well.

Contributions to the codebase are welcome and we would love to hear back from -you if you find this API useful. Finally if you use the Tensorflow Object +you if you find this API useful. Finally if you use the TensorFlow Object Detection API for a research publication, please consider citing: ``` @@ -26,91 +26,91 @@ Song Y, Guadarrama S, Murphy K, CVPR 2017

-## Maintainers +## Support for TensorFlow 2 and 1 +The TensorFlow Object Detection API supports both TensorFlow 2 (TF2) and +TensorFlow 1 (TF1). A majority of the modules in the library are both TF1 and +TF2 compatible. In cases where they are not, we provide two versions. -Name | GitHub --------------- | --------------------------------------------- -Jonathan Huang | [jch1](https://github.com/jch1) -Vivek Rathod | [tombstone](https://github.com/tombstone) -Ronny Votel | [ronnyvotel](https://github.com/ronnyvotel) -Derek Chow | [derekjchow](https://github.com/derekjchow) -Chen Sun | [jesu9](https://github.com/jesu9) -Menglong Zhu | [dreamdragon](https://github.com/dreamdragon) -Alireza Fathi | [afathi3](https://github.com/afathi3) -Zhichao Lu | [pkulzc](https://github.com/pkulzc) - -## Table of contents - -Setup: - -* Installation
- -Quick Start: - -* - Quick Start: Jupyter notebook for off-the-shelf inference
-* Quick Start: Training a pet detector
- -Customizing a Pipeline: - -* - Configuring an object detection pipeline
-* Preparing inputs
- -Running: - -* Running locally
-* Running on the cloud
- -Extras: - -* Tensorflow detection model zoo
-* - Exporting a trained model for inference
-* - Exporting a trained model for TPU inference
-* - Defining your own model architecture
-* - Bringing in your own dataset
-* - Supported object detection evaluation protocols
-* - Inference and evaluation on the Open Images dataset
-* - Run an instance segmentation model
-* - Run the evaluation for the Open Images Challenge 2018/2019
-* - TPU compatible detection pipelines
-* - Running object detection on mobile devices with TensorFlow Lite
-* - Context R-CNN documentation for data preparation, training, and export
+Although we will continue to maintain the TF1 models and provide support, we +encourage users to try the Object Detection API with TF2 for the following +reasons: -## Getting Help +* We provide new architectures supported in TF2 only and we will continue to + develop in TF2 going forward. -To get help with issues you may encounter using the Tensorflow Object Detection -API, create a new question on [StackOverflow](https://stackoverflow.com/) with -the tags "tensorflow" and "object-detection". +* The popular models we ported from TF1 to TF2 achieve the same performance. -Please report bugs (actually broken code, not usage questions) to the -tensorflow/models GitHub -[issue tracker](https://github.com/tensorflow/models/issues), prefixing the -issue name with "object_detection". +* A single training and evaluation binary now supports both GPU and TPU + distribution strategies making it possible to train models with synchronous + SGD by default. + +* Eager execution with new binaries makes debugging easy! + +Finally, if are an existing user of the Object Detection API we have retained +the same config language you are familiar with and ensured that the +TF2 training/eval binary takes the same arguments as our TF1 binaries. + +Note: The models we provide in [TF2 Zoo](g3doc/tf2_detection_zoo.md) and +[TF1 Zoo](g3doc/tf1_detection_zoo.md) are specific to the TensorFlow major +version and are not interoperable. -Please check [FAQ](g3doc/faq.md) for frequently asked questions before reporting -an issue. +Please select one of the two links below for TensorFlow version specific +documentation of the Object Detection API: -## Release information -### June 17th, 2020 + + +[![Object Detection API TensorFlow 2](https://img.shields.io/badge/Object%20Detection%20API-TensorFlow%202-orange)](g3doc/tf2.md) \ +[![Object Detection API TensorFlow 1](https://img.shields.io/badge/Object%20Detection%20API-TensorFlow%201-orange)](g3doc/tf1.md) + + + +## Whats New + +### TensorFlow 2 Support + +We are happy to announce that the TF OD API officially supports TF2! Our release +includes: + +* New binaries for train/eval/export that are designed to run in eager mode. +* A suite of TF2 compatible (Keras-based) models; this includes migrations of + our most popular TF1.x models (e.g., SSD with MobileNet, RetinaNet, + Faster R-CNN, Mask R-CNN), as well as a few new architectures for which we + will only maintain TF2 implementations: + + 1. CenterNet - a simple and effective anchor-free architecture based on + the recent [Objects as Points](https://arxiv.org/abs/1904.07850) paper by + Zhou et al. + 2. [EfficientDet](https://arxiv.org/abs/1911.09070) - a recent family of + SOTA models discovered with the help of Neural Architecture Search. + +* COCO pre-trained weights for all of the models provided as TF2 style + object-based checkpoints. +* Access to [Distribution Strategies](https://www.tensorflow.org/guide/distributed_training) + for distributed training --- our model are designed to be trainable using sync + multi-GPU and TPU platforms. +* Colabs demo’ing eager mode training and inference. + +See our release blogpost [here](https://blog.tensorflow.org/2020/07/tensorflow-2-meets-object-detection-api.html). +If you are an existing user of the TF OD API using TF 1.x, don’t worry, we’ve +got you covered. + +**Thanks to contributors**: Akhil Chinnakotla, Allen Lavoie, Anirudh Vegesana, +Anjali Sridhar, Austin Myers, Dan Kondratyuk, David Ross, Derek Chow, Jaeyoun +Kim, Jing Li, Jonathan Huang, Jordi Pont-Tuset, Karmel Allison, Kathy Ruan, +Kaushik Shivakumar, Lu He, Mingxing Tan, Pengchong Jin, Ronny Votel, Sara Beery, +Sergi Caelles Prat, Shan Yang, Sudheendra Vijayanarasimhan, Tina Tian, Tomer +Kaftan, Vighnesh Birodkar, Vishnu Banna, Vivek Rathod, Yanhui Liang, Yiming Shi, +Yixin Shi, Yu-hui Chen, Zhichao Lu. + +### Context R-CNN We have released [Context R-CNN](https://arxiv.org/abs/1912.03538), a model that uses attention to incorporate contextual information images (e.g. from temporally nearby frames taken by a static camera) in order to improve accuracy. Importantly, these contextual images need not be labeled. -* When applied to a challenging wildlife detection dataset ([Snapshot Serengeti](http://lila.science/datasets/snapshot-serengeti)), +* When applied to a challenging wildlife detection dataset + ([Snapshot Serengeti](http://lila.science/datasets/snapshot-serengeti)), Context R-CNN with context from up to a month of images outperforms a single-frame baseline by 17.9% mAP, and outperforms S3D (a 3d convolution based baseline) by 11.2% mAP. @@ -118,282 +118,48 @@ Importantly, these contextual images need not be labeled. novel camera deployment to improve performance at that camera, boosting model generalizeability. -Read about Context R-CNN on the Google AI blog [here](https://ai.googleblog.com/2020/06/leveraging-temporal-context-for-object.html). +Read about Context R-CNN on the Google AI blog +[here](https://ai.googleblog.com/2020/06/leveraging-temporal-context-for-object.html). We have provided code for generating data with associated context -[here](g3doc/context_rcnn.md), and a sample config for a Context R-CNN -model [here](samples/configs/context_rcnn_resnet101_snapshot_serengeti_sync.config). +[here](g3doc/context_rcnn.md), and a sample config for a Context R-CNN model +[here](samples/configs/context_rcnn_resnet101_snapshot_serengeti_sync.config). Snapshot Serengeti-trained Faster R-CNN and Context R-CNN models can be found in -the [model zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md#snapshot-serengeti-camera-trap-trained-models). +the +[model zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf1_detection_zoo.md#snapshot-serengeti-camera-trap-trained-models). A colab demonstrating Context R-CNN is provided [here](colab_tutorials/context_rcnn_tutorial.ipynb). Thanks to contributors: Sara Beery, Jonathan Huang, Guanhang Wu, Vivek -Rathod, Ronny Votel, Zhichao Lu, David Ross, Pietro Perona, Tanya Birch, and -the Wildlife Insights AI Team. - -### May 19th, 2020 - -We have released [MobileDets](https://arxiv.org/abs/2004.14525), a set of -high-performance models for mobile CPUs, DSPs and EdgeTPUs. - -* MobileDets outperform MobileNetV3+SSDLite by 1.7 mAP at comparable mobile - CPU inference latencies. MobileDets also outperform MobileNetV2+SSDLite by - 1.9 mAP on mobile CPUs, 3.7 mAP on EdgeTPUs and 3.4 mAP on DSPs while - running equally fast. MobileDets also offer up to 2x speedup over MnasFPN on - EdgeTPUs and DSPs. - -For each of the three hardware platforms we have released model definition, -model checkpoints trained on the COCO14 dataset and converted TFLite models in -fp32 and/or uint8. - -Thanks to contributors: Yunyang Xiong, Hanxiao Liu, Suyog Gupta, Berkin -Akin, Gabriel Bender, Pieter-Jan Kindermans, Mingxing Tan, Vikas Singh, Bo Chen, -Quoc Le, Zhichao Lu. - -### May 7th, 2020 - -We have released a mobile model with the -[MnasFPN head](https://arxiv.org/abs/1912.01106). - -* MnasFPN with MobileNet-V2 backbone is the most accurate (26.6 mAP at 183ms - on Pixel 1) mobile detection model we have released to date. With - depth-multiplier, MnasFPN with MobileNet-V2 backbone is 1.8 mAP higher than - MobileNet-V3-Large with SSDLite (23.8 mAP vs 22.0 mAP) at similar latency - (120ms) on Pixel 1. - -We have released model definition, model checkpoints trained on the COCO14 -dataset and a converted TFLite model. - -Thanks to contributors: Bo Chen, Golnaz Ghiasi, Hanxiao Liu, Tsung-Yi -Lin, Dmitry Kalenichenko, Hartwig Adam, Quoc Le, Zhichao Lu, Jonathan Huang, Hao -Xu. - -### Nov 13th, 2019 - -We have released MobileNetEdgeTPU SSDLite model. - -* SSDLite with MobileNetEdgeTPU backbone, which achieves 10% mAP higher than - MobileNetV2 SSDLite (24.3 mAP vs 22 mAP) on a Google Pixel4 at comparable - latency (6.6ms vs 6.8ms). - -Along with the model definition, we are also releasing model checkpoints trained -on the COCO dataset. - -Thanks to contributors: Yunyang Xiong, Bo Chen, Suyog Gupta, Hanxiao Liu, -Gabriel Bender, Mingxing Tan, Berkin Akin, Zhichao Lu, Quoc Le - -### Oct 15th, 2019 - -We have released two MobileNet V3 SSDLite models (presented in -[Searching for MobileNetV3](https://arxiv.org/abs/1905.02244)). - -* SSDLite with MobileNet-V3-Large backbone, which is 27% faster than Mobilenet - V2 SSDLite (119ms vs 162ms) on a Google Pixel phone CPU at the same mAP. -* SSDLite with MobileNet-V3-Small backbone, which is 37% faster than MnasNet - SSDLite reduced with depth-multiplier (43ms vs 68ms) at the same mAP. - -Along with the model definition, we are also releasing model checkpoints trained -on the COCO dataset. - -Thanks to contributors: Bo Chen, Zhichao Lu, Vivek Rathod, Jonathan Huang - -### July 1st, 2019 - -We have released an updated set of utils and an updated -[tutorial](g3doc/challenge_evaluation.md) for all three tracks of the -[Open Images Challenge 2019](https://storage.googleapis.com/openimages/web/challenge2019.html)! - -The Instance Segmentation metric for -[Open Images V5](https://storage.googleapis.com/openimages/web/index.html) and -[Challenge 2019](https://storage.googleapis.com/openimages/web/challenge2019.html) -is part of this release. Check out -[the metric description](https://storage.googleapis.com/openimages/web/evaluation.html#instance_segmentation_eval) -on the Open Images website. - -Thanks to contributors: Alina Kuznetsova, Rodrigo Benenson - -### Feb 11, 2019 - -We have released detection models trained on the Open Images Dataset V4 in our -detection model zoo, including +Rathod, Ronny Votel, Zhichao Lu, David Ross, Pietro Perona, Tanya Birch, and the +Wildlife Insights AI Team. -* Faster R-CNN detector with Inception Resnet V2 feature extractor -* SSD detector with MobileNet V2 feature extractor -* SSD detector with ResNet 101 FPN feature extractor (aka RetinaNet-101) +## Release Notes +See [notes](g3doc/release_notes.md) for all past releases. -Thanks to contributors: Alina Kuznetsova, Yinxiao Li - -### Sep 17, 2018 - -We have released Faster R-CNN detectors with ResNet-50 / ResNet-101 feature -extractors trained on the -[iNaturalist Species Detection Dataset](https://github.com/visipedia/inat_comp/blob/master/2017/README.md#bounding-boxes). -The models are trained on the training split of the iNaturalist data for 4M -iterations, they achieve 55% and 58% mean AP@.5 over 2854 classes respectively. -For more details please refer to this [paper](https://arxiv.org/abs/1707.06642). - -Thanks to contributors: Chen Sun - -### July 13, 2018 - -There are many new updates in this release, extending the functionality and -capability of the API: - -* Moving from slim-based training to - [Estimator](https://www.tensorflow.org/api_docs/python/tf/estimator/Estimator)-based - training. -* Support for [RetinaNet](https://arxiv.org/abs/1708.02002), and a - [MobileNet](https://ai.googleblog.com/2017/06/mobilenets-open-source-models-for.html) - adaptation of RetinaNet. -* A novel SSD-based architecture called the - [Pooling Pyramid Network](https://arxiv.org/abs/1807.03284) (PPN). -* Releasing several [TPU](https://cloud.google.com/tpu/)-compatible models. - These can be found in the `samples/configs/` directory with a comment in the - pipeline configuration files indicating TPU compatibility. -* Support for quantized training. -* Updated documentation for new binaries, Cloud training, and - [Tensorflow Lite](https://www.tensorflow.org/mobile/tflite/). - -See also our -[expanded announcement blogpost](https://ai.googleblog.com/2018/07/accelerated-training-and-inference-with.html) -and accompanying tutorial at the -[TensorFlow blog](https://medium.com/tensorflow/training-and-serving-a-realtime-mobile-object-detector-in-30-minutes-with-cloud-tpus-b78971cf1193). - -Thanks to contributors: Sara Robinson, Aakanksha Chowdhery, Derek Chow, -Pengchong Jin, Jonathan Huang, Vivek Rathod, Zhichao Lu, Ronny Votel - -### June 25, 2018 - -Additional evaluation tools for the -[Open Images Challenge 2018](https://storage.googleapis.com/openimages/web/challenge.html) -are out. Check out our short tutorial on data preparation and running evaluation -[here](g3doc/challenge_evaluation.md)! - -Thanks to contributors: Alina Kuznetsova - -### June 5, 2018 - -We have released the implementation of evaluation metrics for both tracks of the -[Open Images Challenge 2018](https://storage.googleapis.com/openimages/web/challenge.html) -as a part of the Object Detection API - see the -[evaluation protocols](g3doc/evaluation_protocols.md) for more details. -Additionally, we have released a tool for hierarchical labels expansion for the -Open Images Challenge: check out -[oid_hierarchical_labels_expansion.py](dataset_tools/oid_hierarchical_labels_expansion.py). - -Thanks to contributors: Alina Kuznetsova, Vittorio Ferrari, Jasper -Uijlings - -### April 30, 2018 - -We have released a Faster R-CNN detector with ResNet-101 feature extractor -trained on [AVA](https://research.google.com/ava/) v2.1. Compared with other -commonly used object detectors, it changes the action classification loss -function to per-class Sigmoid loss to handle boxes with multiple labels. The -model is trained on the training split of AVA v2.1 for 1.5M iterations, it -achieves mean AP of 11.25% over 60 classes on the validation split of AVA v2.1. -For more details please refer to this [paper](https://arxiv.org/abs/1705.08421). - -Thanks to contributors: Chen Sun, David Ross - -### April 2, 2018 - -Supercharge your mobile phones with the next generation mobile object detector! -We are adding support for MobileNet V2 with SSDLite presented in -[MobileNetV2: Inverted Residuals and Linear Bottlenecks](https://arxiv.org/abs/1801.04381). -This model is 35% faster than Mobilenet V1 SSD on a Google Pixel phone CPU -(200ms vs. 270ms) at the same accuracy. Along with the model definition, we are -also releasing a model checkpoint trained on the COCO dataset. - -Thanks to contributors: Menglong Zhu, Mark Sandler, Zhichao Lu, Vivek -Rathod, Jonathan Huang - -### February 9, 2018 - -We now support instance segmentation!! In this API update we support a number of -instance segmentation models similar to those discussed in the -[Mask R-CNN paper](https://arxiv.org/abs/1703.06870). For further details refer -to [our slides](http://presentations.cocodataset.org/Places17-GMRI.pdf) from the -2017 Coco + Places Workshop. Refer to the section on -[Running an Instance Segmentation Model](g3doc/instance_segmentation.md) for -instructions on how to configure a model that predicts masks in addition to -object bounding boxes. - -Thanks to contributors: Alireza Fathi, Zhichao Lu, Vivek Rathod, Ronny -Votel, Jonathan Huang - -### November 17, 2017 - -As a part of the Open Images V3 release we have released: - -* An implementation of the Open Images evaluation metric and the - [protocol](g3doc/evaluation_protocols.md#open-images). -* Additional tools to separate inference of detection and evaluation (see - [this tutorial](g3doc/oid_inference_and_evaluation.md)). -* A new detection model trained on the Open Images V2 data release (see - [Open Images model](g3doc/detection_model_zoo.md#open-images-models)). - -See more information on the -[Open Images website](https://github.com/openimages/dataset)! - -Thanks to contributors: Stefan Popov, Alina Kuznetsova - -### November 6, 2017 - -We have re-released faster versions of our (pre-trained) models in the -model zoo. In addition to what was -available before, we are also adding Faster R-CNN models trained on COCO with -Inception V2 and Resnet-50 feature extractors, as well as a Faster R-CNN with -Resnet-101 model trained on the KITTI dataset. - -Thanks to contributors: Jonathan Huang, Vivek Rathod, Derek Chow, Tal -Remez, Chen Sun. - -### October 31, 2017 - -We have released a new state-of-the-art model for object detection using the -Faster-RCNN with the -[NASNet-A image featurization](https://arxiv.org/abs/1707.07012). This model -achieves mAP of 43.1% on the test-dev validation dataset for COCO, improving on -the best available model in the zoo by 6% in terms of absolute mAP. - -Thanks to contributors: Barret Zoph, Vijay Vasudevan, Jonathon Shlens, -Quoc Le - -### August 11, 2017 +## Getting Help -We have released an update to the -[Android Detect demo](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/android) -which will now run models trained using the Tensorflow Object Detection API on -an Android device. By default, it currently runs a frozen SSD w/Mobilenet -detector trained on COCO, but we encourage you to try out other detection -models! +To get help with issues you may encounter using the TensorFlow Object Detection +API, create a new question on [StackOverflow](https://stackoverflow.com/) with +the tags "tensorflow" and "object-detection". -Thanks to contributors: Jonathan Huang, Andrew Harp +Please report bugs (actually broken code, not usage questions) to the +tensorflow/models GitHub +[issue tracker](https://github.com/tensorflow/models/issues), prefixing the +issue name with "object_detection". -### June 15, 2017 +Please check the [FAQ](g3doc/faq.md) for frequently asked questions before +reporting an issue. -In addition to our base Tensorflow detection model definitions, this release -includes: +## Maintainers -* A selection of trainable detection models, including: - * Single Shot Multibox Detector (SSD) with MobileNet, - * SSD with Inception V2, - * Region-Based Fully Convolutional Networks (R-FCN) with Resnet 101, - * Faster RCNN with Resnet 101, - * Faster RCNN with Inception Resnet v2 -* Frozen weights (trained on the COCO dataset) for each of the above models to - be used for out-of-the-box inference purposes. -* A [Jupyter notebook](colab_tutorials/object_detection_tutorial.ipynb) for - performing out-of-the-box inference with one of our released models -* Convenient [local training](g3doc/running_locally.md) scripts as well as - distributed training and evaluation pipelines via - [Google Cloud](g3doc/running_on_cloud.md). - -Thanks to contributors: Jonathan Huang, Vivek Rathod, Derek Chow, Chen -Sun, Menglong Zhu, Matthew Tang, Anoop Korattikara, Alireza Fathi, Ian Fischer, -Zbigniew Wojna, Yang Song, Sergio Guadarrama, Jasper Uijlings, Viacheslav -Kovalevskyi, Kevin Murphy +* Jonathan Huang ([@GitHub jch1](https://github.com/jch1)) +* Vivek Rathod ([@GitHub tombstone](https://github.com/tombstone)) +* Vighnesh Birodkar ([@GitHub vighneshbirodkar](https://github.com/vighneshbirodkar)) +* Austin Myers ([@GitHub austin-myers](https://github.com/austin-myers)) +* Zhichao Lu ([@GitHub pkulzc](https://github.com/pkulzc)) +* Ronny Votel ([@GitHub ronnyvotel](https://github.com/ronnyvotel)) +* Yu-hui Chen ([@GitHub yuhuichen1015](https://github.com/yuhuichen1015)) +* Derek Chow ([@GitHub derekjchow](https://github.com/derekjchow)) diff --git a/research/object_detection/configs/tf2/center_net_hourglass104_1024x1024_coco17_tpu-32.config b/research/object_detection/configs/tf2/center_net_hourglass104_1024x1024_coco17_tpu-32.config new file mode 100644 index 00000000000..c0a90ef44c9 --- /dev/null +++ b/research/object_detection/configs/tf2/center_net_hourglass104_1024x1024_coco17_tpu-32.config @@ -0,0 +1,129 @@ +# CenterNet meta-architecture from the "Objects as Points" [2] paper with the +# hourglass[1] backbone. +# [1]: https://arxiv.org/abs/1603.06937 +# [2]: https://arxiv.org/abs/1904.07850 +# Trained on COCO, initialized from Extremenet Detection checkpoint +# Train on TPU-32 v3 +# +# Achieves 44.6 mAP on COCO17 Val + + +model { + center_net { + num_classes: 90 + feature_extractor { + type: "hourglass_104" + bgr_ordering: true + channel_means: [104.01362025, 114.03422265, 119.9165958 ] + channel_stds: [73.6027665 , 69.89082075, 70.9150767 ] + } + image_resizer { + keep_aspect_ratio_resizer { + min_dimension: 1024 + max_dimension: 1024 + pad_to_max_dimension: true + } + } + object_detection_task { + task_loss_weight: 1.0 + offset_loss_weight: 1.0 + scale_loss_weight: 0.1 + localization_loss { + l1_localization_loss { + } + } + } + object_center_params { + object_center_loss_weight: 1.0 + min_box_overlap_iou: 0.7 + max_box_predictions: 100 + classification_loss { + penalty_reduced_logistic_focal_loss { + alpha: 2.0 + beta: 4.0 + } + } + } + } +} + +train_config: { + + batch_size: 128 + num_steps: 50000 + + data_augmentation_options { + random_horizontal_flip { + } + } + + data_augmentation_options { + random_adjust_hue { + } + } + + data_augmentation_options { + random_adjust_contrast { + } + } + + data_augmentation_options { + random_adjust_saturation { + } + } + + data_augmentation_options { + random_adjust_brightness { + } + } + + data_augmentation_options { + random_square_crop_by_scale { + scale_min: 0.6 + scale_max: 1.3 + } + } + + optimizer { + adam_optimizer: { + epsilon: 1e-7 # Match tf.keras.optimizers.Adam's default. + learning_rate: { + cosine_decay_learning_rate { + learning_rate_base: 1e-3 + total_steps: 50000 + warmup_learning_rate: 2.5e-4 + warmup_steps: 5000 + } + } + } + use_moving_average: false + } + max_number_of_boxes: 100 + unpad_groundtruth_tensors: false + + fine_tune_checkpoint_version: V2 + fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/ckpt-1" + fine_tune_checkpoint_type: "detection" +} + +train_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/train2017-?????-of-00256.tfrecord" + } +} + +eval_config: { + metrics_set: "coco_detection_metrics" + use_moving_averages: false + batch_size: 1; +} + +eval_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + shuffle: false + num_epochs: 1 + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/val2017-?????-of-00032.tfrecord" + } +} diff --git a/research/object_detection/configs/tf2/center_net_hourglass104_512x512_coco17_tpu-8.config b/research/object_detection/configs/tf2/center_net_hourglass104_512x512_coco17_tpu-8.config new file mode 100644 index 00000000000..9e38d98939b --- /dev/null +++ b/research/object_detection/configs/tf2/center_net_hourglass104_512x512_coco17_tpu-8.config @@ -0,0 +1,143 @@ +# CenterNet meta-architecture from the "Objects as Points" [2] paper with the +# hourglass[1] backbone. +# [1]: https://arxiv.org/abs/1603.06937 +# [2]: https://arxiv.org/abs/1904.07850 +# Trained on COCO, initialized from Extremenet Detection checkpoint +# Train on TPU-8 +# +# Achieves 41.9 mAP on COCO17 Val + +model { + center_net { + num_classes: 90 + feature_extractor { + type: "hourglass_104" + bgr_ordering: true + channel_means: [104.01362025, 114.03422265, 119.9165958 ] + channel_stds: [73.6027665 , 69.89082075, 70.9150767 ] + } + image_resizer { + keep_aspect_ratio_resizer { + min_dimension: 512 + max_dimension: 512 + pad_to_max_dimension: true + } + } + object_detection_task { + task_loss_weight: 1.0 + offset_loss_weight: 1.0 + scale_loss_weight: 0.1 + localization_loss { + l1_localization_loss { + } + } + } + object_center_params { + object_center_loss_weight: 1.0 + min_box_overlap_iou: 0.7 + max_box_predictions: 100 + classification_loss { + penalty_reduced_logistic_focal_loss { + alpha: 2.0 + beta: 4.0 + } + } + } + } +} + +train_config: { + + batch_size: 128 + num_steps: 140000 + + data_augmentation_options { + random_horizontal_flip { + } + } + + data_augmentation_options { + random_crop_image { + min_aspect_ratio: 0.5 + max_aspect_ratio: 1.7 + random_coef: 0.25 + } + } + + + data_augmentation_options { + random_adjust_hue { + } + } + + data_augmentation_options { + random_adjust_contrast { + } + } + + data_augmentation_options { + random_adjust_saturation { + } + } + + data_augmentation_options { + random_adjust_brightness { + } + } + + data_augmentation_options { + random_absolute_pad_image { + max_height_padding: 200 + max_width_padding: 200 + pad_color: [0, 0, 0] + } + } + + optimizer { + adam_optimizer: { + epsilon: 1e-7 # Match tf.keras.optimizers.Adam's default. + learning_rate: { + manual_step_learning_rate { + initial_learning_rate: 1e-3 + schedule { + step: 90000 + learning_rate: 1e-4 + } + schedule { + step: 120000 + learning_rate: 1e-5 + } + } + } + } + use_moving_average: false + } + max_number_of_boxes: 100 + unpad_groundtruth_tensors: false + + fine_tune_checkpoint_version: V2 + fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/ckpt-1" + fine_tune_checkpoint_type: "detection" +} + +train_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/train2017-?????-of-00256.tfrecord" + } +} + +eval_config: { + metrics_set: "coco_detection_metrics" + use_moving_averages: false + batch_size: 1; +} + +eval_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + shuffle: false + num_epochs: 1 + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/val2017-?????-of-00032.tfrecord" + } +} diff --git a/research/object_detection/configs/tf2/center_net_resnet101_v1_fpn_512x512_coco17_tpu-8.config b/research/object_detection/configs/tf2/center_net_resnet101_v1_fpn_512x512_coco17_tpu-8.config new file mode 100644 index 00000000000..2bb7f07ce5e --- /dev/null +++ b/research/object_detection/configs/tf2/center_net_resnet101_v1_fpn_512x512_coco17_tpu-8.config @@ -0,0 +1,141 @@ +# CenterNet meta-architecture from the "Objects as Points" [1] paper +# with the ResNet-v1-101 FPN backbone. +# [1]: https://arxiv.org/abs/1904.07850 + +# Train on TPU-8 +# +# Achieves 34.18 mAP on COCO17 Val + + +model { + center_net { + num_classes: 90 + feature_extractor { + type: "resnet_v2_101" + } + image_resizer { + keep_aspect_ratio_resizer { + min_dimension: 512 + max_dimension: 512 + pad_to_max_dimension: true + } + } + object_detection_task { + task_loss_weight: 1.0 + offset_loss_weight: 1.0 + scale_loss_weight: 0.1 + localization_loss { + l1_localization_loss { + } + } + } + object_center_params { + object_center_loss_weight: 1.0 + min_box_overlap_iou: 0.7 + max_box_predictions: 100 + classification_loss { + penalty_reduced_logistic_focal_loss { + alpha: 2.0 + beta: 4.0 + } + } + } + } +} + +train_config: { + + batch_size: 128 + num_steps: 140000 + + data_augmentation_options { + random_horizontal_flip { + } + } + + data_augmentation_options { + random_crop_image { + min_aspect_ratio: 0.5 + max_aspect_ratio: 1.7 + random_coef: 0.25 + } + } + + + data_augmentation_options { + random_adjust_hue { + } + } + + data_augmentation_options { + random_adjust_contrast { + } + } + + data_augmentation_options { + random_adjust_saturation { + } + } + + data_augmentation_options { + random_adjust_brightness { + } + } + + data_augmentation_options { + random_absolute_pad_image { + max_height_padding: 200 + max_width_padding: 200 + pad_color: [0, 0, 0] + } + } + + optimizer { + adam_optimizer: { + epsilon: 1e-7 # Match tf.keras.optimizers.Adam's default. + learning_rate: { + manual_step_learning_rate { + initial_learning_rate: 1e-3 + schedule { + step: 90000 + learning_rate: 1e-4 + } + schedule { + step: 120000 + learning_rate: 1e-5 + } + } + } + } + use_moving_average: false + } + max_number_of_boxes: 100 + unpad_groundtruth_tensors: false + + fine_tune_checkpoint_version: V2 + fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/weights-1" + fine_tune_checkpoint_type: "classification" +} + +train_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/train2017-?????-of-00256.tfrecord" + } +} + +eval_config: { + metrics_set: "coco_detection_metrics" + use_moving_averages: false + batch_size: 1; +} + +eval_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + shuffle: false + num_epochs: 1 + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/val2017-?????-of-00032.tfrecord" + } +} + diff --git a/research/object_detection/configs/tf2/faster_rcnn_resnet101_v1_1024x1024_coco17_tpu-8.config b/research/object_detection/configs/tf2/faster_rcnn_resnet101_v1_1024x1024_coco17_tpu-8.config new file mode 100644 index 00000000000..c38f6b9e214 --- /dev/null +++ b/research/object_detection/configs/tf2/faster_rcnn_resnet101_v1_1024x1024_coco17_tpu-8.config @@ -0,0 +1,166 @@ +# Faster R-CNN with Resnet-101 (v1), +# w/high res inputs, long training schedule +# Trained on COCO, initialized from Imagenet classification checkpoint +# +# Train on TPU-8 +# +# Achieves 37.1 mAP on COCO17 val + +model { + faster_rcnn { + num_classes: 90 + image_resizer { + fixed_shape_resizer { + width: 1024 + height: 1024 + } + } + feature_extractor { + type: 'faster_rcnn_resnet101_keras' + batch_norm_trainable: true + } + first_stage_anchor_generator { + grid_anchor_generator { + scales: [0.25, 0.5, 1.0, 2.0] + aspect_ratios: [0.5, 1.0, 2.0] + height_stride: 16 + width_stride: 16 + } + } + first_stage_box_predictor_conv_hyperparams { + op: CONV + regularizer { + l2_regularizer { + weight: 0.0 + } + } + initializer { + truncated_normal_initializer { + stddev: 0.01 + } + } + } + first_stage_nms_score_threshold: 0.0 + first_stage_nms_iou_threshold: 0.7 + first_stage_max_proposals: 300 + first_stage_localization_loss_weight: 2.0 + first_stage_objectness_loss_weight: 1.0 + initial_crop_size: 14 + maxpool_kernel_size: 2 + maxpool_stride: 2 + second_stage_box_predictor { + mask_rcnn_box_predictor { + use_dropout: false + dropout_keep_probability: 1.0 + fc_hyperparams { + op: FC + regularizer { + l2_regularizer { + weight: 0.0 + } + } + initializer { + variance_scaling_initializer { + factor: 1.0 + uniform: true + mode: FAN_AVG + } + } + } + share_box_across_classes: true + } + } + second_stage_post_processing { + batch_non_max_suppression { + score_threshold: 0.0 + iou_threshold: 0.6 + max_detections_per_class: 100 + max_total_detections: 300 + } + score_converter: SOFTMAX + } + second_stage_localization_loss_weight: 2.0 + second_stage_classification_loss_weight: 1.0 + use_static_shapes: true + use_matmul_crop_and_resize: true + clip_anchors_to_image: true + use_static_balanced_label_sampler: true + use_matmul_gather_in_matcher: true + } +} + +train_config: { + batch_size: 64 + sync_replicas: true + startup_delay_steps: 0 + replicas_to_aggregate: 8 + num_steps: 100000 + optimizer { + momentum_optimizer: { + learning_rate: { + cosine_decay_learning_rate { + learning_rate_base: .04 + total_steps: 100000 + warmup_learning_rate: .013333 + warmup_steps: 2000 + } + } + momentum_optimizer_value: 0.9 + } + use_moving_average: false + } + fine_tune_checkpoint_version: V2 + fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/resnet101.ckpt-1" + fine_tune_checkpoint_type: "classification" + data_augmentation_options { + random_horizontal_flip { + } + } + + data_augmentation_options { + random_adjust_hue { + } + } + + data_augmentation_options { + random_adjust_contrast { + } + } + + data_augmentation_options { + random_adjust_saturation { + } + } + + data_augmentation_options { + random_square_crop_by_scale { + scale_min: 0.6 + scale_max: 1.3 + } + } + max_number_of_boxes: 100 + unpad_groundtruth_tensors: false + use_bfloat16: true # works only on TPUs +} + +train_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/train2017-?????-of-00256.tfrecord" + } +} + +eval_config: { + metrics_set: "coco_detection_metrics" + use_moving_averages: false + batch_size: 1; +} + +eval_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + shuffle: false + num_epochs: 1 + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/val2017-?????-of-00032.tfrecord" + } +} diff --git a/research/object_detection/configs/tf2/faster_rcnn_resnet101_v1_640x640_coco17_tpu-8.config b/research/object_detection/configs/tf2/faster_rcnn_resnet101_v1_640x640_coco17_tpu-8.config new file mode 100644 index 00000000000..af07c7df627 --- /dev/null +++ b/research/object_detection/configs/tf2/faster_rcnn_resnet101_v1_640x640_coco17_tpu-8.config @@ -0,0 +1,145 @@ +# Faster R-CNN with Resnet-50 (v1) +# Trained on COCO, initialized from Imagenet classification checkpoint +# +# Train on TPU-8 +# +# Achieves 31.8 mAP on COCO17 val + +model { + faster_rcnn { + num_classes: 90 + image_resizer { + keep_aspect_ratio_resizer { + min_dimension: 640 + max_dimension: 640 + pad_to_max_dimension: true + } + } + feature_extractor { + type: 'faster_rcnn_resnet101_keras' + batch_norm_trainable: true + } + first_stage_anchor_generator { + grid_anchor_generator { + scales: [0.25, 0.5, 1.0, 2.0] + aspect_ratios: [0.5, 1.0, 2.0] + height_stride: 16 + width_stride: 16 + } + } + first_stage_box_predictor_conv_hyperparams { + op: CONV + regularizer { + l2_regularizer { + weight: 0.0 + } + } + initializer { + truncated_normal_initializer { + stddev: 0.01 + } + } + } + first_stage_nms_score_threshold: 0.0 + first_stage_nms_iou_threshold: 0.7 + first_stage_max_proposals: 300 + first_stage_localization_loss_weight: 2.0 + first_stage_objectness_loss_weight: 1.0 + initial_crop_size: 14 + maxpool_kernel_size: 2 + maxpool_stride: 2 + second_stage_box_predictor { + mask_rcnn_box_predictor { + use_dropout: false + dropout_keep_probability: 1.0 + fc_hyperparams { + op: FC + regularizer { + l2_regularizer { + weight: 0.0 + } + } + initializer { + variance_scaling_initializer { + factor: 1.0 + uniform: true + mode: FAN_AVG + } + } + } + share_box_across_classes: true + } + } + second_stage_post_processing { + batch_non_max_suppression { + score_threshold: 0.0 + iou_threshold: 0.6 + max_detections_per_class: 100 + max_total_detections: 300 + } + score_converter: SOFTMAX + } + second_stage_localization_loss_weight: 2.0 + second_stage_classification_loss_weight: 1.0 + use_static_shapes: true + use_matmul_crop_and_resize: true + clip_anchors_to_image: true + use_static_balanced_label_sampler: true + use_matmul_gather_in_matcher: true + } +} + +train_config: { + batch_size: 64 + sync_replicas: true + startup_delay_steps: 0 + replicas_to_aggregate: 8 + num_steps: 25000 + optimizer { + momentum_optimizer: { + learning_rate: { + cosine_decay_learning_rate { + learning_rate_base: .04 + total_steps: 25000 + warmup_learning_rate: .013333 + warmup_steps: 2000 + } + } + momentum_optimizer_value: 0.9 + } + use_moving_average: false + } + fine_tune_checkpoint_version: V2 + fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/resnet101.ckpt-1" + fine_tune_checkpoint_type: "classification" + data_augmentation_options { + random_horizontal_flip { + } + } + + max_number_of_boxes: 100 + unpad_groundtruth_tensors: false + use_bfloat16: true # works only on TPUs +} + +train_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/train2017-?????-of-00256.tfrecord" + } +} + +eval_config: { + metrics_set: "coco_detection_metrics" + use_moving_averages: false + batch_size: 1; +} + +eval_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + shuffle: false + num_epochs: 1 + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/val2017-?????-of-00032.tfrecord" + } +} diff --git a/research/object_detection/configs/tf2/faster_rcnn_resnet101_v1_800x1333_coco17_gpu-8.config b/research/object_detection/configs/tf2/faster_rcnn_resnet101_v1_800x1333_coco17_gpu-8.config new file mode 100644 index 00000000000..8eb4da02f59 --- /dev/null +++ b/research/object_detection/configs/tf2/faster_rcnn_resnet101_v1_800x1333_coco17_gpu-8.config @@ -0,0 +1,154 @@ +# Faster R-CNN with Resnet-101 (v1), +# Initialized from Imagenet classification checkpoint +# +# Train on GPU-8 +# +# Achieves 36.6 mAP on COCO17 val + +model { + faster_rcnn { + num_classes: 90 + image_resizer { + keep_aspect_ratio_resizer { + min_dimension: 800 + max_dimension: 1333 + pad_to_max_dimension: true + } + } + feature_extractor { + type: 'faster_rcnn_resnet101_keras' + } + first_stage_anchor_generator { + grid_anchor_generator { + scales: [0.25, 0.5, 1.0, 2.0] + aspect_ratios: [0.5, 1.0, 2.0] + height_stride: 16 + width_stride: 16 + } + } + first_stage_box_predictor_conv_hyperparams { + op: CONV + regularizer { + l2_regularizer { + weight: 0.0 + } + } + initializer { + truncated_normal_initializer { + stddev: 0.01 + } + } + } + first_stage_nms_score_threshold: 0.0 + first_stage_nms_iou_threshold: 0.7 + first_stage_max_proposals: 300 + first_stage_localization_loss_weight: 2.0 + first_stage_objectness_loss_weight: 1.0 + initial_crop_size: 14 + maxpool_kernel_size: 2 + maxpool_stride: 2 + second_stage_box_predictor { + mask_rcnn_box_predictor { + use_dropout: false + dropout_keep_probability: 1.0 + fc_hyperparams { + op: FC + regularizer { + l2_regularizer { + weight: 0.0 + } + } + initializer { + variance_scaling_initializer { + factor: 1.0 + uniform: true + mode: FAN_AVG + } + } + } + } + } + second_stage_post_processing { + batch_non_max_suppression { + score_threshold: 0.0 + iou_threshold: 0.6 + max_detections_per_class: 100 + max_total_detections: 100 + } + score_converter: SOFTMAX + } + second_stage_localization_loss_weight: 2.0 + second_stage_classification_loss_weight: 1.0 + } +} + +train_config: { + batch_size: 16 + num_steps: 200000 + optimizer { + momentum_optimizer: { + learning_rate: { + cosine_decay_learning_rate { + learning_rate_base: 0.01 + total_steps: 200000 + warmup_learning_rate: 0.0 + warmup_steps: 5000 + } + } + momentum_optimizer_value: 0.9 + } + use_moving_average: false + } + gradient_clipping_by_norm: 10.0 + fine_tune_checkpoint_version: V2 + fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/resnet101.ckpt-1" + fine_tune_checkpoint_type: "classification" + data_augmentation_options { + random_horizontal_flip { + } + } + + data_augmentation_options { + random_adjust_hue { + } + } + + data_augmentation_options { + random_adjust_contrast { + } + } + + data_augmentation_options { + random_adjust_saturation { + } + } + + data_augmentation_options { + random_square_crop_by_scale { + scale_min: 0.6 + scale_max: 1.3 + } + } +} + +train_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/train2017-?????-of-00256.tfrecord" + } +} + +eval_config: { + metrics_set: "coco_detection_metrics" + use_moving_averages: false + batch_size: 1; +} + +eval_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + shuffle: false + num_epochs: 1 + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/val2017-?????-of-00032.tfrecord" + } +} diff --git a/research/object_detection/configs/tf2/faster_rcnn_resnet152_v1_1024x1024_coco17_tpu-8.config b/research/object_detection/configs/tf2/faster_rcnn_resnet152_v1_1024x1024_coco17_tpu-8.config new file mode 100644 index 00000000000..034667ffe38 --- /dev/null +++ b/research/object_detection/configs/tf2/faster_rcnn_resnet152_v1_1024x1024_coco17_tpu-8.config @@ -0,0 +1,166 @@ +# Faster R-CNN with Resnet-152 (v1) +# w/high res inputs, long training schedule +# Trained on COCO, initialized from Imagenet classification checkpoint +# +# Train on TPU-8 +# +# Achieves 37.6 mAP on COCO17 val + +model { + faster_rcnn { + num_classes: 90 + image_resizer { + fixed_shape_resizer { + width: 1024 + height: 1024 + } + } + feature_extractor { + type: 'faster_rcnn_resnet152_keras' + batch_norm_trainable: true + } + first_stage_anchor_generator { + grid_anchor_generator { + scales: [0.25, 0.5, 1.0, 2.0] + aspect_ratios: [0.5, 1.0, 2.0] + height_stride: 16 + width_stride: 16 + } + } + first_stage_box_predictor_conv_hyperparams { + op: CONV + regularizer { + l2_regularizer { + weight: 0.0 + } + } + initializer { + truncated_normal_initializer { + stddev: 0.01 + } + } + } + first_stage_nms_score_threshold: 0.0 + first_stage_nms_iou_threshold: 0.7 + first_stage_max_proposals: 300 + first_stage_localization_loss_weight: 2.0 + first_stage_objectness_loss_weight: 1.0 + initial_crop_size: 14 + maxpool_kernel_size: 2 + maxpool_stride: 2 + second_stage_box_predictor { + mask_rcnn_box_predictor { + use_dropout: false + dropout_keep_probability: 1.0 + fc_hyperparams { + op: FC + regularizer { + l2_regularizer { + weight: 0.0 + } + } + initializer { + variance_scaling_initializer { + factor: 1.0 + uniform: true + mode: FAN_AVG + } + } + } + share_box_across_classes: true + } + } + second_stage_post_processing { + batch_non_max_suppression { + score_threshold: 0.0 + iou_threshold: 0.6 + max_detections_per_class: 100 + max_total_detections: 300 + } + score_converter: SOFTMAX + } + second_stage_localization_loss_weight: 2.0 + second_stage_classification_loss_weight: 1.0 + use_static_shapes: true + use_matmul_crop_and_resize: true + clip_anchors_to_image: true + use_static_balanced_label_sampler: true + use_matmul_gather_in_matcher: true + } +} + +train_config: { + batch_size: 64 + sync_replicas: true + startup_delay_steps: 0 + replicas_to_aggregate: 8 + num_steps: 100000 + optimizer { + momentum_optimizer: { + learning_rate: { + cosine_decay_learning_rate { + learning_rate_base: .04 + total_steps: 100000 + warmup_learning_rate: .013333 + warmup_steps: 2000 + } + } + momentum_optimizer_value: 0.9 + } + use_moving_average: false + } + fine_tune_checkpoint_version: V2 + fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/resnet152.ckpt-1" + fine_tune_checkpoint_type: "classification" + data_augmentation_options { + random_horizontal_flip { + } + } + + data_augmentation_options { + random_adjust_hue { + } + } + + data_augmentation_options { + random_adjust_contrast { + } + } + + data_augmentation_options { + random_adjust_saturation { + } + } + + data_augmentation_options { + random_square_crop_by_scale { + scale_min: 0.6 + scale_max: 1.3 + } + } + max_number_of_boxes: 100 + unpad_groundtruth_tensors: false + use_bfloat16: true # works only on TPUs +} + +train_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/train2017-?????-of-00256.tfrecord" + } +} + +eval_config: { + metrics_set: "coco_detection_metrics" + use_moving_averages: false + batch_size: 1; +} + +eval_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + shuffle: false + num_epochs: 1 + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/val2017-?????-of-00032.tfrecord" + } +} diff --git a/research/object_detection/configs/tf2/faster_rcnn_resnet152_v1_640x640_coco17_tpu-8.config b/research/object_detection/configs/tf2/faster_rcnn_resnet152_v1_640x640_coco17_tpu-8.config new file mode 100644 index 00000000000..525c4ac456a --- /dev/null +++ b/research/object_detection/configs/tf2/faster_rcnn_resnet152_v1_640x640_coco17_tpu-8.config @@ -0,0 +1,145 @@ +# Faster R-CNN with Resnet-152 (v1) +# Trained on COCO, initialized from Imagenet classification checkpoint +# +# Train on TPU-8 +# +# Achieves 32.4 mAP on COCO17 val + +model { + faster_rcnn { + num_classes: 90 + image_resizer { + keep_aspect_ratio_resizer { + min_dimension: 640 + max_dimension: 640 + pad_to_max_dimension: true + } + } + feature_extractor { + type: 'faster_rcnn_resnet152_keras' + batch_norm_trainable: true + } + first_stage_anchor_generator { + grid_anchor_generator { + scales: [0.25, 0.5, 1.0, 2.0] + aspect_ratios: [0.5, 1.0, 2.0] + height_stride: 16 + width_stride: 16 + } + } + first_stage_box_predictor_conv_hyperparams { + op: CONV + regularizer { + l2_regularizer { + weight: 0.0 + } + } + initializer { + truncated_normal_initializer { + stddev: 0.01 + } + } + } + first_stage_nms_score_threshold: 0.0 + first_stage_nms_iou_threshold: 0.7 + first_stage_max_proposals: 300 + first_stage_localization_loss_weight: 2.0 + first_stage_objectness_loss_weight: 1.0 + initial_crop_size: 14 + maxpool_kernel_size: 2 + maxpool_stride: 2 + second_stage_box_predictor { + mask_rcnn_box_predictor { + use_dropout: false + dropout_keep_probability: 1.0 + fc_hyperparams { + op: FC + regularizer { + l2_regularizer { + weight: 0.0 + } + } + initializer { + variance_scaling_initializer { + factor: 1.0 + uniform: true + mode: FAN_AVG + } + } + } + share_box_across_classes: true + } + } + second_stage_post_processing { + batch_non_max_suppression { + score_threshold: 0.0 + iou_threshold: 0.6 + max_detections_per_class: 100 + max_total_detections: 300 + } + score_converter: SOFTMAX + } + second_stage_localization_loss_weight: 2.0 + second_stage_classification_loss_weight: 1.0 + use_static_shapes: true + use_matmul_crop_and_resize: true + clip_anchors_to_image: true + use_static_balanced_label_sampler: true + use_matmul_gather_in_matcher: true + } +} + +train_config: { + batch_size: 64 + sync_replicas: true + startup_delay_steps: 0 + replicas_to_aggregate: 8 + num_steps: 25000 + optimizer { + momentum_optimizer: { + learning_rate: { + cosine_decay_learning_rate { + learning_rate_base: .04 + total_steps: 25000 + warmup_learning_rate: .013333 + warmup_steps: 2000 + } + } + momentum_optimizer_value: 0.9 + } + use_moving_average: false + } + fine_tune_checkpoint_version: V2 + fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/resnet152.ckpt-1" + fine_tune_checkpoint_type: "classification" + data_augmentation_options { + random_horizontal_flip { + } + } + + max_number_of_boxes: 100 + unpad_groundtruth_tensors: false + use_bfloat16: true # works only on TPUs +} + +train_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/train2017-?????-of-00256.tfrecord" + } +} + +eval_config: { + metrics_set: "coco_detection_metrics" + use_moving_averages: false + batch_size: 1; +} + +eval_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + shuffle: false + num_epochs: 1 + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/val2017-?????-of-00032.tfrecord" + } +} diff --git a/research/object_detection/configs/tf2/faster_rcnn_resnet152_v1_800x1333_coco17_gpu-8.config b/research/object_detection/configs/tf2/faster_rcnn_resnet152_v1_800x1333_coco17_gpu-8.config new file mode 100644 index 00000000000..8d1879f7b9b --- /dev/null +++ b/research/object_detection/configs/tf2/faster_rcnn_resnet152_v1_800x1333_coco17_gpu-8.config @@ -0,0 +1,154 @@ +# Faster R-CNN with Resnet-152 (v1), +# Initialized from Imagenet classification checkpoint +# +# Train on GPU-8 +# +# Achieves 37.3 mAP on COCO17 val + +model { + faster_rcnn { + num_classes: 90 + image_resizer { + keep_aspect_ratio_resizer { + min_dimension: 800 + max_dimension: 1333 + pad_to_max_dimension: true + } + } + feature_extractor { + type: 'faster_rcnn_resnet152_keras' + } + first_stage_anchor_generator { + grid_anchor_generator { + scales: [0.25, 0.5, 1.0, 2.0] + aspect_ratios: [0.5, 1.0, 2.0] + height_stride: 16 + width_stride: 16 + } + } + first_stage_box_predictor_conv_hyperparams { + op: CONV + regularizer { + l2_regularizer { + weight: 0.0 + } + } + initializer { + truncated_normal_initializer { + stddev: 0.01 + } + } + } + first_stage_nms_score_threshold: 0.0 + first_stage_nms_iou_threshold: 0.7 + first_stage_max_proposals: 300 + first_stage_localization_loss_weight: 2.0 + first_stage_objectness_loss_weight: 1.0 + initial_crop_size: 14 + maxpool_kernel_size: 2 + maxpool_stride: 2 + second_stage_box_predictor { + mask_rcnn_box_predictor { + use_dropout: false + dropout_keep_probability: 1.0 + fc_hyperparams { + op: FC + regularizer { + l2_regularizer { + weight: 0.0 + } + } + initializer { + variance_scaling_initializer { + factor: 1.0 + uniform: true + mode: FAN_AVG + } + } + } + } + } + second_stage_post_processing { + batch_non_max_suppression { + score_threshold: 0.0 + iou_threshold: 0.6 + max_detections_per_class: 100 + max_total_detections: 100 + } + score_converter: SOFTMAX + } + second_stage_localization_loss_weight: 2.0 + second_stage_classification_loss_weight: 1.0 + } +} + +train_config: { + batch_size: 16 + num_steps: 200000 + optimizer { + momentum_optimizer: { + learning_rate: { + cosine_decay_learning_rate { + learning_rate_base: 0.01 + total_steps: 200000 + warmup_learning_rate: 0.0 + warmup_steps: 5000 + } + } + momentum_optimizer_value: 0.9 + } + use_moving_average: false + } + gradient_clipping_by_norm: 10.0 + fine_tune_checkpoint_version: V2 + fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/resnet152.ckpt-1" + fine_tune_checkpoint_type: "classification" + data_augmentation_options { + random_horizontal_flip { + } + } + + data_augmentation_options { + random_adjust_hue { + } + } + + data_augmentation_options { + random_adjust_contrast { + } + } + + data_augmentation_options { + random_adjust_saturation { + } + } + + data_augmentation_options { + random_square_crop_by_scale { + scale_min: 0.6 + scale_max: 1.3 + } + } +} + +train_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/train2017-?????-of-00256.tfrecord" + } +} + +eval_config: { + metrics_set: "coco_detection_metrics" + use_moving_averages: false + batch_size: 1; +} + +eval_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + shuffle: false + num_epochs: 1 + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/val2017-?????-of-00032.tfrecord" + } +} diff --git a/research/object_detection/configs/tf2/faster_rcnn_resnet50_v1_1024x1024_coco17_tpu-8.config b/research/object_detection/configs/tf2/faster_rcnn_resnet50_v1_1024x1024_coco17_tpu-8.config new file mode 100644 index 00000000000..b6e590ee717 --- /dev/null +++ b/research/object_detection/configs/tf2/faster_rcnn_resnet50_v1_1024x1024_coco17_tpu-8.config @@ -0,0 +1,166 @@ +# Faster R-CNN with Resnet-50 (v1), +# w/high res inputs, long training schedule +# Trained on COCO, initialized from Imagenet classification checkpoint +# +# Train on TPU-8 +# +# Achieves 31.0 mAP on COCO17 val + +model { + faster_rcnn { + num_classes: 90 + image_resizer { + fixed_shape_resizer { + width: 1024 + height: 1024 + } + } + feature_extractor { + type: 'faster_rcnn_resnet50_keras' + batch_norm_trainable: true + } + first_stage_anchor_generator { + grid_anchor_generator { + scales: [0.25, 0.5, 1.0, 2.0] + aspect_ratios: [0.5, 1.0, 2.0] + height_stride: 16 + width_stride: 16 + } + } + first_stage_box_predictor_conv_hyperparams { + op: CONV + regularizer { + l2_regularizer { + weight: 0.0 + } + } + initializer { + truncated_normal_initializer { + stddev: 0.01 + } + } + } + first_stage_nms_score_threshold: 0.0 + first_stage_nms_iou_threshold: 0.7 + first_stage_max_proposals: 300 + first_stage_localization_loss_weight: 2.0 + first_stage_objectness_loss_weight: 1.0 + initial_crop_size: 14 + maxpool_kernel_size: 2 + maxpool_stride: 2 + second_stage_box_predictor { + mask_rcnn_box_predictor { + use_dropout: false + dropout_keep_probability: 1.0 + fc_hyperparams { + op: FC + regularizer { + l2_regularizer { + weight: 0.0 + } + } + initializer { + variance_scaling_initializer { + factor: 1.0 + uniform: true + mode: FAN_AVG + } + } + } + share_box_across_classes: true + } + } + second_stage_post_processing { + batch_non_max_suppression { + score_threshold: 0.0 + iou_threshold: 0.6 + max_detections_per_class: 100 + max_total_detections: 300 + } + score_converter: SOFTMAX + } + second_stage_localization_loss_weight: 2.0 + second_stage_classification_loss_weight: 1.0 + use_static_shapes: true + use_matmul_crop_and_resize: true + clip_anchors_to_image: true + use_static_balanced_label_sampler: true + use_matmul_gather_in_matcher: true + } +} + +train_config: { + batch_size: 64 + sync_replicas: true + startup_delay_steps: 0 + replicas_to_aggregate: 8 + num_steps: 100000 + optimizer { + momentum_optimizer: { + learning_rate: { + cosine_decay_learning_rate { + learning_rate_base: .04 + total_steps: 100000 + warmup_learning_rate: .013333 + warmup_steps: 2000 + } + } + momentum_optimizer_value: 0.9 + } + use_moving_average: false + } + fine_tune_checkpoint_version: V2 + fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/resnet50.ckpt-1" + fine_tune_checkpoint_type: "classification" + data_augmentation_options { + random_horizontal_flip { + } + } + + data_augmentation_options { + random_adjust_hue { + } + } + + data_augmentation_options { + random_adjust_contrast { + } + } + + data_augmentation_options { + random_adjust_saturation { + } + } + + data_augmentation_options { + random_square_crop_by_scale { + scale_min: 0.6 + scale_max: 1.3 + } + } + max_number_of_boxes: 100 + unpad_groundtruth_tensors: false + use_bfloat16: true # works only on TPUs +} + +train_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/train2017-?????-of-00256.tfrecord" + } +} + +eval_config: { + metrics_set: "coco_detection_metrics" + use_moving_averages: false + batch_size: 1; +} + +eval_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + shuffle: false + num_epochs: 1 + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/val2017-?????-of-00032.tfrecord" + } +} diff --git a/research/object_detection/configs/tf2/faster_rcnn_resnet50_v1_640x640_coco17_tpu-8.config b/research/object_detection/configs/tf2/faster_rcnn_resnet50_v1_640x640_coco17_tpu-8.config new file mode 100644 index 00000000000..c8601c6fed1 --- /dev/null +++ b/research/object_detection/configs/tf2/faster_rcnn_resnet50_v1_640x640_coco17_tpu-8.config @@ -0,0 +1,145 @@ +# Faster R-CNN with Resnet-50 (v1) with 640x640 input resolution +# Trained on COCO, initialized from Imagenet classification checkpoint +# +# Train on TPU-8 +# +# Achieves 29.3 mAP on COCO17 Val + +model { + faster_rcnn { + num_classes: 90 + image_resizer { + keep_aspect_ratio_resizer { + min_dimension: 640 + max_dimension: 640 + pad_to_max_dimension: true + } + } + feature_extractor { + type: 'faster_rcnn_resnet50_keras' + batch_norm_trainable: true + } + first_stage_anchor_generator { + grid_anchor_generator { + scales: [0.25, 0.5, 1.0, 2.0] + aspect_ratios: [0.5, 1.0, 2.0] + height_stride: 16 + width_stride: 16 + } + } + first_stage_box_predictor_conv_hyperparams { + op: CONV + regularizer { + l2_regularizer { + weight: 0.0 + } + } + initializer { + truncated_normal_initializer { + stddev: 0.01 + } + } + } + first_stage_nms_score_threshold: 0.0 + first_stage_nms_iou_threshold: 0.7 + first_stage_max_proposals: 300 + first_stage_localization_loss_weight: 2.0 + first_stage_objectness_loss_weight: 1.0 + initial_crop_size: 14 + maxpool_kernel_size: 2 + maxpool_stride: 2 + second_stage_box_predictor { + mask_rcnn_box_predictor { + use_dropout: false + dropout_keep_probability: 1.0 + fc_hyperparams { + op: FC + regularizer { + l2_regularizer { + weight: 0.0 + } + } + initializer { + variance_scaling_initializer { + factor: 1.0 + uniform: true + mode: FAN_AVG + } + } + } + share_box_across_classes: true + } + } + second_stage_post_processing { + batch_non_max_suppression { + score_threshold: 0.0 + iou_threshold: 0.6 + max_detections_per_class: 100 + max_total_detections: 300 + } + score_converter: SOFTMAX + } + second_stage_localization_loss_weight: 2.0 + second_stage_classification_loss_weight: 1.0 + use_static_shapes: true + use_matmul_crop_and_resize: true + clip_anchors_to_image: true + use_static_balanced_label_sampler: true + use_matmul_gather_in_matcher: true + } +} + +train_config: { + batch_size: 64 + sync_replicas: true + startup_delay_steps: 0 + replicas_to_aggregate: 8 + num_steps: 25000 + optimizer { + momentum_optimizer: { + learning_rate: { + cosine_decay_learning_rate { + learning_rate_base: .04 + total_steps: 25000 + warmup_learning_rate: .013333 + warmup_steps: 2000 + } + } + momentum_optimizer_value: 0.9 + } + use_moving_average: false + } + fine_tune_checkpoint_version: V2 + fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/resnet50.ckpt-1" + fine_tune_checkpoint_type: "classification" + data_augmentation_options { + random_horizontal_flip { + } + } + + max_number_of_boxes: 100 + unpad_groundtruth_tensors: false + use_bfloat16: true # works only on TPUs +} + +train_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/train2017-?????-of-00256.tfrecord" + } +} + +eval_config: { + metrics_set: "coco_detection_metrics" + use_moving_averages: false + batch_size: 1; +} + +eval_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + shuffle: false + num_epochs: 1 + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/val2017-?????-of-00032.tfrecord" + } +} diff --git a/research/object_detection/configs/tf2/faster_rcnn_resnet50_v1_800x1333_coco17_gpu-8.config b/research/object_detection/configs/tf2/faster_rcnn_resnet50_v1_800x1333_coco17_gpu-8.config new file mode 100644 index 00000000000..264be5f0b79 --- /dev/null +++ b/research/object_detection/configs/tf2/faster_rcnn_resnet50_v1_800x1333_coco17_gpu-8.config @@ -0,0 +1,154 @@ +# Faster R-CNN with Resnet-50 (v1), +# Initialized from Imagenet classification checkpoint +# +# Train on GPU-8 +# +# Achieves 31.4 mAP on COCO17 val + +model { + faster_rcnn { + num_classes: 90 + image_resizer { + keep_aspect_ratio_resizer { + min_dimension: 800 + max_dimension: 1333 + pad_to_max_dimension: true + } + } + feature_extractor { + type: 'faster_rcnn_resnet50_keras' + } + first_stage_anchor_generator { + grid_anchor_generator { + scales: [0.25, 0.5, 1.0, 2.0] + aspect_ratios: [0.5, 1.0, 2.0] + height_stride: 16 + width_stride: 16 + } + } + first_stage_box_predictor_conv_hyperparams { + op: CONV + regularizer { + l2_regularizer { + weight: 0.0 + } + } + initializer { + truncated_normal_initializer { + stddev: 0.01 + } + } + } + first_stage_nms_score_threshold: 0.0 + first_stage_nms_iou_threshold: 0.7 + first_stage_max_proposals: 300 + first_stage_localization_loss_weight: 2.0 + first_stage_objectness_loss_weight: 1.0 + initial_crop_size: 14 + maxpool_kernel_size: 2 + maxpool_stride: 2 + second_stage_box_predictor { + mask_rcnn_box_predictor { + use_dropout: false + dropout_keep_probability: 1.0 + fc_hyperparams { + op: FC + regularizer { + l2_regularizer { + weight: 0.0 + } + } + initializer { + variance_scaling_initializer { + factor: 1.0 + uniform: true + mode: FAN_AVG + } + } + } + } + } + second_stage_post_processing { + batch_non_max_suppression { + score_threshold: 0.0 + iou_threshold: 0.6 + max_detections_per_class: 100 + max_total_detections: 100 + } + score_converter: SOFTMAX + } + second_stage_localization_loss_weight: 2.0 + second_stage_classification_loss_weight: 1.0 + } +} + +train_config: { + batch_size: 16 + num_steps: 200000 + optimizer { + momentum_optimizer: { + learning_rate: { + cosine_decay_learning_rate { + learning_rate_base: 0.01 + total_steps: 200000 + warmup_learning_rate: 0.0 + warmup_steps: 5000 + } + } + momentum_optimizer_value: 0.9 + } + use_moving_average: false + } + gradient_clipping_by_norm: 10.0 + fine_tune_checkpoint_version: V2 + fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/resnet50.ckpt-1" + fine_tune_checkpoint_type: "classification" + data_augmentation_options { + random_horizontal_flip { + } + } + + data_augmentation_options { + random_adjust_hue { + } + } + + data_augmentation_options { + random_adjust_contrast { + } + } + + data_augmentation_options { + random_adjust_saturation { + } + } + + data_augmentation_options { + random_square_crop_by_scale { + scale_min: 0.6 + scale_max: 1.3 + } + } +} + +train_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/train2017-?????-of-00256.tfrecord" + } +} + +eval_config: { + metrics_set: "coco_detection_metrics" + use_moving_averages: false + batch_size: 1; +} + +eval_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + shuffle: false + num_epochs: 1 + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/val2017-?????-of-00032.tfrecord" + } +} diff --git a/research/object_detection/configs/tf2/mask_rcnn_inception_resnet_v2_1024x1024_coco17_gpu-8.config b/research/object_detection/configs/tf2/mask_rcnn_inception_resnet_v2_1024x1024_coco17_gpu-8.config new file mode 100644 index 00000000000..974c1d1710b --- /dev/null +++ b/research/object_detection/configs/tf2/mask_rcnn_inception_resnet_v2_1024x1024_coco17_gpu-8.config @@ -0,0 +1,160 @@ +# Mask R-CNN with Inception Resnet v2 (no atrous) +# Sync-trained on COCO (with 8 GPUs) with batch size 16 (1024x1024 resolution) +# Initialized from Imagenet classification checkpoint +# +# Train on GPU-8 +# +# Achieves 40.4 box mAP and 35.5 mask mAP on COCO17 val + +model { + faster_rcnn { + number_of_stages: 3 + num_classes: 90 + image_resizer { + fixed_shape_resizer { + height: 1024 + width: 1024 + } + } + feature_extractor { + type: 'faster_rcnn_inception_resnet_v2_keras' + } + first_stage_anchor_generator { + grid_anchor_generator { + scales: [0.25, 0.5, 1.0, 2.0] + aspect_ratios: [0.5, 1.0, 2.0] + height_stride: 16 + width_stride: 16 + } + } + first_stage_box_predictor_conv_hyperparams { + op: CONV + regularizer { + l2_regularizer { + weight: 0.0 + } + } + initializer { + truncated_normal_initializer { + stddev: 0.01 + } + } + } + first_stage_nms_score_threshold: 0.0 + first_stage_nms_iou_threshold: 0.7 + first_stage_max_proposals: 300 + first_stage_localization_loss_weight: 2.0 + first_stage_objectness_loss_weight: 1.0 + initial_crop_size: 17 + maxpool_kernel_size: 1 + maxpool_stride: 1 + second_stage_box_predictor { + mask_rcnn_box_predictor { + use_dropout: false + dropout_keep_probability: 1.0 + fc_hyperparams { + op: FC + regularizer { + l2_regularizer { + weight: 0.0 + } + } + initializer { + variance_scaling_initializer { + factor: 1.0 + uniform: true + mode: FAN_AVG + } + } + } + mask_height: 33 + mask_width: 33 + mask_prediction_conv_depth: 0 + mask_prediction_num_conv_layers: 4 + conv_hyperparams { + op: CONV + regularizer { + l2_regularizer { + weight: 0.0 + } + } + initializer { + truncated_normal_initializer { + stddev: 0.01 + } + } + } + predict_instance_masks: true + } + } + second_stage_post_processing { + batch_non_max_suppression { + score_threshold: 0.0 + iou_threshold: 0.6 + max_detections_per_class: 100 + max_total_detections: 100 + } + score_converter: SOFTMAX + } + second_stage_localization_loss_weight: 2.0 + second_stage_classification_loss_weight: 1.0 + second_stage_mask_prediction_loss_weight: 4.0 + resize_masks: false + } +} + +train_config: { + batch_size: 16 + num_steps: 200000 + optimizer { + momentum_optimizer: { + learning_rate: { + cosine_decay_learning_rate { + learning_rate_base: 0.008 + total_steps: 200000 + warmup_learning_rate: 0.0 + warmup_steps: 5000 + } + } + momentum_optimizer_value: 0.9 + } + use_moving_average: false + } + gradient_clipping_by_norm: 10.0 + fine_tune_checkpoint_version: V2 + fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/inception_resnet_v2.ckpt-1" + fine_tune_checkpoint_type: "classification" + data_augmentation_options { + random_horizontal_flip { + } + } +} + +train_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/train2017-?????-of-00256.tfrecord" + } + load_instance_masks: true + mask_type: PNG_MASKS +} + +eval_config: { + metrics_set: "coco_detection_metrics" + metrics_set: "coco_mask_metrics" + eval_instance_masks: true + use_moving_averages: false + batch_size: 1 + include_metrics_per_category: true +} + +eval_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + shuffle: false + num_epochs: 1 + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/val2017-?????-of-00032.tfrecord" + } + load_instance_masks: true + mask_type: PNG_MASKS +} diff --git a/research/object_detection/configs/tf2/ssd_efficientdet_d0_512x512_coco17_tpu-8.config b/research/object_detection/configs/tf2/ssd_efficientdet_d0_512x512_coco17_tpu-8.config new file mode 100644 index 00000000000..ffcd461f77f --- /dev/null +++ b/research/object_detection/configs/tf2/ssd_efficientdet_d0_512x512_coco17_tpu-8.config @@ -0,0 +1,199 @@ + # SSD with EfficientNet-b0 + BiFPN feature extractor, +# shared box predictor and focal loss (a.k.a EfficientDet-d0). +# See EfficientDet, Tan et al, https://arxiv.org/abs/1911.09070 +# See Lin et al, https://arxiv.org/abs/1708.02002 +# Trained on COCO, initialized from an EfficientNet-b0 checkpoint. +# +# Train on TPU-8 + +model { + ssd { + inplace_batchnorm_update: true + freeze_batchnorm: false + num_classes: 90 + add_background_class: false + box_coder { + faster_rcnn_box_coder { + y_scale: 10.0 + x_scale: 10.0 + height_scale: 5.0 + width_scale: 5.0 + } + } + matcher { + argmax_matcher { + matched_threshold: 0.5 + unmatched_threshold: 0.5 + ignore_thresholds: false + negatives_lower_than_unmatched: true + force_match_for_each_row: true + use_matmul_gather: true + } + } + similarity_calculator { + iou_similarity { + } + } + encode_background_as_zeros: true + anchor_generator { + multiscale_anchor_generator { + min_level: 3 + max_level: 7 + anchor_scale: 4.0 + aspect_ratios: [1.0, 2.0, 0.5] + scales_per_octave: 3 + } + } + image_resizer { + keep_aspect_ratio_resizer { + min_dimension: 512 + max_dimension: 512 + pad_to_max_dimension: true + } + } + box_predictor { + weight_shared_convolutional_box_predictor { + depth: 64 + class_prediction_bias_init: -4.6 + conv_hyperparams { + force_use_bias: true + activation: SWISH + regularizer { + l2_regularizer { + weight: 0.00004 + } + } + initializer { + random_normal_initializer { + stddev: 0.01 + mean: 0.0 + } + } + batch_norm { + scale: true + decay: 0.99 + epsilon: 0.001 + } + } + num_layers_before_predictor: 3 + kernel_size: 3 + use_depthwise: true + } + } + feature_extractor { + type: 'ssd_efficientnet-b0_bifpn_keras' + bifpn { + min_level: 3 + max_level: 7 + num_iterations: 3 + num_filters: 64 + } + conv_hyperparams { + force_use_bias: true + activation: SWISH + regularizer { + l2_regularizer { + weight: 0.00004 + } + } + initializer { + truncated_normal_initializer { + stddev: 0.03 + mean: 0.0 + } + } + batch_norm { + scale: true, + decay: 0.99, + epsilon: 0.001, + } + } + } + loss { + classification_loss { + weighted_sigmoid_focal { + alpha: 0.25 + gamma: 1.5 + } + } + localization_loss { + weighted_smooth_l1 { + } + } + classification_weight: 1.0 + localization_weight: 1.0 + } + normalize_loss_by_num_matches: true + normalize_loc_loss_by_codesize: true + post_processing { + batch_non_max_suppression { + score_threshold: 1e-8 + iou_threshold: 0.5 + max_detections_per_class: 100 + max_total_detections: 100 + } + score_converter: SIGMOID + } + } +} + +train_config: { + fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/ckpt-0" + fine_tune_checkpoint_version: V2 + fine_tune_checkpoint_type: "classification" + batch_size: 128 + sync_replicas: true + startup_delay_steps: 0 + replicas_to_aggregate: 8 + use_bfloat16: true + num_steps: 300000 + data_augmentation_options { + random_horizontal_flip { + } + } + data_augmentation_options { + random_scale_crop_and_pad_to_square { + output_size: 512 + scale_min: 0.1 + scale_max: 2.0 + } + } + optimizer { + momentum_optimizer: { + learning_rate: { + cosine_decay_learning_rate { + learning_rate_base: 8e-2 + total_steps: 300000 + warmup_learning_rate: .001 + warmup_steps: 2500 + } + } + momentum_optimizer_value: 0.9 + } + use_moving_average: false + } + max_number_of_boxes: 100 + unpad_groundtruth_tensors: false +} + +train_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/train2017-?????-of-00256.tfrecord" + } +} + +eval_config: { + metrics_set: "coco_detection_metrics" + use_moving_averages: false + batch_size: 1; +} + +eval_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + shuffle: false + num_epochs: 1 + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/val2017-?????-of-00032.tfrecord" + } +} diff --git a/research/object_detection/configs/tf2/ssd_efficientdet_d1_640x640_coco17_tpu-8.config b/research/object_detection/configs/tf2/ssd_efficientdet_d1_640x640_coco17_tpu-8.config new file mode 100644 index 00000000000..5eacfeda854 --- /dev/null +++ b/research/object_detection/configs/tf2/ssd_efficientdet_d1_640x640_coco17_tpu-8.config @@ -0,0 +1,199 @@ + # SSD with EfficientNet-b1 + BiFPN feature extractor, +# shared box predictor and focal loss (a.k.a EfficientDet-d1). +# See EfficientDet, Tan et al, https://arxiv.org/abs/1911.09070 +# See Lin et al, https://arxiv.org/abs/1708.02002 +# Trained on COCO, initialized from an EfficientNet-b1 checkpoint. +# +# Train on TPU-8 + +model { + ssd { + inplace_batchnorm_update: true + freeze_batchnorm: false + num_classes: 90 + add_background_class: false + box_coder { + faster_rcnn_box_coder { + y_scale: 10.0 + x_scale: 10.0 + height_scale: 5.0 + width_scale: 5.0 + } + } + matcher { + argmax_matcher { + matched_threshold: 0.5 + unmatched_threshold: 0.5 + ignore_thresholds: false + negatives_lower_than_unmatched: true + force_match_for_each_row: true + use_matmul_gather: true + } + } + similarity_calculator { + iou_similarity { + } + } + encode_background_as_zeros: true + anchor_generator { + multiscale_anchor_generator { + min_level: 3 + max_level: 7 + anchor_scale: 4.0 + aspect_ratios: [1.0, 2.0, 0.5] + scales_per_octave: 3 + } + } + image_resizer { + keep_aspect_ratio_resizer { + min_dimension: 640 + max_dimension: 640 + pad_to_max_dimension: true + } + } + box_predictor { + weight_shared_convolutional_box_predictor { + depth: 88 + class_prediction_bias_init: -4.6 + conv_hyperparams { + force_use_bias: true + activation: SWISH + regularizer { + l2_regularizer { + weight: 0.00004 + } + } + initializer { + random_normal_initializer { + stddev: 0.01 + mean: 0.0 + } + } + batch_norm { + scale: true + decay: 0.99 + epsilon: 0.001 + } + } + num_layers_before_predictor: 3 + kernel_size: 3 + use_depthwise: true + } + } + feature_extractor { + type: 'ssd_efficientnet-b1_bifpn_keras' + bifpn { + min_level: 3 + max_level: 7 + num_iterations: 4 + num_filters: 88 + } + conv_hyperparams { + force_use_bias: true + activation: SWISH + regularizer { + l2_regularizer { + weight: 0.00004 + } + } + initializer { + truncated_normal_initializer { + stddev: 0.03 + mean: 0.0 + } + } + batch_norm { + scale: true, + decay: 0.99, + epsilon: 0.001, + } + } + } + loss { + classification_loss { + weighted_sigmoid_focal { + alpha: 0.25 + gamma: 1.5 + } + } + localization_loss { + weighted_smooth_l1 { + } + } + classification_weight: 1.0 + localization_weight: 1.0 + } + normalize_loss_by_num_matches: true + normalize_loc_loss_by_codesize: true + post_processing { + batch_non_max_suppression { + score_threshold: 1e-8 + iou_threshold: 0.5 + max_detections_per_class: 100 + max_total_detections: 100 + } + score_converter: SIGMOID + } + } +} + +train_config: { + fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/ckpt-0" + fine_tune_checkpoint_version: V2 + fine_tune_checkpoint_type: "classification" + batch_size: 128 + sync_replicas: true + startup_delay_steps: 0 + replicas_to_aggregate: 8 + use_bfloat16: true + num_steps: 300000 + data_augmentation_options { + random_horizontal_flip { + } + } + data_augmentation_options { + random_scale_crop_and_pad_to_square { + output_size: 640 + scale_min: 0.1 + scale_max: 2.0 + } + } + optimizer { + momentum_optimizer: { + learning_rate: { + cosine_decay_learning_rate { + learning_rate_base: 8e-2 + total_steps: 300000 + warmup_learning_rate: .001 + warmup_steps: 2500 + } + } + momentum_optimizer_value: 0.9 + } + use_moving_average: false + } + max_number_of_boxes: 100 + unpad_groundtruth_tensors: false +} + +train_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/train2017-?????-of-00256.tfrecord" + } +} + +eval_config: { + metrics_set: "coco_detection_metrics" + use_moving_averages: false + batch_size: 1; +} + +eval_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + shuffle: false + num_epochs: 1 + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/val2017-?????-of-00032.tfrecord" + } +} diff --git a/research/object_detection/configs/tf2/ssd_efficientdet_d2_768x768_coco17_tpu-8.config b/research/object_detection/configs/tf2/ssd_efficientdet_d2_768x768_coco17_tpu-8.config new file mode 100644 index 00000000000..d2ca75d468c --- /dev/null +++ b/research/object_detection/configs/tf2/ssd_efficientdet_d2_768x768_coco17_tpu-8.config @@ -0,0 +1,199 @@ + # SSD with EfficientNet-b2 + BiFPN feature extractor, +# shared box predictor and focal loss (a.k.a EfficientDet-d2). +# See EfficientDet, Tan et al, https://arxiv.org/abs/1911.09070 +# See Lin et al, https://arxiv.org/abs/1708.02002 +# Trained on COCO, initialized from an EfficientNet-b2 checkpoint. +# +# Train on TPU-8 + +model { + ssd { + inplace_batchnorm_update: true + freeze_batchnorm: false + num_classes: 90 + add_background_class: false + box_coder { + faster_rcnn_box_coder { + y_scale: 10.0 + x_scale: 10.0 + height_scale: 5.0 + width_scale: 5.0 + } + } + matcher { + argmax_matcher { + matched_threshold: 0.5 + unmatched_threshold: 0.5 + ignore_thresholds: false + negatives_lower_than_unmatched: true + force_match_for_each_row: true + use_matmul_gather: true + } + } + similarity_calculator { + iou_similarity { + } + } + encode_background_as_zeros: true + anchor_generator { + multiscale_anchor_generator { + min_level: 3 + max_level: 7 + anchor_scale: 4.0 + aspect_ratios: [1.0, 2.0, 0.5] + scales_per_octave: 3 + } + } + image_resizer { + keep_aspect_ratio_resizer { + min_dimension: 768 + max_dimension: 768 + pad_to_max_dimension: true + } + } + box_predictor { + weight_shared_convolutional_box_predictor { + depth: 112 + class_prediction_bias_init: -4.6 + conv_hyperparams { + force_use_bias: true + activation: SWISH + regularizer { + l2_regularizer { + weight: 0.00004 + } + } + initializer { + random_normal_initializer { + stddev: 0.01 + mean: 0.0 + } + } + batch_norm { + scale: true + decay: 0.99 + epsilon: 0.001 + } + } + num_layers_before_predictor: 3 + kernel_size: 3 + use_depthwise: true + } + } + feature_extractor { + type: 'ssd_efficientnet-b2_bifpn_keras' + bifpn { + min_level: 3 + max_level: 7 + num_iterations: 5 + num_filters: 112 + } + conv_hyperparams { + force_use_bias: true + activation: SWISH + regularizer { + l2_regularizer { + weight: 0.00004 + } + } + initializer { + truncated_normal_initializer { + stddev: 0.03 + mean: 0.0 + } + } + batch_norm { + scale: true, + decay: 0.99, + epsilon: 0.001, + } + } + } + loss { + classification_loss { + weighted_sigmoid_focal { + alpha: 0.25 + gamma: 1.5 + } + } + localization_loss { + weighted_smooth_l1 { + } + } + classification_weight: 1.0 + localization_weight: 1.0 + } + normalize_loss_by_num_matches: true + normalize_loc_loss_by_codesize: true + post_processing { + batch_non_max_suppression { + score_threshold: 1e-8 + iou_threshold: 0.5 + max_detections_per_class: 100 + max_total_detections: 100 + } + score_converter: SIGMOID + } + } +} + +train_config: { + fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/ckpt-0" + fine_tune_checkpoint_version: V2 + fine_tune_checkpoint_type: "classification" + batch_size: 128 + sync_replicas: true + startup_delay_steps: 0 + replicas_to_aggregate: 8 + use_bfloat16: true + num_steps: 300000 + data_augmentation_options { + random_horizontal_flip { + } + } + data_augmentation_options { + random_scale_crop_and_pad_to_square { + output_size: 768 + scale_min: 0.1 + scale_max: 2.0 + } + } + optimizer { + momentum_optimizer: { + learning_rate: { + cosine_decay_learning_rate { + learning_rate_base: 8e-2 + total_steps: 300000 + warmup_learning_rate: .001 + warmup_steps: 2500 + } + } + momentum_optimizer_value: 0.9 + } + use_moving_average: false + } + max_number_of_boxes: 100 + unpad_groundtruth_tensors: false +} + +train_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/train2017-?????-of-00256.tfrecord" + } +} + +eval_config: { + metrics_set: "coco_detection_metrics" + use_moving_averages: false + batch_size: 1; +} + +eval_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + shuffle: false + num_epochs: 1 + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/val2017-?????-of-00032.tfrecord" + } +} diff --git a/research/object_detection/configs/tf2/ssd_efficientdet_d3_896x896_coco17_tpu-32.config b/research/object_detection/configs/tf2/ssd_efficientdet_d3_896x896_coco17_tpu-32.config new file mode 100644 index 00000000000..b072d13a89f --- /dev/null +++ b/research/object_detection/configs/tf2/ssd_efficientdet_d3_896x896_coco17_tpu-32.config @@ -0,0 +1,199 @@ + # SSD with EfficientNet-b3 + BiFPN feature extractor, +# shared box predictor and focal loss (a.k.a EfficientDet-d3). +# See EfficientDet, Tan et al, https://arxiv.org/abs/1911.09070 +# See Lin et al, https://arxiv.org/abs/1708.02002 +# Trained on COCO, initialized from an EfficientNet-b3 checkpoint. +# +# Train on TPU-32 + +model { + ssd { + inplace_batchnorm_update: true + freeze_batchnorm: false + num_classes: 90 + add_background_class: false + box_coder { + faster_rcnn_box_coder { + y_scale: 10.0 + x_scale: 10.0 + height_scale: 5.0 + width_scale: 5.0 + } + } + matcher { + argmax_matcher { + matched_threshold: 0.5 + unmatched_threshold: 0.5 + ignore_thresholds: false + negatives_lower_than_unmatched: true + force_match_for_each_row: true + use_matmul_gather: true + } + } + similarity_calculator { + iou_similarity { + } + } + encode_background_as_zeros: true + anchor_generator { + multiscale_anchor_generator { + min_level: 3 + max_level: 7 + anchor_scale: 4.0 + aspect_ratios: [1.0, 2.0, 0.5] + scales_per_octave: 3 + } + } + image_resizer { + keep_aspect_ratio_resizer { + min_dimension: 896 + max_dimension: 896 + pad_to_max_dimension: true + } + } + box_predictor { + weight_shared_convolutional_box_predictor { + depth: 160 + class_prediction_bias_init: -4.6 + conv_hyperparams { + force_use_bias: true + activation: SWISH + regularizer { + l2_regularizer { + weight: 0.00004 + } + } + initializer { + random_normal_initializer { + stddev: 0.01 + mean: 0.0 + } + } + batch_norm { + scale: true + decay: 0.99 + epsilon: 0.001 + } + } + num_layers_before_predictor: 4 + kernel_size: 3 + use_depthwise: true + } + } + feature_extractor { + type: 'ssd_efficientnet-b3_bifpn_keras' + bifpn { + min_level: 3 + max_level: 7 + num_iterations: 6 + num_filters: 160 + } + conv_hyperparams { + force_use_bias: true + activation: SWISH + regularizer { + l2_regularizer { + weight: 0.00004 + } + } + initializer { + truncated_normal_initializer { + stddev: 0.03 + mean: 0.0 + } + } + batch_norm { + scale: true, + decay: 0.99, + epsilon: 0.001, + } + } + } + loss { + classification_loss { + weighted_sigmoid_focal { + alpha: 0.25 + gamma: 1.5 + } + } + localization_loss { + weighted_smooth_l1 { + } + } + classification_weight: 1.0 + localization_weight: 1.0 + } + normalize_loss_by_num_matches: true + normalize_loc_loss_by_codesize: true + post_processing { + batch_non_max_suppression { + score_threshold: 1e-8 + iou_threshold: 0.5 + max_detections_per_class: 100 + max_total_detections: 100 + } + score_converter: SIGMOID + } + } +} + +train_config: { + fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/ckpt-0" + fine_tune_checkpoint_version: V2 + fine_tune_checkpoint_type: "classification" + batch_size: 128 + sync_replicas: true + startup_delay_steps: 0 + replicas_to_aggregate: 8 + use_bfloat16: true + num_steps: 300000 + data_augmentation_options { + random_horizontal_flip { + } + } + data_augmentation_options { + random_scale_crop_and_pad_to_square { + output_size: 896 + scale_min: 0.1 + scale_max: 2.0 + } + } + optimizer { + momentum_optimizer: { + learning_rate: { + cosine_decay_learning_rate { + learning_rate_base: 8e-2 + total_steps: 300000 + warmup_learning_rate: .001 + warmup_steps: 2500 + } + } + momentum_optimizer_value: 0.9 + } + use_moving_average: false + } + max_number_of_boxes: 100 + unpad_groundtruth_tensors: false +} + +train_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/train2017-?????-of-00256.tfrecord" + } +} + +eval_config: { + metrics_set: "coco_detection_metrics" + use_moving_averages: false + batch_size: 1; +} + +eval_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + shuffle: false + num_epochs: 1 + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/val2017-?????-of-00032.tfrecord" + } +} diff --git a/research/object_detection/configs/tf2/ssd_efficientdet_d4_1024x1024_coco17_tpu-32.config b/research/object_detection/configs/tf2/ssd_efficientdet_d4_1024x1024_coco17_tpu-32.config new file mode 100644 index 00000000000..b13b2d46974 --- /dev/null +++ b/research/object_detection/configs/tf2/ssd_efficientdet_d4_1024x1024_coco17_tpu-32.config @@ -0,0 +1,199 @@ + # SSD with EfficientNet-b4 + BiFPN feature extractor, +# shared box predictor and focal loss (a.k.a EfficientDet-d4). +# See EfficientDet, Tan et al, https://arxiv.org/abs/1911.09070 +# See Lin et al, https://arxiv.org/abs/1708.02002 +# Trained on COCO, initialized from an EfficientNet-b4 checkpoint. +# +# Train on TPU-32 + +model { + ssd { + inplace_batchnorm_update: true + freeze_batchnorm: false + num_classes: 90 + add_background_class: false + box_coder { + faster_rcnn_box_coder { + y_scale: 10.0 + x_scale: 10.0 + height_scale: 5.0 + width_scale: 5.0 + } + } + matcher { + argmax_matcher { + matched_threshold: 0.5 + unmatched_threshold: 0.5 + ignore_thresholds: false + negatives_lower_than_unmatched: true + force_match_for_each_row: true + use_matmul_gather: true + } + } + similarity_calculator { + iou_similarity { + } + } + encode_background_as_zeros: true + anchor_generator { + multiscale_anchor_generator { + min_level: 3 + max_level: 7 + anchor_scale: 4.0 + aspect_ratios: [1.0, 2.0, 0.5] + scales_per_octave: 3 + } + } + image_resizer { + keep_aspect_ratio_resizer { + min_dimension: 1024 + max_dimension: 1024 + pad_to_max_dimension: true + } + } + box_predictor { + weight_shared_convolutional_box_predictor { + depth: 224 + class_prediction_bias_init: -4.6 + conv_hyperparams { + force_use_bias: true + activation: SWISH + regularizer { + l2_regularizer { + weight: 0.00004 + } + } + initializer { + random_normal_initializer { + stddev: 0.01 + mean: 0.0 + } + } + batch_norm { + scale: true + decay: 0.99 + epsilon: 0.001 + } + } + num_layers_before_predictor: 4 + kernel_size: 3 + use_depthwise: true + } + } + feature_extractor { + type: 'ssd_efficientnet-b4_bifpn_keras' + bifpn { + min_level: 3 + max_level: 7 + num_iterations: 7 + num_filters: 224 + } + conv_hyperparams { + force_use_bias: true + activation: SWISH + regularizer { + l2_regularizer { + weight: 0.00004 + } + } + initializer { + truncated_normal_initializer { + stddev: 0.03 + mean: 0.0 + } + } + batch_norm { + scale: true, + decay: 0.99, + epsilon: 0.001, + } + } + } + loss { + classification_loss { + weighted_sigmoid_focal { + alpha: 0.25 + gamma: 1.5 + } + } + localization_loss { + weighted_smooth_l1 { + } + } + classification_weight: 1.0 + localization_weight: 1.0 + } + normalize_loss_by_num_matches: true + normalize_loc_loss_by_codesize: true + post_processing { + batch_non_max_suppression { + score_threshold: 1e-8 + iou_threshold: 0.5 + max_detections_per_class: 100 + max_total_detections: 100 + } + score_converter: SIGMOID + } + } +} + +train_config: { + fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/ckpt-0" + fine_tune_checkpoint_version: V2 + fine_tune_checkpoint_type: "classification" + batch_size: 128 + sync_replicas: true + startup_delay_steps: 0 + replicas_to_aggregate: 8 + use_bfloat16: true + num_steps: 300000 + data_augmentation_options { + random_horizontal_flip { + } + } + data_augmentation_options { + random_scale_crop_and_pad_to_square { + output_size: 1024 + scale_min: 0.1 + scale_max: 2.0 + } + } + optimizer { + momentum_optimizer: { + learning_rate: { + cosine_decay_learning_rate { + learning_rate_base: 8e-2 + total_steps: 300000 + warmup_learning_rate: .001 + warmup_steps: 2500 + } + } + momentum_optimizer_value: 0.9 + } + use_moving_average: false + } + max_number_of_boxes: 100 + unpad_groundtruth_tensors: false +} + +train_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/train2017-?????-of-00256.tfrecord" + } +} + +eval_config: { + metrics_set: "coco_detection_metrics" + use_moving_averages: false + batch_size: 1; +} + +eval_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + shuffle: false + num_epochs: 1 + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/val2017-?????-of-00032.tfrecord" + } +} diff --git a/research/object_detection/configs/tf2/ssd_efficientdet_d5_1280x1280_coco17_tpu-32.config b/research/object_detection/configs/tf2/ssd_efficientdet_d5_1280x1280_coco17_tpu-32.config new file mode 100644 index 00000000000..bcb33d50300 --- /dev/null +++ b/research/object_detection/configs/tf2/ssd_efficientdet_d5_1280x1280_coco17_tpu-32.config @@ -0,0 +1,199 @@ + # SSD with EfficientNet-b5 + BiFPN feature extractor, +# shared box predictor and focal loss (a.k.a EfficientDet-d5). +# See EfficientDet, Tan et al, https://arxiv.org/abs/1911.09070 +# See Lin et al, https://arxiv.org/abs/1708.02002 +# Trained on COCO, initialized from an EfficientNet-b5 checkpoint. +# +# Train on TPU-32 + +model { + ssd { + inplace_batchnorm_update: true + freeze_batchnorm: false + num_classes: 90 + add_background_class: false + box_coder { + faster_rcnn_box_coder { + y_scale: 10.0 + x_scale: 10.0 + height_scale: 5.0 + width_scale: 5.0 + } + } + matcher { + argmax_matcher { + matched_threshold: 0.5 + unmatched_threshold: 0.5 + ignore_thresholds: false + negatives_lower_than_unmatched: true + force_match_for_each_row: true + use_matmul_gather: true + } + } + similarity_calculator { + iou_similarity { + } + } + encode_background_as_zeros: true + anchor_generator { + multiscale_anchor_generator { + min_level: 3 + max_level: 7 + anchor_scale: 4.0 + aspect_ratios: [1.0, 2.0, 0.5] + scales_per_octave: 3 + } + } + image_resizer { + keep_aspect_ratio_resizer { + min_dimension: 1280 + max_dimension: 1280 + pad_to_max_dimension: true + } + } + box_predictor { + weight_shared_convolutional_box_predictor { + depth: 288 + class_prediction_bias_init: -4.6 + conv_hyperparams { + force_use_bias: true + activation: SWISH + regularizer { + l2_regularizer { + weight: 0.00004 + } + } + initializer { + random_normal_initializer { + stddev: 0.01 + mean: 0.0 + } + } + batch_norm { + scale: true + decay: 0.99 + epsilon: 0.001 + } + } + num_layers_before_predictor: 4 + kernel_size: 3 + use_depthwise: true + } + } + feature_extractor { + type: 'ssd_efficientnet-b5_bifpn_keras' + bifpn { + min_level: 3 + max_level: 7 + num_iterations: 7 + num_filters: 288 + } + conv_hyperparams { + force_use_bias: true + activation: SWISH + regularizer { + l2_regularizer { + weight: 0.00004 + } + } + initializer { + truncated_normal_initializer { + stddev: 0.03 + mean: 0.0 + } + } + batch_norm { + scale: true, + decay: 0.99, + epsilon: 0.001, + } + } + } + loss { + classification_loss { + weighted_sigmoid_focal { + alpha: 0.25 + gamma: 1.5 + } + } + localization_loss { + weighted_smooth_l1 { + } + } + classification_weight: 1.0 + localization_weight: 1.0 + } + normalize_loss_by_num_matches: true + normalize_loc_loss_by_codesize: true + post_processing { + batch_non_max_suppression { + score_threshold: 1e-8 + iou_threshold: 0.5 + max_detections_per_class: 100 + max_total_detections: 100 + } + score_converter: SIGMOID + } + } +} + +train_config: { + fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/ckpt-0" + fine_tune_checkpoint_version: V2 + fine_tune_checkpoint_type: "classification" + batch_size: 128 + sync_replicas: true + startup_delay_steps: 0 + replicas_to_aggregate: 8 + use_bfloat16: true + num_steps: 300000 + data_augmentation_options { + random_horizontal_flip { + } + } + data_augmentation_options { + random_scale_crop_and_pad_to_square { + output_size: 1280 + scale_min: 0.1 + scale_max: 2.0 + } + } + optimizer { + momentum_optimizer: { + learning_rate: { + cosine_decay_learning_rate { + learning_rate_base: 8e-2 + total_steps: 300000 + warmup_learning_rate: .001 + warmup_steps: 2500 + } + } + momentum_optimizer_value: 0.9 + } + use_moving_average: false + } + max_number_of_boxes: 100 + unpad_groundtruth_tensors: false +} + +train_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/train2017-?????-of-00256.tfrecord" + } +} + +eval_config: { + metrics_set: "coco_detection_metrics" + use_moving_averages: false + batch_size: 1; +} + +eval_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + shuffle: false + num_epochs: 1 + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/val2017-?????-of-00032.tfrecord" + } +} diff --git a/research/object_detection/configs/tf2/ssd_efficientdet_d6_1408x1408_coco17_tpu-32.config b/research/object_detection/configs/tf2/ssd_efficientdet_d6_1408x1408_coco17_tpu-32.config new file mode 100644 index 00000000000..1f24607431c --- /dev/null +++ b/research/object_detection/configs/tf2/ssd_efficientdet_d6_1408x1408_coco17_tpu-32.config @@ -0,0 +1,201 @@ + # SSD with EfficientNet-b6 + BiFPN feature extractor, +# shared box predictor and focal loss (a.k.a EfficientDet-d6). +# See EfficientDet, Tan et al, https://arxiv.org/abs/1911.09070 +# See Lin et al, https://arxiv.org/abs/1708.02002 +# Trained on COCO, initialized from an EfficientNet-b6 checkpoint. +# +# Train on TPU-32 + +model { + ssd { + inplace_batchnorm_update: true + freeze_batchnorm: false + num_classes: 90 + add_background_class: false + box_coder { + faster_rcnn_box_coder { + y_scale: 10.0 + x_scale: 10.0 + height_scale: 5.0 + width_scale: 5.0 + } + } + matcher { + argmax_matcher { + matched_threshold: 0.5 + unmatched_threshold: 0.5 + ignore_thresholds: false + negatives_lower_than_unmatched: true + force_match_for_each_row: true + use_matmul_gather: true + } + } + similarity_calculator { + iou_similarity { + } + } + encode_background_as_zeros: true + anchor_generator { + multiscale_anchor_generator { + min_level: 3 + max_level: 7 + anchor_scale: 4.0 + aspect_ratios: [1.0, 2.0, 0.5] + scales_per_octave: 3 + } + } + image_resizer { + keep_aspect_ratio_resizer { + min_dimension: 1408 + max_dimension: 1408 + pad_to_max_dimension: true + } + } + box_predictor { + weight_shared_convolutional_box_predictor { + depth: 384 + class_prediction_bias_init: -4.6 + conv_hyperparams { + force_use_bias: true + activation: SWISH + regularizer { + l2_regularizer { + weight: 0.00004 + } + } + initializer { + random_normal_initializer { + stddev: 0.01 + mean: 0.0 + } + } + batch_norm { + scale: true + decay: 0.99 + epsilon: 0.001 + } + } + num_layers_before_predictor: 5 + kernel_size: 3 + use_depthwise: true + } + } + feature_extractor { + type: 'ssd_efficientnet-b6_bifpn_keras' + bifpn { + min_level: 3 + max_level: 7 + num_iterations: 8 + num_filters: 384 + # Use unweighted sum for stability. + combine_method: 'sum' + } + conv_hyperparams { + force_use_bias: true + activation: SWISH + regularizer { + l2_regularizer { + weight: 0.00004 + } + } + initializer { + truncated_normal_initializer { + stddev: 0.03 + mean: 0.0 + } + } + batch_norm { + scale: true, + decay: 0.99, + epsilon: 0.001, + } + } + } + loss { + classification_loss { + weighted_sigmoid_focal { + alpha: 0.25 + gamma: 1.5 + } + } + localization_loss { + weighted_smooth_l1 { + } + } + classification_weight: 1.0 + localization_weight: 1.0 + } + normalize_loss_by_num_matches: true + normalize_loc_loss_by_codesize: true + post_processing { + batch_non_max_suppression { + score_threshold: 1e-8 + iou_threshold: 0.5 + max_detections_per_class: 100 + max_total_detections: 100 + } + score_converter: SIGMOID + } + } +} + +train_config: { + fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/ckpt-0" + fine_tune_checkpoint_version: V2 + fine_tune_checkpoint_type: "classification" + batch_size: 128 + sync_replicas: true + startup_delay_steps: 0 + replicas_to_aggregate: 8 + use_bfloat16: true + num_steps: 300000 + data_augmentation_options { + random_horizontal_flip { + } + } + data_augmentation_options { + random_scale_crop_and_pad_to_square { + output_size: 1408 + scale_min: 0.1 + scale_max: 2.0 + } + } + optimizer { + momentum_optimizer: { + learning_rate: { + cosine_decay_learning_rate { + learning_rate_base: 8e-2 + total_steps: 300000 + warmup_learning_rate: .001 + warmup_steps: 2500 + } + } + momentum_optimizer_value: 0.9 + } + use_moving_average: false + } + max_number_of_boxes: 100 + unpad_groundtruth_tensors: false +} + +train_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/train2017-?????-of-00256.tfrecord" + } +} + +eval_config: { + metrics_set: "coco_detection_metrics" + use_moving_averages: false + batch_size: 1; +} + +eval_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + shuffle: false + num_epochs: 1 + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/val2017-?????-of-00032.tfrecord" + } +} diff --git a/research/object_detection/configs/tf2/ssd_efficientdet_d7_1536x1536_coco17_tpu-32.config b/research/object_detection/configs/tf2/ssd_efficientdet_d7_1536x1536_coco17_tpu-32.config new file mode 100644 index 00000000000..81954aa8bdd --- /dev/null +++ b/research/object_detection/configs/tf2/ssd_efficientdet_d7_1536x1536_coco17_tpu-32.config @@ -0,0 +1,201 @@ + # SSD with EfficientNet-b6 + BiFPN feature extractor, +# shared box predictor and focal loss (a.k.a EfficientDet-d7). +# See EfficientDet, Tan et al, https://arxiv.org/abs/1911.09070 +# See Lin et al, https://arxiv.org/abs/1708.02002 +# Trained on COCO, initialized from an EfficientNet-b6 checkpoint. +# +# Train on TPU-32 + +model { + ssd { + inplace_batchnorm_update: true + freeze_batchnorm: false + num_classes: 90 + add_background_class: false + box_coder { + faster_rcnn_box_coder { + y_scale: 10.0 + x_scale: 10.0 + height_scale: 5.0 + width_scale: 5.0 + } + } + matcher { + argmax_matcher { + matched_threshold: 0.5 + unmatched_threshold: 0.5 + ignore_thresholds: false + negatives_lower_than_unmatched: true + force_match_for_each_row: true + use_matmul_gather: true + } + } + similarity_calculator { + iou_similarity { + } + } + encode_background_as_zeros: true + anchor_generator { + multiscale_anchor_generator { + min_level: 3 + max_level: 7 + anchor_scale: 4.0 + aspect_ratios: [1.0, 2.0, 0.5] + scales_per_octave: 3 + } + } + image_resizer { + keep_aspect_ratio_resizer { + min_dimension: 1536 + max_dimension: 1536 + pad_to_max_dimension: true + } + } + box_predictor { + weight_shared_convolutional_box_predictor { + depth: 384 + class_prediction_bias_init: -4.6 + conv_hyperparams { + force_use_bias: true + activation: SWISH + regularizer { + l2_regularizer { + weight: 0.00004 + } + } + initializer { + random_normal_initializer { + stddev: 0.01 + mean: 0.0 + } + } + batch_norm { + scale: true + decay: 0.99 + epsilon: 0.001 + } + } + num_layers_before_predictor: 5 + kernel_size: 3 + use_depthwise: true + } + } + feature_extractor { + type: 'ssd_efficientnet-b6_bifpn_keras' + bifpn { + min_level: 3 + max_level: 7 + num_iterations: 8 + num_filters: 384 + # Use unweighted sum for stability. + combine_method: 'sum' + } + conv_hyperparams { + force_use_bias: true + activation: SWISH + regularizer { + l2_regularizer { + weight: 0.00004 + } + } + initializer { + truncated_normal_initializer { + stddev: 0.03 + mean: 0.0 + } + } + batch_norm { + scale: true, + decay: 0.99, + epsilon: 0.001, + } + } + } + loss { + classification_loss { + weighted_sigmoid_focal { + alpha: 0.25 + gamma: 1.5 + } + } + localization_loss { + weighted_smooth_l1 { + } + } + classification_weight: 1.0 + localization_weight: 1.0 + } + normalize_loss_by_num_matches: true + normalize_loc_loss_by_codesize: true + post_processing { + batch_non_max_suppression { + score_threshold: 1e-8 + iou_threshold: 0.5 + max_detections_per_class: 100 + max_total_detections: 100 + } + score_converter: SIGMOID + } + } +} + +train_config: { + fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/ckpt-0" + fine_tune_checkpoint_version: V2 + fine_tune_checkpoint_type: "classification" + batch_size: 128 + sync_replicas: true + startup_delay_steps: 0 + replicas_to_aggregate: 8 + use_bfloat16: true + num_steps: 300000 + data_augmentation_options { + random_horizontal_flip { + } + } + data_augmentation_options { + random_scale_crop_and_pad_to_square { + output_size: 1536 + scale_min: 0.1 + scale_max: 2.0 + } + } + optimizer { + momentum_optimizer: { + learning_rate: { + cosine_decay_learning_rate { + learning_rate_base: 8e-2 + total_steps: 300000 + warmup_learning_rate: .001 + warmup_steps: 2500 + } + } + momentum_optimizer_value: 0.9 + } + use_moving_average: false + } + max_number_of_boxes: 100 + unpad_groundtruth_tensors: false +} + +train_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/train2017-?????-of-00256.tfrecord" + } +} + +eval_config: { + metrics_set: "coco_detection_metrics" + use_moving_averages: false + batch_size: 1; +} + +eval_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + shuffle: false + num_epochs: 1 + tf_record_input_reader { + input_path: "PATH_TO_BEE_CONFIGURED/val2017-?????-of-00032.tfrecord" + } +} diff --git a/research/object_detection/configs/tf2/ssd_mobilenet_v1_fpn_640x640_coco17_tpu-8.config b/research/object_detection/configs/tf2/ssd_mobilenet_v1_fpn_640x640_coco17_tpu-8.config new file mode 100644 index 00000000000..3cfe304f171 --- /dev/null +++ b/research/object_detection/configs/tf2/ssd_mobilenet_v1_fpn_640x640_coco17_tpu-8.config @@ -0,0 +1,197 @@ +# SSD with Mobilenet v1 FPN feature extractor, shared box predictor and focal +# loss (a.k.a Retinanet). +# See Lin et al, https://arxiv.org/abs/1708.02002 +# Trained on COCO, initialized from Imagenet classification checkpoint +# Train on TPU-8 +# +# Achieves 29.1 mAP on COCO17 Val + +model { + ssd { + inplace_batchnorm_update: true + freeze_batchnorm: false + num_classes: 90 + box_coder { + faster_rcnn_box_coder { + y_scale: 10.0 + x_scale: 10.0 + height_scale: 5.0 + width_scale: 5.0 + } + } + matcher { + argmax_matcher { + matched_threshold: 0.5 + unmatched_threshold: 0.5 + ignore_thresholds: false + negatives_lower_than_unmatched: true + force_match_for_each_row: true + use_matmul_gather: true + } + } + similarity_calculator { + iou_similarity { + } + } + encode_background_as_zeros: true + anchor_generator { + multiscale_anchor_generator { + min_level: 3 + max_level: 7 + anchor_scale: 4.0 + aspect_ratios: [1.0, 2.0, 0.5] + scales_per_octave: 2 + } + } + image_resizer { + fixed_shape_resizer { + height: 640 + width: 640 + } + } + box_predictor { + weight_shared_convolutional_box_predictor { + depth: 256 + class_prediction_bias_init: -4.6 + conv_hyperparams { + activation: RELU_6, + regularizer { + l2_regularizer { + weight: 0.00004 + } + } + initializer { + random_normal_initializer { + stddev: 0.01 + mean: 0.0 + } + } + batch_norm { + scale: true, + decay: 0.997, + epsilon: 0.001, + } + } + num_layers_before_predictor: 4 + kernel_size: 3 + } + } + feature_extractor { + type: 'ssd_mobilenet_v1_fpn_keras' + fpn { + min_level: 3 + max_level: 7 + } + min_depth: 16 + depth_multiplier: 1.0 + conv_hyperparams { + activation: RELU_6, + regularizer { + l2_regularizer { + weight: 0.00004 + } + } + initializer { + random_normal_initializer { + stddev: 0.01 + mean: 0.0 + } + } + batch_norm { + scale: true, + decay: 0.997, + epsilon: 0.001, + } + } + override_base_feature_extractor_hyperparams: true + } + loss { + classification_loss { + weighted_sigmoid_focal { + alpha: 0.25 + gamma: 2.0 + } + } + localization_loss { + weighted_smooth_l1 { + } + } + classification_weight: 1.0 + localization_weight: 1.0 + } + normalize_loss_by_num_matches: true + normalize_loc_loss_by_codesize: true + post_processing { + batch_non_max_suppression { + score_threshold: 1e-8 + iou_threshold: 0.6 + max_detections_per_class: 100 + max_total_detections: 100 + } + score_converter: SIGMOID + } + } +} + +train_config: { + fine_tune_checkpoint_version: V2 + fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/mobilenet_v1.ckpt-1" + fine_tune_checkpoint_type: "classification" + batch_size: 64 + sync_replicas: true + startup_delay_steps: 0 + replicas_to_aggregate: 8 + num_steps: 25000 + data_augmentation_options { + random_horizontal_flip { + } + } + data_augmentation_options { + random_crop_image { + min_object_covered: 0.0 + min_aspect_ratio: 0.75 + max_aspect_ratio: 3.0 + min_area: 0.75 + max_area: 1.0 + overlap_thresh: 0.0 + } + } + optimizer { + momentum_optimizer: { + learning_rate: { + cosine_decay_learning_rate { + learning_rate_base: .04 + total_steps: 25000 + warmup_learning_rate: .013333 + warmup_steps: 2000 + } + } + momentum_optimizer_value: 0.9 + } + use_moving_average: false + } + max_number_of_boxes: 100 + unpad_groundtruth_tensors: false +} + +train_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/train2017-?????-of-00256.tfrecord" + } +} + +eval_config: { + metrics_set: "coco_detection_metrics" + use_moving_averages: false + batch_size: 1; +} + +eval_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + shuffle: false + num_epochs: 1 + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/val2017-?????-of-00032.tfrecord" + } +} diff --git a/research/object_detection/configs/tf2/ssd_mobilenet_v2_320x320_coco17_tpu-8.config b/research/object_detection/configs/tf2/ssd_mobilenet_v2_320x320_coco17_tpu-8.config new file mode 100644 index 00000000000..dc3a4a7f3e7 --- /dev/null +++ b/research/object_detection/configs/tf2/ssd_mobilenet_v2_320x320_coco17_tpu-8.config @@ -0,0 +1,197 @@ +# SSD with Mobilenet v2 +# Trained on COCO17, initialized from Imagenet classification checkpoint +# Train on TPU-8 +# +# Achieves 22.2 mAP on COCO17 Val + +model { + ssd { + inplace_batchnorm_update: true + freeze_batchnorm: false + num_classes: 90 + box_coder { + faster_rcnn_box_coder { + y_scale: 10.0 + x_scale: 10.0 + height_scale: 5.0 + width_scale: 5.0 + } + } + matcher { + argmax_matcher { + matched_threshold: 0.5 + unmatched_threshold: 0.5 + ignore_thresholds: false + negatives_lower_than_unmatched: true + force_match_for_each_row: true + use_matmul_gather: true + } + } + similarity_calculator { + iou_similarity { + } + } + encode_background_as_zeros: true + anchor_generator { + ssd_anchor_generator { + num_layers: 6 + min_scale: 0.2 + max_scale: 0.95 + aspect_ratios: 1.0 + aspect_ratios: 2.0 + aspect_ratios: 0.5 + aspect_ratios: 3.0 + aspect_ratios: 0.3333 + } + } + image_resizer { + fixed_shape_resizer { + height: 300 + width: 300 + } + } + box_predictor { + convolutional_box_predictor { + min_depth: 0 + max_depth: 0 + num_layers_before_predictor: 0 + use_dropout: false + dropout_keep_probability: 0.8 + kernel_size: 1 + box_code_size: 4 + apply_sigmoid_to_scores: false + class_prediction_bias_init: -4.6 + conv_hyperparams { + activation: RELU_6, + regularizer { + l2_regularizer { + weight: 0.00004 + } + } + initializer { + random_normal_initializer { + stddev: 0.01 + mean: 0.0 + } + } + batch_norm { + train: true, + scale: true, + center: true, + decay: 0.97, + epsilon: 0.001, + } + } + } + } + feature_extractor { + type: 'ssd_mobilenet_v2_keras' + min_depth: 16 + depth_multiplier: 1.0 + conv_hyperparams { + activation: RELU_6, + regularizer { + l2_regularizer { + weight: 0.00004 + } + } + initializer { + truncated_normal_initializer { + stddev: 0.03 + mean: 0.0 + } + } + batch_norm { + train: true, + scale: true, + center: true, + decay: 0.97, + epsilon: 0.001, + } + } + override_base_feature_extractor_hyperparams: true + } + loss { + classification_loss { + weighted_sigmoid_focal { + alpha: 0.75, + gamma: 2.0 + } + } + localization_loss { + weighted_smooth_l1 { + delta: 1.0 + } + } + classification_weight: 1.0 + localization_weight: 1.0 + } + normalize_loss_by_num_matches: true + normalize_loc_loss_by_codesize: true + post_processing { + batch_non_max_suppression { + score_threshold: 1e-8 + iou_threshold: 0.6 + max_detections_per_class: 100 + max_total_detections: 100 + } + score_converter: SIGMOID + } + } +} + +train_config: { + fine_tune_checkpoint_version: V2 + fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/mobilenet_v2.ckpt-1" + fine_tune_checkpoint_type: "classification" + batch_size: 512 + sync_replicas: true + startup_delay_steps: 0 + replicas_to_aggregate: 8 + num_steps: 50000 + data_augmentation_options { + random_horizontal_flip { + } + } + data_augmentation_options { + ssd_random_crop { + } + } + optimizer { + momentum_optimizer: { + learning_rate: { + cosine_decay_learning_rate { + learning_rate_base: .8 + total_steps: 50000 + warmup_learning_rate: 0.13333 + warmup_steps: 2000 + } + } + momentum_optimizer_value: 0.9 + } + use_moving_average: false + } + max_number_of_boxes: 100 + unpad_groundtruth_tensors: false +} + +train_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/train2017-?????-of-00256.tfrecord" + } +} + +eval_config: { + metrics_set: "coco_detection_metrics" + use_moving_averages: false +} + +eval_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + shuffle: false + num_epochs: 1 + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/val2017-?????-of-00032.tfrecord" + } +} diff --git a/research/object_detection/configs/tf2/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8.config b/research/object_detection/configs/tf2/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8.config new file mode 100644 index 00000000000..656e324c5d9 --- /dev/null +++ b/research/object_detection/configs/tf2/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8.config @@ -0,0 +1,201 @@ +# SSD with Mobilenet v2 FPN-lite (go/fpn-lite) feature extractor, shared box +# predictor and focal loss (a mobile version of Retinanet). +# Retinanet: see Lin et al, https://arxiv.org/abs/1708.02002 +# Trained on COCO, initialized from Imagenet classification checkpoint +# Train on TPU-8 +# +# Achieves 22.2 mAP on COCO17 Val + +model { + ssd { + inplace_batchnorm_update: true + freeze_batchnorm: false + num_classes: 90 + box_coder { + faster_rcnn_box_coder { + y_scale: 10.0 + x_scale: 10.0 + height_scale: 5.0 + width_scale: 5.0 + } + } + matcher { + argmax_matcher { + matched_threshold: 0.5 + unmatched_threshold: 0.5 + ignore_thresholds: false + negatives_lower_than_unmatched: true + force_match_for_each_row: true + use_matmul_gather: true + } + } + similarity_calculator { + iou_similarity { + } + } + encode_background_as_zeros: true + anchor_generator { + multiscale_anchor_generator { + min_level: 3 + max_level: 7 + anchor_scale: 4.0 + aspect_ratios: [1.0, 2.0, 0.5] + scales_per_octave: 2 + } + } + image_resizer { + fixed_shape_resizer { + height: 320 + width: 320 + } + } + box_predictor { + weight_shared_convolutional_box_predictor { + depth: 128 + class_prediction_bias_init: -4.6 + conv_hyperparams { + activation: RELU_6, + regularizer { + l2_regularizer { + weight: 0.00004 + } + } + initializer { + random_normal_initializer { + stddev: 0.01 + mean: 0.0 + } + } + batch_norm { + scale: true, + decay: 0.997, + epsilon: 0.001, + } + } + num_layers_before_predictor: 4 + share_prediction_tower: true + use_depthwise: true + kernel_size: 3 + } + } + feature_extractor { + type: 'ssd_mobilenet_v2_fpn_keras' + use_depthwise: true + fpn { + min_level: 3 + max_level: 7 + additional_layer_depth: 128 + } + min_depth: 16 + depth_multiplier: 1.0 + conv_hyperparams { + activation: RELU_6, + regularizer { + l2_regularizer { + weight: 0.00004 + } + } + initializer { + random_normal_initializer { + stddev: 0.01 + mean: 0.0 + } + } + batch_norm { + scale: true, + decay: 0.997, + epsilon: 0.001, + } + } + override_base_feature_extractor_hyperparams: true + } + loss { + classification_loss { + weighted_sigmoid_focal { + alpha: 0.25 + gamma: 2.0 + } + } + localization_loss { + weighted_smooth_l1 { + } + } + classification_weight: 1.0 + localization_weight: 1.0 + } + normalize_loss_by_num_matches: true + normalize_loc_loss_by_codesize: true + post_processing { + batch_non_max_suppression { + score_threshold: 1e-8 + iou_threshold: 0.6 + max_detections_per_class: 100 + max_total_detections: 100 + } + score_converter: SIGMOID + } + } +} + +train_config: { + fine_tune_checkpoint_version: V2 + fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/mobilenet_v2.ckpt-1" + fine_tune_checkpoint_type: "classification" + batch_size: 128 + sync_replicas: true + startup_delay_steps: 0 + replicas_to_aggregate: 8 + num_steps: 50000 + data_augmentation_options { + random_horizontal_flip { + } + } + data_augmentation_options { + random_crop_image { + min_object_covered: 0.0 + min_aspect_ratio: 0.75 + max_aspect_ratio: 3.0 + min_area: 0.75 + max_area: 1.0 + overlap_thresh: 0.0 + } + } + optimizer { + momentum_optimizer: { + learning_rate: { + cosine_decay_learning_rate { + learning_rate_base: .08 + total_steps: 50000 + warmup_learning_rate: .026666 + warmup_steps: 1000 + } + } + momentum_optimizer_value: 0.9 + } + use_moving_average: false + } + max_number_of_boxes: 100 + unpad_groundtruth_tensors: false +} + +train_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/train2017-?????-of-00256.tfrecord" + } +} + +eval_config: { + metrics_set: "coco_detection_metrics" + use_moving_averages: false +} + +eval_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + shuffle: false + num_epochs: 1 + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/val2017-?????-of-00032.tfrecord" + } +} + diff --git a/research/object_detection/configs/tf2/ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8.config b/research/object_detection/configs/tf2/ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8.config new file mode 100644 index 00000000000..5e4bca1688c --- /dev/null +++ b/research/object_detection/configs/tf2/ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8.config @@ -0,0 +1,201 @@ +# SSD with Mobilenet v2 FPN-lite (go/fpn-lite) feature extractor, shared box +# predictor and focal loss (a mobile version of Retinanet). +# Retinanet: see Lin et al, https://arxiv.org/abs/1708.02002 +# Trained on COCO, initialized from Imagenet classification checkpoint +# Train on TPU-8 +# +# Achieves 28.2 mAP on COCO17 Val + +model { + ssd { + inplace_batchnorm_update: true + freeze_batchnorm: false + num_classes: 90 + box_coder { + faster_rcnn_box_coder { + y_scale: 10.0 + x_scale: 10.0 + height_scale: 5.0 + width_scale: 5.0 + } + } + matcher { + argmax_matcher { + matched_threshold: 0.5 + unmatched_threshold: 0.5 + ignore_thresholds: false + negatives_lower_than_unmatched: true + force_match_for_each_row: true + use_matmul_gather: true + } + } + similarity_calculator { + iou_similarity { + } + } + encode_background_as_zeros: true + anchor_generator { + multiscale_anchor_generator { + min_level: 3 + max_level: 7 + anchor_scale: 4.0 + aspect_ratios: [1.0, 2.0, 0.5] + scales_per_octave: 2 + } + } + image_resizer { + fixed_shape_resizer { + height: 640 + width: 640 + } + } + box_predictor { + weight_shared_convolutional_box_predictor { + depth: 128 + class_prediction_bias_init: -4.6 + conv_hyperparams { + activation: RELU_6, + regularizer { + l2_regularizer { + weight: 0.00004 + } + } + initializer { + random_normal_initializer { + stddev: 0.01 + mean: 0.0 + } + } + batch_norm { + scale: true, + decay: 0.997, + epsilon: 0.001, + } + } + num_layers_before_predictor: 4 + share_prediction_tower: true + use_depthwise: true + kernel_size: 3 + } + } + feature_extractor { + type: 'ssd_mobilenet_v2_fpn_keras' + use_depthwise: true + fpn { + min_level: 3 + max_level: 7 + additional_layer_depth: 128 + } + min_depth: 16 + depth_multiplier: 1.0 + conv_hyperparams { + activation: RELU_6, + regularizer { + l2_regularizer { + weight: 0.00004 + } + } + initializer { + random_normal_initializer { + stddev: 0.01 + mean: 0.0 + } + } + batch_norm { + scale: true, + decay: 0.997, + epsilon: 0.001, + } + } + override_base_feature_extractor_hyperparams: true + } + loss { + classification_loss { + weighted_sigmoid_focal { + alpha: 0.25 + gamma: 2.0 + } + } + localization_loss { + weighted_smooth_l1 { + } + } + classification_weight: 1.0 + localization_weight: 1.0 + } + normalize_loss_by_num_matches: true + normalize_loc_loss_by_codesize: true + post_processing { + batch_non_max_suppression { + score_threshold: 1e-8 + iou_threshold: 0.6 + max_detections_per_class: 100 + max_total_detections: 100 + } + score_converter: SIGMOID + } + } +} + +train_config: { + fine_tune_checkpoint_version: V2 + fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/mobilenet_v2.ckpt-1" + fine_tune_checkpoint_type: "classification" + batch_size: 128 + sync_replicas: true + startup_delay_steps: 0 + replicas_to_aggregate: 8 + num_steps: 50000 + data_augmentation_options { + random_horizontal_flip { + } + } + data_augmentation_options { + random_crop_image { + min_object_covered: 0.0 + min_aspect_ratio: 0.75 + max_aspect_ratio: 3.0 + min_area: 0.75 + max_area: 1.0 + overlap_thresh: 0.0 + } + } + optimizer { + momentum_optimizer: { + learning_rate: { + cosine_decay_learning_rate { + learning_rate_base: .08 + total_steps: 50000 + warmup_learning_rate: .026666 + warmup_steps: 1000 + } + } + momentum_optimizer_value: 0.9 + } + use_moving_average: false + } + max_number_of_boxes: 100 + unpad_groundtruth_tensors: false +} + +train_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/train2017-?????-of-00256.tfrecord" + } +} + +eval_config: { + metrics_set: "coco_detection_metrics" + use_moving_averages: false +} + +eval_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + shuffle: false + num_epochs: 1 + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/val2017-?????-of-00032.tfrecord" + } +} + diff --git a/research/object_detection/configs/tf2/ssd_resnet101_v1_fpn_1024x1024_coco17_tpu-8.config b/research/object_detection/configs/tf2/ssd_resnet101_v1_fpn_1024x1024_coco17_tpu-8.config new file mode 100644 index 00000000000..015617ba444 --- /dev/null +++ b/research/object_detection/configs/tf2/ssd_resnet101_v1_fpn_1024x1024_coco17_tpu-8.config @@ -0,0 +1,197 @@ +# SSD with Resnet 101 v1 FPN feature extractor, shared box predictor and focal +# loss (a.k.a Retinanet). +# See Lin et al, https://arxiv.org/abs/1708.02002 +# Trained on COCO, initialized from Imagenet classification checkpoint +# Train on TPU-8 +# +# Achieves 39.5 mAP on COCO17 Val + +model { + ssd { + inplace_batchnorm_update: true + freeze_batchnorm: false + num_classes: 90 + box_coder { + faster_rcnn_box_coder { + y_scale: 10.0 + x_scale: 10.0 + height_scale: 5.0 + width_scale: 5.0 + } + } + matcher { + argmax_matcher { + matched_threshold: 0.5 + unmatched_threshold: 0.5 + ignore_thresholds: false + negatives_lower_than_unmatched: true + force_match_for_each_row: true + use_matmul_gather: true + } + } + similarity_calculator { + iou_similarity { + } + } + encode_background_as_zeros: true + anchor_generator { + multiscale_anchor_generator { + min_level: 3 + max_level: 7 + anchor_scale: 4.0 + aspect_ratios: [1.0, 2.0, 0.5] + scales_per_octave: 2 + } + } + image_resizer { + fixed_shape_resizer { + height: 1024 + width: 1024 + } + } + box_predictor { + weight_shared_convolutional_box_predictor { + depth: 256 + class_prediction_bias_init: -4.6 + conv_hyperparams { + activation: RELU_6, + regularizer { + l2_regularizer { + weight: 0.0004 + } + } + initializer { + random_normal_initializer { + stddev: 0.01 + mean: 0.0 + } + } + batch_norm { + scale: true, + decay: 0.997, + epsilon: 0.001, + } + } + num_layers_before_predictor: 4 + kernel_size: 3 + } + } + feature_extractor { + type: 'ssd_resnet101_v1_fpn_keras' + fpn { + min_level: 3 + max_level: 7 + } + min_depth: 16 + depth_multiplier: 1.0 + conv_hyperparams { + activation: RELU_6, + regularizer { + l2_regularizer { + weight: 0.0004 + } + } + initializer { + truncated_normal_initializer { + stddev: 0.03 + mean: 0.0 + } + } + batch_norm { + scale: true, + decay: 0.997, + epsilon: 0.001, + } + } + override_base_feature_extractor_hyperparams: true + } + loss { + classification_loss { + weighted_sigmoid_focal { + alpha: 0.25 + gamma: 2.0 + } + } + localization_loss { + weighted_smooth_l1 { + } + } + classification_weight: 1.0 + localization_weight: 1.0 + } + normalize_loss_by_num_matches: true + normalize_loc_loss_by_codesize: true + post_processing { + batch_non_max_suppression { + score_threshold: 1e-8 + iou_threshold: 0.6 + max_detections_per_class: 100 + max_total_detections: 100 + } + score_converter: SIGMOID + } + } +} + +train_config: { + fine_tune_checkpoint_version: V2 + fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/resnet101.ckpt-1" + fine_tune_checkpoint_type: "classification" + batch_size: 64 + sync_replicas: true + startup_delay_steps: 0 + replicas_to_aggregate: 8 + use_bfloat16: true + num_steps: 100000 + data_augmentation_options { + random_horizontal_flip { + } + } + data_augmentation_options { + random_crop_image { + min_object_covered: 0.0 + min_aspect_ratio: 0.75 + max_aspect_ratio: 3.0 + min_area: 0.75 + max_area: 1.0 + overlap_thresh: 0.0 + } + } + optimizer { + momentum_optimizer: { + learning_rate: { + cosine_decay_learning_rate { + learning_rate_base: .04 + total_steps: 100000 + warmup_learning_rate: .013333 + warmup_steps: 2000 + } + } + momentum_optimizer_value: 0.9 + } + use_moving_average: false + } + max_number_of_boxes: 100 + unpad_groundtruth_tensors: false +} + +train_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/train2017-?????-of-00256.tfrecord" + } +} + +eval_config: { + metrics_set: "coco_detection_metrics" + use_moving_averages: false +} + +eval_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + shuffle: false + num_epochs: 1 + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/val2017-?????-of-00032.tfrecord" + } +} diff --git a/research/object_detection/configs/tf2/ssd_resnet101_v1_fpn_640x640_coco17_tpu-8.config b/research/object_detection/configs/tf2/ssd_resnet101_v1_fpn_640x640_coco17_tpu-8.config new file mode 100644 index 00000000000..37e9b9b632c --- /dev/null +++ b/research/object_detection/configs/tf2/ssd_resnet101_v1_fpn_640x640_coco17_tpu-8.config @@ -0,0 +1,197 @@ +# SSD with Resnet 101 v1 FPN feature extractor, shared box predictor and focal +# loss (a.k.a Retinanet). +# See Lin et al, https://arxiv.org/abs/1708.02002 +# Trained on COCO, initialized from Imagenet classification checkpoint +# Train on TPU-8 +# +# Achieves 35.4 mAP on COCO17 Val + +model { + ssd { + inplace_batchnorm_update: true + freeze_batchnorm: false + num_classes: 90 + box_coder { + faster_rcnn_box_coder { + y_scale: 10.0 + x_scale: 10.0 + height_scale: 5.0 + width_scale: 5.0 + } + } + matcher { + argmax_matcher { + matched_threshold: 0.5 + unmatched_threshold: 0.5 + ignore_thresholds: false + negatives_lower_than_unmatched: true + force_match_for_each_row: true + use_matmul_gather: true + } + } + similarity_calculator { + iou_similarity { + } + } + encode_background_as_zeros: true + anchor_generator { + multiscale_anchor_generator { + min_level: 3 + max_level: 7 + anchor_scale: 4.0 + aspect_ratios: [1.0, 2.0, 0.5] + scales_per_octave: 2 + } + } + image_resizer { + fixed_shape_resizer { + height: 640 + width: 640 + } + } + box_predictor { + weight_shared_convolutional_box_predictor { + depth: 256 + class_prediction_bias_init: -4.6 + conv_hyperparams { + activation: RELU_6, + regularizer { + l2_regularizer { + weight: 0.0004 + } + } + initializer { + random_normal_initializer { + stddev: 0.01 + mean: 0.0 + } + } + batch_norm { + scale: true, + decay: 0.997, + epsilon: 0.001, + } + } + num_layers_before_predictor: 4 + kernel_size: 3 + } + } + feature_extractor { + type: 'ssd_resnet101_v1_fpn_keras' + fpn { + min_level: 3 + max_level: 7 + } + min_depth: 16 + depth_multiplier: 1.0 + conv_hyperparams { + activation: RELU_6, + regularizer { + l2_regularizer { + weight: 0.0004 + } + } + initializer { + truncated_normal_initializer { + stddev: 0.03 + mean: 0.0 + } + } + batch_norm { + scale: true, + decay: 0.997, + epsilon: 0.001, + } + } + override_base_feature_extractor_hyperparams: true + } + loss { + classification_loss { + weighted_sigmoid_focal { + alpha: 0.25 + gamma: 2.0 + } + } + localization_loss { + weighted_smooth_l1 { + } + } + classification_weight: 1.0 + localization_weight: 1.0 + } + normalize_loss_by_num_matches: true + normalize_loc_loss_by_codesize: true + post_processing { + batch_non_max_suppression { + score_threshold: 1e-8 + iou_threshold: 0.6 + max_detections_per_class: 100 + max_total_detections: 100 + } + score_converter: SIGMOID + } + } +} + +train_config: { + fine_tune_checkpoint_version: V2 + fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/resnet101.ckpt-1" + fine_tune_checkpoint_type: "classification" + batch_size: 64 + sync_replicas: true + startup_delay_steps: 0 + replicas_to_aggregate: 8 + use_bfloat16: true + num_steps: 25000 + data_augmentation_options { + random_horizontal_flip { + } + } + data_augmentation_options { + random_crop_image { + min_object_covered: 0.0 + min_aspect_ratio: 0.75 + max_aspect_ratio: 3.0 + min_area: 0.75 + max_area: 1.0 + overlap_thresh: 0.0 + } + } + optimizer { + momentum_optimizer: { + learning_rate: { + cosine_decay_learning_rate { + learning_rate_base: .04 + total_steps: 25000 + warmup_learning_rate: .013333 + warmup_steps: 2000 + } + } + momentum_optimizer_value: 0.9 + } + use_moving_average: false + } + max_number_of_boxes: 100 + unpad_groundtruth_tensors: false +} + +train_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/train2017-?????-of-00256.tfrecord" + } +} + +eval_config: { + metrics_set: "coco_detection_metrics" + use_moving_averages: false +} + +eval_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + shuffle: false + num_epochs: 1 + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/val2017-?????-of-00032.tfrecord" + } +} diff --git a/research/object_detection/configs/tf2/ssd_resnet152_v1_fpn_1024x1024_coco17_tpu-8.config b/research/object_detection/configs/tf2/ssd_resnet152_v1_fpn_1024x1024_coco17_tpu-8.config new file mode 100644 index 00000000000..9dbc06e3d72 --- /dev/null +++ b/research/object_detection/configs/tf2/ssd_resnet152_v1_fpn_1024x1024_coco17_tpu-8.config @@ -0,0 +1,197 @@ +# SSD with Resnet 152 v1 FPN feature extractor, shared box predictor and focal +# loss (a.k.a Retinanet). +# See Lin et al, https://arxiv.org/abs/1708.02002 +# Trained on COCO, initialized from Imagenet classification checkpoint +# Train on TPU-8 +# +# Achieves 39.6 mAP on COCO17 Val + +model { + ssd { + inplace_batchnorm_update: true + freeze_batchnorm: false + num_classes: 90 + box_coder { + faster_rcnn_box_coder { + y_scale: 10.0 + x_scale: 10.0 + height_scale: 5.0 + width_scale: 5.0 + } + } + matcher { + argmax_matcher { + matched_threshold: 0.5 + unmatched_threshold: 0.5 + ignore_thresholds: false + negatives_lower_than_unmatched: true + force_match_for_each_row: true + use_matmul_gather: true + } + } + similarity_calculator { + iou_similarity { + } + } + encode_background_as_zeros: true + anchor_generator { + multiscale_anchor_generator { + min_level: 3 + max_level: 7 + anchor_scale: 4.0 + aspect_ratios: [1.0, 2.0, 0.5] + scales_per_octave: 2 + } + } + image_resizer { + fixed_shape_resizer { + height: 1024 + width: 1024 + } + } + box_predictor { + weight_shared_convolutional_box_predictor { + depth: 256 + class_prediction_bias_init: -4.6 + conv_hyperparams { + activation: RELU_6, + regularizer { + l2_regularizer { + weight: 0.0004 + } + } + initializer { + random_normal_initializer { + stddev: 0.01 + mean: 0.0 + } + } + batch_norm { + scale: true, + decay: 0.997, + epsilon: 0.001, + } + } + num_layers_before_predictor: 4 + kernel_size: 3 + } + } + feature_extractor { + type: 'ssd_resnet152_v1_fpn_keras' + fpn { + min_level: 3 + max_level: 7 + } + min_depth: 16 + depth_multiplier: 1.0 + conv_hyperparams { + activation: RELU_6, + regularizer { + l2_regularizer { + weight: 0.0004 + } + } + initializer { + truncated_normal_initializer { + stddev: 0.03 + mean: 0.0 + } + } + batch_norm { + scale: true, + decay: 0.997, + epsilon: 0.001, + } + } + override_base_feature_extractor_hyperparams: true + } + loss { + classification_loss { + weighted_sigmoid_focal { + alpha: 0.25 + gamma: 2.0 + } + } + localization_loss { + weighted_smooth_l1 { + } + } + classification_weight: 1.0 + localization_weight: 1.0 + } + normalize_loss_by_num_matches: true + normalize_loc_loss_by_codesize: true + post_processing { + batch_non_max_suppression { + score_threshold: 1e-8 + iou_threshold: 0.6 + max_detections_per_class: 100 + max_total_detections: 100 + } + score_converter: SIGMOID + } + } +} + +train_config: { + fine_tune_checkpoint_version: V2 + fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/resnet152.ckpt-1" + fine_tune_checkpoint_type: "classification" + batch_size: 64 + sync_replicas: true + startup_delay_steps: 0 + replicas_to_aggregate: 8 + use_bfloat16: true + num_steps: 100000 + data_augmentation_options { + random_horizontal_flip { + } + } + data_augmentation_options { + random_crop_image { + min_object_covered: 0.0 + min_aspect_ratio: 0.75 + max_aspect_ratio: 3.0 + min_area: 0.75 + max_area: 1.0 + overlap_thresh: 0.0 + } + } + optimizer { + momentum_optimizer: { + learning_rate: { + cosine_decay_learning_rate { + learning_rate_base: .04 + total_steps: 100000 + warmup_learning_rate: .013333 + warmup_steps: 2000 + } + } + momentum_optimizer_value: 0.9 + } + use_moving_average: false + } + max_number_of_boxes: 100 + unpad_groundtruth_tensors: false +} + +train_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/train2017-?????-of-00256.tfrecord" + } +} + +eval_config: { + metrics_set: "coco_detection_metrics" + use_moving_averages: false +} + +eval_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + shuffle: false + num_epochs: 1 + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/val2017-?????-of-00032.tfrecord" + } +} diff --git a/research/object_detection/configs/tf2/ssd_resnet152_v1_fpn_640x640_coco17_tpu-8.config b/research/object_detection/configs/tf2/ssd_resnet152_v1_fpn_640x640_coco17_tpu-8.config new file mode 100644 index 00000000000..aa99f0a115e --- /dev/null +++ b/research/object_detection/configs/tf2/ssd_resnet152_v1_fpn_640x640_coco17_tpu-8.config @@ -0,0 +1,197 @@ +# SSD with Resnet 152 v1 FPN feature extractor, shared box predictor and focal +# loss (a.k.a Retinanet). +# See Lin et al, https://arxiv.org/abs/1708.02002 +# Trained on COCO, initialized from Imagenet classification checkpoint +# Train on TPU-8 +# +# Achieves 35.6 mAP on COCO17 Val + +model { + ssd { + inplace_batchnorm_update: true + freeze_batchnorm: false + num_classes: 90 + box_coder { + faster_rcnn_box_coder { + y_scale: 10.0 + x_scale: 10.0 + height_scale: 5.0 + width_scale: 5.0 + } + } + matcher { + argmax_matcher { + matched_threshold: 0.5 + unmatched_threshold: 0.5 + ignore_thresholds: false + negatives_lower_than_unmatched: true + force_match_for_each_row: true + use_matmul_gather: true + } + } + similarity_calculator { + iou_similarity { + } + } + encode_background_as_zeros: true + anchor_generator { + multiscale_anchor_generator { + min_level: 3 + max_level: 7 + anchor_scale: 4.0 + aspect_ratios: [1.0, 2.0, 0.5] + scales_per_octave: 2 + } + } + image_resizer { + fixed_shape_resizer { + height: 640 + width: 640 + } + } + box_predictor { + weight_shared_convolutional_box_predictor { + depth: 256 + class_prediction_bias_init: -4.6 + conv_hyperparams { + activation: RELU_6, + regularizer { + l2_regularizer { + weight: 0.0004 + } + } + initializer { + random_normal_initializer { + stddev: 0.01 + mean: 0.0 + } + } + batch_norm { + scale: true, + decay: 0.997, + epsilon: 0.001, + } + } + num_layers_before_predictor: 4 + kernel_size: 3 + } + } + feature_extractor { + type: 'ssd_resnet152_v1_fpn_keras' + fpn { + min_level: 3 + max_level: 7 + } + min_depth: 16 + depth_multiplier: 1.0 + conv_hyperparams { + activation: RELU_6, + regularizer { + l2_regularizer { + weight: 0.0004 + } + } + initializer { + truncated_normal_initializer { + stddev: 0.03 + mean: 0.0 + } + } + batch_norm { + scale: true, + decay: 0.997, + epsilon: 0.001, + } + } + override_base_feature_extractor_hyperparams: true + } + loss { + classification_loss { + weighted_sigmoid_focal { + alpha: 0.25 + gamma: 2.0 + } + } + localization_loss { + weighted_smooth_l1 { + } + } + classification_weight: 1.0 + localization_weight: 1.0 + } + normalize_loss_by_num_matches: true + normalize_loc_loss_by_codesize: true + post_processing { + batch_non_max_suppression { + score_threshold: 1e-8 + iou_threshold: 0.6 + max_detections_per_class: 100 + max_total_detections: 100 + } + score_converter: SIGMOID + } + } +} + +train_config: { + fine_tune_checkpoint_version: V2 + fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/resnet152.ckpt-1" + fine_tune_checkpoint_type: "classification" + batch_size: 64 + sync_replicas: true + startup_delay_steps: 0 + replicas_to_aggregate: 8 + use_bfloat16: true + num_steps: 25000 + data_augmentation_options { + random_horizontal_flip { + } + } + data_augmentation_options { + random_crop_image { + min_object_covered: 0.0 + min_aspect_ratio: 0.75 + max_aspect_ratio: 3.0 + min_area: 0.75 + max_area: 1.0 + overlap_thresh: 0.0 + } + } + optimizer { + momentum_optimizer: { + learning_rate: { + cosine_decay_learning_rate { + learning_rate_base: .04 + total_steps: 25000 + warmup_learning_rate: .013333 + warmup_steps: 2000 + } + } + momentum_optimizer_value: 0.9 + } + use_moving_average: false + } + max_number_of_boxes: 100 + unpad_groundtruth_tensors: false +} + +train_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/train2017-?????-of-00256.tfrecord" + } +} + +eval_config: { + metrics_set: "coco_detection_metrics" + use_moving_averages: false +} + +eval_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + shuffle: false + num_epochs: 1 + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/val2017-?????-of-00032.tfrecord" + } +} diff --git a/research/object_detection/configs/tf2/ssd_resnet50_v1_fpn_1024x1024_coco17_tpu-8.config b/research/object_detection/configs/tf2/ssd_resnet50_v1_fpn_1024x1024_coco17_tpu-8.config new file mode 100644 index 00000000000..e1575a00299 --- /dev/null +++ b/research/object_detection/configs/tf2/ssd_resnet50_v1_fpn_1024x1024_coco17_tpu-8.config @@ -0,0 +1,197 @@ +# SSD with Resnet 50 v1 FPN feature extractor, shared box predictor and focal +# loss (a.k.a Retinanet). +# See Lin et al, https://arxiv.org/abs/1708.02002 +# Trained on COCO, initialized from Imagenet classification checkpoint +# Train on TPU-8 +# +# Achieves 38.3 mAP on COCO17 Val + +model { + ssd { + inplace_batchnorm_update: true + freeze_batchnorm: false + num_classes: 90 + box_coder { + faster_rcnn_box_coder { + y_scale: 10.0 + x_scale: 10.0 + height_scale: 5.0 + width_scale: 5.0 + } + } + matcher { + argmax_matcher { + matched_threshold: 0.5 + unmatched_threshold: 0.5 + ignore_thresholds: false + negatives_lower_than_unmatched: true + force_match_for_each_row: true + use_matmul_gather: true + } + } + similarity_calculator { + iou_similarity { + } + } + encode_background_as_zeros: true + anchor_generator { + multiscale_anchor_generator { + min_level: 3 + max_level: 7 + anchor_scale: 4.0 + aspect_ratios: [1.0, 2.0, 0.5] + scales_per_octave: 2 + } + } + image_resizer { + fixed_shape_resizer { + height: 1024 + width: 1024 + } + } + box_predictor { + weight_shared_convolutional_box_predictor { + depth: 256 + class_prediction_bias_init: -4.6 + conv_hyperparams { + activation: RELU_6, + regularizer { + l2_regularizer { + weight: 0.0004 + } + } + initializer { + random_normal_initializer { + stddev: 0.01 + mean: 0.0 + } + } + batch_norm { + scale: true, + decay: 0.997, + epsilon: 0.001, + } + } + num_layers_before_predictor: 4 + kernel_size: 3 + } + } + feature_extractor { + type: 'ssd_resnet50_v1_fpn_keras' + fpn { + min_level: 3 + max_level: 7 + } + min_depth: 16 + depth_multiplier: 1.0 + conv_hyperparams { + activation: RELU_6, + regularizer { + l2_regularizer { + weight: 0.0004 + } + } + initializer { + truncated_normal_initializer { + stddev: 0.03 + mean: 0.0 + } + } + batch_norm { + scale: true, + decay: 0.997, + epsilon: 0.001, + } + } + override_base_feature_extractor_hyperparams: true + } + loss { + classification_loss { + weighted_sigmoid_focal { + alpha: 0.25 + gamma: 2.0 + } + } + localization_loss { + weighted_smooth_l1 { + } + } + classification_weight: 1.0 + localization_weight: 1.0 + } + normalize_loss_by_num_matches: true + normalize_loc_loss_by_codesize: true + post_processing { + batch_non_max_suppression { + score_threshold: 1e-8 + iou_threshold: 0.6 + max_detections_per_class: 100 + max_total_detections: 100 + } + score_converter: SIGMOID + } + } +} + +train_config: { + fine_tune_checkpoint_version: V2 + fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/resnet50.ckpt-1" + fine_tune_checkpoint_type: "classification" + batch_size: 64 + sync_replicas: true + startup_delay_steps: 0 + replicas_to_aggregate: 8 + use_bfloat16: true + num_steps: 100000 + data_augmentation_options { + random_horizontal_flip { + } + } + data_augmentation_options { + random_crop_image { + min_object_covered: 0.0 + min_aspect_ratio: 0.75 + max_aspect_ratio: 3.0 + min_area: 0.75 + max_area: 1.0 + overlap_thresh: 0.0 + } + } + optimizer { + momentum_optimizer: { + learning_rate: { + cosine_decay_learning_rate { + learning_rate_base: .04 + total_steps: 100000 + warmup_learning_rate: .013333 + warmup_steps: 2000 + } + } + momentum_optimizer_value: 0.9 + } + use_moving_average: false + } + max_number_of_boxes: 100 + unpad_groundtruth_tensors: false +} + +train_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/train2017-?????-of-00256.tfrecord" + } +} + +eval_config: { + metrics_set: "coco_detection_metrics" + use_moving_averages: false +} + +eval_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + shuffle: false + num_epochs: 1 + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/val2017-?????-of-00032.tfrecord" + } +} diff --git a/research/object_detection/configs/tf2/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.config b/research/object_detection/configs/tf2/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.config new file mode 100644 index 00000000000..7164144b730 --- /dev/null +++ b/research/object_detection/configs/tf2/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.config @@ -0,0 +1,197 @@ +# SSD with Resnet 50 v1 FPN feature extractor, shared box predictor and focal +# loss (a.k.a Retinanet). +# See Lin et al, https://arxiv.org/abs/1708.02002 +# Trained on COCO, initialized from Imagenet classification checkpoint +# Train on TPU-8 +# +# Achieves 34.3 mAP on COCO17 Val + +model { + ssd { + inplace_batchnorm_update: true + freeze_batchnorm: false + num_classes: 90 + box_coder { + faster_rcnn_box_coder { + y_scale: 10.0 + x_scale: 10.0 + height_scale: 5.0 + width_scale: 5.0 + } + } + matcher { + argmax_matcher { + matched_threshold: 0.5 + unmatched_threshold: 0.5 + ignore_thresholds: false + negatives_lower_than_unmatched: true + force_match_for_each_row: true + use_matmul_gather: true + } + } + similarity_calculator { + iou_similarity { + } + } + encode_background_as_zeros: true + anchor_generator { + multiscale_anchor_generator { + min_level: 3 + max_level: 7 + anchor_scale: 4.0 + aspect_ratios: [1.0, 2.0, 0.5] + scales_per_octave: 2 + } + } + image_resizer { + fixed_shape_resizer { + height: 640 + width: 640 + } + } + box_predictor { + weight_shared_convolutional_box_predictor { + depth: 256 + class_prediction_bias_init: -4.6 + conv_hyperparams { + activation: RELU_6, + regularizer { + l2_regularizer { + weight: 0.0004 + } + } + initializer { + random_normal_initializer { + stddev: 0.01 + mean: 0.0 + } + } + batch_norm { + scale: true, + decay: 0.997, + epsilon: 0.001, + } + } + num_layers_before_predictor: 4 + kernel_size: 3 + } + } + feature_extractor { + type: 'ssd_resnet50_v1_fpn_keras' + fpn { + min_level: 3 + max_level: 7 + } + min_depth: 16 + depth_multiplier: 1.0 + conv_hyperparams { + activation: RELU_6, + regularizer { + l2_regularizer { + weight: 0.0004 + } + } + initializer { + truncated_normal_initializer { + stddev: 0.03 + mean: 0.0 + } + } + batch_norm { + scale: true, + decay: 0.997, + epsilon: 0.001, + } + } + override_base_feature_extractor_hyperparams: true + } + loss { + classification_loss { + weighted_sigmoid_focal { + alpha: 0.25 + gamma: 2.0 + } + } + localization_loss { + weighted_smooth_l1 { + } + } + classification_weight: 1.0 + localization_weight: 1.0 + } + normalize_loss_by_num_matches: true + normalize_loc_loss_by_codesize: true + post_processing { + batch_non_max_suppression { + score_threshold: 1e-8 + iou_threshold: 0.6 + max_detections_per_class: 100 + max_total_detections: 100 + } + score_converter: SIGMOID + } + } +} + +train_config: { + fine_tune_checkpoint_version: V2 + fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/resnet50.ckpt-1" + fine_tune_checkpoint_type: "classification" + batch_size: 64 + sync_replicas: true + startup_delay_steps: 0 + replicas_to_aggregate: 8 + use_bfloat16: true + num_steps: 25000 + data_augmentation_options { + random_horizontal_flip { + } + } + data_augmentation_options { + random_crop_image { + min_object_covered: 0.0 + min_aspect_ratio: 0.75 + max_aspect_ratio: 3.0 + min_area: 0.75 + max_area: 1.0 + overlap_thresh: 0.0 + } + } + optimizer { + momentum_optimizer: { + learning_rate: { + cosine_decay_learning_rate { + learning_rate_base: .04 + total_steps: 25000 + warmup_learning_rate: .013333 + warmup_steps: 2000 + } + } + momentum_optimizer_value: 0.9 + } + use_moving_average: false + } + max_number_of_boxes: 100 + unpad_groundtruth_tensors: false +} + +train_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/train2017-?????-of-00256.tfrecord" + } +} + +eval_config: { + metrics_set: "coco_detection_metrics" + use_moving_averages: false +} + +eval_input_reader: { + label_map_path: "PATH_TO_BE_CONFIGURED/label_map.txt" + shuffle: false + num_epochs: 1 + tf_record_input_reader { + input_path: "PATH_TO_BE_CONFIGURED/val2017-?????-of-00032.tfrecord" + } +} diff --git a/research/object_detection/g3doc/challenge_evaluation.md b/research/object_detection/g3doc/challenge_evaluation.md index d8ea21017d1..15f032d4e8a 100644 --- a/research/object_detection/g3doc/challenge_evaluation.md +++ b/research/object_detection/g3doc/challenge_evaluation.md @@ -47,7 +47,7 @@ python object_detection/dataset_tools/oid_hierarchical_labels_expansion.py \ --annotation_type=2 ``` -1. If you are not using Tensorflow, you can run evaluation directly using your +1. If you are not using TensorFlow, you can run evaluation directly using your algorithm's output and generated ground-truth files. {value=4} After step 3 you produced the ground-truth files suitable for running 'OID @@ -73,7 +73,7 @@ For the Object Detection Track, the participants will be ranked on: - "OpenImagesDetectionChallenge_Precision/mAP@0.5IOU" -To use evaluation within Tensorflow training, use metric name +To use evaluation within TensorFlow training, use metric name `oid_challenge_detection_metrics` in the evaluation config. ## Instance Segmentation Track @@ -130,7 +130,7 @@ python object_detection/dataset_tools/oid_hierarchical_labels_expansion.py \ --annotation_type=1 ``` -1. If you are not using Tensorflow, you can run evaluation directly using your +1. If you are not using TensorFlow, you can run evaluation directly using your algorithm's output and generated ground-truth files. {value=4} ``` diff --git a/research/object_detection/g3doc/configuring_jobs.md b/research/object_detection/g3doc/configuring_jobs.md index c088169bc99..59925f293b5 100644 --- a/research/object_detection/g3doc/configuring_jobs.md +++ b/research/object_detection/g3doc/configuring_jobs.md @@ -2,7 +2,7 @@ ## Overview -The Tensorflow Object Detection API uses protobuf files to configure the +The TensorFlow Object Detection API uses protobuf files to configure the training and evaluation process. The schema for the training pipeline can be found in object_detection/protos/pipeline.proto. At a high level, the config file is split into 5 parts: @@ -60,7 +60,7 @@ to a value suited for the dataset the user is training on. ## Defining Inputs -The Tensorflow Object Detection API accepts inputs in the TFRecord file format. +The TensorFlow Object Detection API accepts inputs in the TFRecord file format. Users must specify the locations of both the training and evaluation files. Additionally, users should also specify a label map, which define the mapping between a class id and class name. The label map should be identical between @@ -126,24 +126,6 @@ data_augmentation_options { } ``` -### Model Parameter Initialization - -While optional, it is highly recommended that users utilize other object -detection checkpoints. Training an object detector from scratch can take days. -To speed up the training process, it is recommended that users re-use the -feature extractor parameters from a pre-existing image classification or -object detection checkpoint. `train_config` provides two fields to specify -pre-existing checkpoints: `fine_tune_checkpoint` and -`from_detection_checkpoint`. `fine_tune_checkpoint` should provide a path to -the pre-existing checkpoint -(ie:"/usr/home/username/checkpoint/model.ckpt-#####"). -`from_detection_checkpoint` is a boolean value. If false, it assumes the -checkpoint was from an object classification checkpoint. Note that starting -from a detection checkpoint will usually result in a faster training job than -a classification checkpoint. - -The list of provided checkpoints can be found [here](detection_model_zoo.md). - ### Input Preprocessing The `data_augmentation_options` in `train_config` can be used to specify diff --git a/research/object_detection/g3doc/context_rcnn.md b/research/object_detection/g3doc/context_rcnn.md index 8d132b15b28..14a42d89afe 100644 --- a/research/object_detection/g3doc/context_rcnn.md +++ b/research/object_detection/g3doc/context_rcnn.md @@ -1,5 +1,7 @@ # Context R-CNN +[![TensorFlow 1.15](https://img.shields.io/badge/TensorFlow-1.15-FF6F00?logo=tensorflow)](https://github.com/tensorflow/tensorflow/releases/tag/v1.15.0) + Context R-CNN is an object detection model that uses contextual features to improve object detection. See https://arxiv.org/abs/1912.03538 for more details. diff --git a/research/object_detection/g3doc/defining_your_own_model.md b/research/object_detection/g3doc/defining_your_own_model.md index 865f6af169b..dabc0649f6e 100644 --- a/research/object_detection/g3doc/defining_your_own_model.md +++ b/research/object_detection/g3doc/defining_your_own_model.md @@ -2,14 +2,14 @@ In this section, we discuss some of the abstractions that we use for defining detection models. If you would like to define a new model -architecture for detection and use it in the Tensorflow Detection API, +architecture for detection and use it in the TensorFlow Detection API, then this section should also serve as a high level guide to the files that you will need to edit to get your new model working. ## DetectionModels (`object_detection/core/model.py`) In order to be trained, evaluated, and exported for serving using our -provided binaries, all models under the Tensorflow Object Detection API must +provided binaries, all models under the TensorFlow Object Detection API must implement the `DetectionModel` interface (see the full definition in `object_detection/core/model.py`). In particular, each of these models are responsible for implementing 5 functions: @@ -20,7 +20,7 @@ each of these models are responsible for implementing 5 functions: postprocess functions. * `postprocess`: Convert predicted output tensors to final detections. * `loss`: Compute scalar loss tensors with respect to provided groundtruth. -* `restore`: Load a checkpoint into the Tensorflow graph. +* `restore`: Load a checkpoint into the TensorFlow graph. Given a `DetectionModel` at training time, we pass each image batch through the following sequence of functions to compute a loss which can be optimized via @@ -87,7 +87,7 @@ functions: * `_extract_box_classifier_features`: Extract second stage Box Classifier features. * `restore_from_classification_checkpoint_fn`: Load a checkpoint into the - Tensorflow graph. + TensorFlow graph. See the `object_detection/models/faster_rcnn_resnet_v1_feature_extractor.py` definition as one example. Some remarks: diff --git a/research/object_detection/g3doc/evaluation_protocols.md b/research/object_detection/g3doc/evaluation_protocols.md index e431fa7233e..d5a070f6bc0 100644 --- a/research/object_detection/g3doc/evaluation_protocols.md +++ b/research/object_detection/g3doc/evaluation_protocols.md @@ -1,6 +1,6 @@ # Supported object detection evaluation protocols -The Tensorflow Object Detection API currently supports three evaluation protocols, +The TensorFlow Object Detection API currently supports three evaluation protocols, that can be configured in `EvalConfig` by setting `metrics_set` to the corresponding value. diff --git a/research/object_detection/g3doc/exporting_models.md b/research/object_detection/g3doc/exporting_models.md index c64408302e9..701acf3c430 100644 --- a/research/object_detection/g3doc/exporting_models.md +++ b/research/object_detection/g3doc/exporting_models.md @@ -1,6 +1,8 @@ # Exporting a trained model for inference -After your model has been trained, you should export it to a Tensorflow +[![TensorFlow 1.15](https://img.shields.io/badge/TensorFlow-1.15-FF6F00?logo=tensorflow)](https://github.com/tensorflow/tensorflow/releases/tag/v1.15.0) + +After your model has been trained, you should export it to a TensorFlow graph proto. A checkpoint will typically consist of three files: * model.ckpt-${CHECKPOINT_NUMBER}.data-00000-of-00001 diff --git a/research/object_detection/g3doc/faq.md b/research/object_detection/g3doc/faq.md index c0ca503fc6e..f2a6e30ccf7 100644 --- a/research/object_detection/g3doc/faq.md +++ b/research/object_detection/g3doc/faq.md @@ -22,6 +22,6 @@ A: Similar to BackupHandler, syncing your fork to HEAD should make it work. ## Q: Why can't I get the inference time as reported in model zoo? A: The inference time reported in model zoo is mean time of testing hundreds of images with an internal machine. As mentioned in -[Tensorflow detection model zoo](detection_model_zoo.md), this speed depends +[TensorFlow detection model zoo](tf1_detection_zoo.md), this speed depends highly on one's specific hardware configuration and should be treated more as relative timing. diff --git a/research/object_detection/g3doc/installation.md b/research/object_detection/g3doc/installation.md deleted file mode 100644 index 05c891802af..00000000000 --- a/research/object_detection/g3doc/installation.md +++ /dev/null @@ -1,184 +0,0 @@ -# Installation - -## Dependencies - -Tensorflow Object Detection API depends on the following libraries: - -* Protobuf 3.0.0 -* Python-tk -* Pillow 1.0 -* lxml -* tf-slim (https://github.com/google-research/tf-slim) -* slim (which is included in the "tensorflow/models/research/" checkout) -* Jupyter notebook -* Matplotlib -* Tensorflow (1.15.0) -* Cython -* contextlib2 -* cocoapi - -For detailed steps to install Tensorflow, follow the [Tensorflow installation -instructions](https://www.tensorflow.org/install/). A typical user can install -Tensorflow using one of the following commands: - -``` bash -# For CPU -pip install tensorflow -# For GPU -pip install tensorflow-gpu -``` - -The remaining libraries can be installed on Ubuntu 16.04 using via apt-get: - -```bash -sudo apt-get install protobuf-compiler python-pil python-lxml python-tk -pip install --user Cython -pip install --user contextlib2 -pip install --user jupyter -pip install --user matplotlib -pip install --user tf_slim -``` - -Alternatively, users can install dependencies using pip: - -```bash -pip install --user Cython -pip install --user contextlib2 -pip install --user pillow -pip install --user lxml -pip install --user jupyter -pip install --user matplotlib -pip install --user tf_slim -``` - - -**Note**: sometimes "sudo apt-get install protobuf-compiler" will install -Protobuf 3+ versions for you and some users have issues when using 3.5. -If that is your case, try the [manual](#Manual-protobuf-compiler-installation-and-usage) installation. - -## Download the tensorflow/models repository - -```bash -git clone https://github.com/tensorflow/models.git -``` - -To use this library, you need to download this repository, whenever it says -`` it will be referring to the folder that you downloaded -this repository into. - -## COCO API installation - -Download the -[cocoapi](https://github.com/cocodataset/cocoapi) and -copy the pycocotools subfolder to the tensorflow/models/research directory if -you are interested in using COCO evaluation metrics. The default metrics are -based on those used in Pascal VOC evaluation. To use the COCO object detection -metrics add `metrics_set: "coco_detection_metrics"` to the `eval_config` message -in the config file. To use the COCO instance segmentation metrics add -`metrics_set: "coco_mask_metrics"` to the `eval_config` message in the config -file. - -```bash -git clone https://github.com/cocodataset/cocoapi.git -cd cocoapi/PythonAPI -make -cp -r pycocotools /models/research/ -``` - -Alternatively, users can install `pycocotools` using pip: - -```bash -pip install --user pycocotools -``` - -## Protobuf Compilation - -The Tensorflow Object Detection API uses Protobufs to configure model and -training parameters. Before the framework can be used, the Protobuf libraries -must be compiled. This should be done by running the following command from -the [tensorflow/models/research/ -](https://github.com/tensorflow/models/tree/master/research/) -directory: - - -``` bash -# From tensorflow/models/research/ -protoc object_detection/protos/*.proto --python_out=. -``` - -**Note**: If you're getting errors while compiling, you might be using an incompatible protobuf compiler. If that's the case, use the following manual installation - -## Manual protobuf-compiler installation and usage - -**If you are on linux:** - -Download and install the 3.0 release of protoc, then unzip the file. - -```bash -# From tensorflow/models/research/ -wget -O protobuf.zip https://github.com/google/protobuf/releases/download/v3.0.0/protoc-3.0.0-linux-x86_64.zip -unzip protobuf.zip -``` - -Run the compilation process again, but use the downloaded version of protoc - -```bash -# From tensorflow/models/research/ -./bin/protoc object_detection/protos/*.proto --python_out=. -``` - -**If you are on MacOS:** - -If you have homebrew, download and install the protobuf with -```brew install protobuf``` - -Alternately, run: -```PROTOC_ZIP=protoc-3.3.0-osx-x86_64.zip -curl -OL https://github.com/google/protobuf/releases/download/v3.3.0/$PROTOC_ZIP -sudo unzip -o $PROTOC_ZIP -d /usr/local bin/protoc -rm -f $PROTOC_ZIP -``` - -Run the compilation process again: - -``` bash -# From tensorflow/models/research/ -protoc object_detection/protos/*.proto --python_out=. -``` - -## Add Libraries to PYTHONPATH - -When running locally, the tensorflow/models/research/ and slim directories -should be appended to PYTHONPATH. This can be done by running the following from -tensorflow/models/research/: - - -``` bash -# From tensorflow/models/research/ -export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim -``` - -Note: This command needs to run from every new terminal you start. If you wish -to avoid running this manually, you can add it as a new line to the end of your -~/.bashrc file, replacing \`pwd\` with the absolute path of -tensorflow/models/research on your system. After updating ~/.bashrc file you -can run the following command: - -Note: Some of the functions defined in tensorflow/models/research/slim has been -moved to [tf-slim](https://github.com/google-research/tf-slim), so installing -tf_slim is required now. - -``` bash -source ~/.bashrc -``` - -# Testing the Installation - -You can test that you have correctly installed the Tensorflow Object Detection\ -API by running the following command: - - -```bash -# If using Tensorflow 1.X: -python object_detection/builders/model_builder_tf1_test.py -``` diff --git a/research/object_detection/g3doc/instance_segmentation.md b/research/object_detection/g3doc/instance_segmentation.md index 8ebf7d8c3d7..f9b4856c90f 100644 --- a/research/object_detection/g3doc/instance_segmentation.md +++ b/research/object_detection/g3doc/instance_segmentation.md @@ -67,7 +67,7 @@ your own models: 1. mask_rcnn_resnet50_atrous_coco 1. mask_rcnn_inception_v2_coco -For more details see the [detection model zoo](detection_model_zoo.md). +For more details see the [detection model zoo](tf1_detection_zoo.md). ### Updating a Faster R-CNN config file diff --git a/research/object_detection/g3doc/oid_inference_and_evaluation.md b/research/object_detection/g3doc/oid_inference_and_evaluation.md index 4babf10a276..d54ad23940b 100644 --- a/research/object_detection/g3doc/oid_inference_and_evaluation.md +++ b/research/object_detection/g3doc/oid_inference_and_evaluation.md @@ -113,10 +113,10 @@ computations on subsets of the validation and test sets. ## Inferring detections Inference requires a trained object detection model. In this tutorial we will -use a model from the [detections model zoo](detection_model_zoo.md), which can +use a model from the [detections model zoo](tf1_detection_zoo.md), which can be downloaded and unpacked by running the commands below. More information about the model, such as its architecture and how it was trained, is available in the -[model zoo page](detection_model_zoo.md). +[model zoo page](tf1_detection_zoo.md). ```bash # From tensorflow/models/research/oid diff --git a/research/object_detection/g3doc/preparing_inputs.md b/research/object_detection/g3doc/preparing_inputs.md index 8e690e8c345..7e8df08502b 100644 --- a/research/object_detection/g3doc/preparing_inputs.md +++ b/research/object_detection/g3doc/preparing_inputs.md @@ -1,6 +1,6 @@ # Preparing Inputs -Tensorflow Object Detection API reads data using the TFRecord file format. Two +TensorFlow Object Detection API reads data using the TFRecord file format. Two sample scripts (`create_pascal_tf_record.py` and `create_pet_tf_record.py`) are provided to convert from the PASCAL VOC dataset and Oxford-IIIT Pet dataset to TFRecords. diff --git a/research/object_detection/g3doc/release_notes.md b/research/object_detection/g3doc/release_notes.md new file mode 100644 index 00000000000..f69727d5d85 --- /dev/null +++ b/research/object_detection/g3doc/release_notes.md @@ -0,0 +1,339 @@ +# Release Notes + +### July 10th, 2020 + +We are happy to announce that the TF OD API officially supports TF2! Our release +includes: + +* New binaries for train/eval/export that are designed to run in eager mode. +* A suite of TF2 compatible (Keras-based) models; this includes migrations of + our most popular TF1.x models (e.g., SSD with MobileNet, RetinaNet, Faster + R-CNN, Mask R-CNN), as well as a few new architectures for which we will + only maintain TF2 implementations: + + 1. CenterNet - a simple and effective anchor-free architecture based on the + recent [Objects as Points](https://arxiv.org/abs/1904.07850) paper by + Zhou et al + 2. [EfficientDet](https://arxiv.org/abs/1911.09070) - a recent family of + SOTA models discovered with the help of Neural Architecture Search. + +* COCO pre-trained weights for all of the models provided as TF2 style + object-based checkpoints. + +* Access to + [Distribution Strategies](https://www.tensorflow.org/guide/distributed_training) + for distributed training --- our model are designed to be trainable using + sync multi-GPU and TPU platforms. + +* Colabs demo’ing eager mode training and inference. + +See our release blogpost +[here](https://blog.tensorflow.org/2020/07/tensorflow-2-meets-object-detection-api.html). +If you are an existing user of the TF OD API using TF 1.x, don’t worry, we’ve +got you covered. + +**Thanks to contributors**: Akhil Chinnakotla, Allen Lavoie, Anirudh Vegesana, +Anjali Sridhar, Austin Myers, Dan Kondratyuk, David Ross, Derek Chow, Jaeyoun +Kim, Jing Li, Jonathan Huang, Jordi Pont-Tuset, Karmel Allison, Kathy Ruan, +Kaushik Shivakumar, Lu He, Mingxing Tan, Pengchong Jin, Ronny Votel, Sara Beery, +Sergi Caelles Prat, Shan Yang, Sudheendra Vijayanarasimhan, Tina Tian, Tomer +Kaftan, Vighnesh Birodkar, Vishnu Banna, Vivek Rathod, Yanhui Liang, Yiming Shi, +Yixin Shi, Yu-hui Chen, Zhichao Lu. + +### June 17th, 2020 + +We have released [Context R-CNN](https://arxiv.org/abs/1912.03538), a model that +uses attention to incorporate contextual information images (e.g. from +temporally nearby frames taken by a static camera) in order to improve accuracy. +Importantly, these contextual images need not be labeled. + +* When applied to a challenging wildlife detection dataset + ([Snapshot Serengeti](http://lila.science/datasets/snapshot-serengeti)), + Context R-CNN with context from up to a month of images outperforms a + single-frame baseline by 17.9% mAP, and outperforms S3D (a 3d convolution + based baseline) by 11.2% mAP. +* Context R-CNN leverages temporal context from the unlabeled frames of a + novel camera deployment to improve performance at that camera, boosting + model generalizeability. + +Read about Context R-CNN on the Google AI blog +[here](https://ai.googleblog.com/2020/06/leveraging-temporal-context-for-object.html). + +We have provided code for generating data with associated context +[here](context_rcnn.md), and a sample config for a Context R-CNN model +[here](../samples/configs/context_rcnn_resnet101_snapshot_serengeti_sync.config). + +Snapshot Serengeti-trained Faster R-CNN and Context R-CNN models can be found in +the +[model zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf1_detection_zoo.md#snapshot-serengeti-camera-trap-trained-models). + +A colab demonstrating Context R-CNN is provided +[here](../colab_tutorials/context_rcnn_tutorial.ipynb). + +Thanks to contributors: Sara Beery, Jonathan Huang, Guanhang Wu, Vivek +Rathod, Ronny Votel, Zhichao Lu, David Ross, Pietro Perona, Tanya Birch, and the +Wildlife Insights AI Team. + +### May 19th, 2020 + +We have released [MobileDets](https://arxiv.org/abs/2004.14525), a set of +high-performance models for mobile CPUs, DSPs and EdgeTPUs. + +* MobileDets outperform MobileNetV3+SSDLite by 1.7 mAP at comparable mobile + CPU inference latencies. MobileDets also outperform MobileNetV2+SSDLite by + 1.9 mAP on mobile CPUs, 3.7 mAP on EdgeTPUs and 3.4 mAP on DSPs while + running equally fast. MobileDets also offer up to 2x speedup over MnasFPN on + EdgeTPUs and DSPs. + +For each of the three hardware platforms we have released model definition, +model checkpoints trained on the COCO14 dataset and converted TFLite models in +fp32 and/or uint8. + +Thanks to contributors: Yunyang Xiong, Hanxiao Liu, Suyog Gupta, Berkin +Akin, Gabriel Bender, Pieter-Jan Kindermans, Mingxing Tan, Vikas Singh, Bo Chen, +Quoc Le, Zhichao Lu. + +### May 7th, 2020 + +We have released a mobile model with the +[MnasFPN head](https://arxiv.org/abs/1912.01106). + +* MnasFPN with MobileNet-V2 backbone is the most accurate (26.6 mAP at 183ms + on Pixel 1) mobile detection model we have released to date. With + depth-multiplier, MnasFPN with MobileNet-V2 backbone is 1.8 mAP higher than + MobileNet-V3-Large with SSDLite (23.8 mAP vs 22.0 mAP) at similar latency + (120ms) on Pixel 1. + +We have released model definition, model checkpoints trained on the COCO14 +dataset and a converted TFLite model. + +Thanks to contributors: Bo Chen, Golnaz Ghiasi, Hanxiao Liu, Tsung-Yi +Lin, Dmitry Kalenichenko, Hartwig Adam, Quoc Le, Zhichao Lu, Jonathan Huang, Hao +Xu. + +### Nov 13th, 2019 + +We have released MobileNetEdgeTPU SSDLite model. + +* SSDLite with MobileNetEdgeTPU backbone, which achieves 10% mAP higher than + MobileNetV2 SSDLite (24.3 mAP vs 22 mAP) on a Google Pixel4 at comparable + latency (6.6ms vs 6.8ms). + +Along with the model definition, we are also releasing model checkpoints trained +on the COCO dataset. + +Thanks to contributors: Yunyang Xiong, Bo Chen, Suyog Gupta, Hanxiao Liu, +Gabriel Bender, Mingxing Tan, Berkin Akin, Zhichao Lu, Quoc Le + +### Oct 15th, 2019 + +We have released two MobileNet V3 SSDLite models (presented in +[Searching for MobileNetV3](https://arxiv.org/abs/1905.02244)). + +* SSDLite with MobileNet-V3-Large backbone, which is 27% faster than Mobilenet + V2 SSDLite (119ms vs 162ms) on a Google Pixel phone CPU at the same mAP. +* SSDLite with MobileNet-V3-Small backbone, which is 37% faster than MnasNet + SSDLite reduced with depth-multiplier (43ms vs 68ms) at the same mAP. + +Along with the model definition, we are also releasing model checkpoints trained +on the COCO dataset. + +Thanks to contributors: Bo Chen, Zhichao Lu, Vivek Rathod, Jonathan Huang + +### July 1st, 2019 + +We have released an updated set of utils and an updated +[tutorial](challenge_evaluation.md) for all three tracks of the +[Open Images Challenge 2019](https://storage.googleapis.com/openimages/web/challenge2019.html)! + +The Instance Segmentation metric for +[Open Images V5](https://storage.googleapis.com/openimages/web/index.html) and +[Challenge 2019](https://storage.googleapis.com/openimages/web/challenge2019.html) +is part of this release. Check out +[the metric description](https://storage.googleapis.com/openimages/web/evaluation.html#instance_segmentation_eval) +on the Open Images website. + +Thanks to contributors: Alina Kuznetsova, Rodrigo Benenson + +### Feb 11, 2019 + +We have released detection models trained on the Open Images Dataset V4 in our +detection model zoo, including + +* Faster R-CNN detector with Inception Resnet V2 feature extractor +* SSD detector with MobileNet V2 feature extractor +* SSD detector with ResNet 101 FPN feature extractor (aka RetinaNet-101) + +Thanks to contributors: Alina Kuznetsova, Yinxiao Li + +### Sep 17, 2018 + +We have released Faster R-CNN detectors with ResNet-50 / ResNet-101 feature +extractors trained on the +[iNaturalist Species Detection Dataset](https://github.com/visipedia/inat_comp/blob/master/2017/README.md#bounding-boxes). +The models are trained on the training split of the iNaturalist data for 4M +iterations, they achieve 55% and 58% mean AP@.5 over 2854 classes respectively. +For more details please refer to this [paper](https://arxiv.org/abs/1707.06642). + +Thanks to contributors: Chen Sun + +### July 13, 2018 + +There are many new updates in this release, extending the functionality and +capability of the API: + +* Moving from slim-based training to + [Estimator](https://www.tensorflow.org/api_docs/python/tf/estimator/Estimator)-based + training. +* Support for [RetinaNet](https://arxiv.org/abs/1708.02002), and a + [MobileNet](https://ai.googleblog.com/2017/06/mobilenets-open-source-models-for.html) + adaptation of RetinaNet. +* A novel SSD-based architecture called the + [Pooling Pyramid Network](https://arxiv.org/abs/1807.03284) (PPN). +* Releasing several [TPU](https://cloud.google.com/tpu/)-compatible models. + These can be found in the `samples/configs/` directory with a comment in the + pipeline configuration files indicating TPU compatibility. +* Support for quantized training. +* Updated documentation for new binaries, Cloud training, and + [TensorFlow Lite](https://www.tensorflow.org/mobile/tflite/). + +See also our +[expanded announcement blogpost](https://ai.googleblog.com/2018/07/accelerated-training-and-inference-with.html) +and accompanying tutorial at the +[TensorFlow blog](https://medium.com/tensorflow/training-and-serving-a-realtime-mobile-object-detector-in-30-minutes-with-cloud-tpus-b78971cf1193). + +Thanks to contributors: Sara Robinson, Aakanksha Chowdhery, Derek Chow, +Pengchong Jin, Jonathan Huang, Vivek Rathod, Zhichao Lu, Ronny Votel + +### June 25, 2018 + +Additional evaluation tools for the +[Open Images Challenge 2018](https://storage.googleapis.com/openimages/web/challenge.html) +are out. Check out our short tutorial on data preparation and running evaluation +[here](challenge_evaluation.md)! + +Thanks to contributors: Alina Kuznetsova + +### June 5, 2018 + +We have released the implementation of evaluation metrics for both tracks of the +[Open Images Challenge 2018](https://storage.googleapis.com/openimages/web/challenge.html) +as a part of the Object Detection API - see the +[evaluation protocols](evaluation_protocols.md) for more details. Additionally, +we have released a tool for hierarchical labels expansion for the Open Images +Challenge: check out +[oid_hierarchical_labels_expansion.py](../dataset_tools/oid_hierarchical_labels_expansion.py). + +Thanks to contributors: Alina Kuznetsova, Vittorio Ferrari, Jasper +Uijlings + +### April 30, 2018 + +We have released a Faster R-CNN detector with ResNet-101 feature extractor +trained on [AVA](https://research.google.com/ava/) v2.1. Compared with other +commonly used object detectors, it changes the action classification loss +function to per-class Sigmoid loss to handle boxes with multiple labels. The +model is trained on the training split of AVA v2.1 for 1.5M iterations, it +achieves mean AP of 11.25% over 60 classes on the validation split of AVA v2.1. +For more details please refer to this [paper](https://arxiv.org/abs/1705.08421). + +Thanks to contributors: Chen Sun, David Ross + +### April 2, 2018 + +Supercharge your mobile phones with the next generation mobile object detector! +We are adding support for MobileNet V2 with SSDLite presented in +[MobileNetV2: Inverted Residuals and Linear Bottlenecks](https://arxiv.org/abs/1801.04381). +This model is 35% faster than Mobilenet V1 SSD on a Google Pixel phone CPU +(200ms vs. 270ms) at the same accuracy. Along with the model definition, we are +also releasing a model checkpoint trained on the COCO dataset. + +Thanks to contributors: Menglong Zhu, Mark Sandler, Zhichao Lu, Vivek +Rathod, Jonathan Huang + +### February 9, 2018 + +We now support instance segmentation!! In this API update we support a number of +instance segmentation models similar to those discussed in the +[Mask R-CNN paper](https://arxiv.org/abs/1703.06870). For further details refer +to [our slides](http://presentations.cocodataset.org/Places17-GMRI.pdf) from the +2017 Coco + Places Workshop. Refer to the section on +[Running an Instance Segmentation Model](instance_segmentation.md) for +instructions on how to configure a model that predicts masks in addition to +object bounding boxes. + +Thanks to contributors: Alireza Fathi, Zhichao Lu, Vivek Rathod, Ronny +Votel, Jonathan Huang + +### November 17, 2017 + +As a part of the Open Images V3 release we have released: + +* An implementation of the Open Images evaluation metric and the + [protocol](evaluation_protocols.md#open-images). +* Additional tools to separate inference of detection and evaluation (see + [this tutorial](oid_inference_and_evaluation.md)). +* A new detection model trained on the Open Images V2 data release (see + [Open Images model](tf1_detection_zoo.md#open-images-models)). + +See more information on the +[Open Images website](https://github.com/openimages/dataset)! + +Thanks to contributors: Stefan Popov, Alina Kuznetsova + +### November 6, 2017 + +We have re-released faster versions of our (pre-trained) models in the +model zoo. In addition to what was available +before, we are also adding Faster R-CNN models trained on COCO with Inception V2 +and Resnet-50 feature extractors, as well as a Faster R-CNN with Resnet-101 +model trained on the KITTI dataset. + +Thanks to contributors: Jonathan Huang, Vivek Rathod, Derek Chow, Tal +Remez, Chen Sun. + +### October 31, 2017 + +We have released a new state-of-the-art model for object detection using the +Faster-RCNN with the +[NASNet-A image featurization](https://arxiv.org/abs/1707.07012). This model +achieves mAP of 43.1% on the test-dev validation dataset for COCO, improving on +the best available model in the zoo by 6% in terms of absolute mAP. + +Thanks to contributors: Barret Zoph, Vijay Vasudevan, Jonathon Shlens, +Quoc Le + +### August 11, 2017 + +We have released an update to the +[Android Detect demo](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/android) +which will now run models trained using the TensorFlow Object Detection API on +an Android device. By default, it currently runs a frozen SSD w/Mobilenet +detector trained on COCO, but we encourage you to try out other detection +models! + +Thanks to contributors: Jonathan Huang, Andrew Harp + +### June 15, 2017 + +In addition to our base TensorFlow detection model definitions, this release +includes: + +* A selection of trainable detection models, including: + * Single Shot Multibox Detector (SSD) with MobileNet, + * SSD with Inception V2, + * Region-Based Fully Convolutional Networks (R-FCN) with Resnet 101, + * Faster RCNN with Resnet 101, + * Faster RCNN with Inception Resnet v2 +* Frozen weights (trained on the COCO dataset) for each of the above models to + be used for out-of-the-box inference purposes. +* A [Jupyter notebook](../colab_tutorials/object_detection_tutorial.ipynb) for + performing out-of-the-box inference with one of our released models +* Convenient training and evaluation + [instructions](tf1_training_and_evaluation.md) for local runs and Google + Cloud. + +Thanks to contributors: Jonathan Huang, Vivek Rathod, Derek Chow, Chen +Sun, Menglong Zhu, Matthew Tang, Anoop Korattikara, Alireza Fathi, Ian Fischer, +Zbigniew Wojna, Yang Song, Sergio Guadarrama, Jasper Uijlings, Viacheslav +Kovalevskyi, Kevin Murphy diff --git a/research/object_detection/g3doc/running_locally.md b/research/object_detection/g3doc/running_locally.md deleted file mode 100644 index 5e72ea43378..00000000000 --- a/research/object_detection/g3doc/running_locally.md +++ /dev/null @@ -1,66 +0,0 @@ -# Running Locally - -This page walks through the steps required to train an object detection model -on a local machine. It assumes the reader has completed the -following prerequisites: - -1. The Tensorflow Object Detection API has been installed as documented in the -[installation instructions](installation.md). This includes installing library -dependencies, compiling the configuration protobufs and setting up the Python -environment. -2. A valid data set has been created. See [this page](preparing_inputs.md) for -instructions on how to generate a dataset for the PASCAL VOC challenge or the -Oxford-IIIT Pet dataset. -3. A Object Detection pipeline configuration has been written. See -[this page](configuring_jobs.md) for details on how to write a pipeline configuration. - -## Recommended Directory Structure for Training and Evaluation - -``` -+data - -label_map file - -train TFRecord file - -eval TFRecord file -+models - + model - -pipeline config file - +train - +eval -``` - -## Running the Training Job - -A local training job can be run with the following command: - -```bash -# From the tensorflow/models/research/ directory -PIPELINE_CONFIG_PATH={path to pipeline config file} -MODEL_DIR={path to model directory} -NUM_TRAIN_STEPS=50000 -SAMPLE_1_OF_N_EVAL_EXAMPLES=1 -python object_detection/model_main.py \ - --pipeline_config_path=${PIPELINE_CONFIG_PATH} \ - --model_dir=${MODEL_DIR} \ - --num_train_steps=${NUM_TRAIN_STEPS} \ - --sample_1_of_n_eval_examples=$SAMPLE_1_OF_N_EVAL_EXAMPLES \ - --alsologtostderr -``` - -where `${PIPELINE_CONFIG_PATH}` points to the pipeline config and -`${MODEL_DIR}` points to the directory in which training checkpoints -and events will be written to. Note that this binary will interleave both -training and evaluation. - -## Running Tensorboard - -Progress for training and eval jobs can be inspected using Tensorboard. If -using the recommended directory structure, Tensorboard can be run using the -following command: - -```bash -tensorboard --logdir=${MODEL_DIR} -``` - -where `${MODEL_DIR}` points to the directory that contains the -train and eval directories. Please note it may take Tensorboard a couple minutes -to populate with data. diff --git a/research/object_detection/g3doc/running_notebook.md b/research/object_detection/g3doc/running_notebook.md index c2b8ad18762..b92aec33aa1 100644 --- a/research/object_detection/g3doc/running_notebook.md +++ b/research/object_detection/g3doc/running_notebook.md @@ -1,5 +1,8 @@ # Quick Start: Jupyter notebook for off-the-shelf inference +[![TensorFlow 2.2](https://img.shields.io/badge/TensorFlow-2.2-FF6F00?logo=tensorflow)](https://github.com/tensorflow/tensorflow/releases/tag/v2.2.0) +[![TensorFlow 1.15](https://img.shields.io/badge/TensorFlow-1.15-FF6F00?logo=tensorflow)](https://github.com/tensorflow/tensorflow/releases/tag/v1.15.0) + If you'd like to hit the ground running and run detection on a few example images right out of the box, we recommend trying out the Jupyter notebook demo. To run the Jupyter notebook, run the following command from diff --git a/research/object_detection/g3doc/running_on_cloud.md b/research/object_detection/g3doc/running_on_cloud.md deleted file mode 100644 index 5ee5d87a223..00000000000 --- a/research/object_detection/g3doc/running_on_cloud.md +++ /dev/null @@ -1,170 +0,0 @@ -# Running on Google Cloud ML Engine - -The Tensorflow Object Detection API supports distributed training on Google -Cloud ML Engine. This section documents instructions on how to train and -evaluate your model using Cloud ML. The reader should complete the following -prerequistes: - -1. The reader has created and configured a project on Google Cloud Platform. -See [the Cloud ML quick start guide](https://cloud.google.com/ml-engine/docs/quickstarts/command-line). -2. The reader has installed the Tensorflow Object Detection API as documented -in the [installation instructions](installation.md). -3. The reader has a valid data set and stored it in a Google Cloud Storage -bucket. See [this page](preparing_inputs.md) for instructions on how to generate -a dataset for the PASCAL VOC challenge or the Oxford-IIIT Pet dataset. -4. The reader has configured a valid Object Detection pipeline, and stored it -in a Google Cloud Storage bucket. See [this page](configuring_jobs.md) for -details on how to write a pipeline configuration. - -Additionally, it is recommended users test their job by running training and -evaluation jobs for a few iterations -[locally on their own machines](running_locally.md). - -## Packaging - -In order to run the Tensorflow Object Detection API on Cloud ML, it must be -packaged (along with it's TF-Slim dependency and the -[pycocotools](https://github.com/cocodataset/cocoapi/tree/master/PythonAPI/pycocotools) -library). The required packages can be created with the following command - -``` bash -# From tensorflow/models/research/ -bash object_detection/dataset_tools/create_pycocotools_package.sh /tmp/pycocotools -python setup.py sdist -(cd slim && python setup.py sdist) -``` - -This will create python packages dist/object_detection-0.1.tar.gz, -slim/dist/slim-0.1.tar.gz, and /tmp/pycocotools/pycocotools-2.0.tar.gz. - -## Running a Multiworker (GPU) Training Job on CMLE - -Google Cloud ML requires a YAML configuration file for a multiworker training -job using GPUs. A sample YAML file is given below: - -``` -trainingInput: - runtimeVersion: "1.12" - scaleTier: CUSTOM - masterType: standard_gpu - workerCount: 9 - workerType: standard_gpu - parameterServerCount: 3 - parameterServerType: standard - - -``` - -Please keep the following guidelines in mind when writing the YAML -configuration: - -* A job with n workers will have n + 1 training machines (n workers + 1 master). -* The number of parameters servers used should be an odd number to prevent - a parameter server from storing only weight variables or only bias variables - (due to round robin parameter scheduling). -* The learning rate in the training config should be decreased when using a - larger number of workers. Some experimentation is required to find the - optimal learning rate. - -The YAML file should be saved on the local machine (not on GCP). Once it has -been written, a user can start a training job on Cloud ML Engine using the -following command: - -```bash -# From tensorflow/models/research/ -gcloud ml-engine jobs submit training object_detection_`date +%m_%d_%Y_%H_%M_%S` \ - --runtime-version 1.12 \ - --job-dir=gs://${MODEL_DIR} \ - --packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz,/tmp/pycocotools/pycocotools-2.0.tar.gz \ - --module-name object_detection.model_main \ - --region us-central1 \ - --config ${PATH_TO_LOCAL_YAML_FILE} \ - -- \ - --model_dir=gs://${MODEL_DIR} \ - --pipeline_config_path=gs://${PIPELINE_CONFIG_PATH} -``` - -Where `${PATH_TO_LOCAL_YAML_FILE}` is the local path to the YAML configuration, -`gs://${MODEL_DIR}` specifies the directory on Google Cloud Storage where the -training checkpoints and events will be written to and -`gs://${PIPELINE_CONFIG_PATH}` points to the pipeline configuration stored on -Google Cloud Storage. - -Users can monitor the progress of their training job on the [ML Engine -Dashboard](https://console.cloud.google.com/mlengine/jobs). - -Note: This sample is supported for use with 1.12 runtime version. - -## Running a TPU Training Job on CMLE - -Launching a training job with a TPU compatible pipeline config requires using a -similar command: - -```bash -gcloud ml-engine jobs submit training `whoami`_object_detection_`date +%m_%d_%Y_%H_%M_%S` \ ---job-dir=gs://${MODEL_DIR} \ ---packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz,/tmp/pycocotools/pycocotools-2.0.tar.gz \ ---module-name object_detection.model_tpu_main \ ---runtime-version 1.12 \ ---scale-tier BASIC_TPU \ ---region us-central1 \ --- \ ---tpu_zone us-central1 \ ---model_dir=gs://${MODEL_DIR} \ ---pipeline_config_path=gs://${PIPELINE_CONFIG_PATH} -``` - -In contrast with the GPU training command, there is no need to specify a YAML -file and we point to the *object_detection.model_tpu_main* binary instead of -*object_detection.model_main*. We must also now set `scale-tier` to be -`BASIC_TPU` and provide a `tpu_zone`. Finally as before `pipeline_config_path` -points to a points to the pipeline configuration stored on Google Cloud Storage -(but is now must be a TPU compatible model). - -## Running an Evaluation Job on CMLE - -Note: You only need to do this when using TPU for training as it does not -interleave evaluation during training as in the case of Multiworker GPU -training. - -Evaluation jobs run on a single machine, so it is not necessary to write a YAML -configuration for evaluation. Run the following command to start the evaluation -job: - -```bash -gcloud ml-engine jobs submit training object_detection_eval_`date +%m_%d_%Y_%H_%M_%S` \ - --runtime-version 1.12 \ - --job-dir=gs://${MODEL_DIR} \ - --packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz,/tmp/pycocotools/pycocotools-2.0.tar.gz \ - --module-name object_detection.model_main \ - --region us-central1 \ - --scale-tier BASIC_GPU \ - -- \ - --model_dir=gs://${MODEL_DIR} \ - --pipeline_config_path=gs://${PIPELINE_CONFIG_PATH} \ - --checkpoint_dir=gs://${MODEL_DIR} -``` - -Where `gs://${MODEL_DIR}` points to the directory on Google Cloud Storage where -training checkpoints are saved (same as the training job), as well as -to where evaluation events will be saved on Google Cloud Storage and -`gs://${PIPELINE_CONFIG_PATH}` points to where the pipeline configuration is -stored on Google Cloud Storage. - -Typically one starts an evaluation job concurrently with the training job. -Note that we do not support running evaluation on TPU, so the above command -line for launching evaluation jobs is the same whether you are training -on GPU or TPU. - -## Running Tensorboard - -You can run Tensorboard locally on your own machine to view progress of your -training and eval jobs on Google Cloud ML. Run the following command to start -Tensorboard: - -``` bash -tensorboard --logdir=gs://${YOUR_CLOUD_BUCKET} -``` - -Note it may Tensorboard a few minutes to populate with results. - diff --git a/research/object_detection/g3doc/running_on_mobile_tensorflowlite.md b/research/object_detection/g3doc/running_on_mobile_tensorflowlite.md index db166bcd394..379652e34cb 100644 --- a/research/object_detection/g3doc/running_on_mobile_tensorflowlite.md +++ b/research/object_detection/g3doc/running_on_mobile_tensorflowlite.md @@ -1,5 +1,7 @@ # Running on mobile with TensorFlow Lite +[![TensorFlow 1.15](https://img.shields.io/badge/TensorFlow-1.15-FF6F00?logo=tensorflow)](https://github.com/tensorflow/tensorflow/releases/tag/v1.15.0) + In this section, we will show you how to use [TensorFlow Lite](https://www.tensorflow.org/mobile/tflite/) to get a smaller model and allow you take advantage of ops that have been optimized for mobile devices. diff --git a/research/object_detection/g3doc/running_pets.md b/research/object_detection/g3doc/running_pets.md index bb62db5612a..7d6b7bfa7c0 100644 --- a/research/object_detection/g3doc/running_pets.md +++ b/research/object_detection/g3doc/running_pets.md @@ -1,6 +1,8 @@ # Quick Start: Distributed Training on the Oxford-IIIT Pets Dataset on Google Cloud -This page is a walkthrough for training an object detector using the Tensorflow +[![TensorFlow 1.15](https://img.shields.io/badge/TensorFlow-1.15-FF6F00?logo=tensorflow)](https://github.com/tensorflow/tensorflow/releases/tag/v1.15.0) + +This page is a walkthrough for training an object detector using the TensorFlow Object Detection API. In this tutorial, we'll be training on the Oxford-IIIT Pets dataset to build a system to detect various breeds of cats and dogs. The output of the detector will look like the following: @@ -40,10 +42,10 @@ export YOUR_GCS_BUCKET=${YOUR_GCS_BUCKET} It is also possible to run locally by following [the running locally instructions](running_locally.md). -## Installing Tensorflow and the Tensorflow Object Detection API +## Installing TensorFlow and the TensorFlow Object Detection API Please run through the [installation instructions](installation.md) to install -Tensorflow and all it dependencies. Ensure the Protobuf libraries are +TensorFlow and all it dependencies. Ensure the Protobuf libraries are compiled and the library directories are added to `PYTHONPATH`. ## Getting the Oxford-IIIT Pets Dataset and Uploading it to Google Cloud Storage @@ -77,7 +79,7 @@ should appear as follows: ... other files and directories ``` -The Tensorflow Object Detection API expects data to be in the TFRecord format, +The TensorFlow Object Detection API expects data to be in the TFRecord format, so we'll now run the `create_pet_tf_record` script to convert from the raw Oxford-IIIT Pet dataset into TFRecords. Run the following commands from the `tensorflow/models/research/` directory: @@ -134,7 +136,7 @@ in the following step. ## Configuring the Object Detection Pipeline -In the Tensorflow Object Detection API, the model parameters, training +In the TensorFlow Object Detection API, the model parameters, training parameters and eval parameters are all defined by a config file. More details can be found [here](configuring_jobs.md). For this tutorial, we will use some predefined templates provided with the source code. In the @@ -188,10 +190,10 @@ browser](https://console.cloud.google.com/storage/browser). Before we can start a job on Google Cloud ML Engine, we must: -1. Package the Tensorflow Object Detection code. +1. Package the TensorFlow Object Detection code. 2. Write a cluster configuration for our Google Cloud ML job. -To package the Tensorflow Object Detection code, run the following commands from +To package the TensorFlow Object Detection code, run the following commands from the `tensorflow/models/research/` directory: ```bash @@ -248,7 +250,7 @@ web browser. You should see something similar to the following: ![](img/tensorboard.png) -Make sure your Tensorboard version is the same minor version as your Tensorflow (1.x) +Make sure your Tensorboard version is the same minor version as your TensorFlow (1.x) You will also want to click on the images tab to see example detections made by the model while it trains. After about an hour and a half of training, you can @@ -265,9 +267,9 @@ the training jobs are configured to go for much longer than is necessary for convergence. To save money, we recommend killing your jobs once you've seen that they've converged. -## Exporting the Tensorflow Graph +## Exporting the TensorFlow Graph -After your model has been trained, you should export it to a Tensorflow graph +After your model has been trained, you should export it to a TensorFlow graph proto. First, you need to identify a candidate checkpoint to export. You can search your bucket using the [Google Cloud Storage Browser](https://console.cloud.google.com/storage/browser). The file should be diff --git a/research/object_detection/g3doc/tf1.md b/research/object_detection/g3doc/tf1.md new file mode 100644 index 00000000000..f973ef38c3a --- /dev/null +++ b/research/object_detection/g3doc/tf1.md @@ -0,0 +1,92 @@ +# Object Detection API with TensorFlow 1 + +## Requirements + +[![Python 3.6](https://img.shields.io/badge/Python-3.6-3776AB)](https://www.python.org/downloads/release/python-360/) +[![TensorFlow 1.15](https://img.shields.io/badge/TensorFlow-1.15-FF6F00?logo=tensorflow)](https://github.com/tensorflow/tensorflow/releases/tag/v1.15.0) +[![Protobuf Compiler >= 3.0](https://img.shields.io/badge/ProtoBuf%20Compiler-%3E3.0-brightgreen)](https://grpc.io/docs/protoc-installation/#install-using-a-package-manager) + +## Installation + +You can install the TensorFlow Object Detection API either with Python Package +Installer (pip) or Docker. For local runs we recommend using Docker and for +Google Cloud runs we recommend using pip. + +Clone the TensorFlow Models repository and proceed to one of the installation +options. + +```bash +git clone https://github.com/tensorflow/models.git +``` + +### Docker Installation + +```bash +# From the root of the git repository +docker build -f research/object_detection/dockerfiles/tf1/Dockerfile -t od . +docker run -it od +``` + +### Python Package Installation + +```bash +cd models/research +# Compile protos. +protoc object_detection/protos/*.proto --python_out=. +# Install TensorFlow Object Detection API. +cp object_detection/packages/tf1/setup.py . +python -m pip install . +``` + +```bash +# Test the installation. +python object_detection/builders/model_builder_tf1_test.py +``` + +## Quick Start + +### Colabs + +* [Jupyter notebook for off-the-shelf inference](../colab_tutorials/object_detection_tutorial.ipynb) +* [Training a pet detector](running_pets.md) + +### Training and Evaluation + +To train and evaluate your models either locally or on Google Cloud see +[instructions](tf1_training_and_evaluation.md). + +## Model Zoo + +We provide a large collection of models that are trained on several datasets in +the [Model Zoo](tf1_detection_zoo.md). + +## Guides + +* + Configuring an object detection pipeline
+* Preparing inputs
+* + Defining your own model architecture
+* + Bringing in your own dataset
+* + Supported object detection evaluation protocols
+* + TPU compatible detection pipelines
+ +## Extras: + +* + Exporting a trained model for inference
+* + Exporting a trained model for TPU inference
+* + Inference and evaluation on the Open Images dataset
+* + Run an instance segmentation model
+* + Run the evaluation for the Open Images Challenge 2018/2019
+* + Running object detection on mobile devices with TensorFlow Lite
+* + Context R-CNN documentation for data preparation, training, and export
diff --git a/research/object_detection/g3doc/detection_model_zoo.md b/research/object_detection/g3doc/tf1_detection_zoo.md similarity index 97% rename from research/object_detection/g3doc/detection_model_zoo.md rename to research/object_detection/g3doc/tf1_detection_zoo.md index cb515b813ba..15416bb7aec 100644 --- a/research/object_detection/g3doc/detection_model_zoo.md +++ b/research/object_detection/g3doc/tf1_detection_zoo.md @@ -1,4 +1,7 @@ -# Tensorflow detection model zoo +# TensorFlow 1 Detection Model Zoo + +[![TensorFlow 1.15](https://img.shields.io/badge/TensorFlow-1.15-FF6F00?logo=tensorflow)](https://github.com/tensorflow/tensorflow/releases/tag/v1.15.0) +[![Python 3.6](https://img.shields.io/badge/Python-3.6-3776AB)](https://www.python.org/downloads/release/python-360/) We provide a collection of detection models pre-trained on the [COCO dataset](http://cocodataset.org), the @@ -64,9 +67,9 @@ Some remarks on frozen inference graphs: metrics. * Our frozen inference graphs are generated using the [v1.12.0](https://github.com/tensorflow/tensorflow/tree/v1.12.0) release - version of Tensorflow and we do not guarantee that these will work with + version of TensorFlow and we do not guarantee that these will work with other versions; this being said, each frozen inference graph can be - regenerated using your current version of Tensorflow by re-running the + regenerated using your current version of TensorFlow by re-running the [exporter](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/exporting_models.md), pointing it at the model directory as well as the corresponding config file in diff --git a/research/object_detection/g3doc/tf1_training_and_evaluation.md b/research/object_detection/g3doc/tf1_training_and_evaluation.md new file mode 100644 index 00000000000..76c601f1897 --- /dev/null +++ b/research/object_detection/g3doc/tf1_training_and_evaluation.md @@ -0,0 +1,237 @@ +# Training and Evaluation with TensorFlow 1 + +[![Python 3.6](https://img.shields.io/badge/Python-3.6-3776AB)](https://www.python.org/downloads/release/python-360/) +[![TensorFlow 1.15](https://img.shields.io/badge/TensorFlow-1.15-FF6F00?logo=tensorflow)](https://github.com/tensorflow/tensorflow/releases/tag/v1.15.0) + +This page walks through the steps required to train an object detection model. +It assumes the reader has completed the following prerequisites: + +1. The TensorFlow Object Detection API has been installed as documented in the + [installation instructions](tf1.md#installation). +2. A valid data set has been created. See [this page](preparing_inputs.md) for + instructions on how to generate a dataset for the PASCAL VOC challenge or + the Oxford-IIIT Pet dataset. + +## Recommended Directory Structure for Training and Evaluation + +```bash +. +├── data/ +│   ├── eval-00000-of-00001.tfrecord +│   ├── label_map.txt +│   ├── train-00000-of-00002.tfrecord +│   └── train-00001-of-00002.tfrecord +└── models/ + └── my_model_dir/ + ├── eval/ # Created by evaluation job. + ├── my_model.config + └── train/ # + └── model_ckpt-100-data@1 # Created by training job. + └── model_ckpt-100-index # + └── checkpoint # +``` + +## Writing a model configuration + +Please refer to sample [TF1 configs](../samples/configs) and +[configuring jobs](configuring_jobs.md) to create a model config. + +### Model Parameter Initialization + +While optional, it is highly recommended that users utilize classification or +object detection checkpoints. Training an object detector from scratch can take +days. To speed up the training process, it is recommended that users re-use the +feature extractor parameters from a pre-existing image classification or object +detection checkpoint. The`train_config` section in the config provides two +fields to specify pre-existing checkpoints: + +* `fine_tune_checkpoint`: a path prefix to the pre-existing checkpoint + (ie:"/usr/home/username/checkpoint/model.ckpt-#####"). + +* `fine_tune_checkpoint_type`: with value `classification` or `detection` + depending on the type. + +A list of detection checkpoints can be found [here](tf1_detection_zoo.md). + +## Local + +### Training + +A local training job can be run with the following command: + +```bash +# From the tensorflow/models/research/ directory +PIPELINE_CONFIG_PATH={path to pipeline config file} +MODEL_DIR={path to model directory} +NUM_TRAIN_STEPS=50000 +SAMPLE_1_OF_N_EVAL_EXAMPLES=1 +python object_detection/model_main.py \ + --pipeline_config_path=${PIPELINE_CONFIG_PATH} \ + --model_dir=${MODEL_DIR} \ + --num_train_steps=${NUM_TRAIN_STEPS} \ + --sample_1_of_n_eval_examples=${SAMPLE_1_OF_N_EVAL_EXAMPLES} \ + --alsologtostderr +``` + +where `${PIPELINE_CONFIG_PATH}` points to the pipeline config and `${MODEL_DIR}` +points to the directory in which training checkpoints and events will be +written. Note that this binary will interleave both training and evaluation. + +## Google Cloud AI Platform + +The TensorFlow Object Detection API supports training on Google Cloud AI +Platform. This section documents instructions on how to train and evaluate your +model using Cloud AI Platform. The reader should complete the following +prerequistes: + +1. The reader has created and configured a project on Google Cloud AI Platform. + See + [Using GPUs](https://cloud.google.com/ai-platform/training/docs/using-gpus) + and + [Using TPUs](https://cloud.google.com/ai-platform/training/docs/using-tpus) + guides. +2. The reader has a valid data set and stored it in a Google Cloud Storage + bucket. See [this page](preparing_inputs.md) for instructions on how to + generate a dataset for the PASCAL VOC challenge or the Oxford-IIIT Pet + dataset. + +Additionally, it is recommended users test their job by running training and +evaluation jobs for a few iterations [locally on their own machines](#local). + +### Training with multiple workers with single GPU + +Google Cloud ML requires a YAML configuration file for a multiworker training +job using GPUs. A sample YAML file is given below: + +``` +trainingInput: + runtimeVersion: "1.15" + scaleTier: CUSTOM + masterType: standard_gpu + workerCount: 9 + workerType: standard_gpu + parameterServerCount: 3 + parameterServerType: standard + +``` + +Please keep the following guidelines in mind when writing the YAML +configuration: + +* A job with n workers will have n + 1 training machines (n workers + 1 + master). +* The number of parameters servers used should be an odd number to prevent a + parameter server from storing only weight variables or only bias variables + (due to round robin parameter scheduling). +* The learning rate in the training config should be decreased when using a + larger number of workers. Some experimentation is required to find the + optimal learning rate. + +The YAML file should be saved on the local machine (not on GCP). Once it has +been written, a user can start a training job on Cloud ML Engine using the +following command: + +```bash +# From the tensorflow/models/research/ directory +cp object_detection/packages/tf1/setup.py . +gcloud ml-engine jobs submit training object_detection_`date +%m_%d_%Y_%H_%M_%S` \ + --runtime-version 1.15 \ + --python-version 3.6 \ + --job-dir=gs://${MODEL_DIR} \ + --package-path ./object_detection \ + --module-name object_detection.model_main \ + --region us-central1 \ + --config ${PATH_TO_LOCAL_YAML_FILE} \ + -- \ + --model_dir=gs://${MODEL_DIR} \ + --pipeline_config_path=gs://${PIPELINE_CONFIG_PATH} +``` + +Where `${PATH_TO_LOCAL_YAML_FILE}` is the local path to the YAML configuration, +`gs://${MODEL_DIR}` specifies the directory on Google Cloud Storage where the +training checkpoints and events will be written to and +`gs://${PIPELINE_CONFIG_PATH}` points to the pipeline configuration stored on +Google Cloud Storage. + +Users can monitor the progress of their training job on the +[ML Engine Dashboard](https://console.cloud.google.com/ai-platform/jobs). + +## Training with TPU + +Launching a training job with a TPU compatible pipeline config requires using a +similar command: + +```bash +# From the tensorflow/models/research/ directory +cp object_detection/packages/tf1/setup.py . +gcloud ml-engine jobs submit training `whoami`_object_detection_`date +%m_%d_%Y_%H_%M_%S` \ + --job-dir=gs://${MODEL_DIR} \ + --package-path ./object_detection \ + --module-name object_detection.model_tpu_main \ + --runtime-version 1.15 \ + --python-version 3.6 \ + --scale-tier BASIC_TPU \ + --region us-central1 \ + -- \ + --tpu_zone us-central1 \ + --model_dir=gs://${MODEL_DIR} \ + --pipeline_config_path=gs://${PIPELINE_CONFIG_PATH} +``` + +In contrast with the GPU training command, there is no need to specify a YAML +file, and we point to the *object_detection.model_tpu_main* binary instead of +*object_detection.model_main*. We must also now set `scale-tier` to be +`BASIC_TPU` and provide a `tpu_zone`. Finally as before `pipeline_config_path` +points to a points to the pipeline configuration stored on Google Cloud Storage +(but is now must be a TPU compatible model). + +## Evaluation with GPU + +Note: You only need to do this when using TPU for training, as it does not +interleave evaluation during training, as in the case of Multiworker GPU +training. + +Evaluation jobs run on a single machine, so it is not necessary to write a YAML +configuration for evaluation. Run the following command to start the evaluation +job: + +```bash +# From the tensorflow/models/research/ directory +cp object_detection/packages/tf1/setup.py . +gcloud ml-engine jobs submit training object_detection_eval_`date +%m_%d_%Y_%H_%M_%S` \ + --runtime-version 1.15 \ + --python-version 3.6 \ + --job-dir=gs://${MODEL_DIR} \ + --package-path ./object_detection \ + --module-name object_detection.model_main \ + --region us-central1 \ + --scale-tier BASIC_GPU \ + -- \ + --model_dir=gs://${MODEL_DIR} \ + --pipeline_config_path=gs://${PIPELINE_CONFIG_PATH} \ + --checkpoint_dir=gs://${MODEL_DIR} +``` + +Where `gs://${MODEL_DIR}` points to the directory on Google Cloud Storage where +training checkpoints are saved (same as the training job), as well as to where +evaluation events will be saved on Google Cloud Storage and +`gs://${PIPELINE_CONFIG_PATH}` points to where the pipeline configuration is +stored on Google Cloud Storage. + +Typically one starts an evaluation job concurrently with the training job. Note +that we do not support running evaluation on TPU, so the above command line for +launching evaluation jobs is the same whether you are training on GPU or TPU. + +## Running Tensorboard + +Progress for training and eval jobs can be inspected using Tensorboard. If using +the recommended directory structure, Tensorboard can be run using the following +command: + +```bash +tensorboard --logdir=${MODEL_DIR} +``` + +where `${MODEL_DIR}` points to the directory that contains the train and eval +directories. Please note it may take Tensorboard a couple minutes to populate +with data. diff --git a/research/object_detection/g3doc/tf2.md b/research/object_detection/g3doc/tf2.md new file mode 100644 index 00000000000..3860c7ea2f3 --- /dev/null +++ b/research/object_detection/g3doc/tf2.md @@ -0,0 +1,82 @@ +# Object Detection API with TensorFlow 2 + +## Requirements + +[![Python 3.6](https://img.shields.io/badge/Python-3.6-3776AB)](https://www.python.org/downloads/release/python-360/) +[![TensorFlow 2.2](https://img.shields.io/badge/TensorFlow-2.2-FF6F00?logo=tensorflow)](https://github.com/tensorflow/tensorflow/releases/tag/v2.2.0) +[![Protobuf Compiler >= 3.0](https://img.shields.io/badge/ProtoBuf%20Compiler-%3E3.0-brightgreen)](https://grpc.io/docs/protoc-installation/#install-using-a-package-manager) + +## Installation + +You can install the TensorFlow Object Detection API either with Python Package +Installer (pip) or Docker. For local runs we recommend using Docker and for +Google Cloud runs we recommend using pip. + +Clone the TensorFlow Models repository and proceed to one of the installation +options. + +```bash +git clone https://github.com/tensorflow/models.git +``` + +### Docker Installation + +```bash +# From the root of the git repository +docker build -f research/object_detection/dockerfiles/tf2/Dockerfile -t od . +docker run -it od +``` + +### Python Package Installation + +```bash +cd models/research +# Compile protos. +protoc object_detection/protos/*.proto --python_out=. +# Install TensorFlow Object Detection API. +cp object_detection/packages/tf2/setup.py . +python -m pip install . +``` + +```bash +# Test the installation. +python object_detection/builders/model_builder_tf2_test.py +``` + +## Quick Start + +### Colabs + + + +* Training - + [Fine-tune a pre-trained detector in eager mode on custom data](../colab_tutorials/eager_few_shot_od_training_tf2_colab.ipynb) + +* Inference - + [Run inference with models from the zoo](../colab_tutorials/inference_tf2_colab.ipynb) + + + +## Training and Evaluation + +To train and evaluate your models either locally or on Google Cloud see +[instructions](tf2_training_and_evaluation.md). + +## Model Zoo + +We provide a large collection of models that are trained on COCO 2017 in the +[Model Zoo](tf2_detection_zoo.md). + +## Guides + +* + Configuring an object detection pipeline
+* Preparing inputs
+* + Defining your own model architecture
+* + Bringing in your own dataset
+* + Supported object detection evaluation protocols
+* + TPU compatible detection pipelines
diff --git a/research/object_detection/g3doc/tf2_classification_zoo.md b/research/object_detection/g3doc/tf2_classification_zoo.md new file mode 100644 index 00000000000..23c629ac0e9 --- /dev/null +++ b/research/object_detection/g3doc/tf2_classification_zoo.md @@ -0,0 +1,25 @@ +# TensorFlow 2 Classification Model Zoo + +[![TensorFlow 2.2](https://img.shields.io/badge/TensorFlow-2.2-FF6F00?logo=tensorflow)](https://github.com/tensorflow/tensorflow/releases/tag/v2.2.0) +[![Python 3.6](https://img.shields.io/badge/Python-3.6-3776AB)](https://www.python.org/downloads/release/python-360/) + +We provide a collection of classification models pre-trained on the +[Imagenet](http://www.image-net.org). These can be used to initilize detection +model parameters. + +Model name | +---------- | +[EfficientNet B0](http://download.tensorflow.org/models/object_detection/classification/tf2/20200710/efficientnet_b0.tar.gz) | +[EfficientNet B1](http://download.tensorflow.org/models/object_detection/classification/tf2/20200710/efficientnet_b1.tar.gz) | +[EfficientNet B2](http://download.tensorflow.org/models/object_detection/classification/tf2/20200710/efficientnet_b2.tar.gz) | +[EfficientNet B3](http://download.tensorflow.org/models/object_detection/classification/tf2/20200710/efficientnet_b3.tar.gz) | +[EfficientNet B4](http://download.tensorflow.org/models/object_detection/classification/tf2/20200710/efficientnet_b4.tar.gz) | +[EfficientNet B5](http://download.tensorflow.org/models/object_detection/classification/tf2/20200710/efficientnet_b5.tar.gz) | +[EfficientNet B6](http://download.tensorflow.org/models/object_detection/classification/tf2/20200710/efficientnet_b6.tar.gz) | +[EfficientNet B7](http://download.tensorflow.org/models/object_detection/classification/tf2/20200710/efficientnet_b7.tar.gz) | +[Resnet V1 50](http://download.tensorflow.org/models/object_detection/classification/tf2/20200710/resnet50_v1.tar.gz) | +[Resnet V1 101](http://download.tensorflow.org/models/object_detection/classification/tf2/20200710/resnet101_v1.tar.gz) | +[Resnet V1 152](http://download.tensorflow.org/models/object_detection/classification/tf2/20200710/resnet152_v1.tar.gz) | +[Inception Resnet V2](http://download.tensorflow.org/models/object_detection/classification/tf2/20200710/inception_resnet_v2.tar.gz) | +[MobileNet V1](http://download.tensorflow.org/models/object_detection/classification/tf2/20200710/mobilnet_v1.tar.gz) | +[MobileNet V2](http://download.tensorflow.org/models/object_detection/classification/tf2/20200710/mobilnet_v2.tar.gz) | diff --git a/research/object_detection/g3doc/tf2_detection_zoo.md b/research/object_detection/g3doc/tf2_detection_zoo.md new file mode 100644 index 00000000000..16c38ec58c9 --- /dev/null +++ b/research/object_detection/g3doc/tf2_detection_zoo.md @@ -0,0 +1,65 @@ +# TensorFlow 2 Detection Model Zoo + +[![TensorFlow 2.2](https://img.shields.io/badge/TensorFlow-2.2-FF6F00?logo=tensorflow)](https://github.com/tensorflow/tensorflow/releases/tag/v2.2.0) +[![Python 3.6](https://img.shields.io/badge/Python-3.6-3776AB)](https://www.python.org/downloads/release/python-360/) + + + +We provide a collection of detection models pre-trained on the +[COCO 2017 dataset](http://cocodataset.org). These models can be useful for +out-of-the-box inference if you are interested in categories already in those +datasets. You can try it in our inference +[colab](../colab_tutorials/inference_tf2_colab.ipynb) + +They are also useful for initializing your models when training on novel +datasets. You can try this out on our few-shot training +[colab](../colab_tutorials/eager_few_shot_od_training_tf2_colab.ipynb). + + + +Finally, if you would like to train these models from scratch, you can find the +model configs in this [directory](../configs/tf2) (also in the linked +`tar.gz`s). + +Model name | Speed (ms) | COCO mAP | Outputs +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :--------: | :----------: | :-----: +[CenterNet HourGlass104 512x512](http://download.tensorflow.org/models/object_detection/tf2/20200710/centernet_hg104_512x512_coco17_tpu-8.tar.gz) | 70 | 41.6 | Boxes +[CenterNet HourGlass104 Keypoints 512x512](http://download.tensorflow.org/models/object_detection/tf2/20200710/centernet_hg104_512x512_kpts_coco17_tpu-32.tar.gz) | 76 | 40.0/61.4 | Boxes/Keypoints +[CenterNet HourGlass104 1024x1024](http://download.tensorflow.org/models/object_detection/tf2/20200710/centernet_hg104_1024x1024_coco17_tpu-32.tar.gz) | 197 | 43.5 | Boxes +[CenterNet HourGlass104 Keypoints 1024x1024](http://download.tensorflow.org/models/object_detection/tf2/20200710/centernet_hg104_1024x1024_kpts_coco17_tpu-32.tar.gz) | 211 | 42.8/64.5 | Boxes/Keypoints +[CenterNet Resnet50 V1 FPN 512x512](http://download.tensorflow.org/models/object_detection/tf2/20200710/centernet_resnet50_v1_fpn_512x512_coco17_tpu-8.tar.gz) | 27 | 31.2 | Boxes +[CenterNet Resnet50 V1 FPN Keypoints 512x512](http://download.tensorflow.org/models/object_detection/tf2/20200710/centernet_resnet50_v1_fpn_512x512_kpts_coco17_tpu-8.tar.gz) | 30 | 29.3/50.7 | Boxes/Keypoints +[CenterNet Resnet101 V1 FPN 512x512](http://download.tensorflow.org/models/object_detection/tf2/20200710/centernet_resnet101_v1_fpn_512x512_coco17_tpu-8.tar.gz) | 34 | 34.2 | Boxes +[CenterNet Resnet50 V2 512x512](http://download.tensorflow.org/models/object_detection/tf2/20200710/centernet_resnet50_v2_512x512_coco17_tpu-8.tar.gz) | 27 | 29.5 | Boxes +[CenterNet Resnet50 V2 Keypoints 512x512](http://download.tensorflow.org/models/object_detection/tf2/20200710/centernet_resnet50_v2_512x512_kpts_coco17_tpu-8.tar.gz) | 30 | 27.6/48.2 | Boxes/Keypoints +[EfficientDet D0 512x512](http://download.tensorflow.org/models/object_detection/tf2/20200710/efficientdet_d0_coco17_tpu-32.tar.gz) | 39 | 33.6 | Boxes +[EfficientDet D1 640x640](http://download.tensorflow.org/models/object_detection/tf2/20200710/efficientdet_d1_coco17_tpu-32.tar.gz) | 54 | 38.4 | Boxes +[EfficientDet D2 768x768](http://download.tensorflow.org/models/object_detection/tf2/20200710/efficientdet_d2_coco17_tpu-32.tar.gz) | 67 | 41.8 | Boxes +[EfficientDet D3 896x896](http://download.tensorflow.org/models/object_detection/tf2/20200710/efficientdet_d3_coco17_tpu-32.tar.gz) | 95 | 45.4 | Boxes +[EfficientDet D4 1024x1024](http://download.tensorflow.org/models/object_detection/tf2/20200710/efficientdet_d4_coco17_tpu-32.tar.gz) | 133 | 48.5 | Boxes +[EfficientDet D5 1280x1280](http://download.tensorflow.org/models/object_detection/tf2/20200710/efficientdet_d5_coco17_tpu-32.tar.gz) | 222 | 49.7 | Boxes +[EfficientDet D6 1280x1280](http://download.tensorflow.org/models/object_detection/tf2/20200710/efficientdet_d6_coco17_tpu-32.tar.gz) | 268 | 50.5 | Boxes +[EfficientDet D7 1536x1536](http://download.tensorflow.org/models/object_detection/tf2/20200710/efficientdet_d7_coco17_tpu-32.tar.gz) | 325 | 51.2 | Boxes +[SSD MobileNet v2 320x320](http://download.tensorflow.org/models/object_detection/tf2/20200710/ssd_mobilenet_v2_320x320_coco17_tpu-8.tar.gz) |19 | 20.2 | Boxes +[SSD MobileNet V1 FPN 640x640](http://download.tensorflow.org/models/object_detection/tf2/20200710/ssd_mobilenet_v1_fpn_640x640_coco17_tpu-8.tar.gz) | 48 | 29.1 | Boxes +[SSD MobileNet V2 FPNLite 320x320](http://download.tensorflow.org/models/object_detection/tf2/20200710/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8.tar.gz) | 22 | 22.2 | Boxes +[SSD MobileNet V2 FPNLite 640x640](http://download.tensorflow.org/models/object_detection/tf2/20200710/ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8.tar.gz) | 39 | 28.2 | Boxes +[SSD ResNet50 V1 FPN 640x640 (RetinaNet50)](http://download.tensorflow.org/models/object_detection/tf2/20200710/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.tar.gz) | 46 | 34.3 | Boxes +[SSD ResNet50 V1 FPN 1024x1024 (RetinaNet50)](http://download.tensorflow.org/models/object_detection/tf2/20200710/ssd_resnet50_v1_fpn_1024x1024_coco17_tpu-8.tar.gz) | 87 | 38.3 | Boxes +[SSD ResNet101 V1 FPN 640x640 (RetinaNet101)](http://download.tensorflow.org/models/object_detection/tf2/20200710/ssd_resnet101_v1_fpn_640x640_coco17_tpu-8.tar.gz) | 57 | 35.6 | Boxes +[SSD ResNet101 V1 FPN 1024x1024 (RetinaNet101)](http://download.tensorflow.org/models/object_detection/tf2/20200710/ssd_resnet101_v1_fpn_1024x1024_coco17_tpu-8.tar.gz) | 104 | 39.5 | Boxes +[SSD ResNet152 V1 FPN 640x640 (RetinaNet152)](http://download.tensorflow.org/models/object_detection/tf2/20200710/ssd_resnet152_v1_fpn_640x640_coco17_tpu-8.tar.gz) | 80 | 35.4 | Boxes +[SSD ResNet152 V1 FPN 1024x1024 (RetinaNet152)](http://download.tensorflow.org/models/object_detection/tf2/20200710/ssd_resnet152_v1_fpn_1024x1024_coco17_tpu-8.tar.gz) | 111 | 39.6 | Boxes +[Faster R-CNN ResNet50 V1 640x640](http://download.tensorflow.org/models/object_detection/tf2/20200710/faster_rcnn_resnet50_v1_640x640_coco17_tpu-8.tar.gz) | 53 | 29.3 | Boxes +[Faster R-CNN ResNet50 V1 1024x1024](http://download.tensorflow.org/models/object_detection/tf2/20200710/faster_rcnn_resnet50_v1_1024x1024_coco17_tpu-8.tar.gz) | 65 | 31.0 | Boxes +[Faster R-CNN ResNet50 V1 800x1333](http://download.tensorflow.org/models/object_detection/tf2/20200710/faster_rcnn_resnet50_v1_800x1333_coco17_gpu-8.tar.gz) | 65 | 31.6 | Boxes +[Faster R-CNN ResNet101 V1 640x640](http://download.tensorflow.org/models/object_detection/tf2/20200710/faster_rcnn_resnet101_v1_640x640_coco17_tpu-8.tar.gz) | 55 | 31.8 | Boxes +[Faster R-CNN ResNet101 V1 1024x1024](http://download.tensorflow.org/models/object_detection/tf2/20200710/faster_rcnn_resnet101_v1_1024x1024_coco17_tpu-8.tar.gz) | 72 | 37.1 | Boxes +[Faster R-CNN ResNet101 V1 800x1333](http://download.tensorflow.org/models/object_detection/tf2/20200710/faster_rcnn_resnet101_v1_800x1333_coco17_gpu-8.tar.gz) | 77 | 36.6 | Boxes +[Faster R-CNN ResNet152 V1 640x640](http://download.tensorflow.org/models/object_detection/tf2/20200710/faster_rcnn_resnet152_v1_640x640_coco17_tpu-8.tar.gz) | 64 | 32.4 | Boxes +[Faster R-CNN ResNet152 V1 1024x1024](http://download.tensorflow.org/models/object_detection/tf2/20200710/faster_rcnn_resnet152_v1_1024x1024_coco17_tpu-8.tar.gz) | 85 | 37.6 | Boxes +[Faster R-CNN ResNet152 V1 800x1333](http://download.tensorflow.org/models/object_detection/tf2/20200710/faster_rcnn_resnet152_v1_800x1333_coco17_gpu-8.tar.gz) | 101 | 37.4 | Boxes +[Faster R-CNN Inception ResNet V2 640x640](http://download.tensorflow.org/models/object_detection/tf2/20200710/faster_rcnn_inception_resnet_v2_640x640_coco17_tpu-8.tar.gz) | 206 | 37.7 | Boxes +[Faster R-CNN Inception ResNet V2 1024x1024](http://download.tensorflow.org/models/object_detection/tf2/20200710/faster_rcnn_inception_resnet_v2_1024x1024_coco17_tpu-8.tar.gz) | 236 | 38.7 | Boxes +[Mask R-CNN Inception ResNet V2 1024x1024](http://download.tensorflow.org/models/object_detection/tf2/20200710/mask_rcnn_inception_resnet_v2_1024x1024_coco17_gpu-8.tar.gz) | 301 | 39.0/34.6 | Boxes/Masks +[ExtremeNet](http://download.tensorflow.org/models/object_detection/tf2/20200710/extremenet.tar.gz) | -- | -- | Boxes diff --git a/research/object_detection/g3doc/tf2_training_and_evaluation.md b/research/object_detection/g3doc/tf2_training_and_evaluation.md new file mode 100644 index 00000000000..8d05a04f8db --- /dev/null +++ b/research/object_detection/g3doc/tf2_training_and_evaluation.md @@ -0,0 +1,285 @@ +# Training and Evaluation with TensorFlow 2 + +[![TensorFlow 2.2](https://img.shields.io/badge/TensorFlow-2.2-FF6F00?logo=tensorflow)](https://github.com/tensorflow/tensorflow/releases/tag/v2.2.0) +[![Python 3.6](https://img.shields.io/badge/Python-3.6-3776AB)](https://www.python.org/downloads/release/python-360/) + +This page walks through the steps required to train an object detection model. +It assumes the reader has completed the following prerequisites: + +1. The TensorFlow Object Detection API has been installed as documented in the + [installation instructions](tf2.md#installation). +2. A valid data set has been created. See [this page](preparing_inputs.md) for + instructions on how to generate a dataset for the PASCAL VOC challenge or + the Oxford-IIIT Pet dataset. + +## Recommended Directory Structure for Training and Evaluation + +```bash +. +├── data/ +│   ├── eval-00000-of-00001.tfrecord +│   ├── label_map.txt +│   ├── train-00000-of-00002.tfrecord +│   └── train-00001-of-00002.tfrecord +└── models/ + └── my_model_dir/ + ├── eval/ # Created by evaluation job. + ├── my_model.config + └── model_ckpt-100-data@1 # + └── model_ckpt-100-index # Created by training job. + └── checkpoint # +``` + +## Writing a model configuration + +Please refer to sample [TF2 configs](../configs/tf2) and +[configuring jobs](configuring_jobs.md) to create a model config. + +### Model Parameter Initialization + +While optional, it is highly recommended that users utilize classification or +object detection checkpoints. Training an object detector from scratch can take +days. To speed up the training process, it is recommended that users re-use the +feature extractor parameters from a pre-existing image classification or object +detection checkpoint. The `train_config` section in the config provides two +fields to specify pre-existing checkpoints: + +* `fine_tune_checkpoint`: a path prefix to the pre-existing checkpoint + (ie:"/usr/home/username/checkpoint/model.ckpt-#####"). + +* `fine_tune_checkpoint_type`: with value `classification` or `detection` + depending on the type. + +A list of classification checkpoints can be found +[here](tf2_classification_zoo.md) + +A list of detection checkpoints can be found [here](tf2_detection_zoo.md). + +## Local + +### Training + +A local training job can be run with the following command: + +```bash +# From the tensorflow/models/research/ directory +PIPELINE_CONFIG_PATH={path to pipeline config file} +MODEL_DIR={path to model directory} +python object_detection/model_main_tf2.py \ + --pipeline_config_path=${PIPELINE_CONFIG_PATH} \ + --model_dir=${MODEL_DIR} \ + --alsologtostderr +``` + +where `${PIPELINE_CONFIG_PATH}` points to the pipeline config and `${MODEL_DIR}` +points to the directory in which training checkpoints and events will be +written. + +### Evaluation + +A local evaluation job can be run with the following command: + +```bash +# From the tensorflow/models/research/ directory +PIPELINE_CONFIG_PATH={path to pipeline config file} +MODEL_DIR={path to model directory} +CHECKPOINT_DIR=${MODEL_DIR} +MODEL_DIR={path to model directory} +python object_detection/model_main_tf2.py \ + --pipeline_config_path=${PIPELINE_CONFIG_PATH} \ + --model_dir=${MODEL_DIR} \ + --checkpoint_dir=${CHECKPOINT_DIR} \ + --alsologtostderr +``` + +where `${CHECKPOINT_DIR}` points to the directory with checkpoints produced by +the training job. Evaluation events are written to `${MODEL_DIR/eval}` + +## Google Cloud VM + +The TensorFlow Object Detection API supports training on Google Cloud with Deep +Learning GPU VMs and TPU VMs. This section documents instructions on how to +train and evaluate your model on them. The reader should complete the following +prerequistes: + +1. The reader has create and configured a GPU VM or TPU VM on Google Cloud with + TensorFlow >= 2.2.0. See + [TPU quickstart](https://cloud.google.com/tpu/docs/quickstart) and + [GPU quickstart](https://cloud.google.com/ai-platform/deep-learning-vm/docs/tensorflow_start_instance#with-one-or-more-gpus) + +2. The reader has installed the TensorFlow Object Detection API as documented + in the [installation instructions](tf2.md#installation) on the VM. + +3. The reader has a valid data set and stored it in a Google Cloud Storage + bucket or locally on the VM. See [this page](preparing_inputs.md) for + instructions on how to generate a dataset for the PASCAL VOC challenge or + the Oxford-IIIT Pet dataset. + +Additionally, it is recommended users test their job by running training and +evaluation jobs for a few iterations [locally on their own machines](#local). + +### Training + +Training on GPU or TPU VMs is similar to local training. It can be launched +using the following command. + +```bash +# From the tensorflow/models/research/ directory +USE_TPU=true +TPU_NAME="MY_TPU_NAME" +PIPELINE_CONFIG_PATH={path to pipeline config file} +MODEL_DIR={path to model directory} +python object_detection/model_main_tf2.py \ + --pipeline_config_path=${PIPELINE_CONFIG_PATH} \ + --model_dir=${MODEL_DIR} \ + --use_tpu=${USE_TPU} \ # (optional) only required for TPU training. + --tpu_name=${TPU_NAME} \ # (optional) only required for TPU training. + --alsologtostderr +``` + +where `${PIPELINE_CONFIG_PATH}` points to the pipeline config and `${MODEL_DIR}` +points to the root directory for the files produces. Training checkpoints and +events are written to `${MODEL_DIR}`. Note that the paths can be either local or +a path to GCS bucket. + +### Evaluation + +Evaluation is only supported on GPU. Similar to local evaluation it can be +launched using the following command: + +```bash +# From the tensorflow/models/research/ directory +PIPELINE_CONFIG_PATH={path to pipeline config file} +MODEL_DIR={path to model directory} +CHECKPOINT_DIR=${MODEL_DIR} +MODEL_DIR={path to model directory} +python object_detection/model_main_tf2.py \ + --pipeline_config_path=${PIPELINE_CONFIG_PATH} \ + --model_dir=${MODEL_DIR} \ + --checkpoint_dir=${CHECKPOINT_DIR} \ + --alsologtostderr +``` + +where `${CHECKPOINT_DIR}` points to the directory with checkpoints produced by +the training job. Evaluation events are written to `${MODEL_DIR/eval}`. Note +that the paths can be either local or a path to GCS bucket. + +## Google Cloud AI Platform + +The TensorFlow Object Detection API supports also supports training on Google +Cloud AI Platform. This section documents instructions on how to train and +evaluate your model using Cloud ML. The reader should complete the following +prerequistes: + +1. The reader has created and configured a project on Google Cloud AI Platform. + See + [Using GPUs](https://cloud.google.com/ai-platform/training/docs/using-gpus) + and + [Using TPUs](https://cloud.google.com/ai-platform/training/docs/using-tpus) + guides. +2. The reader has a valid data set and stored it in a Google Cloud Storage + bucket. See [this page](preparing_inputs.md) for instructions on how to + generate a dataset for the PASCAL VOC challenge or the Oxford-IIIT Pet + dataset. + +Additionally, it is recommended users test their job by running training and +evaluation jobs for a few iterations [locally on their own machines](#local). + +### Training with multiple GPUs + +A user can start a training job on Cloud AI Platform using the following +command: + +```bash +# From the tensorflow/models/research/ directory +cp object_detection/packages/tf2/setup.py . +gcloud ai-platform jobs submit training object_detection_`date +%m_%d_%Y_%H_%M_%S` \ + --runtime-version 2.1 \ + --python-version 3.6 \ + --job-dir=gs://${MODEL_DIR} \ + --package-path ./object_detection \ + --module-name object_detection.model_main_tf2 \ + --region us-central1 \ + --master-machine-type n1-highcpu-16 \ + --master-accelerator count=8,type=nvidia-tesla-v100 \ + -- \ + --model_dir=gs://${MODEL_DIR} \ + --pipeline_config_path=gs://${PIPELINE_CONFIG_PATH} +``` + +Where `gs://${MODEL_DIR}` specifies the directory on Google Cloud Storage where +the training checkpoints and events will be written to and +`gs://${PIPELINE_CONFIG_PATH}` points to the pipeline configuration stored on +Google Cloud Storage. + +Users can monitor the progress of their training job on the +[ML Engine Dashboard](https://console.cloud.google.com/ai-platform/jobs). + +### Training with TPU + +Launching a training job with a TPU compatible pipeline config requires using a +similar command: + +```bash +# From the tensorflow/models/research/ directory +cp object_detection/packages/tf2/setup.py . +gcloud ai-platform jobs submit training `whoami`_object_detection_`date +%m_%d_%Y_%H_%M_%S` \ + --job-dir=gs://${MODEL_DIR} \ + --package-path ./object_detection \ + --module-name object_detection.model_main_tf2 \ + --runtime-version 2.1 \ + --python-version 3.6 \ + --scale-tier BASIC_TPU \ + --region us-central1 \ + -- \ + --use_tpu true \ + --model_dir=gs://${MODEL_DIR} \ + --pipeline_config_path=gs://${PIPELINE_CONFIG_PATH} +``` + +As before `pipeline_config_path` points to the pipeline configuration stored on +Google Cloud Storage (but is now must be a TPU compatible model). + +### Evaluating with GPU + +Evaluation jobs run on a single machine. Run the following command to start the +evaluation job: + +```bash +# From the tensorflow/models/research/ directory +cp object_detection/packages/tf2/setup.py . +gcloud ai-platform jobs submit training object_detection_eval_`date +%m_%d_%Y_%H_%M_%S` \ + --runtime-version 2.1 \ + --python-version 3.6 \ + --job-dir=gs://${MODEL_DIR} \ + --package-path ./object_detection \ + --module-name object_detection.model_main_tf2 \ + --region us-central1 \ + --scale-tier BASIC_GPU \ + -- \ + --model_dir=gs://${MODEL_DIR} \ + --pipeline_config_path=gs://${PIPELINE_CONFIG_PATH} \ + --checkpoint_dir=gs://${MODEL_DIR} +``` + +where `gs://${MODEL_DIR}` points to the directory on Google Cloud Storage where +training checkpoints are saved and `gs://{PIPELINE_CONFIG_PATH}` points to where +the model configuration file stored on Google Cloud Storage. Evaluation events +are written to `gs://${MODEL_DIR}/eval` + +Typically one starts an evaluation job concurrently with the training job. Note +that we do not support running evaluation on TPU. + +## Running Tensorboard + +Progress for training and eval jobs can be inspected using Tensorboard. If using +the recommended directory structure, Tensorboard can be run using the following +command: + +```bash +tensorboard --logdir=${MODEL_DIR} +``` + +where `${MODEL_DIR}` points to the directory that contains the train and eval +directories. Please note it may take Tensorboard a couple minutes to populate +with data. diff --git a/research/object_detection/g3doc/tpu_compatibility.md b/research/object_detection/g3doc/tpu_compatibility.md index 0eb0c7a20ee..411f1c55cf5 100644 --- a/research/object_detection/g3doc/tpu_compatibility.md +++ b/research/object_detection/g3doc/tpu_compatibility.md @@ -2,7 +2,7 @@ [TOC] -The Tensorflow Object Detection API supports TPU training for some models. To +The TensorFlow Object Detection API supports TPU training for some models. To make models TPU compatible you need to make a few tweaks to the model config as mentioned below. We also provide several sample configs that you can use as a template. @@ -11,7 +11,7 @@ template. ### Static shaped tensors -TPU training currently requires all tensors in the Tensorflow Graph to have +TPU training currently requires all tensors in the TensorFlow Graph to have static shapes. However, most of the sample configs in Object Detection API have a few different tensors that are dynamically shaped. Fortunately, we provide simple alternatives in the model configuration that modifies these tensors to @@ -62,7 +62,7 @@ have static shape: ### TPU friendly ops Although TPU supports a vast number of tensorflow ops, a few used in the -Tensorflow Object Detection API are unsupported. We list such ops below and +TensorFlow Object Detection API are unsupported. We list such ops below and recommend compatible substitutes. * **Anchor sampling** - Typically we use hard example mining in standard SSD diff --git a/research/object_detection/g3doc/tpu_exporters.md b/research/object_detection/g3doc/tpu_exporters.md index 0368359067e..4cc3395aea6 100644 --- a/research/object_detection/g3doc/tpu_exporters.md +++ b/research/object_detection/g3doc/tpu_exporters.md @@ -1,5 +1,7 @@ # Object Detection TPU Inference Exporter +[![TensorFlow 1.15](https://img.shields.io/badge/TensorFlow-1.15-FF6F00?logo=tensorflow)](https://github.com/tensorflow/tensorflow/releases/tag/v1.15.0) + This package contains SavedModel Exporter for TPU Inference of object detection models. diff --git a/research/object_detection/g3doc/using_your_own_dataset.md b/research/object_detection/g3doc/using_your_own_dataset.md index 23222f26e26..6192af2dda1 100644 --- a/research/object_detection/g3doc/using_your_own_dataset.md +++ b/research/object_detection/g3doc/using_your_own_dataset.md @@ -2,7 +2,7 @@ [TOC] -To use your own dataset in Tensorflow Object Detection API, you must convert it +To use your own dataset in TensorFlow Object Detection API, you must convert it into the [TFRecord file format](https://www.tensorflow.org/api_guides/python/python_io#tfrecords_format_details). This document outlines how to write a script to generate the TFRecord file.