pkulzc Update slim and fix minor issue in object detection (#5354)
* Merged commit includes the following changes:
213899768  by Sergio Guadarrama:

    Fixes #3819.

--
213493831  by Sergio Guadarrama:

    Internal change

212057654  by Sergio Guadarrama:

    Internal change

210747685  by Sergio Guadarrama:

    For FPN, when use_depthwise is set to true, use slightly modified mobilenet v1 config.

--
210128931  by Sergio Guadarrama:

    Allow user-defined current_step in NASNet.

--
209092664  by Sergio Guadarrama:

    Add quantized fine-tuning / training / eval and export to slim image classifier binaries.

--
207651347  by Sergio Guadarrama:

    Update mobilenet v1 docs to include revised tflite models.

--
207165245  by Sergio Guadarrama:

    Internal change

207095064  by Sergio Guadarrama:

    Internal change

PiperOrigin-RevId: 213899768

* Update model_lib.py to fix eval_spec name issue.
Latest commit f505cec Sep 25, 2018

README.md

MobileNetV2

This folder contains building code for MobileNetV2, based on MobileNetV2: Inverted Residuals and Linear Bottlenecks

Performance

Latency

This is the timing of MobileNetV1 vs MobileNetV2 using TF-Lite on the large core of Pixel 1 phone.

mnet_v1_vs_v2_pixel1_latency.png

MACs

MACs, also sometimes known as MADDs - the number of multiply-accumulates needed to compute an inference on a single image is a common metric to measure the efficiency of the model.

Below is the graph comparing V2 vs a few selected networks. The size of each blob represents the number of parameters. Note for ShuffleNet there are no published size numbers. We estimate it to be comparable to MobileNetV2 numbers.

madds_top1_accuracy

Pretrained models

Imagenet Checkpoints

Classification Checkpoint MACs (M) Parameters (M) Top 1 Accuracy Top 5 Accuracy Mobile CPU (ms) Pixel 1
mobilenet_v2_1.4_224 582 6.06 75.0 92.5 138.0
mobilenet_v2_1.3_224 509 5.34 74.4 92.1 123.0
mobilenet_v2_1.0_224 300 3.47 71.8 91.0 73.8
mobilenet_v2_1.0_192 221 3.47 70.7 90.1 55.1
mobilenet_v2_1.0_160 154 3.47 68.8 89.0 40.2
mobilenet_v2_1.0_128 99 3.47 65.3 86.9 27.6
mobilenet_v2_1.0_96 56 3.47 60.3 83.2 17.6
mobilenet_v2_0.75_224 209 2.61 69.8 89.6 55.8
mobilenet_v2_0.75_192 153 2.61 68.7 88.9 41.6
mobilenet_v2_0.75_160 107 2.61 66.4 87.3 30.4
mobilenet_v2_0.75_128 69 2.61 63.2 85.3 21.9
mobilenet_v2_0.75_96 39 2.61 58.8 81.6 14.2
mobilenet_v2_0.5_224 97 1.95 65.4 86.4 28.7
mobilenet_v2_0.5_192 71 1.95 63.9 85.4 21.1
mobilenet_v2_0.5_160 50 1.95 61.0 83.2 14.9
mobilenet_v2_0.5_128 32 1.95 57.7 80.8 9.9
mobilenet_v2_0.5_96 18 1.95 51.2 75.8 6.4
mobilenet_v2_0.35_224 59 1.66 60.3 82.9 19.7
mobilenet_v2_0.35_192 43 1.66 58.2 81.2 14.6
mobilenet_v2_0.35_160 30 1.66 55.7 79.1 10.5
mobilenet_v2_0.35_128 20 1.66 50.8 75.0 6.9
mobilenet_v2_0.35_96 11 1.66 45.5 70.4 4.5

Training

The numbers above can be reproduced using slim's train_image_classifier. Below is the set of parameters that achieves 72.0% for full size MobileNetV2, after about 700K when trained on 8 GPU. If trained on a single GPU the full convergence is after 5.5M steps. Also note that learning rate and num_epochs_per_decay both need to be adjusted depending on how many GPUs are being used due to slim's internal averaging.

--model_name="mobilenet_v2"
--learning_rate=0.045 * NUM_GPUS   #slim internally averages clones so we compensate
--preprocessing_name="inception_v2"
--label_smoothing=0.1
--moving_average_decay=0.9999
--batch_size= 96
--num_clones = NUM_GPUS # you can use any number here between 1 and 8 depending on your hardware setup.
--learning_rate_decay_factor=0.98
--num_epochs_per_decay = 2.5 / NUM_GPUS # train_image_classifier does per clone epochs

Example

See this ipython notebook or open and run the network directly in Colaboratory.