Error when creating Quantized SSDLite MobileNet V3 TFLite model using Post training Integer Quantization method

# System information
- **What is the top-level directory of the model you are using**: 
- **Have I written custom code (as opposed to using a stock example script provided in TensorFlow)**:
- **OS Platform and Distribution (e.g., Linux Ubuntu 16.04)**: Google Colab
- **TensorFlow installed from (source or binary)**: binary
- **TensorFlow version (use command below)**: 1.15.0
- **Bazel version (if compiling from source)**:
- **CUDA/cuDNN version**:
- **GPU model and memory**:
- **Exact command to reproduce**:

### Describe the problem
I am trying to created a quantized model of the SSDLite MobileNet V3 [small variant](http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v3_small_coco_2019_08_14.tar.gz) found in the Object Detection Model Zoo.

According to the "Running on Mobile documentation" [link](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tensorflowlite.md), I am able to generate a frozen graph with the post-processing ops. And then use that frozen graph to generate a float32 TFLite model using:
```
input_arrays = ["normalized_input_image_tensor"]
output_arrays = ['TFLite_Detection_PostProcess','TFLite_Detection_PostProcess:1','TFLite_Detection_PostProcess:2','TFLite_Detection_PostProcess:3']
input_shapes = {'normalized_input_image_tensor': [1, 320, 320, 3]}

converter = tf.lite.TFLiteConverter.from_frozen_graph(graph_def_file, input_arrays, output_arrays, input_shapes)
converter.allow_custom_ops = True
converter.experimental_new_converter = True
tflite_model = converter.convert()
open("ssdlite_mobilenet_v3_small_f32.tflite", "wb").write(tflite_model)
```

But I need a quantized version of the model. According to the same link, I can generate a quantized model using the command below: 
```
bazel run --config=opt tensorflow/lite/toco:toco -- \
--input_file=$OUTPUT_DIR/tflite_graph.pb \
--output_file=$OUTPUT_DIR/detect.tflite \
--input_shapes=1,300,300,3 \
--input_arrays=normalized_input_image_tensor \
--output_arrays='TFLite_Detection_PostProcess','TFLite_Detection_PostProcess:1','TFLite_Detection_PostProcess:2','TFLite_Detection_PostProcess:3' \
--inference_type=QUANTIZED_UINT8 \
--mean_values=128 \
--std_values=128 \
--change_concat_input_ranges=false \
--allow_custom_ops
```

But I understand this will only work for models that have been quantized aware trained with this option in the pipeline.config:
```
graph_rewriter {
  quantization {
    delay: 48000
    weight_bits: 8
    activation_bits: 8
  }
}
```

But the SSD MobileNet V3 model provided in the zoo doesn't have that in the pipeline.config file which I assumes mean that this is a float model. Therefore, I am trying the instructions provided in the TFLite docs [here](https://www.tensorflow.org/lite/performance/post_training_integer_quant) to try and perform post-training integer quantization. I switched to TF 2.x (tf-nightly) build and try the following:
```
converter = tf.compat.v1.lite.TFLiteConverter.from_frozen_graph(graph_def_file, input_arrays, output_arrays, input_shapes)
converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]
converter.representative_dataset = calibration_gen
converter.allow_custom_ops = True
converter.experimental_new_converter = True
tflite_model = converter.convert()
open("/content/ssd_mobilenet_v3_small_int8.tflite", "wb").write(tflite_model)
```
The representative dataset I have provided is made by passing 20 (320x320) images.
```
images = []
with os.scandir(path) as it:
  for entry in it:
    if entry.is_file():
      input_data = imageio.imread(entry.path)
      input_data = ((input_data - 127.5) / 127.5)
      input_data = input_data.astype("float32")
      images.append(input_data)

def calibration_gen():
  for image in images:
    image = np.expand_dims(image, 0)
    yield [image]
```

The quantized model is generated without any errors. But running the model on an Android device on CPU shows many wrong detections (all donuts) and crashes the app when using NNAPI with the error below:
```
E/ExecutionBuilder: NN_RET_CHECK failed (frameworks/ml/nn/common/operations/Broadcast.cpp:370): output.scale > input1.scale * input2.scale (output.scale = 0.00784314, input1.scale * input2.scale = 0.00876141) 
E/Utils: Validation failed for operation MUL
E/ValidateHal: Operand 53 with lifetime TEMPORARY_VARIABLE is not being written to.
E/ModelBuilder: ANeuralNetworksModel_finish called on invalid model
E/ExecutionBuilder: NN_RET_CHECK failed (frameworks/ml/nn/common/operations/Broadcast.cpp:370): output.scale > input1.scale * input2.scale (output.scale = 0.00784314, input1.scale * input2.scale = 0.0126087) 
E/Utils: Validation failed for operation MUL
E/ValidateHal: Operand 50 with lifetime TEMPORARY_VARIABLE is not being written to.
E/ModelBuilder: ANeuralNetworksModel_finish called on invalid model
E/AndroidRuntime: FATAL EXCEPTION: inference
    Process: com.test.app, PID: 6127
    java.lang.IllegalArgumentException: Internal error: Failed to apply delegate: NN API returned error ANEURALNETWORKS_BAD_DATA at line 681 while adding operation.
    
    NN API returned error ANEURALNETWORKS_BAD_DATA at line 3620 while finalizing the model.
    
    NN API returned error ANEURALNETWORKS_BAD_DATA at line 681 while adding operation.
    
    NN API returned error ANEURALNETWORKS_BAD_DATA at line 3620 while finalizing the model.
    
    Node number 182 (TfLiteNnapiDelegate) failed to prepare.
    
    Restored previous execution plan after delegate application failu
        at org.tensorflow.lite.NativeInterpreterWrapper.applyDelegate(Native Method)
        at org.tensorflow.lite.NativeInterpreterWrapper.applyDelegates(NativeInterpreterWrapper.java:336)
        at org.tensorflow.lite.NativeInterpreterWrapper.init(NativeInterpreterWrapper.java:82)
        at org.tensorflow.lite.NativeInterpreterWrapper.<init>(NativeInterpreterWrapper.java:63)
        at org.tensorflow.lite.Interpreter.<init>(Interpreter.java:234)
```

Has anyone successfully quantized that model without retraining it with quantize aware training method. And if yes, how did you convert it? Was it also through passing it a representative_dataset? Did I create my representative dataset correctly?

### Source code / logs
- Provided above

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when creating Quantized SSDLite MobileNet V3 TFLite model using Post training Integer Quantization method #8128

System information

Describe the problem

Source code / logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Error when creating Quantized SSDLite MobileNet V3 TFLite model using Post training Integer Quantization method #8128

Description

System information

Describe the problem

Source code / logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions