-
Notifications
You must be signed in to change notification settings - Fork 45.3k
Description
System information
- What is the top-level directory of the model you are using:
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Google Colab
- TensorFlow installed from (source or binary): binary
- TensorFlow version (use command below): 1.15.0
- Bazel version (if compiling from source):
- CUDA/cuDNN version:
- GPU model and memory:
- Exact command to reproduce:
Describe the problem
I am trying to created a quantized model of the SSDLite MobileNet V3 small variant found in the Object Detection Model Zoo.
According to the "Running on Mobile documentation" link, I am able to generate a frozen graph with the post-processing ops. And then use that frozen graph to generate a float32 TFLite model using:
input_arrays = ["normalized_input_image_tensor"]
output_arrays = ['TFLite_Detection_PostProcess','TFLite_Detection_PostProcess:1','TFLite_Detection_PostProcess:2','TFLite_Detection_PostProcess:3']
input_shapes = {'normalized_input_image_tensor': [1, 320, 320, 3]}
converter = tf.lite.TFLiteConverter.from_frozen_graph(graph_def_file, input_arrays, output_arrays, input_shapes)
converter.allow_custom_ops = True
converter.experimental_new_converter = True
tflite_model = converter.convert()
open("ssdlite_mobilenet_v3_small_f32.tflite", "wb").write(tflite_model)
But I need a quantized version of the model. According to the same link, I can generate a quantized model using the command below:
bazel run --config=opt tensorflow/lite/toco:toco -- \
--input_file=$OUTPUT_DIR/tflite_graph.pb \
--output_file=$OUTPUT_DIR/detect.tflite \
--input_shapes=1,300,300,3 \
--input_arrays=normalized_input_image_tensor \
--output_arrays='TFLite_Detection_PostProcess','TFLite_Detection_PostProcess:1','TFLite_Detection_PostProcess:2','TFLite_Detection_PostProcess:3' \
--inference_type=QUANTIZED_UINT8 \
--mean_values=128 \
--std_values=128 \
--change_concat_input_ranges=false \
--allow_custom_ops
But I understand this will only work for models that have been quantized aware trained with this option in the pipeline.config:
graph_rewriter {
quantization {
delay: 48000
weight_bits: 8
activation_bits: 8
}
}
But the SSD MobileNet V3 model provided in the zoo doesn't have that in the pipeline.config file which I assumes mean that this is a float model. Therefore, I am trying the instructions provided in the TFLite docs here to try and perform post-training integer quantization. I switched to TF 2.x (tf-nightly) build and try the following:
converter = tf.compat.v1.lite.TFLiteConverter.from_frozen_graph(graph_def_file, input_arrays, output_arrays, input_shapes)
converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]
converter.representative_dataset = calibration_gen
converter.allow_custom_ops = True
converter.experimental_new_converter = True
tflite_model = converter.convert()
open("/content/ssd_mobilenet_v3_small_int8.tflite", "wb").write(tflite_model)
The representative dataset I have provided is made by passing 20 (320x320) images.
images = []
with os.scandir(path) as it:
for entry in it:
if entry.is_file():
input_data = imageio.imread(entry.path)
input_data = ((input_data - 127.5) / 127.5)
input_data = input_data.astype("float32")
images.append(input_data)
def calibration_gen():
for image in images:
image = np.expand_dims(image, 0)
yield [image]
The quantized model is generated without any errors. But running the model on an Android device on CPU shows many wrong detections (all donuts) and crashes the app when using NNAPI with the error below:
E/ExecutionBuilder: NN_RET_CHECK failed (frameworks/ml/nn/common/operations/Broadcast.cpp:370): output.scale > input1.scale * input2.scale (output.scale = 0.00784314, input1.scale * input2.scale = 0.00876141)
E/Utils: Validation failed for operation MUL
E/ValidateHal: Operand 53 with lifetime TEMPORARY_VARIABLE is not being written to.
E/ModelBuilder: ANeuralNetworksModel_finish called on invalid model
E/ExecutionBuilder: NN_RET_CHECK failed (frameworks/ml/nn/common/operations/Broadcast.cpp:370): output.scale > input1.scale * input2.scale (output.scale = 0.00784314, input1.scale * input2.scale = 0.0126087)
E/Utils: Validation failed for operation MUL
E/ValidateHal: Operand 50 with lifetime TEMPORARY_VARIABLE is not being written to.
E/ModelBuilder: ANeuralNetworksModel_finish called on invalid model
E/AndroidRuntime: FATAL EXCEPTION: inference
Process: com.test.app, PID: 6127
java.lang.IllegalArgumentException: Internal error: Failed to apply delegate: NN API returned error ANEURALNETWORKS_BAD_DATA at line 681 while adding operation.
NN API returned error ANEURALNETWORKS_BAD_DATA at line 3620 while finalizing the model.
NN API returned error ANEURALNETWORKS_BAD_DATA at line 681 while adding operation.
NN API returned error ANEURALNETWORKS_BAD_DATA at line 3620 while finalizing the model.
Node number 182 (TfLiteNnapiDelegate) failed to prepare.
Restored previous execution plan after delegate application failu
at org.tensorflow.lite.NativeInterpreterWrapper.applyDelegate(Native Method)
at org.tensorflow.lite.NativeInterpreterWrapper.applyDelegates(NativeInterpreterWrapper.java:336)
at org.tensorflow.lite.NativeInterpreterWrapper.init(NativeInterpreterWrapper.java:82)
at org.tensorflow.lite.NativeInterpreterWrapper.<init>(NativeInterpreterWrapper.java:63)
at org.tensorflow.lite.Interpreter.<init>(Interpreter.java:234)
Has anyone successfully quantized that model without retraining it with quantize aware training method. And if yes, how did you convert it? Was it also through passing it a representative_dataset? Did I create my representative dataset correctly?
Source code / logs
- Provided above