Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal compiler error #31368

Closed
DocDriven opened this issue Aug 6, 2019 · 20 comments
Closed

Internal compiler error #31368

DocDriven opened this issue Aug 6, 2019 · 20 comments
Assignees
Labels
comp:lite TF Lite related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 1.14 for issues seen with TF 1.14 type:bug Bug

Comments

@DocDriven
Copy link

Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: -
  • TensorFlow installed from (source or binary): Docker image latest-gpu-py3
  • TensorFlow version (use command below): 1.14.0
  • Python version: 3.6
  • Bazel version (if compiling from source): -
  • GCC/Compiler version (if compiling from source): -
  • CUDA/cuDNN version: 10.1
  • GPU model and memory: RTX 2080 Ti / 12 GB

I have created a fully-quantized tf lite model from a saved model. But trying to compile it with the edgetpu_compiler, I get an error:

user@ubuntu:~/tf/tensorflow1_14$ edgetpu_compiler saved_converted_linearmodel_tpu_1.14.0.tflite 
Edge TPU Compiler version 2.0.258810407
INFO: Initialized TensorFlow Lite runtime.

Internal compiler error. Aborting!

Error message is unfortunately not very helpful. The non-compiled version is loadable and produces the correct results.

I have attached the model that I try to compile, as well as its visualization (via visualize.py).

litemodel.tar.gz

@oanush oanush self-assigned this Aug 7, 2019
@oanush oanush added comp:lite TF Lite related issues TF 1.14 for issues seen with TF 1.14 type:bug Bug labels Aug 7, 2019
@oanush oanush assigned suharshs and unassigned oanush Aug 7, 2019
@cuongdv1
Copy link

cuongdv1 commented Sep 4, 2019

@DocDriven I have same issue.
have you solved it? could you share your solution?

@DocDriven
Copy link
Author

@cuongdv1 Unfortunately, I haven't figured it out yet because as far as I know, the source code for the compiler is not open source. Therefore, I couldn't debug it. Best bet is to wait for a new release of the compiler and try again.

@jbrownkramer
Copy link

I'll add that I get this error when I try to compile an object detection tflite model produced by Google Cloud AutoML. Also using Edge TPU Compiler version 2.0.258810407

@DocDriven
Copy link
Author

DocDriven commented Sep 11, 2019

Are there any updates on this topic? I have come across this problem multiple times now, even with networks that are shipped with keras (e.g. VGG16). The test code for this is below.

import os
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.applications import mobilenet, resnet50, inception_v3, vgg16
from tensorflow.keras.preprocessing.image import load_img
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.applications.resnet50 import decode_predictions


### Load and test model

imagenet_dir = './tiny-imagenet-200/test/images'

vgg16_model = vgg16.VGG16(weights='imagenet')
print(vgg16_model.summary())

filename = './bird.jpg'
original = load_img(filename, target_size=(224, 224))
numpy_image = img_to_array(original)
image_batch = np.expand_dims(numpy_image, axis=0)
processed_image = vgg16.preprocess_input(image_batch.copy())

predictions = vgg16_model.predict(processed_image)
label = decode_predictions(predictions)
print(label)

keras_file = 'vgg16.h5'
keras.models.save_model(vgg16_model, keras_file)


### TF lite conversion

def representative_dataset_gen():
	for image in os.listdir('./tiny-imagenet-200/test/images')[:500]:
		original = load_img(os.path.join('./tiny-imagenet-200/test/images', image), target_size=(224, 224))
		numpy_image = img_to_array(original)
		image_batch = np.expand_dims(numpy_image, axis=0)
		processed_image = vgg16.preprocess_input(image_batch.copy())
		print(processed_image.shape)
		print(type(processed_image))
		yield [processed_image]

converter = tf.compat.v1.lite.TFLiteConverter.from_keras_model_file(keras_file)
converter.representative_dataset = representative_dataset_gen
converter.optimizations = [tf.lite.Optimize.DEFAULT]

tflite_model = converter.convert()
open("vgg16_fiq.tflite", "wb").write(tflite_model)


### Test tflite model

interpreter = tf.lite.Interpreter(model_path="vgg16.tflite")
interpreter.allocate_tensors()

input_detail = interpreter.get_input_details()[0]
output_detail = interpreter.get_output_details()[0]
print('Input detail: ', input_detail)
print('Output detail: ', output_detail)

interpreter.set_tensor(input_detail['index'], processed_image)
interpreter.invoke()
pred_litemodel = interpreter.get_tensor(output_detail['index'])
label_lite = decode_predictions(pred_litemodel)
print(label_lite)

I used the Tiny ImageNet dataset for the post training quantization. Also, my test picture is the one from the Coral demo, which I have attached. It should output magpie (bird.jpg).

I can produce a tflite file with this code, but the TPU compiler throws the "Internal compiler error" again. Can you please confirm to me, if this is reproducable?

@Lap1n
Copy link

Lap1n commented Sep 25, 2019

I have the same error using tensorflow 2.0 nightly and tensorflow 1.0 nightly. Any update on this? Since the error is very generic, it is very hard to debug...

@DocDriven
Copy link
Author

@Lap1n
At least for the VGG16 model in my previous post, I was able to compile it with the new compiler version 2.0.267685300. Unfortunately, this did not resolve the original problem for me. Tested it with the TF 1.15 nightly docker image.

@ynorz
Copy link

ynorz commented Nov 7, 2019

I had the same error using the MobileNet v2 model in Keras with the tiny-imagenet-200 dataset. The TPU compiler version was 2.0.267685300. The quantized tflite file was produced successfully, but it cannot be compiled.

@bhavitvyamalik
Copy link

@ynorz can you show me the code you used for converting and quantizing your model?

@oanush oanush added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Nov 7, 2019
@ynorz
Copy link

ynorz commented Nov 7, 2019

@ynorz can you show me the code you used for converting and quantizing your model?

def get_label(file_path):
  # convert the path to a list of path components
  parts = tf.strings.split(file_path, '/')
  # The second to last is the class-directory
  return parts[-3] == CLASS_NAMES

def decode_img(img):
  # convert the compressed string to a 3D uint8 tensor
  img = tf.image.decode_jpeg(img, channels=3)
  # Use `convert_image_dtype` to convert to floats in the [0,1] range.
  img = tf.image.convert_image_dtype(img, tf.float32)
  # resize the image to the desired size.
  return tf.image.resize(img, [224, 224])

def process_path(file_path):
  label = get_label(file_path)
  # load the raw data from the file as a string
  img = tf.io.read_file(file_path)
  img = decode_img(img)
  return label, img

data_dir = '/my_data_dir'
data_dir = pathlib.Path(data_dir)
list_ds = tf.data.Dataset.list_files(str(data_dir/'*/*/*'))
image_count = len(list(data_dir.glob('*/*/*.JPEG')))
CLASS_NAMES = np.array([item.name for item in data_dir.glob('*')])
labeled_ds = list_ds.map(process_path, num_parallel_calls=100)
tf.compat.v1.enable_eager_execution()
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
def representative_data_gen():
    for _,image in labeled_ds.take(100):
        image = tf.expand_dims(image, 0)
        yield [image]

converter.representative_dataset = tf.lite.RepresentativeDataset(representative_data_gen)
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
converted_tflite_model = converter.convert()
open(TFLITE_MODEL,"wb").write(converted_tflite_model)

@ynorz
Copy link

ynorz commented Nov 8, 2019

@bhavitvyamalik
The tflite model I got from this code could run on CPU but compiling it would trigger the compiler error.

@bhavitvyamalik
Copy link

There can be 2 possibilities why your model is giving internal compiler error.

  1. Most basic of all is you have not done your modeling right and some operators are not supported by Edge TPU while you are compiling it. I remember my custom model giving the same error when it had only basic operators which are supported by Edge TPU. So make sure your model is supported which you are converting. If it applies then move on to next point.

  2. Versioning error. I tried converting my code in Tensorflow2.0 which didn't run in Edge TPU even after compiling. Make sure you are using Tensorflow 1.15.0 for converting and quantizing. This was my code for the same:

def representative_data_gen():
  for input_value in mnist_data[:100]:     //mnist_data list contains images which were resized during appending
    data = np.array([input_value])
    yield [data]

opt = tf.lite.Optimize.DEFAULT
ops = tf.lite.OpsSet.TFLITE_BUILTINS_INT8
dtype = tf.uint8

converter = tf.lite.TFLiteConverter.from_keras_model_file(model_path)

converter.optimizations = [opt]
converter.representative_dataset = representative_data_gen
converter.target_spec.supported_ops = [ops]
converter.inference_input_type = dtype
converter.inference_output_type = dtype

tflite_quant_model = converter.convert()
open("model_quantised.tflite", "wb").write(tflite_quant_model)

@ynorz
Copy link

ynorz commented Nov 21, 2019

https://colab.research.google.com/github/tensorflow/tensorflow/blob/master/tensorflow/lite/g3doc/performance/post_training_integer_quant.ipynb

I tried to do the post-training integer quantization following the official guide on MNIST. And this guide can only run on tensorflow 1.15.0, which, to some extent, did prove your point that tensorflow 1.15 worked better. However, I still get the internal compiler error with compiler version: 2.0.267685300.

@bhavitvyamalik
Copy link

If you are getting internal compiler error then your operations are supported by the Edge TPU during compilation. However it can still run on the CPU of your Edge TPU but it'll increase the inference time to a large extent.

Most importantly, you can compile only these models successfully on Edge TPU:

  • Mobilenet_v1
  • Mobilenet_v2
  • Inception_v3
  • ResNet50

If you'll use any other model, it might not work properly. Try using one of these models followed by quantization using the code I posted earlier. It should work flawlessly.

@yanghaoyue001
Copy link

Figured out a solution that sounds stupid. I moved the folder 'models' with .tflite file to '/home/username/edgetpu', and then the compiler works with the same compile code provided on the official website. This 'edgetpu' folder was created through a beginner object detection retrain example using dataset of American bulldog and Abyssinian provided on the official website.

My setup: custom dataset, mobilenet_v1 or mobilenet_v2 downloaded from coral website, coral accelerator.

@jdcast
Copy link

jdcast commented Feb 9, 2020

Figured out a solution that sounds stupid. I moved the folder 'models' with .tflite file to '/home/username/edgetpu', and then the compiler works with the same compile code provided on the official website. This 'edgetpu' folder was created through a beginner object detection retrain example using dataset of American bulldog and Abyssinian provided on the official website.

My setup: custom dataset, mobilenet_v1 or mobilenet_v2 downloaded from coral website, coral accelerator.

On this same example code, I initially received this error after compiling with edgetpu_compiler output_tflite_graph.tflite:

Edge TPU Compiler version 2.0.291256449

Model compiled successfully in 276 ms.

Input model: output_tflite_graph.tflite
Input size: 5.34MiB
Output model: output_tflite_graph_edgetpu.tflite
Output size: 5.75MiB
On-chip memory available for caching model parameters: 7.62MiB
On-chip memory used for caching model parameters: 5.66MiB
Off-chip memory used for streaming uncached model parameters: 0.00B
Number of Edge TPU subgraphs: 1
Total number of operations: 64
Operation log: output_tflite_graph_edgetpu.log

Model successfully compiled but not all operations are supported by the Edge TPU. A percentage of the model will instead run on the CPU, which is slower. If possible, consider updating your model to use only operations supported by the Edge TPU. For details, visit g.co/coral/model-reqs.
Number of operations that will run on Edge TPU: 63
Number of operations that will run on CPU: 1
See the operation log file for individual operation details.
Error opening file for writing: output_tflite_graph_edgetpu.tflite

Internal compiler error. Aborting!

But was able to get around it after I ran with sudo, which gives the following output:

Edge TPU Compiler version 2.0.291256449

Model compiled successfully in 341 ms.

Input model: output_tflite_graph.tflite
Input size: 5.34MiB
Output model: output_tflite_graph_edgetpu.tflite
Output size: 5.75MiB
On-chip memory available for caching model parameters: 7.62MiB
On-chip memory used for caching model parameters: 5.66MiB
Off-chip memory used for streaming uncached model parameters: 0.00B
Number of Edge TPU subgraphs: 1
Total number of operations: 64
Operation log: output_tflite_graph_edgetpu.log

Model successfully compiled but not all operations are supported by the Edge TPU. A percentage of the model will instead run on the CPU, which is slower. If possible, consider updating your model to use only operations supported by the Edge TPU. For details, visit g.co/coral/model-reqs.
Number of operations that will run on Edge TPU: 63
Number of operations that will run on CPU: 1
See the operation log file for individual operation details.

Note: I didn't have to move the files around as mentioned in the previous post.

@anilsathyan7
Copy link

anilsathyan7 commented Feb 19, 2020

I'am having same issue with a tflite model with transpose convolution. Tensorflow 1.x does not seem to support transpose convolution. With latest tf2.0-nighlty quantized tflite model it gives the error: 'Internal compiler error. Aborting!. It would be helpful, if the compiler exactly prints the cause of failure i.e. if some operators or it's version is not supported. It seem to works until some random convolutional layer(602) and produces compiler error after its inclusion!!!

pnet_test.tflite.zip

@tekotan
Copy link

tekotan commented May 5, 2020

I'am having same issue with a tflite model with transpose convolution. Tensorflow 1.x does not seem to support transpose convolution. With latest tf2.0-nighlty quantized tflite model it gives the error: 'Internal compiler error. Aborting!. It would be helpful, if the compiler exactly prints the cause of failure i.e. if some operators or it's version is not supported. It seem to works until some random convolutional layer(602) and produces compiler error after its inclusion!!!

pnet_test.tflite.zip

I am also having a similar issue. I have a custom model which uses transpose convolution that I want to compile for edge tpu. Was there any solution?

@danieldanuega
Copy link

I'am having same issue with a tflite model with transpose convolution. Tensorflow 1.x does not seem to support transpose convolution. With latest tf2.0-nighlty quantized tflite model it gives the error: 'Internal compiler error. Aborting!. It would be helpful, if the compiler exactly prints the cause of failure i.e. if some operators or it's version is not supported. It seem to works until some random convolutional layer(602) and produces compiler error after its inclusion!!!

pnet_test.tflite.zip

Have you tried edgetpu_compiler -s your_tflite_graph.tflite?
It prints out the log

@tensorflowbutler
Copy link
Member

Hi There,

We are checking to see if you still need help on this, as you are using an older version of tensorflow which is officially considered end of life . We recommend that you upgrade to the latest 2.x version and let us know if the issue still persists in newer versions. Please open a new issue for any help you need against 2.x, and we will get you the right help.

This issue will be closed automatically 7 days from now. If you still need help with this issue, please provide us with more information.

@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:lite TF Lite related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 1.14 for issues seen with TF 1.14 type:bug Bug
Projects
None yet
Development

No branches or pull requests