Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TF Lite conversion of minimal graph with tf.matmul fails on Linux but works on MacOS #27640

Open
sklampfl opened this issue Apr 8, 2019 · 11 comments

Comments

@sklampfl
Copy link

commented Apr 8, 2019

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
    16.04.6 LTS (GNU/Linux 4.15.0-47-generic x86_64) and macOS Mojave Version 10.14.4
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: N/A
  • TensorFlow installed from (source or binary): pip install tf-nightly
  • TensorFlow version (use command below): 1.14.1-dev20190408
  • Python version: 2.7.16 (Mac) and 2.7.12 (Linux)
  • Bazel version (if compiling from source): N/A
  • GCC/Compiler version (if compiling from source): N/A
  • CUDA/cuDNN version: N/A
  • GPU model and memory: N/A

Describe the current behavior
The example code below creates a minimal TensorFlow graph that computes a tf.matmul between two input matrices and exports the graph to TensorFlow Lite from the current session via the Python API. It invokes the TF Lite Interpreter on example input and compares the output to the result of session.run.

The code works on Mac, but fails on Linux during TF Lite conversion (see logs below).

Describe the expected behavior
It should work (or at least behave the same) on both operating systems.

I know that the TF Lite Operator Compatibility states:

tf.matmul - as long as the second argument is constant and transposition is not used

This is not the case here, but it still curious that it works on Mac.

Code to reproduce the issue

import numpy
import tensorflow as tf

def export_tflite_from_session(session, input_nodes, output_nodes, tflite_filename):
    print("Converting to tflite...")
    converter = tf.lite.TFLiteConverter.from_session(session, input_nodes, output_nodes)
    tflite_model = converter.convert()
    with open(tflite_filename, "wb") as f:
        f.write(tflite_model)
    print("Converted %s." % tflite_filename)

def test_tflite_model(tflite_filename, examples):
    print("Loading TFLite interpreter for %s..." % tflite_filename)
    interpreter = tf.lite.Interpreter(model_path=tflite_filename)
    interpreter.allocate_tensors()
    input_details = interpreter.get_input_details()
    output_details = interpreter.get_output_details()
    print("input details: %s" % input_details)
    print("output details: %s" % output_details)

    for i, input_tensor in enumerate(input_details):
        interpreter.set_tensor(input_tensor['index'], examples[i])
    interpreter.invoke()
    model_output = []
    for i, output_tensor in enumerate(output_details):
        model_output.append(interpreter.get_tensor(output_tensor['index']))
    return model_output

def main():
    tflite_filename = "model.tflite"
    shape_a = (2, 3, 4)
    shape_b = (2, 4, 5)

    a = tf.placeholder(dtype=tf.float32, shape=shape_a, name="A")
    b = tf.placeholder(dtype=tf.float32, shape=shape_b, name="B")
    c = tf.matmul(a, b, name="output")

    numpy.random.seed(1234)
    a_ = numpy.random.rand(*shape_a).astype(numpy.float32)
    b_ = numpy.random.rand(*shape_b).astype(numpy.float32)
    with tf.Session() as session:
        session_output = session.run(c, feed_dict={a: a_, b: b_})
        export_tflite_from_session(session, [a, b], [c], tflite_filename)

    tflite_output = test_tflite_model(tflite_filename, [a_, b_])
    tflite_output = tflite_output[0]

    print("Input example:")
    print(a_)
    print(a_.shape)
    print(b_)
    print(b_.shape)
    print("Session output:")
    print(session_output)
    print(session_output.shape)
    print("TFLite output:")
    print(tflite_output)
    print(tflite_output.shape)
    print(numpy.allclose(session_output, tflite_output))

if __name__ == '__main__':
    main()

Other info / logs
Output on Mac:

2019-04-08 14:46:05.835019: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Converting to tflite...
2019-04-08 14:46:05.837757: I tensorflow/core/grappler/devices.cc:53] Number of eligible GPUs (core count >= 8): 0 (Note: TensorFlow was not compiled with CUDA support)
2019-04-08 14:46:05.837803: I tensorflow/core/grappler/clusters/single_machine.cc:359] Starting new session
2019-04-08 14:46:05.839940: I tensorflow/core/grappler/devices.cc:53] Number of eligible GPUs (core count >= 8): 0 (Note: TensorFlow was not compiled with CUDA support)
2019-04-08 14:46:05.839979: I tensorflow/core/grappler/clusters/single_machine.cc:359] Starting new session
Converted model.tflite.
Loading TFLite interpreter for model.tflite...
INFO: Initialized TensorFlow Lite runtime.
input details: [{'index': 0, 'shape': array([2, 3, 4], dtype=int32), 'quantization': (0.0, 0L), 'name': 'A', 'dtype': <type 'numpy.float32'>}, {'index': 1, 'shape': array([2, 4, 5], dtype=int32), 'quantization': (0.0, 0L), 'name': 'B', 'dtype': <type 'numpy.float32'>}]
output details: [{'index': 2, 'shape': array([2, 3, 5], dtype=int32), 'quantization': (0.0, 0L), 'name': 'output', 'dtype': <type 'numpy.float32'>}]
Input example:
[[[0.19151945 0.62210876 0.43772775 0.7853586 ]
  [0.77997583 0.2725926  0.27646425 0.8018722 ]
  [0.95813936 0.87593263 0.35781726 0.5009951 ]]

 [[0.6834629  0.71270204 0.37025076 0.5611962 ]
  [0.50308317 0.01376845 0.7728266  0.8826412 ]
  [0.364886   0.6153962  0.07538124 0.368824  ]]]
(2, 3, 4)
[[[0.9331401  0.65137815 0.39720258 0.78873014 0.31683612]
  [0.56809866 0.8691274  0.4361734  0.8021476  0.14376682]
  [0.70426095 0.7045813  0.21879211 0.92486763 0.44214076]
  [0.90931594 0.05980922 0.18428709 0.04735528 0.6748809 ]]

 [[0.59462476 0.5333102  0.04332406 0.5614331  0.32966843]
  [0.5029668  0.11189432 0.6071937  0.5659447  0.00676406]
  [0.6174417  0.9121229  0.7905241  0.99208146 0.95880175]
  [0.7919641  0.28525096 0.62491673 0.4780938  0.19567518]]]
(2, 4, 5)
Session output:
[[[1.5545473  1.0208298  0.58792216 1.0921113  0.87367964]
  [1.8065444  0.98772776 0.636969   1.1275158  0.94971865]
  [2.0992541  1.6674836  0.93324846 1.812999   0.9258208 ]]

 [[1.437925   0.942041   1.1057518  1.422692   0.69494617]
  [1.4822663  1.226527   1.1926711  1.4789319  1.0796423 ]
  [0.86513305 0.43742114 0.679548   0.8042561  0.26889932]]]
(2, 3, 5)
TFLite output:
[[[1.5545473  1.0208298  0.58792216 1.0921113  0.87367964]
  [1.8065444  0.98772776 0.636969   1.1275158  0.94971865]
  [2.0992541  1.6674836  0.93324846 1.812999   0.9258208 ]]

 [[1.437925   0.942041   1.1057518  1.422692   0.69494617]
  [1.4822663  1.226527   1.1926711  1.4789319  1.0796423 ]
  [0.86513305 0.43742114 0.679548   0.8042561  0.26889932]]]
(2, 3, 5)
True

Output on Linux:

2019-04-08 14:47:09.730317: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-04-08 14:47:09.734305: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-04-08 14:47:10.718760: E tensorflow/stream_executor/cuda/cuda_driver.cc:320] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2019-04-08 14:47:10.718805: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:166] retrieving CUDA diagnostic information for host: everest6
2019-04-08 14:47:10.718811: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:173] hostname: everest6
2019-04-08 14:47:10.718867: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:197] libcuda reported version is: 410.104.0
2019-04-08 14:47:10.718890: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:201] kernel reported version is: 410.104.0
2019-04-08 14:47:10.718896: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:308] kernel version seems to match DSO: 410.104.0
2019-04-08 14:47:10.737178: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 4200000000 Hz
2019-04-08 14:47:10.737608: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x52d4340 executing computations on platform Host. Devices:
2019-04-08 14:47:10.737622: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2019-04-08 14:47:10.738962: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1288] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set.  If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
Converting to tflite...
2019-04-08 14:47:10.739692: I tensorflow/core/grappler/devices.cc:50] Number of eligible GPUs (core count >= 8): 0
2019-04-08 14:47:10.739747: I tensorflow/core/grappler/clusters/single_machine.cc:359] Starting new session
2019-04-08 14:47:10.741001: I tensorflow/core/grappler/devices.cc:50] Number of eligible GPUs (core count >= 8): 0
2019-04-08 14:47:10.741033: I tensorflow/core/grappler/clusters/single_machine.cc:359] Starting new session
Traceback (most recent call last):
  File "minimal_tflite_test.py", line 67, in <module>
    main()
  File "minimal_tflite_test.py", line 47, in main
    export_tflite_from_session(session, [a, b], [c], tflite_filename)
  File "minimal_tflite_test.py", line 9, in export_tflite_from_session
    tflite_model = converter.convert()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/lite/python/lite.py", line 742, in convert
    **converter_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/lite/python/convert.py", line 410, in toco_convert_impl
    input_data.SerializeToString())
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/lite/python/convert.py", line 176, in toco_convert_protos
    "TOCO failed. See console for info.\n%s\n%s\n" % (stdout, stderr))
tensorflow.lite.python.convert.ConverterError: TOCO failed. See console for info.
2019-04-08 14:47:11.490702: I tensorflow/lite/toco/graph_transformations/graph_transformations.cc:39] Before Removing unused ops: 1 operators, 3 arrays (0 quantized)
2019-04-08 14:47:11.490766: I tensorflow/lite/toco/graph_transformations/graph_transformations.cc:39] Before general graph transformations: 1 operators, 3 arrays (0 quantized)
2019-04-08 14:47:11.490876: I tensorflow/lite/toco/graph_transformations/graph_transformations.cc:39] After general graph transformations pass 1: 11 operators, 25 arrays (0 quantized)
2019-04-08 14:47:11.490918: F tensorflow/lite/toco/graph_transformations/propagate_fixed_sizes.cc:1800] Check failed: axis >= 0 (-323499096 vs. 0)
Aborted (core dumped)

The error occurs during TF Lite conversion. The erroneous axis value (-323499096) is different every time the script is called. If the value is positive, the error is:

2019-04-08 15:14:43.877396: F tensorflow/lite/toco/graph_transformations/propagate_fixed_sizes.cc:1801] Check failed: axis < input_shape.dimensions_count() (539352088 vs. 3)
@amitsrivastava78

This comment has been minimized.

Copy link
Contributor

commented Apr 11, 2019

@sklampfl , i ran your code and it works fine on my ubuntu 18.04.2 LTS with python version 3.6.7and tensorflow version 1.13.1. To me this looks like the configuration issue only.

Regards
Amit

@sklampfl

This comment has been minimized.

Copy link
Author

commented Apr 11, 2019

@amitsrivastava78 Thank you for checking this out! I also have the same problem with Python3 (3.5.2) and tensorflow 1.13.1 (with or without gpu). Do you have a specific configuration issue in mind?

Since the error occurs on C++ level, I believe there might be an issue with system libraries then, rather than Python dependencies.

@sklampfl

This comment has been minimized.

Copy link
Author

commented Apr 11, 2019

I quickly created a small Google Cloud instance with Ubuntu 18.04.2 LTS, Python 3.6.7, and tensorflow 1.13.1, and ran into the same issue there.

@amitsrivastava78 do you have any specific libraries installed?

@amitsrivastava78

This comment has been minimized.

Copy link
Contributor

commented Apr 11, 2019

@sklampfl apart from this i have just CUDA 9.2 version installed on my system. which i think should not be the issue try g++ vesion mentioned below : -
and g++ (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0

Regards
Amit

@sklampfl

This comment has been minimized.

Copy link
Author

commented Apr 15, 2019

I don't think g++ is relevant either, since I installed tensorflow via pip, but for the sake of completeness this is the result of g++ --version on both systems:
MacOS: Apple LLVM version 10.0.1 (clang-1001.0.46.3) Target: x86_64-apple-darwin18.5.0
Linux: g++ (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
On the cloud instance there was no g++ installed

@amitsrivastava78

This comment has been minimized.

Copy link
Contributor

commented Apr 16, 2019

@sklampfl , We can try one more thing, lets build the tensorflow from source(and install it) and then try to run the convert_test.py cases can check if you still face the same problem. If not then try to run the example code that you posted, i am pasting my configuration file(.tf_configure.bazelrc) as below , you already know my python and tensorflow configuration : -

build --action_env PYTHON_BIN_PATH="/usr/bin/python"
build --action_env PYTHON_LIB_PATH="/usr/lib/python3/dist-packages"
build --python_path="/usr/bin/python"
build:xla --define with_xla_support=true
build --action_env TF_NEED_OPENCL_SYCL="0"
build --action_env TF_NEED_ROCM="0"
build --action_env TF_NEED_CUDA="1"
build --action_env CUDA_TOOLKIT_PATH="/usr/local/cuda"
build --action_env TF_CUDA_VERSION="9.2"
build --action_env CUDNN_INSTALL_PATH="/usr/lib/x86_64-linux-gnu"
build --action_env TF_CUDNN_VERSION="7"
build --action_env TF_NCCL_VERSION=""
build --action_env TF_CUDA_COMPUTE_CAPABILITIES="6.1"
build --action_env LD_LIBRARY_PATH="/usr/local/cuda-9.2/lib64:/usr/local/cuda-9.2/extras/CUPTI/lib64:"
build --action_env TF_CUDA_CLANG="0"
build --action_env GCC_HOST_COMPILER_PATH="/usr/bin/gcc"
build --config=cuda
test --config=cuda
build:opt --copt=-march=native
build:opt --copt=-Wno-sign-compare
build:opt --host_copt=-march=native
build:opt --define with_default_optimizations=true
build:v2 --define=tf_api_version=2
test --flaky_test_attempts=3
test --test_size_filters=small,medium
test --test_tag_filters=-benchmark-test,-no_oss,-oss_serial
test --build_tag_filters=-benchmark-test,-no_oss
test --test_tag_filters=-gpu
test --build_tag_filters=-gpu
build --action_env TF_CONFIGURE_IOS="0"

Regards
Amit

@sklampfl

This comment has been minimized.

Copy link
Author

commented Apr 18, 2019

@amitsrivastava78 Thank you for investing your time in this.

I managed to get it to work on a fresh Ubuntu 18.04 instance, with Python 3.6.7 and TensorFlow compiled from source (both r1.13 and master, with gcc 7.3.0).

It still does not work on my original Ubuntu 16.04 machine, even if I compile TensorFlow from source (Python 2 or Python 3). I have not yet figured out which configuration breaks it (e.g. gcc version? other libraries? exact Python version?).

I can work with Ubuntu 18.04 for now, but I think it is still a bug that some TFLite conversions only work under certain conditions.

PS: Thank you for pointing out the convert_test.py script. These test cases are always successful for me, regardless of whether the minimal test I posted fails or not.

@amitsrivastava78

This comment has been minimized.

Copy link
Contributor

commented Apr 18, 2019

@sklampfl , Great to hear that things have worked out well on the newer version of Ubuntu for you.

Cheers!

Regards
Amit

@ymodak ymodak added type:bug and removed type:support labels May 2, 2019

@ymodak ymodak assigned haozha111 and unassigned ymodak May 2, 2019

@Oktai15

This comment has been minimized.

Copy link

commented Jul 26, 2019

@amitsrivastava78 @haozha111 I have the same problem on my machine: Debian GNU/Linux 9 (stretch), gcc 4.9.0/gcc 6.3.0 with tensorflow 2.0

@Oktai15

This comment has been minimized.

Copy link

commented Jul 26, 2019

Small code for reproducing:

import numpy as np
import tensorflow as tf

root = tf.train.Checkpoint()
root.f = tf.function(lambda x, y: tf.matmul(x, y))

new_input_data = np.random.randn(2, 100, 100, 3).astype(np.float32)
new_w = np.random.randn(2, 1, 3, 3).astype(np.float32)

input_data = tf.convert_to_tensor(new_input_data)
input_w = tf.convert_to_tensor(new_w)

concrete_func = root.f.get_concrete_function(input_data, input_w)

converter = tf.lite.TFLiteConverter.from_concrete_functions([concrete_func])
tflite_model = converter.convert()

Log:

ConverterError: TOCO failed. See console for info.
2019-07-26 11:45:58.103201: I tensorflow/lite/toco/graph_transformations/graph_transformations.cc:39] Before Removing unused ops: 2 operators, 4 arrays (0 quantized)
2019-07-26 11:45:58.103298: I tensorflow/lite/toco/graph_transformations/graph_transformations.cc:39] Before general graph transformations: 2 operators, 4 arrays (0 quantized)
2019-07-26 11:45:58.120208: I tensorflow/lite/toco/graph_transformations/graph_transformations.cc:39] After general graph transformations pass 1: 410 operators, 823 arrays (0 quantized)
2019-07-26 11:45:58.122977: F tensorflow/lite/toco/graph_transformations/propagate_fixed_sizes.cc:1812] Check failed: axis >= 0 (-1015774856 vs. 0)
Fatal Python error: Aborted

Do we have any workaround now?

@paulbauriegel

This comment has been minimized.

Copy link

commented Jul 31, 2019

@Oktai15 I'm running Ubuntu 19.04 and building from source did solve the problem on my machine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants
You can’t perform that action at this time.