Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++ API producing incorrect model metaparams #34277

Closed
DocDriven opened this issue Nov 14, 2019 · 29 comments
Closed

C++ API producing incorrect model metaparams #34277

DocDriven opened this issue Nov 14, 2019 · 29 comments
Assignees
Labels
comp:lite TF Lite related issues stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author TF 2.0 Issues relating to TensorFlow 2.0 type:bug Bug

Comments

@DocDriven
Copy link

DocDriven commented Nov 14, 2019

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
  • TensorFlow installed from (source or binary):
  • TensorFlow version (use command below): 2.0 and 1.15
  • Python version:
  • Bazel version (if compiling from source):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version:
  • GPU model and memory:

Describe the current behavior

An autoencoder model consisting only of standard keras Dense layers is converted into a tflite model. This model can be loaded and inspected with the Python API. The output there is consistent with the output from the visualize.py script.

Input detail:  {'name': 'input_1', 'index': 1, 'shape': array([ 1, 90], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0)}
Output detail:  {'name': 'Identity', 'index': 0, 'shape': array([ 1, 90], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0)}

When loading the very same model with the C++ API, I get ridicolous large results for the number of inputs/outputs/nodes.

The C++ functions that were used to inspect the model are:

std::unique_ptr<tflite::Interpreter> interpreter = BuildInterpreter(*model);

LOG(INFO) << "tensors size: " << interpreter->tensors_size() << std::endl;
LOG(INFO) << "nodes size: " << interpreter->nodes_size() << std::endl;
LOG(INFO) << "inputs: " << interpreter->inputs().size() << std::endl;
LOG(INFO) << "input(0) name: " << interpreter->GetInputName(0) << std::endl;
LOG(INFO) << "outputs: " << interpreter->outputs().size() << std::endl;
LOG(INFO) << "output(0) name: " << interpreter->GetOutputName(0) << std::endl;

int t_size = interpreter->tensors_size();
for (int i = 0; i < t_size; i++) {
  LOG(INFO) << i << ": " << interpreter->tensor(i)->name << ", " 
            << interpreter->tensor(i)->bytes << ", "
            << interpreter->tensor(i)->type << ", "
            << interpreter->tensor(i)->params.scale << ", "
            << interpreter->tensor(i)->params.zero_point << std::endl;
}
std::cout << "End of test" << std::endl;

This produces the following output:

tensors size: 21
nodes size: 11936128518282651046
inputs: 25344
input(0) name: Identity
outputs: 18446744073709501604
output(0) name: Identity
0: Identity, 360, 1, 0, 0
1: input_1, 360, 1, 0, 0
2: model/dense/MatMul/ReadVariableOp/transpose, 3600, 9, 0.00187181, 0
3: model/dense/MatMul_bias, 160, 1, 0, 0
4: model/dense/Relu, 160, 1, 0, 0
5: model/dense_1/MatMul/ReadVariableOp/transpose, 1600, 1, 0, 0
6: model/dense_1/MatMul_bias, 40, 1, 0, 0
7: model/dense_1/Relu, 40, 1, 0, 0
8: model/dense_2/MatMul/ReadVariableOp/transpose, 1600, 1, 0, 0
9: model/dense_2/MatMul_bias, 160, 1, 0, 0
10: model/dense_2/Relu, 160, 1, 0, 0
11: model/dense_3/MatMul/ReadVariableOp/transpose, 3600, 9, 0.00208381, 0
12: model/dense_3/MatMul_bias, 360, 1, 0, 0
13: End of test

The code to create the tflite model to inspect can be found on my repo (https://github.com/DocDriven/tflite-cpp-api-tests). All relevant files are named simple_ae.*.

I suspect the C++ API to be broken at some point, as the models seem to be fine. Same results for TF 1.x and TF2.0. Trying different models yields the exact same ridicolous values, independently from their size.

@oanush oanush self-assigned this Nov 15, 2019
@oanush oanush added comp:lite TF Lite related issues TF 2.0 Issues relating to TensorFlow 2.0 type:bug Bug labels Nov 15, 2019
@oanush oanush assigned liyunlu0618 and unassigned oanush Nov 15, 2019
@oanush oanush added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Nov 15, 2019
@DocDriven
Copy link
Author

@liyunlu0618 : Upon further investigation of this problem I noticed that the number of detected nodes, inputs and outputs is independent from the model architecture. Varying the input/output size and/or the number of layers do not affect the output of interpreter->nodes_size(), interpreter->inputs().size() and interpreter->outputs().size(). The number of tensors does change, but the hidden tensors phenomenon remains.

@liyunlu0618 liyunlu0618 assigned jdduke and unassigned liyunlu0618 Nov 16, 2019
@jdduke
Copy link
Member

jdduke commented Nov 16, 2019

This is expected behavior, the tensors() array returned in C++ is the full list of all tensors. You can juse the inputs() indices to inspect/use/query just the inputs, same with outputs().

@jdduke jdduke closed this as completed Nov 16, 2019
@tensorflow-bot
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

@DocDriven
Copy link
Author

@jdduke
Correct me if I am wrong, but I am using the inputs/outputs methods, but these have no reasonable size (1.8e19 outputs and 2.5e4 inputs). It is hard to believe that this is expected behavior for a model with 90 floats as inputs/outputs.
Please reopen the issue.

@jdduke
Copy link
Member

jdduke commented Nov 19, 2019

Ah, sorry, I didn't notice the output values from your first post. Can I ask how you're building the C++ API? Are you building it as a shared library then using it in your own app? Does your model work with our minimal C++ example?

@DocDriven
Copy link
Author

No worries. I am using the devel-py3 docker image to generate the libtensorflowlite.so library. I built it with the extended runtime because not all my used ops were supported by tflite. I had an issue regarding the build here: #33980

I built your minimal.cc example and executed it with my model. Notice that I did not provide inputs/read outputs. Upon inspection, this look reasonable to me:

INFO: Created TensorFlow Lite delegate for select TF ops.
2019-11-20 11:15:09.648738: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
=== Pre-invoke Interpreter State ===
Interpreter has 37 tensors and 14 nodes
Inputs: 16
Outputs: 13

Tensor   0 dense/BiasAdd        kTfLiteFloat32  kTfLiteArenaRw        160 bytes ( 0.0 MB)  1 40
Tensor   1 dense/LeakyRelu      kTfLiteFloat32  kTfLiteArenaRw        160 bytes ( 0.0 MB)  1 40
Tensor   2 dense/MatMul_bias    kTfLiteFloat32   kTfLiteMmapRo        160 bytes ( 0.0 MB)  40
Tensor   3 dense/kernel/transpose kTfLiteInt8   kTfLiteMmapRo       3600 bytes ( 0.0 MB)  40 90
Tensor   4 dense_1/BiasAdd      kTfLiteFloat32  kTfLiteArenaRw         40 bytes ( 0.0 MB)  1 10
Tensor   5 dense_1/MatMul_bias  kTfLiteFloat32   kTfLiteMmapRo         40 bytes ( 0.0 MB)  10
Tensor   6 dense_1/kernel/transpose kTfLiteFloat32   kTfLiteMmapRo       1600 bytes ( 0.0 MB)  10 40
Tensor   7 dense_2/MatMul_bias  kTfLiteFloat32   kTfLiteMmapRo         40 bytes ( 0.0 MB)  10
Tensor   8 dense_2/kernel/transpose kTfLiteFloat32   kTfLiteMmapRo       1600 bytes ( 0.0 MB)  10 40
Tensor   9 dense_3/BiasAdd      kTfLiteFloat32  kTfLiteArenaRw          4 bytes ( 0.0 MB) 
Tensor  10 dense_3/LeakyRelu    kTfLiteFloat32  kTfLiteArenaRw          4 bytes ( 0.0 MB) 
Tensor  11 dense_3/MatMul_bias  kTfLiteFloat32   kTfLiteMmapRo        160 bytes ( 0.0 MB)  40
Tensor  12 dense_3/kernel/transpose kTfLiteFloat32   kTfLiteMmapRo       1600 bytes ( 0.0 MB)  40 10
Tensor  13 dense_4/BiasAdd      kTfLiteFloat32  kTfLiteArenaRw          4 bytes ( 0.0 MB) 
Tensor  14 dense_4/MatMul_bias  kTfLiteFloat32   kTfLiteMmapRo        360 bytes ( 0.0 MB)  90
Tensor  15 dense_4/kernel/transpose kTfLiteInt8   kTfLiteMmapRo       3600 bytes ( 0.0 MB)  90 40
Tensor  16 input_1              kTfLiteFloat32  kTfLiteArenaRw        360 bytes ( 0.0 MB)  1 90
Tensor  17 lambda/Exp           kTfLiteFloat32  kTfLiteArenaRw         40 bytes ( 0.0 MB)  1 10
Tensor  18 lambda/add           kTfLiteFloat32  kTfLiteArenaRw          4 bytes ( 0.0 MB) 
Tensor  19 lambda/mul           kTfLiteFloat32  kTfLiteArenaRw         40 bytes ( 0.0 MB)  1 10
Tensor  20 lambda/mul_1         kTfLiteFloat32  kTfLiteArenaRw          4 bytes ( 0.0 MB) 
Tensor  21 lambda/random_normal kTfLiteFloat32  kTfLiteArenaRw          4 bytes ( 0.0 MB) 
Tensor  22 lambda/random_normal/RandomStandardNormal kTfLiteFloat32  kTfLiteDynamic          4 bytes ( 0.0 MB) 
Tensor  23 lambda/random_normal/mean kTfLiteFloat32   kTfLiteMmapRo          4 bytes ( 0.0 MB) 
Tensor  24 lambda/random_normal/mul kTfLiteFloat32  kTfLiteArenaRw          4 bytes ( 0.0 MB) 
Tensor  25 lambda/random_normal/shape kTfLiteInt32   kTfLiteMmapRo          4 bytes ( 0.0 MB)  1
Tensor  26 lambda/random_normal/stddev kTfLiteFloat32   kTfLiteMmapRo          4 bytes ( 0.0 MB) 
Tensor  27 (null)               kTfLiteNoType  kTfLiteMemNone          0 bytes ( 0.0 MB)  (null)
Tensor  28 (null)               kTfLiteNoType  kTfLiteMemNone          0 bytes ( 0.0 MB)  (null)
Tensor  29 (null)               kTfLiteNoType  kTfLiteMemNone          0 bytes ( 0.0 MB)  (null)
Tensor  30 (null)               kTfLiteNoType  kTfLiteMemNone          0 bytes ( 0.0 MB)  (null)
Tensor  31 (null)               kTfLiteNoType  kTfLiteMemNone          0 bytes ( 0.0 MB)  (null)
Tensor  32 (null)               kTfLiteNoType  kTfLiteMemNone          0 bytes ( 0.0 MB)  (null)
Tensor  33 (null)               kTfLiteNoType  kTfLiteMemNone          0 bytes ( 0.0 MB)  (null)
Tensor  34 (null)               kTfLiteNoType  kTfLiteMemNone          0 bytes ( 0.0 MB)  (null)
Tensor  35 (null)               kTfLiteNoType  kTfLiteMemNone          0 bytes ( 0.0 MB)  (null)
Tensor  36 (null)               kTfLiteNoType  kTfLiteMemNone          0 bytes ( 0.0 MB)  (null)

Node   0 Operator Custom Name FlexRandomStandardNormal
  Inputs: 25
  Outputs: 22
Node   1 Operator Builtin Code   9 FULLY_CONNECTED
  Inputs: 16 3 2
  Outputs: 0
Node   2 Operator Builtin Code  18 MUL
  Inputs: 22 26
  Outputs: 24
Node   3 Operator Builtin Code   0 ADD
  Inputs: 24 23
  Outputs: 21
Node   4 Operator Builtin Code  98 LEAKY_RELU
  Inputs: 0
  Outputs: 1
Node   5 Operator Builtin Code   9 FULLY_CONNECTED
  Inputs: 1 6 5
  Outputs: 4
Node   6 Operator Builtin Code   9 FULLY_CONNECTED
  Inputs: 1 8 7
  Outputs: 19
Node   7 Operator Builtin Code  47 EXP
  Inputs: 19
  Outputs: 17
Node   8 Operator Builtin Code  18 MUL
  Inputs: 17 21
  Outputs: 20
Node   9 Operator Builtin Code   0 ADD
  Inputs: 4 20
  Outputs: 18
Node  10 Operator Builtin Code   9 FULLY_CONNECTED
  Inputs: 18 12 11
  Outputs: 9
Node  11 Operator Builtin Code  98 LEAKY_RELU
  Inputs: 9
  Outputs: 10
Node  12 Operator Builtin Code   9 FULLY_CONNECTED
  Inputs: 10 15 14
  Outputs: 13
Node  13 Operator Custom Name TfLiteFlexDelegate
  Inputs: 25
  Outputs: 22


=== Post-invoke Interpreter State ===
Interpreter has 37 tensors and 14 nodes
Inputs: 16
Outputs: 13

Tensor   0 dense/BiasAdd        kTfLiteFloat32  kTfLiteArenaRw        160 bytes ( 0.0 MB)  1 40
Tensor   1 dense/LeakyRelu      kTfLiteFloat32  kTfLiteArenaRw        160 bytes ( 0.0 MB)  1 40
Tensor   2 dense/MatMul_bias    kTfLiteFloat32   kTfLiteMmapRo        160 bytes ( 0.0 MB)  40
Tensor   3 dense/kernel/transpose kTfLiteInt8   kTfLiteMmapRo       3600 bytes ( 0.0 MB)  40 90
Tensor   4 dense_1/BiasAdd      kTfLiteFloat32  kTfLiteArenaRw         40 bytes ( 0.0 MB)  1 10
Tensor   5 dense_1/MatMul_bias  kTfLiteFloat32   kTfLiteMmapRo         40 bytes ( 0.0 MB)  10
Tensor   6 dense_1/kernel/transpose kTfLiteFloat32   kTfLiteMmapRo       1600 bytes ( 0.0 MB)  10 40
Tensor   7 dense_2/MatMul_bias  kTfLiteFloat32   kTfLiteMmapRo         40 bytes ( 0.0 MB)  10
Tensor   8 dense_2/kernel/transpose kTfLiteFloat32   kTfLiteMmapRo       1600 bytes ( 0.0 MB)  10 40
Tensor   9 dense_3/BiasAdd      kTfLiteFloat32  kTfLiteArenaRw        160 bytes ( 0.0 MB)  1 40
Tensor  10 dense_3/LeakyRelu    kTfLiteFloat32  kTfLiteArenaRw        160 bytes ( 0.0 MB)  1 40
Tensor  11 dense_3/MatMul_bias  kTfLiteFloat32   kTfLiteMmapRo        160 bytes ( 0.0 MB)  40
Tensor  12 dense_3/kernel/transpose kTfLiteFloat32   kTfLiteMmapRo       1600 bytes ( 0.0 MB)  40 10
Tensor  13 dense_4/BiasAdd      kTfLiteFloat32  kTfLiteArenaRw        360 bytes ( 0.0 MB)  1 90
Tensor  14 dense_4/MatMul_bias  kTfLiteFloat32   kTfLiteMmapRo        360 bytes ( 0.0 MB)  90
Tensor  15 dense_4/kernel/transpose kTfLiteInt8   kTfLiteMmapRo       3600 bytes ( 0.0 MB)  90 40
Tensor  16 input_1              kTfLiteFloat32  kTfLiteArenaRw        360 bytes ( 0.0 MB)  1 90
Tensor  17 lambda/Exp           kTfLiteFloat32  kTfLiteArenaRw         40 bytes ( 0.0 MB)  1 10
Tensor  18 lambda/add           kTfLiteFloat32  kTfLiteArenaRw         40 bytes ( 0.0 MB)  1 10
Tensor  19 lambda/mul           kTfLiteFloat32  kTfLiteArenaRw         40 bytes ( 0.0 MB)  1 10
Tensor  20 lambda/mul_1         kTfLiteFloat32  kTfLiteArenaRw         40 bytes ( 0.0 MB)  1 10
Tensor  21 lambda/random_normal kTfLiteFloat32  kTfLiteArenaRw         40 bytes ( 0.0 MB)  10
Tensor  22 lambda/random_normal/RandomStandardNormal kTfLiteFloat32  kTfLiteDynamic         40 bytes ( 0.0 MB)  10
Tensor  23 lambda/random_normal/mean kTfLiteFloat32   kTfLiteMmapRo          4 bytes ( 0.0 MB) 
Tensor  24 lambda/random_normal/mul kTfLiteFloat32  kTfLiteArenaRw         40 bytes ( 0.0 MB)  10
Tensor  25 lambda/random_normal/shape kTfLiteInt32   kTfLiteMmapRo          4 bytes ( 0.0 MB)  1
Tensor  26 lambda/random_normal/stddev kTfLiteFloat32   kTfLiteMmapRo          4 bytes ( 0.0 MB) 
Tensor  27 (null)               kTfLiteInt8  kTfLiteArenaRw         90 bytes ( 0.0 MB)  1 90
Tensor  28 (null)               kTfLiteFloat32  kTfLiteArenaRw          4 bytes ( 0.0 MB)  1
Tensor  29 (null)               kTfLiteNoType  kTfLiteMemNone          0 bytes ( 0.0 MB)  (null)
Tensor  30 (null)               kTfLiteNoType  kTfLiteMemNone          0 bytes ( 0.0 MB)  (null)
Tensor  31 (null)               kTfLiteNoType  kTfLiteMemNone          0 bytes ( 0.0 MB)  (null)
Tensor  32 (null)               kTfLiteNoType  kTfLiteMemNone          0 bytes ( 0.0 MB)  (null)
Tensor  33 (null)               kTfLiteNoType  kTfLiteMemNone          0 bytes ( 0.0 MB)  (null)
Tensor  34 (null)               kTfLiteNoType  kTfLiteMemNone          0 bytes ( 0.0 MB)  (null)
Tensor  35 (null)               kTfLiteInt8  kTfLiteArenaRw         40 bytes ( 0.0 MB)  1 40
Tensor  36 (null)               kTfLiteFloat32  kTfLiteArenaRw          4 bytes ( 0.0 MB)  1

Node   0 Operator Custom Name FlexRandomStandardNormal
  Inputs: 25
  Outputs: 22
Node   1 Operator Builtin Code   9 FULLY_CONNECTED
  Inputs: 16 3 2
  Outputs: 0
Node   2 Operator Builtin Code  18 MUL
  Inputs: 22 26
  Outputs: 24
Node   3 Operator Builtin Code   0 ADD
  Inputs: 24 23
  Outputs: 21
Node   4 Operator Builtin Code  98 LEAKY_RELU
  Inputs: 0
  Outputs: 1
Node   5 Operator Builtin Code   9 FULLY_CONNECTED
  Inputs: 1 6 5
  Outputs: 4
Node   6 Operator Builtin Code   9 FULLY_CONNECTED
  Inputs: 1 8 7
  Outputs: 19
Node   7 Operator Builtin Code  47 EXP
  Inputs: 19
  Outputs: 17
Node   8 Operator Builtin Code  18 MUL
  Inputs: 17 21
  Outputs: 20
Node   9 Operator Builtin Code   0 ADD
  Inputs: 4 20
  Outputs: 18
Node  10 Operator Builtin Code   9 FULLY_CONNECTED
  Inputs: 18 12 11
  Outputs: 9
Node  11 Operator Builtin Code  98 LEAKY_RELU
  Inputs: 9
  Outputs: 10
Node  12 Operator Builtin Code   9 FULLY_CONNECTED
  Inputs: 10 15 14
  Outputs: 13
Node  13 Operator Custom Name TfLiteFlexDelegate
  Inputs: 25
  Outputs: 22

If I can anymore provide information, do not hesitate to leave a message. I want to get this fixed ASAP.

@jdduke
Copy link
Member

jdduke commented Nov 20, 2019

I suspect the issue here may just be to poor C++ ABI stability when using different compile flags or options across shared library boundaries, particularly if you're using a very different pipeline for your client app.

I'd be curious if you see the same issues when using the C API as supported by this shared library using a slightly lower level API ( here is an example). We're seeing more and more issues that users are having trying to use the C++ API over a shared library boundary, and I think we'll probably focus on prioritizing the stable C ABI/API and perhaps offering a lightweight C++ wrapper on top of that for convenience. We definitely appreciate the feedback and want to get this resolved. I'll reopen this issue for tracking purposes.

@jdduke jdduke reopened this Nov 20, 2019
@tensorflowbutler tensorflowbutler removed the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Nov 21, 2019
@DocDriven
Copy link
Author

DocDriven commented Nov 25, 2019

@jdduke
I have tried to use the C API as you suggested. I am going to post all steps that I did, and ask you to confirm their correctness.

1.) I tried to build the extended runtime via
bazel build --config=monolithic --define=with_select_tf_ops=true -c opt //tensorflow/lite/experimental/c:libtensorflowlite_c.so
This worked, but the build only took around 8 seconds. This is way too short to build the extended runtime. Later, it turned out that it indeed was NOT built.

2.) I had to add another two libraries to get my project to compile. The first was absl which I took from here (https://chromium.googlesource.com/external/github.com/abseil/abseil-cpp/+/refs/heads/master/). The other library was googletest, which I had to build with bazel first. I used the command from its root dir:
bazel build //:gtest
This produced both a static and a dynamic library. I did not notice any differences for my app, regardless of which one I was using.

3.) I copy pasted the test code you provided for me and all I did was to change the paths to my tflite model. It is the same model that generated the output above. The build was successful.

However, I get the following, unsatisfying output when running the executable:

[==========] Running 10 tests from 2 test suites.
[----------] Global test environment set-up.
[----------] 1 test from CAPI
[ RUN      ] CAPI.Version
[       OK ] CAPI.Version (0 ms)
[----------] 1 test from CAPI (0 ms total)

[----------] 9 tests from CApiSimple
[ RUN      ] CApiSimple.Smoke
ERROR: Regular TensorFlow ops are not supported by this interpreter. Make sure you apply/link the Flex delegate before inference.
ERROR: Node number 0 (FlexRandomStandardNormal) failed to prepare.

./capi_test.c:30: Failure
Expected equality of these values:
  TfLiteInterpreterAllocateTensors(interpreter)
    Which is: 1
  kTfLiteOk
    Which is: 0
[  FAILED  ] CApiSimple.Smoke (0 ms)
[ RUN      ] CApiSimple.QuantizationParams
ERROR: Regular TensorFlow ops are not supported by this interpreter. Make sure you apply/link the Flex delegate before inference.
ERROR: Node number 0 (FlexRandomStandardNormal) failed to prepare.

./capi_test.c:100: Failure
Expected equality of these values:
  TfLiteInterpreterAllocateTensors(interpreter)
    Which is: 1
  kTfLiteOk
    Which is: 0
[  FAILED  ] CApiSimple.QuantizationParams (0 ms)
[ RUN      ] CApiSimple.Delegate
ERROR: Regular TensorFlow ops are not supported by this interpreter. Make sure you apply/link the Flex delegate before inference.
ERROR: Node number 0 (FlexRandomStandardNormal) failed to prepare.

./capi_test.c:163: Failure
Value of: delegate_prepared
  Actual: false
Expected: true
Speicherzugriffsfehler (Speicherabzug geschrieben)

The last line is German (my mothertongue), but I do not know the exact translation. Basically, my computer tries to access memory addresses that it is not allowed to access (memory access error maybe?) Also, the extended runtime was not built apparently.

Can you deduce from this log where the problem might be coming from? Thanks!

@jdduke
Copy link
Member

jdduke commented Nov 25, 2019

This worked, but the build only took around 8 seconds. This is way too short to build the extended runtime. Later, it turned out that it indeed was NOT built.

We actually deprecated that build flag a while back. Let me see about re-introducing it in a more targeted fashion for the C/C++ shared libraries.

@DocDriven
Copy link
Author

Is there any possibility to build the c library with native operators enabled? Or do I have to roll back to an image that still has the flag?

@jdduke
Copy link
Member

jdduke commented Nov 26, 2019

If you want to try locally, you can roll back e57a567#diff-866c5e896c5bfd544d4e642ed2e3d2bd, and try again with your build command.

@DocDriven
Copy link
Author

DocDriven commented Nov 26, 2019

I was able to build it by rolling back to the parent commit of e57a567. I slightly modified the expected outputs here and there, but failed to work out the correct values for some of them.

The main thing is the retrieval of the outputs which bothers me. For example, take line 67 which should return the ByteSize of the output tensor. I expect this to be 90 * sizeof(float) = 360 byte, but I get 0 bytes.
When inspecting the model with visualize.py, I see that the detected output vector dense_4/BiasAdd has shape None. So maybe something goes wrong during the conversion?

HTML-Visualization
vae.tar.gz

Also, the executable dies at the same point (with a little more information this time):

[==========] Running 10 tests from 2 test suites.
[----------] Global test environment set-up.
[----------] 1 test from CAPI
[ RUN      ] CAPI.Version
[       OK ] CAPI.Version (0 ms)
[----------] 1 test from CAPI (0 ms total)

[----------] 9 tests from CApiSimple
[ RUN      ] CApiSimple.Smoke
INFO: Created TensorFlow Lite delegate for select TF ops.
2019-11-26 14:28:56.579157: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
./capi_test.c:66: Failure
Expected equality of these values:
  TfLiteTensorDim(output_tensor, 0)
    Which is: 0
  2
./capi_test.c:67: Failure
Expected equality of these values:
  TfLiteTensorByteSize(output_tensor)
    Which is: 0
  sizeof(float) * 90
    Which is: 360
./capi_test.c:68: Failure
Expected: (TfLiteTensorData(output_tensor)) != (nullptr), actual: NULL vs (nullptr)
./capi_test.c:79: Failure
Expected equality of these values:
  TfLiteTensorCopyToBuffer(output_tensor, output.data(), output.size() * sizeof(float))
    Which is: 1
  kTfLiteOk
    Which is: 0
[  FAILED  ] CApiSimple.Smoke (11 ms)
[ RUN      ] CApiSimple.QuantizationParams
./capi_test.c:110: Failure
Expected equality of these values:
  input_params.scale
    Which is: 0
  0.003922f
    Which is: 0.003922
./capi_test.c:116: Failure
Expected equality of these values:
  TfLiteTensorCopyFromBuffer(input_tensor, input.data(), input.size() * sizeof(float))
    Which is: 1
  kTfLiteOk
    Which is: 0
[  FAILED  ] CApiSimple.QuantizationParams (1 ms)
[ RUN      ] CApiSimple.Delegate
ERROR: Attempting to use a delegate that only supports static-sized tensors with a graph that has dynamic-sized tensors.
./capi_test.c:163: Failure
Value of: delegate_prepared
  Actual: false
Expected: true
Speicherzugriffsfehler (Speicherabzug geschrieben)

gdb provides a little more information:

Thread 1 "capi_test" received signal SIGSEGV, Segmentation fault.
0x00007ffff1a58151 in TfLiteInterpreterInvoke ()
   from /home/docdriven/projects/custom_op_test/src/libtensorflowlite_c.so

@jdduke
Copy link
Member

jdduke commented Nov 26, 2019

The delegate test failure is more or less expected, though it shouldn't seg fault. In your test, did you explicitly call TfLiteInterpreterAllocateTensors before checking the output tensor data? The output from the minimal example above looks correct, with the output tensor having dimension [1, 90] and non-null bytes allocated.

@DocDriven
Copy link
Author

I do not know if this happens under the hood, but my test is the exact file that you provided. So the whole test case is

TEST(CApiSimple, Smoke) {
  TfLiteModel* model =
      TfLiteModelCreateFromFile("vae_.tflite");
  ASSERT_NE(model, nullptr);

  TfLiteInterpreterOptions* options = TfLiteInterpreterOptionsCreate();
  ASSERT_NE(options, nullptr);
  TfLiteInterpreterOptionsSetNumThreads(options, 2);

  TfLiteInterpreter* interpreter = TfLiteInterpreterCreate(model, options);
  ASSERT_NE(interpreter, nullptr);

  // The options/model can be deleted immediately after interpreter creation.
  TfLiteInterpreterOptionsDelete(options);
  TfLiteModelDelete(model);

  ASSERT_EQ(TfLiteInterpreterAllocateTensors(interpreter), kTfLiteOk);
  ASSERT_EQ(TfLiteInterpreterGetInputTensorCount(interpreter), 1);
  ASSERT_EQ(TfLiteInterpreterGetOutputTensorCount(interpreter), 1);

  std::array<int, 1> input_dims = {2};
  ASSERT_EQ(TfLiteInterpreterResizeInputTensor(
                interpreter, 0, input_dims.data(), input_dims.size()),
            kTfLiteOk);
  ASSERT_EQ(TfLiteInterpreterAllocateTensors(interpreter), kTfLiteOk);

  TfLiteTensor* input_tensor = TfLiteInterpreterGetInputTensor(interpreter, 0);
  ASSERT_NE(input_tensor, nullptr);
  EXPECT_EQ(TfLiteTensorType(input_tensor), kTfLiteFloat32);
  EXPECT_EQ(TfLiteTensorNumDims(input_tensor), 1);
  EXPECT_EQ(TfLiteTensorDim(input_tensor, 0), 2);
  EXPECT_EQ(TfLiteTensorByteSize(input_tensor), sizeof(float) * 2);
  EXPECT_NE(TfLiteTensorData(input_tensor), nullptr);
  EXPECT_STREQ(TfLiteTensorName(input_tensor), "input_1");

  TfLiteQuantizationParams input_params =
      TfLiteTensorQuantizationParams(input_tensor);
  EXPECT_EQ(input_params.scale, 0.f);
  EXPECT_EQ(input_params.zero_point, 0);

  std::array<float, 2> input = {1.f, 3.f};
  ASSERT_EQ(TfLiteTensorCopyFromBuffer(input_tensor, input.data(),
                                       input.size() * sizeof(float)),
            kTfLiteOk);

  ASSERT_EQ(TfLiteInterpreterInvoke(interpreter), kTfLiteOk);

  const TfLiteTensor* output_tensor =
      TfLiteInterpreterGetOutputTensor(interpreter, 0);
  ASSERT_NE(output_tensor, nullptr);
  EXPECT_EQ(TfLiteTensorType(output_tensor), kTfLiteFloat32);
  EXPECT_EQ(TfLiteTensorNumDims(output_tensor), 2);
  EXPECT_EQ(TfLiteTensorDim(output_tensor, 0), 2);
  EXPECT_EQ(TfLiteTensorByteSize(output_tensor), sizeof(float) * 90);
  EXPECT_NE(TfLiteTensorData(output_tensor), nullptr);
  EXPECT_STREQ(TfLiteTensorName(output_tensor), "dense_4/BiasAdd");

  TfLiteQuantizationParams output_params =
      TfLiteTensorQuantizationParams(output_tensor);
  EXPECT_EQ(output_params.scale, 0.f);
  EXPECT_EQ(output_params.zero_point, 0);

  std::array<float, 90> output;
  ASSERT_EQ(TfLiteTensorCopyToBuffer(output_tensor, output.data(),
                                     output.size() * sizeof(float)),
            kTfLiteOk);
  EXPECT_EQ(output[0], 3.f);
  EXPECT_EQ(output[1], 9.f);

  TfLiteInterpreterDelete(interpreter);
}

So it gets called twice. Is this correct, and when exactly is allocation necessary?

@jdduke
Copy link
Member

jdduke commented Nov 26, 2019

That looks correct. Whenever an input tensor (or tensors) is resized, an explicit allocation is required. I'll try to repro locally.

@jdduke
Copy link
Member

jdduke commented Nov 26, 2019

Ah, right, so that resize call is kind of nonsensical for this graph. If you remove

 std::array<int, 1> input_dims = {2};
  ASSERT_EQ(TfLiteInterpreterResizeInputTensor(
                interpreter, 0, input_dims.data(), input_dims.size()),
            kTfLiteOk);
  ASSERT_EQ(TfLiteInterpreterAllocateTensors(interpreter), kTfLiteOk);

from the test, it should proceed. Note that some of the other expectations are bogus for that vae.tflite model, but you should at least get meaningful output shapes. This test passes:

TEST(CApiSimple, Smoke) {
  TfLiteModel* model =
      TfLiteModelCreateFromFile("vae.tflite");
  ASSERT_NE(model, nullptr);

  TfLiteInterpreterOptions* options = TfLiteInterpreterOptionsCreate();
  ASSERT_NE(options, nullptr);
  TfLiteInterpreterOptionsSetNumThreads(options, 2);

  TfLiteInterpreter* interpreter = TfLiteInterpreterCreate(model, options);
  std::unique_ptr<TfLiteInterpreter, void (*)(TfLiteInterpreter*)>
      interpreter_holder(interpreter, &TfLiteInterpreterDelete);
  ASSERT_NE(interpreter, nullptr);

  // The options/model can be deleted immediately after interpreter creation.
  TfLiteInterpreterOptionsDelete(options);
  TfLiteModelDelete(model);

  ASSERT_EQ(TfLiteInterpreterAllocateTensors(interpreter), kTfLiteOk);
  ASSERT_EQ(TfLiteInterpreterGetInputTensorCount(interpreter), 1);
  ASSERT_EQ(TfLiteInterpreterGetOutputTensorCount(interpreter), 1);

  /*
  std::array<int, 1> input_dims = {2};
  ASSERT_EQ(TfLiteInterpreterResizeInputTensor(
                interpreter, 0, input_dims.data(), input_dims.size()),
            kTfLiteOk);
  ASSERT_EQ(TfLiteInterpreterAllocateTensors(interpreter), kTfLiteOk);
  */

  TfLiteTensor* input_tensor = TfLiteInterpreterGetInputTensor(interpreter, 0);
  ASSERT_NE(input_tensor, nullptr);
  EXPECT_EQ(TfLiteTensorType(input_tensor), kTfLiteFloat32);
  EXPECT_EQ(TfLiteTensorNumDims(input_tensor), 2);
  EXPECT_EQ(TfLiteTensorDim(input_tensor, 0), 1);
  EXPECT_EQ(TfLiteTensorDim(input_tensor, 1), 90);
  EXPECT_EQ(TfLiteTensorByteSize(input_tensor), sizeof(float) * 90);
  EXPECT_NE(TfLiteTensorData(input_tensor), nullptr);
  EXPECT_STREQ(TfLiteTensorName(input_tensor), "input_1");

  TfLiteQuantizationParams input_params =
      TfLiteTensorQuantizationParams(input_tensor);
  EXPECT_EQ(input_params.scale, 0.f);
  EXPECT_EQ(input_params.zero_point, 0);

  std::array<float, 90> input = {1.f};
  ASSERT_EQ(TfLiteTensorCopyFromBuffer(input_tensor, input.data(),
                                       input.size() * sizeof(float)),
            kTfLiteOk);

  ASSERT_EQ(TfLiteInterpreterInvoke(interpreter), kTfLiteOk);

  const TfLiteTensor* output_tensor =
      TfLiteInterpreterGetOutputTensor(interpreter, 0);
  ASSERT_NE(output_tensor, nullptr);
  EXPECT_EQ(TfLiteTensorType(output_tensor), kTfLiteFloat32);
  EXPECT_EQ(TfLiteTensorNumDims(output_tensor), 2);
  EXPECT_EQ(TfLiteTensorDim(output_tensor, 0), 1);
  EXPECT_EQ(TfLiteTensorDim(output_tensor, 1), 90);
  EXPECT_EQ(TfLiteTensorByteSize(output_tensor), sizeof(float) * 90);
  EXPECT_NE(TfLiteTensorData(output_tensor), nullptr);
  EXPECT_STREQ(TfLiteTensorName(output_tensor), "dense_4/BiasAdd");

  TfLiteQuantizationParams output_params =
      TfLiteTensorQuantizationParams(output_tensor);
  EXPECT_EQ(output_params.scale, 0.f);
  EXPECT_EQ(output_params.zero_point, 0);

  std::array<float, 90> output;
  ASSERT_EQ(TfLiteTensorCopyToBuffer(output_tensor, output.data(),
                                     output.size() * sizeof(float)),
            kTfLiteOk);
}

@jdduke
Copy link
Member

jdduke commented Nov 26, 2019

Just one other questions, do you really need the random normal generation in your inference graph? We've seen that sometimes these random ops are used (and useful) with training, but they then get carried forward into the inference graph, without adding as much value.

@DocDriven
Copy link
Author

I am tasked with the training and deployment of a variational autoencoder. The model tries to learn the params of a distribution as its encoded representation. To generate samples for decoding, samples have to be drawn from said distribution. This is emulated by utilizingbthe so-called reparametrization trick, whichbrequires to generate random numbers from a {0,1} normal distribution.

As far as I can tell, using random numbers is inevitable.

BUT: If you know of another way to do this, I would happily drop the random part.

@jdduke
Copy link
Member

jdduke commented Nov 27, 2019

I see, good to know. I think we've seen enough use-cases for the RandomStandardNormal op that it's probably time to make it a proper builtin. In the meantime, can I ask what platform you'll be deploying to?

@DocDriven
Copy link
Author

Your code snippet works for me, so thanks for that. I was also able to find out what causes the SegFault. In the Delegate test, I first added the interpreter holder from your example with the SegFault still occuring. Turns out, when adding ASSERT_NE(interpreter, nullptr) afterwards, it works even though this exact assertion fails.

TEST(CApiSimple, Delegate) {
  TfLiteModel* model =
      TfLiteModelCreateFromFile("vae_.tflite");

  // Create and install a delegate instance.
  bool delegate_prepared = false;
  TfLiteDelegate delegate = TfLiteDelegateCreate();
  delegate.data_ = &delegate_prepared;
  delegate.Prepare = [](TfLiteContext* context, TfLiteDelegate* delegate) {
    *static_cast<bool*>(delegate->data_) = true;
    return kTfLiteOk;
  };
  TfLiteInterpreterOptions* options = TfLiteInterpreterOptionsCreate();
  TfLiteInterpreterOptionsAddDelegate(options, &delegate);
  TfLiteInterpreter* interpreter = TfLiteInterpreterCreate(model, options);
  std::unique_ptr<TfLiteInterpreter, void (*)(TfLiteInterpreter*)>
      interpreter_holder(interpreter, &TfLiteInterpreterDelete);
  ASSERT_NE(interpreter, nullptr);

  // The delegate should have been applied.
  EXPECT_TRUE(delegate_prepared);

  // Subsequent exectuion should behave properly (the delegate is a no-op).
  TfLiteInterpreterOptionsDelete(options);
  TfLiteModelDelete(model);

  ASSERT_EQ(TfLiteInterpreterAllocateTensors(interpreter), kTfLiteOk);
  EXPECT_EQ(TfLiteInterpreterInvoke(interpreter), kTfLiteOk);
}

I get the following output now:

[==========] Running 10 tests from 2 test suites.
[----------] Global test environment set-up.
[----------] 1 test from CAPI
[ RUN      ] CAPI.Version
[       OK ] CAPI.Version (0 ms)
[----------] 1 test from CAPI (0 ms total)

[----------] 9 tests from CApiSimple
[ RUN      ] CApiSimple.Smoke
INFO: Created TensorFlow Lite delegate for select TF ops.
2019-11-27 10:57:43.757941: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
[       OK ] CApiSimple.Smoke (12 ms)
[ RUN      ] CApiSimple.QuantizationParams
[       OK ] CApiSimple.QuantizationParams (1 ms)
[ RUN      ] CApiSimple.Delegate
ERROR: Attempting to use a delegate that only supports static-sized tensors with a graph that has dynamic-sized tensors.
./capi_test.c:153: Failure
Expected: (interpreter) != (nullptr), actual: NULL vs (nullptr)
[  FAILED  ] CApiSimple.Delegate (1 ms)
[ RUN      ] CApiSimple.DelegateFails
ERROR: Attempting to use a delegate that only supports static-sized tensors with a graph that has dynamic-sized tensors.
[       OK ] CApiSimple.DelegateFails (1 ms)
[ RUN      ] CApiSimple.ErrorReporter
Invoke called on model that is not ready.[       OK ] CApiSimple.ErrorReporter (5 ms)
[ RUN      ] CApiSimple.ValidModel
[       OK ] CApiSimple.ValidModel (0 ms)
[ RUN      ] CApiSimple.ValidModelFromFile
[       OK ] CApiSimple.ValidModelFromFile (0 ms)
[ RUN      ] CApiSimple.InvalidModel
ERROR: The model is not a valid Flatbuffer buffer
[       OK ] CApiSimple.InvalidModel (0 ms)
[ RUN      ] CApiSimple.InvalidModelFromFile
ERROR: Could not open 'x.tflite'.
ERROR: The model is not a valid Flatbuffer file
[       OK ] CApiSimple.InvalidModelFromFile (0 ms)
[----------] 9 tests from CApiSimple (20 ms total)

[----------] Global test environment tear-down
[==========] 10 tests from 2 test suites ran. (20 ms total)
[  PASSED  ] 9 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] CApiSimple.Delegate

 1 FAILED TEST

I am not sure how this impacts the usage of my model, but I will report back if I encounter any anomalies.

Also, I'm planning on deploying this on ARM64 and x86_64 systems, which both run some kind of UNIX-like OS. I am glad to hear that you consider implementing this. I have created a feature request a while back that addresses the issue: #33341

If I can provide any help for this task, do not hesitate to message me.

@DocDriven
Copy link
Author

DocDriven commented Nov 29, 2019

@jdduke
Speaking of building this library for the ARM64 architecture (for testing this issue on the other platform), do you - by any chance - know which flag I have to use? I am struggling a bit to find documentation on the available flags. I have found some flags under tensorflow/lite/build_def.bzl that look like this:

def tflite_copts():
    """Defines compile time flags."""
    copts = [
        "-DFARMHASH_NO_CXX_STRING",
    ] + select({
        str(Label("//tensorflow:android_arm64")): [
            "-O3",
        ],
        str(Label("//tensorflow:android_arm")): [
            "-mfpu=neon",
            "-O3",
        ],
        str(Label("//tensorflow:ios_x86_64")): [
            "-msse4.1",
        ],
        str(Label("//tensorflow:windows")): [
            "/DTFL_COMPILE_LIBRARY",
            "/wd4018",  # -Wno-sign-compare
        ],
        "//conditions:default": [
            "-Wno-sign-compare",
        ],
    })

    return copts

Unfortunately, I do not know if this the correct approach nor do I know if ARM is supported. It states android_arm64 is supported, but I doubt this is what I am looking for.

EDIT:
I have found several flags for ARM under tensorflow/lite/kernels/BUILD which I think are also used by the experimental build for C. The flags do not work, however, e.g. when invoking the bazel build command with --config=arm64, I get a message like this:

INFO: Options provided by the client:
  Inherited 'common' options: --isatty=1 --terminal_columns=146
INFO: Reading rc options for 'build' from /tensorflow_src/.bazelrc:
  'build' options: --apple_platform_type=macos --define framework_shared_object=true --define open_source_build=true --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone --strategy=Genrule=standalone -c opt --cxxopt=-std=c++14 --host_cxxopt=-std=c++14 --announce_rc --define=grpc_no_ares=true --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include
INFO: Reading rc options for 'build' from /tensorflow_src/.tf_configure.bazelrc:
  'build' options: --action_env PYTHON_BIN_PATH=/usr/local/bin/python --action_env PYTHON_LIB_PATH=/usr/lib/python3/dist-packages --python_path=/usr/local/bin/python --config=xla --action_env TF_CONFIGURE_IOS=0
INFO: Found applicable config definition build:xla in file /tensorflow_src/.tf_configure.bazelrc: --define with_xla_support=true
ERROR: Config value arm64 is not defined in any .rc file

Any help is appreciated. Thanks!

@jaeyoo
Copy link
Member

jaeyoo commented Dec 2, 2019

@DocDriven Hi I am Jae from TFLite team and I want to help you. To follow the long threads, I want to list up the issues. please feel free to correct me if i am wrong.

  • compilation issue on C API is fixed (1e-9 number of output nodes, etc.) -> fixed
  • tf.Multinomial / tf.RandomStandardNormal are required in TFLite side. -> i will work on it.
  • build issue on arm64.
    I didn't yet reproduce it, but could you test with --config=android_arm64 ?

@DocDriven
Copy link
Author

DocDriven commented Dec 2, 2019

Hi @jaeyoo , your list is correct. Concerning issue 2, this would eliminate my need for a working library with the with_select_tf_ops enabled. Alternatively, you could consider re-introducing the corresponding flag for this mode of operation.

Also, I tested the flag you suggested. I am on commit e57a567~1 and had to downgrade my bazel installation to 0.26.1. Without the --config=android_arm64 it builds successfully. With the flag, I get an error (even on master with a devel image I pulled today):

root@534edaa88c8c:/tensorflow_src# bazel build --config=monolithic --config=android_arm64 --define=with_select_tf_ops=true -c opt //tensorflow/lite/experimental/c:libtensorflowlite_c.so
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=1 --terminal_columns=204
INFO: Reading rc options for 'build' from /tensorflow_src/.bazelrc:
  'build' options: --apple_platform_type=macos --define framework_shared_object=true --define open_source_build=true --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone --strategy=Genrule=standalone -c opt --cxxopt=-std=c++14 --host_cxxopt=-std=c++14 --announce_rc --define=grpc_no_ares=true --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include
INFO: Reading rc options for 'build' from /tensorflow_src/.tf_configure.bazelrc:
  'build' options: --action_env PYTHON_BIN_PATH=/usr/local/bin/python --action_env PYTHON_LIB_PATH=/usr/local/lib/python3.6/dist-packages --python_path=/usr/local/bin/python --config=xla --action_env TF_CONFIGURE_IOS=0
INFO: Found applicable config definition build:xla in file /tensorflow_src/.tf_configure.bazelrc: --define with_xla_support=true
INFO: Found applicable config definition build:monolithic in file /tensorflow_src/.bazelrc: --define framework_shared_object=false
INFO: Found applicable config definition build:android_arm64 in file /tensorflow_src/.bazelrc: --config=android --cpu=arm64-v8a --fat_apk_cpu=arm64-v8a
INFO: Found applicable config definition build:android in file /tensorflow_src/.bazelrc: --crosstool_top=//external:android/crosstool --host_crosstool_top=@bazel_tools//tools/cpp:toolchain
INFO: Build options --cpu, --crosstool_top, --fat_apk_cpu, and 1 more have changed, discarding analysis cache.
ERROR: /root/.cache/bazel/_bazel_root/43801f1e35f242fb634ebbc6079cf6c5/external/local_config_cc/BUILD:46:1: in cc_toolchain_suite rule @local_config_cc//:toolchain: cc_toolchain_suite '@local_config_cc//:toolchain' does not contain a toolchain for cpu 'arm64-v8a'
ERROR: Analysis of target '//tensorflow/lite/experimental/c:libtensorflowlite_c.so' failed; build aborted: Analysis of target '@local_config_cc//:toolchain' failed; build aborted
INFO: Elapsed time: 0.983s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (1 packages loaded, 2773 targets configured)
    currently loading: tensorflow/cc/saved_model

It seems to be a valid option, but it is missing the toolchain.

@jdduke
Copy link
Member

jdduke commented Dec 2, 2019

android_arm64 should only be used if you're actually building for Android (and you'll need to run the configure script from your root checkout to point bazel to your Android SDK/NDK). If that's what you want to test on, I think you just need to run that configure script.

We haven't done much validation of cross-compilation with bazel for generic aarch64 internally, but I can file an internal ticket for tracking. There's a similar thread on #34520 for validating generic arm64 builds, which might be useful.

@DocDriven
Copy link
Author

I am not building for Android, just a custom board with an ARM64 processor.

I was trying to follow one tutorial referenced in the thread you mentioned (https://github.com/xifengcun/tensorflow-aarch64-crossbuild). I decided to not use the chroot/debootstrap approach and applied the steps to the build container within docker.

It seems that the tutorial was intended for an older release of bazel, as some attributes within tensorflow_src/tools/aarch64_compiler/BUILD seem to be outdated. I tried the command

bazel build --config=monolithic --cpu=aarch64 --define=with_select_tf_ops=true -c opt //tensorflow/lite/experimental/c:libtensorflowlite_c.so --host_crosstool_top=@bazel_tools//tools/cpp:toolchain --crosstool_top=//tools/aarch64_compiler:toolchain --verbose_failures

and got the output

INFO: Options provided by the client:
  Inherited 'common' options: --isatty=1 --terminal_columns=204
INFO: Reading rc options for 'build' from /tensorflow_src/.bazelrc:
  'build' options: --apple_platform_type=macos --define framework_shared_object=true --define open_source_build=true --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone --strategy=Genrule=standalone -c opt --cxxopt=-std=c++14 --host_cxxopt=-std=c++14 --announce_rc --define=grpc_no_ares=true --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include
INFO: Reading rc options for 'build' from /tensorflow_src/.tf_configure.bazelrc:
  'build' options: --action_env PYTHON_BIN_PATH=/usr/local/bin/python --action_env PYTHON_LIB_PATH=/usr/lib/python3/dist-packages --python_path=/usr/local/bin/python --config=xla --action_env TF_CONFIGURE_IOS=0
INFO: Found applicable config definition build:xla in file /tensorflow_src/.tf_configure.bazelrc: --define with_xla_support=true
INFO: Found applicable config definition build:monolithic in file /tensorflow_src/.bazelrc: --define framework_shared_object=false
ERROR: /tensorflow_src/tools/aarch64_compiler/BUILD:15:1: //tools/aarch64_compiler:gcc-linux-aarch64: no such attribute 'dynamic_runtime_libs' in 'cc_toolchain' rule
ERROR: /tensorflow_src/tools/aarch64_compiler/BUILD:15:1: //tools/aarch64_compiler:gcc-linux-aarch64: no such attribute 'static_runtime_libs' in 'cc_toolchain' rule
ERROR: /tensorflow_src/tools/aarch64_compiler/BUILD:15:1: //tools/aarch64_compiler:gcc-linux-aarch64: missing value for mandatory attribute 'toolchain_config' in 'cc_toolchain' rule
ERROR: /tensorflow_src/tools/aarch64_compiler/BUILD:15:1: Target '//tools/aarch64_compiler:empty' contains an error and its package is in error and referenced by '//tools/aarch64_compiler:gcc-linux-aarch64'
ERROR: /tensorflow_src/tools/aarch64_compiler/BUILD:3:1: Target '//tools/aarch64_compiler:gcc-linux-aarch64' contains an error and its package is in error and referenced by '//tools/aarch64_compiler:toolchain'
ERROR: /tensorflow_src/tensorflow/lite/experimental/c/BUILD:20:1: every rule of type cc_binary implicitly depends upon the target '//tools/aarch64_compiler:toolchain', but this target could not be found because of: Target '//tools/aarch64_compiler:toolchain' contains an error and its package is in error
ERROR: Analysis of target '//tensorflow/lite/experimental/c:libtensorflowlite_c.so' failed; build aborted: Analysis failed
INFO: Elapsed time: 0.249s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (1 packages loaded, 4 targets configured)
    currently loading: tensorflow/lite ... (2 packages)

So I simply removed the presumably deprecated(?) attributes dynamic_runtime_libs and static_runtime_libs, and added the mandatory attribute toolchain_config = ':empty'.

What I ended up with are the following errors:

INFO: Options provided by the client:
  Inherited 'common' options: --isatty=1 --terminal_columns=204
INFO: Reading rc options for 'build' from /tensorflow_src/.bazelrc:
  'build' options: --apple_platform_type=macos --define framework_shared_object=true --define open_source_build=true --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone --strategy=Genrule=standalone -c opt --cxxopt=-std=c++14 --host_cxxopt=-std=c++14 --announce_rc --define=grpc_no_ares=true --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include
INFO: Reading rc options for 'build' from /tensorflow_src/.tf_configure.bazelrc:
  'build' options: --action_env PYTHON_BIN_PATH=/usr/local/bin/python --action_env PYTHON_LIB_PATH=/usr/lib/python3/dist-packages --python_path=/usr/local/bin/python --config=xla --action_env TF_CONFIGURE_IOS=0
INFO: Found applicable config definition build:xla in file /tensorflow_src/.tf_configure.bazelrc: --define with_xla_support=true
INFO: Found applicable config definition build:monolithic in file /tensorflow_src/.bazelrc: --define framework_shared_object=false
ERROR: /tensorflow_src/tools/aarch64_compiler/BUILD:25:22: in toolchain_config attribute of cc_toolchain rule //tools/aarch64_compiler:gcc-linux-aarch64: '//tools/aarch64_compiler:empty' does not have mandatory providers: 'CcToolchainConfigInfo'
ERROR: Analysis of target '//tensorflow/lite/experimental/c:libtensorflowlite_c.so' failed; build aborted: Analysis of target '//tools/aarch64_compiler:gcc-linux-aarch64' failed; build aborted
INFO: Elapsed time: 0.126s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (2 packages loaded, 3 targets configured)
    currently loading: tensorflow/lite/kernels

I am afraid that I do not have a clue how to deal with the mandatory provider: 'CcToolchainConfigInfo' error. Maybe somebody could take a look at it?

I think it should be able to work inside a Docker container, also it would be easier to ship than using a chroot environment.

@jaeyoo jaeyoo self-assigned this Dec 9, 2019
@geetachavan1 geetachavan1 added this to In progress in TensorFlow 2.2.0 Feb 6, 2020
tensorflow-copybara pushed a commit that referenced this issue May 30, 2020
This remains useful for testing and development. Restore the ability
to inject support for TF ops in TFLite using `--define=with_select_tf_ops=true`.

See also issue #34277.

PiperOrigin-RevId: 313873470
Change-Id: I6b68cd863efc17f5ae0667c0d2c9d68958d6e4ad
tensorflow-copybara pushed a commit that referenced this issue May 30, 2020
This remains useful for testing and development. Restore the ability
to inject support for TF ops in TFLite using `--define=with_select_tf_ops=true`.

See also issue #34277.

PiperOrigin-RevId: 313887137
Change-Id: Ia7c737b76705d5718895311c9694ffd91164040b
@saikumarchalla
Copy link

@DocDriven Could you please check in latest TF version and let us know the update. Thanks!

@saikumarchalla saikumarchalla added the stat:awaiting response Status - Awaiting response from author label Jun 2, 2021
@google-ml-butler
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you.

@google-ml-butler google-ml-butler bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Jun 9, 2021
@jaeyoo jaeyoo removed their assignment Jun 10, 2021
@google-ml-butler
Copy link

Closing as stale. Please reopen if you'd like to work on this further.

TensorFlow 2.2.0 automation moved this from In progress to Done Jun 17, 2021
@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:lite TF Lite related issues stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author TF 2.0 Issues relating to TensorFlow 2.0 type:bug Bug
Projects
Development

No branches or pull requests

7 participants