Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tflite runs much slower than tfmobile ... #21787

Closed
jiarenyf opened this issue Aug 22, 2018 · 23 comments
Closed

tflite runs much slower than tfmobile ... #21787

jiarenyf opened this issue Aug 22, 2018 · 23 comments
Assignees
Labels

Comments

@jiarenyf
Copy link

@jiarenyf jiarenyf commented Aug 22, 2018

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu14.04
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: Xiaomi 8
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): 1.10
  • Python version:
  • Bazel version (if compiling from source):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version: 9.0 / 7.1
  • GPU model and memory:
  • Exact command to reproduce:

Describe the problem

I test performance of tf-mobile, tf-lite, tf-mobile-int8, tf-lite-int8 on android, and I find that the speed of tf-lite is much slower than tf-mobile.

  1. I use freeze_graph to generate A.pb file from checkpoint for testing tf-mobile performance.

  2. I use toco_convert to convert A.pb file to A.tflite file for for testing tf-lite performance.

  3. I use transform_graph to get quantitative AQ.pb file from A.pb file for testing tf-mobile int8 performance.

  4. I train a model with the same architecture by adding the line tf.contrib.quantize.create_training_graph() and get the checkpoint file. Then I replace the line with tf.contrib.quantize.create_eval_graph() to generate the A.pbtxt file, and use checkpoint file and A.pbtxt file to get A8.pb with fake quantization nodes. Finally, I use toco_convert to get the A8.tflite file.

  5. I test the performance with these 4 files on android, each runs several times for inference on the same image, and the result is listed below:

tf-mobile: 357ms per image
tf-mobile int8: 356ms per image
tf-lite: 844ms per image
tf-lite int8; 571ms per image

I wonder why tf-lite is much slower than tf-mobile.

PS: the model architecture only contains: CONV+BN+RELU, RESHAPE, FULLY-CONTECT ops.

The features shape from CONV+BN+RELU is [B,T,C], then I reshape it to [-1,C] and go on to the fc layer, then reshape the out with shape [B*T,K] to [B,T,K], which is the final result I expected.

I wonder is the reshape op the brings the worse performance ?

Thank you very much ...

@jiarenyf

This comment has been minimized.

Copy link
Author

@jiarenyf jiarenyf commented Aug 22, 2018

@aselle Could you please take a look ?

@shashishekhar shashishekhar self-assigned this Aug 22, 2018
@shashishekhar

This comment has been minimized.

Copy link
Contributor

@shashishekhar shashishekhar commented Aug 22, 2018

@jiarenyf which version of TFLite are you using, can you use the benchmark tool to profile. If you are using an older version of TFLite then there was a severe regression in Conv: issue #15554 which has been fixed. TFLite performance on most models is better than tf-mobile.
Also can you include a model with the bug, I can help in debugging the performance issues.

@jiarenyf

This comment has been minimized.

Copy link
Author

@jiarenyf jiarenyf commented Aug 23, 2018

@shashishekhar

I use org.tensorflow:tensorflow-lite:1.10.0

I can share you the 4 files I mentioned above, all are generated with random variables.

https://share.weiyun.com/5FXn1pe

Thank you.

@shashishekhar

This comment has been minimized.

Copy link
Contributor

@shashishekhar shashishekhar commented Aug 27, 2018

@jiarenyf : Thanks, I ran a few preliminary benchmarks using the benchmark tool, on Pixel 2 TFLite
cnn+ctc8.tflite was
num_threads= 1, average: 258.86 ms
num_threads = 4: average: 105 ms.

I compiled with arm64, how are you benchmarking, are you building arm64 for TFLite?.

Looking at the numbers for floating point, looks like the Fully connected op takes some time, I doubt reshape op is slower.
One other thing to note, Tensorflow mobile benchmark by default uses multiple threads, whereas TFLite uses a single thread. Make sure you are explicitly passing the number of threads for both benchmarks.

@jiarenyf

This comment has been minimized.

Copy link
Author

@jiarenyf jiarenyf commented Aug 30, 2018

@shashishekhar Thanks.

I use the benchmark tool to test performance tf-mobile and tf-lite on desktop. And the performance of the tf-mobile and tf-lite are recoded in benchmark_on_mobile.txt and benchmark_on_lite.txt.

../../benchmark_model \
  --graph='./cnn+ctc.pb' \
  --input_layer='images_placeholder' \
  --input_layer_shape='1,48,480,1' \
  --input_layer_type='float' \
  --output_layer='Reshape_1' \
  --num_threads=4

benchmark_on_mobile.txt

../../benchmark_model_tflite \
  --graph='./cnn+ctc.tflite' \
  --input_layer='images_placeholder' \
  --input_layer_shape='1,48,480,1' \
  --num_threads=4

benchmark_on_lite.txt

I wonder:

  1. In the benchmark_on_lite.txt there is 8 conv2d + 1 DEPTHWISE_CONV_2D, while in benchmark_on_mobile.txt there is 9 conv2d. I use the pb file to generate the tflite file, so why the network architecture is different ?

  2. In the benchmark_on_mobile.txt, Conv2D takes 305ms and MatMul takes 36ms, while in the benchmark_on_tflite.txt, Conv2D takes 377ms and fully-connected takes 382ms. Why the time of fc-layer is different ?

All the files are provided here: All the files: *.txt, *.pb, *.tflie

Thank you.

@shashishekhar

This comment has been minimized.

Copy link
Contributor

@shashishekhar shashishekhar commented Aug 31, 2018

@jiarenyf : Thank you for the report, I am looking into the differences.
Note: for desktop TFLite doesn't have optimized kernels but for mobile there are neon optimized kernels.
Also since the graph is converted, the operations are not entirely the same.

I am looking into why there maybe a difference between performance difference here and TFLite is accidentally taking a slow path.

@jiarenyf

This comment has been minimized.

Copy link
Author

@jiarenyf jiarenyf commented Sep 13, 2018

@shashishekhar Do you have any idea about the performance difference between tf-mobile and tf-lite ?

@shashishekhar

This comment has been minimized.

Copy link
Contributor

@shashishekhar shashishekhar commented Sep 14, 2018

@jiarenyf : Sorry I didn't get time to investigate this, will try to update the bug sometime in coming weeks.

@tensorflow tensorflow deleted a comment from theoamaya Sep 14, 2018
@jazzystring1

This comment has been minimized.

Copy link

@jazzystring1 jazzystring1 commented Oct 14, 2018

It also happens to me. Using a float model, the inference time in tfmobile is 50-60ms while in tensorfloe lite, it reaches 80-110ms.

@jdduke

This comment has been minimized.

Copy link
Member

@jdduke jdduke commented Nov 16, 2018

What is the exact command you're using to build benchmark_model_tflite? Be sure to include -c opt --config=android_armt64 in your build command.

@andrehentz

This comment has been minimized.

Copy link
Contributor

@andrehentz andrehentz commented Nov 26, 2018

I'm able to reproduce the issue. We will track this internally and report back after we investigate further.

@jiarenyf

This comment has been minimized.

Copy link
Author

@jiarenyf jiarenyf commented Dec 17, 2018

@andrehentz

Hey, do you have any progress with the issue ?

@jdduke

This comment has been minimized.

Copy link
Member

@jdduke jdduke commented Dec 18, 2018

We have an internal fix which dramatically improves performance (~2-3x), however, we're still running it through some accuracy/validation testing. It's unlikely the fix will land in the next week or two (due to holidays), but expect something concrete in early January.

@jazzystring1

This comment has been minimized.

Copy link

@jazzystring1 jazzystring1 commented Dec 18, 2018

@jdduke Sounds good! Looking forward for the fix :)

@filipvg68

This comment has been minimized.

Copy link

@filipvg68 filipvg68 commented Jan 6, 2019

@jdduke Any idea on timings the fix? Thanks!

@defaultUser3214

This comment has been minimized.

Copy link

@defaultUser3214 defaultUser3214 commented Jan 22, 2019

@jdduke Is the fix already available in the tflite-nightle:0.0 version?

@jdduke

This comment has been minimized.

Copy link
Member

@jdduke jdduke commented Jan 23, 2019

It's not quite there, expect an update in the next week or two. Thanks for your patience.

@jdduke

This comment has been minimized.

Copy link
Member

@jdduke jdduke commented Feb 7, 2019

We're in the process of upstreaming the fix to Eigen, stay tuned.

@doudasek

This comment has been minimized.

Copy link

@doudasek doudasek commented Feb 14, 2019

@jdduke Thanks for the fix, waiting for it. I am on the 'edge' of releasing app and faster inference would be really helpful. But I cannot wait week or more so I need to decide. I just wanted to ask if you have any further information, when the new release will be public? Thank you very much!

@lenaevans

This comment has been minimized.

Copy link

@lenaevans lenaevans commented Feb 26, 2019

@jdduke Are there any updates on the fix? Thank you!

@jdduke

This comment has been minimized.

Copy link
Member

@jdduke jdduke commented Mar 8, 2019

Apologies for the delay, I've been out for the past several weeks. Commit 161c500 should improve performance for TFLite's fully_connected operator, bringing it in line with TensorFlow(mobile).

@stakemura

This comment has been minimized.

Copy link

@stakemura stakemura commented Mar 15, 2019

As for TRANSPOSE_CONV ops, TFLite is still much slower than TFMobile unfortunately.
Would you like to check my benchmark report #26736 ?

@jdduke

This comment has been minimized.

Copy link
Member

@jdduke jdduke commented Mar 19, 2019

Thanks for the report, we'll take a look on that issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.