Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TFLite iOS benchmark app doesn't produce consistent result while using GPU delegate #36474

Closed
Richard-Yang-Bose opened this issue Feb 4, 2020 · 5 comments
Assignees
Labels
comp:lite TF Lite related issues TF 1.14 for issues seen with TF 1.14 type:support Support issues

Comments

@Richard-Yang-Bose
Copy link

Richard-Yang-Bose commented Feb 4, 2020

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Mac OS 10.14.6
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: iPhone XR (tested on iOS 13.3, and 12.3.1), iPhone Xs (13.1.2)
  • TensorFlow installed from (source or binary): installed from source.
  • TensorFlow version (use command below): 1.14.0
  • Python version: 3.6
  • Bazel version (if compiling from source): 1.2.1
  • GCC/Compiler version (if compiling from source): 4.2.1

Describe the current behavior
TFLite iOS benchmark app doesn't produce consistent result while using GPU delegate.

Testing with Mobilenet_1.0_224(float) model on iOS benchmark app, while using the default parameter, I'm able to obtain similar performance with the benchmark provided here: (around 14.x ms).

However, while adding the GPU delegate param according to the instruction ("use_gpu" : "1" and "gpu_wait_type" : "aggressive" options were also added to benchmark_params.json), the benchmark app still reports almost the same performance, instead of the 4x faster performance shown in the benchmark.

Describe the expected behavior

Similar performance for GPU delegate with the results provided in the iOS benchmark page.

Code to reproduce the issue
https://github.com/tensorflow/tensorflow/blob/9901f967b11763726ae380273a24ee9b4fdae7f0/tensorflow/lite/tools/benchmark/ios/TFLiteBenchmark/TFLiteBenchmark/benchmark_data/benchmark_params.json
with
"use_gpu" : "1", "gpu_wait_type" : "aggressive" added.

Other info / logs

Min num runs: [20]
Min runs duration (seconds): [1]
Max runs duration (seconds): [150]
Inter-run delay (seconds): [-1]
Num threads: [2]
Benchmark name: [mobile_net_benchmark]
Output prefix: []
Min warmup runs: [1]
Min warmup runs duration (seconds): [0.5]
Graph: [/private/var/containers/Bundle/Application/93C7DE45-ADBA-4E5B-B64B-3F789A357080/TFLiteBenchmark.app/mobilenet_v1_1.0_224.tflite]
Input layers: [input]
Input shapes: [1,224,224,3]
Input value ranges: []
Allow fp16 : [0]
Require full delegation : [0]
Enable op profiling: [0]
Max profiling buffer entries: [1024]
Loaded model /private/var/containers/Bundle/Application/93C7DE45-ADBA-4E5B-B64B-3F789A357080/TFLiteBenchmark.app/mobilenet_v1_1.0_224.tflite
2020-02-04 16:04:27.856557-0500 TFLiteBenchmark[1156:379854] Initialized TensorFlow Lite runtime.
The input model file size (MB): 16.9008
Initialized session in 15.791ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=31 first=79523 curr=14493 min=14015 max=79523 avg=16387.8 std=11545

Running benchmark for at least 20 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=68 first=14721 curr=14936 min=14055 max=15838 avg=14710.9 std=308

Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
Peak memory footprint (MB): init=0 overall=0
@ravikyram ravikyram added TF 1.14 for issues seen with TF 1.14 comp:lite TF Lite related issues type:support Support issues labels Feb 5, 2020
@ravikyram ravikyram assigned ymodak and unassigned ravikyram Feb 5, 2020
@srjoglekar246 srjoglekar246 assigned yyoon and unassigned ymodak Feb 5, 2020
@yyoon
Copy link
Contributor

yyoon commented Feb 10, 2020

Thanks for flagging, and providing your console logs.
It was clear from your logs that the use_gpu and gpu_wait_type parameters weren't being correctly read, and I think I found what's causing this. Should be fixed soon.

@yyoon
Copy link
Contributor

yyoon commented Feb 10, 2020

This should be fixed now. Let me know if you're still seeing the problem.

@Richard-Yang-Bose
Copy link
Author

Looking good now, thanks for the work!

Although still not getting similar benchmark performance(around 8ms for the Mobilenet_1.0_224, instead of 3.4ms), but there's already a significant speed up compare to the cpu.

@yyoon
Copy link
Contributor

yyoon commented Feb 11, 2020

@Richard-Yang-Bose That's a somewhat known issue, which happens when the low-power CPU cores (Note: iPhone Xs CPU has non-symmetric cores) are used for the CPU computation part. When the high-power core is used, you'll see the 3.4ms result. Try adjusting the num_threads value to 2, and run the benchmarks multiple times to see that.

It's unfortunate that there doesn't seem to be any good way to force the high-power cores to be used with TFLite. We might end up updating our benchmark results page to reflect this.

@RichardYang40148
Copy link

Gotcha, thanks for the clarification! This is really helpful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:lite TF Lite related issues TF 1.14 for issues seen with TF 1.14 type:support Support issues
Projects
None yet
Development

No branches or pull requests

5 participants