TFLite iOS benchmark app doesn't produce consistent result while using GPU delegate #36474

Richard-Yang-Bose · 2020-02-04T21:07:09Z

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Mac OS 10.14.6
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: iPhone XR (tested on iOS 13.3, and 12.3.1), iPhone Xs (13.1.2)
TensorFlow installed from (source or binary): installed from source.
TensorFlow version (use command below): 1.14.0
Python version: 3.6
Bazel version (if compiling from source): 1.2.1
GCC/Compiler version (if compiling from source): 4.2.1

Describe the current behavior
TFLite iOS benchmark app doesn't produce consistent result while using GPU delegate.

Testing with Mobilenet_1.0_224(float) model on iOS benchmark app, while using the default parameter, I'm able to obtain similar performance with the benchmark provided here: (around 14.x ms).

However, while adding the GPU delegate param according to the instruction ("use_gpu" : "1" and "gpu_wait_type" : "aggressive" options were also added to benchmark_params.json), the benchmark app still reports almost the same performance, instead of the 4x faster performance shown in the benchmark.

Describe the expected behavior

Similar performance for GPU delegate with the results provided in the iOS benchmark page.

Code to reproduce the issue
https://github.com/tensorflow/tensorflow/blob/9901f967b11763726ae380273a24ee9b4fdae7f0/tensorflow/lite/tools/benchmark/ios/TFLiteBenchmark/TFLiteBenchmark/benchmark_data/benchmark_params.json
with
"use_gpu" : "1", "gpu_wait_type" : "aggressive" added.

Other info / logs

Min num runs: [20]
Min runs duration (seconds): [1]
Max runs duration (seconds): [150]
Inter-run delay (seconds): [-1]
Num threads: [2]
Benchmark name: [mobile_net_benchmark]
Output prefix: []
Min warmup runs: [1]
Min warmup runs duration (seconds): [0.5]
Graph: [/private/var/containers/Bundle/Application/93C7DE45-ADBA-4E5B-B64B-3F789A357080/TFLiteBenchmark.app/mobilenet_v1_1.0_224.tflite]
Input layers: [input]
Input shapes: [1,224,224,3]
Input value ranges: []
Allow fp16 : [0]
Require full delegation : [0]
Enable op profiling: [0]
Max profiling buffer entries: [1024]
Loaded model /private/var/containers/Bundle/Application/93C7DE45-ADBA-4E5B-B64B-3F789A357080/TFLiteBenchmark.app/mobilenet_v1_1.0_224.tflite
2020-02-04 16:04:27.856557-0500 TFLiteBenchmark[1156:379854] Initialized TensorFlow Lite runtime.
The input model file size (MB): 16.9008
Initialized session in 15.791ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=31 first=79523 curr=14493 min=14015 max=79523 avg=16387.8 std=11545

Running benchmark for at least 20 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=68 first=14721 curr=14936 min=14055 max=15838 avg=14710.9 std=308

Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
Peak memory footprint (MB): init=0 overall=0

The text was updated successfully, but these errors were encountered:

yyoon · 2020-02-10T08:04:42Z

Thanks for flagging, and providing your console logs.
It was clear from your logs that the use_gpu and gpu_wait_type parameters weren't being correctly read, and I think I found what's causing this. Should be fixed soon.

yyoon · 2020-02-10T15:07:46Z

This should be fixed now. Let me know if you're still seeing the problem.

Richard-Yang-Bose · 2020-02-10T17:21:33Z

Looking good now, thanks for the work!

Although still not getting similar benchmark performance(around 8ms for the Mobilenet_1.0_224, instead of 3.4ms), but there's already a significant speed up compare to the cpu.

yyoon · 2020-02-11T07:37:59Z

@Richard-Yang-Bose That's a somewhat known issue, which happens when the low-power CPU cores (Note: iPhone Xs CPU has non-symmetric cores) are used for the CPU computation part. When the high-power core is used, you'll see the 3.4ms result. Try adjusting the num_threads value to 2, and run the benchmarks multiple times to see that.

It's unfortunate that there doesn't seem to be any good way to force the high-power cores to be used with TFLite. We might end up updating our benchmark results page to reflect this.

RichardYang40148 · 2020-02-11T15:40:01Z

Gotcha, thanks for the clarification! This is really helpful!

tensorflow-bot bot assigned ravikyram Feb 4, 2020

ravikyram added TF 1.14 for issues seen with TF 1.14 comp:lite TF Lite related issues type:support Support issues labels Feb 5, 2020

ravikyram assigned ymodak and unassigned ravikyram Feb 5, 2020

srjoglekar246 assigned yyoon and unassigned ymodak Feb 5, 2020

tensorflow-copybara closed this as completed in 139cc09 Feb 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TFLite iOS benchmark app doesn't produce consistent result while using GPU delegate #36474

TFLite iOS benchmark app doesn't produce consistent result while using GPU delegate #36474

Richard-Yang-Bose commented Feb 4, 2020 •

edited

yyoon commented Feb 10, 2020

yyoon commented Feb 10, 2020

Richard-Yang-Bose commented Feb 10, 2020

yyoon commented Feb 11, 2020

RichardYang40148 commented Feb 11, 2020

TFLite iOS benchmark app doesn't produce consistent result while using GPU delegate #36474

TFLite iOS benchmark app doesn't produce consistent result while using GPU delegate #36474

Comments

Richard-Yang-Bose commented Feb 4, 2020 • edited

yyoon commented Feb 10, 2020

yyoon commented Feb 10, 2020

Richard-Yang-Bose commented Feb 10, 2020

yyoon commented Feb 11, 2020

RichardYang40148 commented Feb 11, 2020

Richard-Yang-Bose commented Feb 4, 2020 •

edited