You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Mac OS 10.14.6
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: iPhone XR (tested on iOS 13.3, and 12.3.1), iPhone Xs (13.1.2)
TensorFlow installed from (source or binary): installed from source.
TensorFlow version (use command below): 1.14.0
Python version: 3.6
Bazel version (if compiling from source): 1.2.1
GCC/Compiler version (if compiling from source): 4.2.1
Describe the current behavior
TFLite iOS benchmark app doesn't produce consistent result while using GPU delegate.
Testing with Mobilenet_1.0_224(float) model on iOS benchmark app, while using the default parameter, I'm able to obtain similar performance with the benchmark provided here: (around 14.x ms).
However, while adding the GPU delegate param according to the instruction ("use_gpu" : "1" and "gpu_wait_type" : "aggressive" options were also added to benchmark_params.json), the benchmark app still reports almost the same performance, instead of the 4x faster performance shown in the benchmark.
Describe the expected behavior
Similar performance for GPU delegate with the results provided in the iOS benchmark page.
Min num runs: [20]
Min runs duration (seconds): [1]
Max runs duration (seconds): [150]
Inter-run delay (seconds): [-1]
Num threads: [2]
Benchmark name: [mobile_net_benchmark]
Output prefix: []
Min warmup runs: [1]
Min warmup runs duration (seconds): [0.5]
Graph: [/private/var/containers/Bundle/Application/93C7DE45-ADBA-4E5B-B64B-3F789A357080/TFLiteBenchmark.app/mobilenet_v1_1.0_224.tflite]
Input layers: [input]
Input shapes: [1,224,224,3]
Input value ranges: []
Allow fp16 : [0]
Require full delegation : [0]
Enable op profiling: [0]
Max profiling buffer entries: [1024]
Loaded model /private/var/containers/Bundle/Application/93C7DE45-ADBA-4E5B-B64B-3F789A357080/TFLiteBenchmark.app/mobilenet_v1_1.0_224.tflite
2020-02-04 16:04:27.856557-0500 TFLiteBenchmark[1156:379854] Initialized TensorFlow Lite runtime.
The input model file size (MB): 16.9008
Initialized session in 15.791ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=31 first=79523 curr=14493 min=14015 max=79523 avg=16387.8 std=11545
Running benchmark for at least 20 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=68 first=14721 curr=14936 min=14055 max=15838 avg=14710.9 std=308
Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
Peak memory footprint (MB): init=0 overall=0
The text was updated successfully, but these errors were encountered:
Thanks for flagging, and providing your console logs.
It was clear from your logs that the use_gpu and gpu_wait_type parameters weren't being correctly read, and I think I found what's causing this. Should be fixed soon.
Although still not getting similar benchmark performance(around 8ms for the Mobilenet_1.0_224, instead of 3.4ms), but there's already a significant speed up compare to the cpu.
@Richard-Yang-Bose That's a somewhat known issue, which happens when the low-power CPU cores (Note: iPhone Xs CPU has non-symmetric cores) are used for the CPU computation part. When the high-power core is used, you'll see the 3.4ms result. Try adjusting the num_threads value to 2, and run the benchmarks multiple times to see that.
It's unfortunate that there doesn't seem to be any good way to force the high-power cores to be used with TFLite. We might end up updating our benchmark results page to reflect this.
System information
Describe the current behavior
TFLite iOS benchmark app doesn't produce consistent result while using GPU delegate.
Testing with Mobilenet_1.0_224(float) model on iOS benchmark app, while using the default parameter, I'm able to obtain similar performance with the benchmark provided here: (around 14.x ms).
However, while adding the GPU delegate param according to the instruction (
"use_gpu" : "1"
and"gpu_wait_type" : "aggressive"
options were also added tobenchmark_params.json
), the benchmark app still reports almost the same performance, instead of the 4x faster performance shown in the benchmark.Describe the expected behavior
Similar performance for GPU delegate with the results provided in the iOS benchmark page.
Code to reproduce the issue
https://github.com/tensorflow/tensorflow/blob/9901f967b11763726ae380273a24ee9b4fdae7f0/tensorflow/lite/tools/benchmark/ios/TFLiteBenchmark/TFLiteBenchmark/benchmark_data/benchmark_params.json
with
"use_gpu" : "1", "gpu_wait_type" : "aggressive"
added.Other info / logs
The text was updated successfully, but these errors were encountered: