-
Notifications
You must be signed in to change notification settings - Fork 74k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ModifyGraphWithDelegate(delegate) running very slow (ubuntu with USB accelerator) #47383
Comments
@JohannSchumann |
Hi,
here the details - sorry forgot to add them.
1) small C++ file minimal.cc which opens the mobilenet.. and does the
ModifyGraphWithDelegate(): below
2) compilation via make (commands for each platform see below)
3) execution:
time ./min_example
4) results: Wall-clock:
Coral DevBoard: 0.053s
ubuntu notebook: 3.25 s (!) with USB Accelerator
5) Versions:
coralboard:
uname -a: Linux tuned-apple 4.14.98-imx #1 SMP PREEMPT Fri Nov 8
23:28:21 UTC 2019 aarch64 GNU/Linux
dpkg -l: iil libedgetpu1-std:arm64 14.1 arm64 Support
library for Edge TPU
tf: source tree from github. Release 2.1.0 (see RELEASE.md in top dir)
Notebook: ubuntu, 16.04, i7, 12GB
uname -a: Linux johann-Aspire-R5-571TG 4.15.0-133-generic
#137~16.04.1-Ubuntu SMP Fri Jan 15 02:55:18 UTC 2021 x86_64 x86_64 x86_64
GNU/Linux
dpkg -l: ii libedgetpu1-max:amd64 15.0 amd64 Support library
for Edge TPU
tf: source tree from github. Release 2.5.0 (see RELEASE.md in top dir)
usb-devices: (for that USB accelerator):
T: Bus=01 Lev=01 Prnt=01 Port=03 Cnt=02 Dev#= 15 Spd=480 MxCh= 0
D: Ver= 2.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
P: Vendor=18d1 ProdID=9302 Rev=01.00
C: #Ifs= 1 Cfg#= 1 Atr=80 MxPwr=498mA
I: If#= 0 Alt= 0 #EPs= 6 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none)
Same run times with the -std libedgetpu on the ubuntu notebook
Compilation on coralboard:
g++ -o "min_example" minimal.cc
/home/mendel/TENSORFLOW/tensorflow/tensorflow/lite/tools/make/gen/aarch64_armv8-a/lib/libtensorflow-lite.a
-std=c++11 -O3 -Wall -I/usr/lib/aarch64-linux-gnu/glib-2.0/include
-I/home/mendel/TENSORFLOW/tensorflow
-I/home/mendel/TENSORFLOW/tensorflow/tensorflow/lite/tools/make/downloads/flatbuffers/include
-lrt -ldl -lpthread -ledgetpu
Compilation on notebook:
g++ -o "min_example" minimal.cc
/home/johann/TOOLS/TENSORFLOW/tensorflow_src/tensorflow/lite/tools/make/gen/linux_x86_64/lib/libtensorflow-lite.a
-std=c++11 -O3 -Wall -I/home/johann/TOOLS/TENSORFLOW/tensorflow_src
-I/home/johann/TOOLS/TENSORFLOW/tensorflow_src/tensorflow/lite/tools/make/downloads/flatbuffers/include
-lrt -ldl -lpthread -ledgetpu
//--------------------------- small C++ example code -----------------------
#include <sys/stat.h>
#include <iostream>
#include <memory>
#include <vector>
#include <fstream>
#include <string>
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
#include "inferencewrapper.h"
#include "edgetpu_c.h"
#include "tensorflow/lite/builtin_op_data.h"
#include "tensorflow/lite/kernels/register.h"
#include "tensorflow/lite/model.h"
#define TFLITE_MINIMAL_CHECK(x) \
if (!(x)) { \
fprintf(stderr, "Error at %s:%d\n", __FILE__, __LINE__); \
exit(EXIT_FAILURE); \
}
int main(int argc, char* argv[]) {
std::unique_ptr<tflite::FlatBufferModel> model;
std::unique_ptr<tflite::Interpreter> interpreter_;
model =
tflite::FlatBufferModel::BuildFromFile("mobilenet_v2_1.0_224_inat_bird_quant_edgetpu.tflite");
TFLITE_MINIMAL_CHECK(model != nullptr);
tflite::ops::builtin::BuiltinOpResolver resolver;
tflite::InterpreterBuilder(*model, resolver)(&interpreter_);
size_t num_devices;
std::unique_ptr<edgetpu_device, decltype(&edgetpu_free_devices)> devices(
edgetpu_list_devices(&num_devices), &edgetpu_free_devices);
TFLITE_MINIMAL_CHECK(num_devices);
printf("Wrapper: found %d devices\n",(int)num_devices);
// edgetpu_verbosity(10);
const auto& device = devices.get()[0];
auto* delegate =
edgetpu_create_delegate(device.type, device.path, nullptr, 0);
// interpreter_->ModifyGraphWithDelegate({delegate,
edgetpu_free_delegate});
interpreter_->ModifyGraphWithDelegate(delegate);
printf("ModifyGraph done\n");
return 0;
}
…On Thu, Feb 25, 2021 at 5:54 AM Saduf2019 ***@***.***> wrote:
@JohannSchumann <https://github.com/JohannSchumann>
We see that the issue template has not been filled, could you please do so
as it helps us analyse the issue [tf version, steps followed before you ran
into this error or stand alone code to reproduce the issue faced]
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#47383 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHRRD2PHI3LLU7RWGGLZH3TAXJYRANCNFSM4YFMJ6EA>
.
|
@JohannSchumann did you use USB 3.0 port to connect the accelerator? |
Hi,
yes, it's connecting to USB 3.0 - see below.
lsusb -t
/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/6p, 5000M
|__ Port 1: Dev 2, If 0, Class=Application Specific Interface, Driver=,
5000M
/: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/12p, 480M
|__ Port 2: Dev 17, If 0, Class=Human Interface Device, Driver=usbhid,
1.5M
dmesg | grep usb
[343085.437163] usb 2-1: new SuperSpeed USB device number 2 using xhci_hcd
[343085.458086] usb 2-1: New USB device found, idVendor=1a6e, idProduct=089a
[343085.458091] usb 2-1: New USB device strings: Mfr=0, Product=0,
SerialNumber=0
...
…On Wed, Mar 3, 2021 at 5:11 AM Terry Heo ***@***.***> wrote:
@JohannSchumann <https://github.com/JohannSchumann> did you use USB 3.0
port to connect the accelerator?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#47383 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHRRDYWFM3NR6D42MQCYMLTBWZI3ANCNFSM4YFMJ6EA>
.
|
Is the source at #47383 (comment) what you're using? What I found was the following page which uses external context. It might worth to try. Also I wonder if you see the same symptom with Python example. |
Hi,
Is the source at #47383 (comment)
<#47383 (comment)>
what you're using?
The exact code that I am running is listed in this comment.
What I found was the following page which uses external context. It might
worth to try.
https://coral.ai/docs/edgetpu/tflite-cpp/#set-up-the-tf-lite-interpreter-with-libedgetpu
I followed that, but it didn't change
Also I wonder if you see the same symptom with Python example.
Yes. The long run-time shows up in the "make interpreter". After that,
run-times seem to be OK.
So the problem seems to be in the START-UP of the accelerator
For the parrot example (
https://coral.ai/docs/accelerator/get-started/#3-run-a-model-on-the-edge-tpu
):
python startup: 0.27s (wall time)
Startup + make_interpreter: 2.9s <--- *this is the very long time*
full example: 2.966s (inference times: 11ms, 2.4ms, 2.4ms...) <----
these times seem to be OK
[this is not the first run after plug-in, where the firmware seems to be
downloaded]
For the C++ example, I ran with verbosity = 10. The log-file (see below)
stops noticeably at:
after: [I :1386] Open device and check if DFU is needed
before [I :1013] OpenDevice: [/sys/bus/usb/devices/2-2]
and
after: I :287] Close: performing graceful reset
before: I :320] Close: final clean up completed
Waiting with any timeouts there?
Thank you!
=====================LOG file (abbreviated) [C++ example]
===========================
$ time min_example
Wrapper: found 1 devices
I :453] No matching device is already opened for shared ownership.
I :31] Failed to open /sys/class/apex: No such file or directory
I :944] EnumerateDevices: vendor:0x1a6e, product:0x89a
I :979] EnumerateDevices: checking bus[2] port[2]
...
I :998] EnumerateDevices: found [/sys/bus/usb/devices/2-2]
...
I :225] Enumerate: adding path [/sys/bus/usb/devices/2-2]
I :104] USB always DFU: False (default)
I :145] USB bulk-in queue capacity: 8
I :65] Performance expectation: Max (default)
I :1386] Open device and check if DFU is needed
<<<<<<< pause in execution ~1/2 second or longer
I :1013] OpenDevice: [/sys/bus/usb/devices/2-2]
I :1050] OpenDevice: checking bus[2] port[2]
I :1081] OpenDevice: device opened 0x1cc2070
I :182] LocalUsbDevice
I :36] UsbStandardCommands
I :37] UsbDfuCommands
I :43] GetDeviceDescriptor
I :397] GetDescriptor
I :78] Vender ID: 0x18d1
I :79] Product ID: 0x9302
I :1413] Device is already in application mode, skipping DFU
I :1425] Resetting device
I :241] Close: closing device 0x1cc2070
I :214] DoCancelAllTransfers: cancelling 0 async transfers
I :222] DoCancelAllTransfers: waiting for all async transfers to complete
I :232] DoCancelAllTransfers: all async transfers have completed
I :274] Close: releasing 0 transfer buffers
I :287] Close: performing graceful reset
<<<<<<< pause in execution ~1/2 second or longer
I :320] Close: final clean up completed
I :1366] Opening device expecting application mode
I :1013] OpenDevice: [/sys/bus/usb/devices/2-2]
I :1050] OpenDevice: checking bus[2] port[2]
I :1081] OpenDevice: device opened 0x1cc1ec0
I :182] LocalUsbDevice
I :36] UsbStandardCommands
I :47] UsbMlCommands
I :40] ~UsbDfuCommands
I :39] ~UsbStandardCommands
I :194] ~LocalUsbDevice
I :241] Close: closing device (nil)
I :350] ClaimInterface
I :81] ReadRegister32 offset 0x1a30c
I :512] SendControlCommandWithDataIn
I :519] SYNC CTRL WITH DATA IN begin
I :536] SYNC CTRL WITH DATA IN end
I :111] ReadRegister32 [0x1A30C] == 0xF0059
I :154] WriteRegister32 [0x1A30C] := 0xF0059
I :473] SendControlCommandWithDataOut
I :783] AsyncInterruptInTransfer
...
I :796] ASYNC IN 3 begin
I :1262] WorkerThreadFunc Installing bulk-in reader. buffer index [0]
I :748] AsyncBulkInTransfer
I :761] ASYNC IN 1 begin
I :1262] WorkerThreadFunc Installing bulk-in reader. buffer index [1]
I :748] AsyncBulkInTransfer
I :761] ASYNC IN 1 begin
real 0m2.917s
user 0m0.030s
sys 0m0.045s
—
… You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#47383 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHRRDYIECDVS7PWJXJ4VA3TCQPTFANCNFSM4YFMJ6EA>
.
|
You'd better file a bug under Coral page. This is not something TF team can answer. |
Hi,
I wrote a minimal C++ code loading the mobilenet_v2_1.0_224_inat_bird_quant_edgetpu.tflite
and doing some inferencing.
When running on a ubuntu-16.04 notebook (i7, SSD, 12G) with USB accelerator,
interpreter_->ModifyGraphWithDelegate
runs substantially (50x) slower than the same(see below) code on the dev-board.
Dev-board:
Code line: interpreter_->ModifyGraphWithDelegate({delegate, edgetpu_free_delegate});
libedgetpu version: libedgetpu1-std:arm64 14.1
run time: 0.07sec (incl. loading network etc)
Ubuntu 16.04,
Code line: interpreter_->ModifyGraphWithDelegate(delegate);
[ the version above with the 2 delegates can't be compiled]
libedgetpu: libedgetpu1-max/coral-edgetpu-stable,now 15.0 amd64 [installed]
TENSORFLOW: tensorflow_src cloned today and built tflite library
runtime: 3.2sec (!)
After that, the inference is running about only half as fast as on the devboard.
(slower USB speed??? or problem with the delegate?)
Code snippet:
std::unique_ptrtflite::FlatBufferModel model;
model = tflite::FlatBufferModel::BuildFromFile(model_path.c_str());
tflite::ops::builtin::BuiltinOpResolver resolver;
tflite::InterpreterBuilder(*model, resolver)(&interpreter_);
size_t num_devices;
std::unique_ptr<edgetpu_device, decltype(&edgetpu_free_devices)> devices(
edgetpu_list_devices(&num_devices), &edgetpu_free_devices);
TFLITE_MINIMAL_CHECK(num_devices);
const auto& device = devices.get()[0];
auto* delegate = edgetpu_create_delegate(device.type, device.path, nullptr, 0);
ON DEVBOARD: interpreter_->ModifyGraphWithDelegate({delegate, edgetpu_free_delegate});
ON UBUNTU/USB: interpreter_->ModifyGraphWithDelegate(delegate);
Thank you!
-johann
The text was updated successfully, but these errors were encountered: