-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libusb1 slightly slower than pyusb for large data. #21
Comments
I confirm, looks like I picked the slowest way to init a char array with ctypes (variant
|
I pushed a tentative patch doing this (and a many more conversions to
|
I will pull the branch and test. Anything else I can do to assist? |
Ran it. Execution time dropped down to 1.93 seconds. That is a drop of around a second and a half! Great job. The time is 1.93 s when I send either a bytes object or a bytearray, which was surprising because it looked like passing in bytes to bulkWrite caused a copy. I did notice that the data returned from bulkRead() is now a bytearray. This caused some issues in my code because a library I use only takes bytes/strings. (https://github.com/ilanschnell/bitarray) The easy fix is to copy it to a bytes object. This got me thinking about what data type should be returned from a read. Both bytearray and bytes have benefits, but I can not quite decide on what data type should be returned. I am leaning more towards the constant string/bytes object since that data will likely be processed, but I may be bias to my use case. I may just add bytearray support to that library I am using. |
With bytes, I have one more type to care about: bytearray is needed when reading as libusb will mutate the buffer. So I went with bytearrays everywhere. With the current code, any value passed is ultimately converted into a bytearray (which may involve memory copy), then into a ctype pointer. In any case, the argument should be either some type of binary array (bytes, bytearray, str with python2), or an integer (only when reading, of course) or a long with python2. Current code should already contain fewer memory copies just because of this conversion, so you should benefit from it even without changes to caller code. But pypy does not like it, as per latest travis-ci results, I have to fix this. |
Since bytearray is used internally as you modify the buffer, then that is the answer :). As I said above, I am already benefiting from the speed increase. |
Ah, indeed if caller is incompatible with bytearray, this breaks compatibility. Mmh. On one hand, I hate breaking backward compatibility. On the other hand, zero-copy is something precious: higher level may make copies where acceptable, but if lower level does copies caller can't do much about it. I'll need to think more about it. About improved performance, would you know where the next hot spot would be ? Still somewhere in libusb (be it in the python wrapper or in libusb itself), or elsewhere ? I'm especially curious about the (lack of) improvement in Always happy to get feedback and bug reports on python-libusb1. About Debian package request, I just bumped the RFP/ITP as I did not expect it to take this long. Maybe my module is not as easy to package as I thought. |
I am personally fine with the backwards compatibility 'break' since this only happens when using CTYPE classes that fail to handle bytes and bytearrays the same. I will work around my case by fixing the bitarray library to accept both bytes and bytearrays. On performance, were you hoping the _bulkTransfer time would be lower? Because a drop to 1.9 is already a great benefit to me and I would feel petulant asking for more. (If you are interested, my tool currently takes 4.1 seconds to run, so a little less than half of that is spent in usb1.) But I am always for faster execution times! When I get up in the morning, I can run your pprofiler (seems better than kern's project) on my tool and profile any methods you would like. I believe targets are currently bulkWrite/bulkRead, nx _bulkTransfer. Let me know if I should profile anything else, and I will happily provide the report. Perhaps the family of create_XXXXX_buffer functions? I can also try transferring this data to te device in a C program to see the theoretical limit of libusb, and we can work from there. |
My interest is that "absolute" performance is hard to put in perspective. If now python-libusb1 takes a negligible amount of time compared to time actually spent in libusb1, then there is little to gain by optimising further. On a higher level, if you are not doing a one-shot transfer but something recurring, you may want to try the async API: setup a handful of transfer object, define a function to call on completion which will process received data (ideally not spending too long, maybe piping it to another process) or will push the next data chunk when sending. This way, you can keep the USB bus busy even when the CPU is busy resubmitting the transfer. This is what I do in my USB protocol analyser driver and I can receive 43MB/s with the bus and/or the device being the bottleneck (USB 2.0 device). |
Testing now. Your current patch transfers my data in 1.93 seconds.
Looks like your patch moved nearly all the execution time to the libusb1.0 library. I did a test in native C, and the same bulkWrite took 1.58s to transfer. The code is largely unnecessary, but here is the part that does the transfer.
And this spat out So there is a difference of 0.35 seconds. |
Great news ! Thanks for checking.
I expect python-libusb1 to be close to this now: mutable buffers that ctypes can directly reference without copying them, and then pass the pointer to C. So I expect no memory copy left inside it - counting |
I find it surprising that when I pass in bytes, the exec time of create_initialised_buffer is so short. I expected it to have a large copy time since now you want bytesarray.. As bytearray, the time is on average 1.924 to transfer. And with bytes the averate is 1.928. Does that small of a difference even make sense lol. This issue looks solved to me. When you get the patch working with pypy, I will close it. Thanks for the quick responses. It has been a pleasure working with you on this. |
Added badges to README. Execution time down to 5.5 seconds. Supports new patch for libusb1 that addresses vpelletier/python-libusb1#21 which shaves another second and a half off of execution time. With libusb1 patch, execution time down to 4.1 seconds.
FWIW, the pypy breakage only affects the ancient version used on Travis, but tests pass on 5.6.0 for example (I think bytearray support is fairly recent there, at least in ctypes):
(side note: that deprecation warning comes from a test knowingly testing the deprecated code path, as it was accidentally broken at some point) So I released 1.6 with this change, along with moving code around to eventually reduce top-level module namespace footprint while allowing to ship the unit test along with the code (this coming from a discussion with the Debian dev working on Debian packaging - which should be able to enter Debian before the upcoming freeze). |
Thanks for the work. I would love to see a link to that conversation about including tests in the distribution (if it is public) since I was having a discussion about that with users of a library I inherited. |
No link sadly, as the discussion happened in meatspace. A summary could be: The initial idea was at the very least to run tests when building the package. As a consequence (not sure if it was intended or not) the test ended up installed in python's top-level module namespace in an earlier package version. I noticed it, and was shown that many 3rd-party python modules (at least ones I have installed on my machine) contain some form of test ("tests" folder, test*.py, ...). As it could help when users report having issues on a specific install, as it makes possible to tell them "post the output of |
Thanks. I will look for further work and see what I can learn for my project. As always, pleasure working with you. |
I am using libusb1 to communicate to a device and am looking for ways to make the transfer faster.
The bulk transfer is 5959114 bytes to the USB Bulk endpoint 2. Currently, the bulkWrite call of usb1 is taking the longest amount of time of my project. I tested sending the same data with pyusb and found pyusb usually faster than libusb1. I hope my statistics and use case can help increase performance of this great repository. Following are the statistics I came up with, and how to reproduce them.
I hooked kernprof to my project and profiled usb1's bulkWrite function. Here are the results. I noticed the ctype conversion takes almost as much time as the transfer. Maybe there is a way to do this with zero copy in C?
Example Code
Code with usb1
timeit line:
Code with pyusb
Here is the timeit line
I can do a comparison with code in C directly calling libusb 1.0 if the benchmarks are useful.
Here is my code that I ran during the profiling: https://github.com/diamondman/proteusisc/blob/059d94e3625331c2786bd5a592388bd9f4caa893/proteusisc/drivers/xilinxPC1driver.py#L327-L333 (Please forgive the messyness. It is still alpha)
The text was updated successfully, but these errors were encountered: