Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZODB.utils.p64 and u64 could be faster and provide better error reports #216

Closed
jamadden opened this issue Aug 22, 2018 · 8 comments
Closed
Assignees

Comments

@jamadden
Copy link
Member

They currently use struct.[un]pack(">Q", x) which parses the format string each time. Using a pre-allocated Struct object can be quite a bit faster

# Python 2.7
$ python -m perf timeit -s 'from struct import pack' 'pack(">Q", 1234567)'
.....................
Mean +- std dev: 151 ns +- 5 ns
$ python -m perf timeit -s 'from struct import Struct; pack = Struct(">Q").pack' 'pack(1234567)'
.....................
Mean +- std dev: 107 ns +- 6 ns
# Python 3.7
$ python -m perf timeit -s 'from struct import pack' 'pack(">Q", 1234567)'
.....................
Mean +- std dev: 119 ns +- 6 ns
$ python -m perf timeit -s 'from struct import Struct; pack = Struct(">Q").pack' 'pack(1234567)'
.....................
Mean +- std dev: 89.7 ns +- 3.4 ns

Additionally, when there is an error in packing, the resulting struct.error just says something like error: integer out of range for 'Q' format code, without saying what the faulting value is. It would be nice if that information was available. We've run into scenarios where RelStorage over a MySQL connection can start throwing that exception for no apparent reason from storage.new_oid() and knowing the value could help debugging. I think just __traceback_info__ would be enough, and probably cheaper than a try/except that raises a different exception.

@jmuchemb
Copy link
Member

I did it for NEO, but without debugging information: Nexedi/neoppod@ef38744

For u64, having __traceback_info__ is quite cheap because we can't avoid defining a function (it can even be free by doing lambda __traceback_info__: ... but you may find that too ugly).

For p64, setting __traceback_info__ would be relatively costly.

@jamadden jamadden self-assigned this Aug 22, 2018
@jamadden
Copy link
Member Author

Thanks for the pointers. I'll do some benchmarking of the various options. I'll be happy if we at least don't lose any speed 😄

@jamadden
Copy link
Member Author

Here are the results of the benchmarking script where I measured u64(p64(123456789):

Benchmark Python 2.7 Python 3.7
current: u64(p64()) 493 ns 547 ns
struct method direct: u64(p64()) 175 ns 254 ns
struct pack direct unpack indirect: u64(p64()) 275 ns 352 ns
struct pack tbinfo unpack indirect: u64(p64()) 370 ns 431 ns
struct pack catch unpack indirect: u64(p64()) 375 ns 432 ns

The first line is our current implementation. That's the number to beat.

The second line is the absolute fastest we can get, directly calling the methods of Struct to pack and unpack. It's also unrealistic because it doesn't index into the results of unpacking like we need to do. It serves as a floor.

The third line (still using the cached methods of Struct) makes unpacking realistic by using a wrapper function to return the correct index, while still just using the pack method directly. That's the fastest we could realistically be.

The fourth line adds a wrapper function for packing that sets __traceback_info__, while the final line uses a wrapper function that has try/except struct.error.

It looks like function call overhead is approximately 100ns. Once you have that, either setting __traceback_info__ or using try/except is a tiny extra overhead. In particular, the exception version is 25-30% faster than what we're currently doing while offering the best debugging. I'd be happy with that.

@jamadden
Copy link
Member Author

("Inlining" the pack/unpack methods into the function through the old keyword argument trick, instead of accessing them as globals, did not improve the speed. Indeed, it slowed it down slightly.)

@jmuchemb
Copy link
Member

jmuchemb commented Aug 22, 2018

Oh. I was about to comment about this and I'm surprised you get slower results. I should test again on my side.

@jamadden
Copy link
Member Author

I was surprised too, but the effect is consistent across multiple separate test runs.

Python 2.7:

.....................
struct pack catch unpack indirect: u64(p64()): Mean +- std dev: 376 ns +- 7 ns
.....................
struct pack catch unpack indirect kwarg: u64(p64()): Mean +- std dev: 390 ns +- 21 ns

Note the higher stdandard deviation of the kwarg approach, which, on the low end, would make it competitive with the non-kwarg approach.

In Python 3.7, I understand accessing module globals is faster now, so there's little effect (although the standard deviation is smaller):

.....................
struct pack catch unpack indirect: u64(p64()): Mean +- std dev: 472 ns +- 28 ns
.....................
struct pack catch unpack indirect kwarg: u64(p64()): Mean +- std dev: 474 ns +- 14 ns

@jmuchemb
Copy link
Member

In NEO, I instead used a nonlocal variable. But I redid perf tests with Python 2 and I see no difference between that and a global.

So fine, let's consider the most readable code.

@jamadden
Copy link
Member Author

If I update the benchmarks to themself have a local reference to the functions they're calling, instead of calling them through a global, as in this gist, I can see a tiny improvement in the kwarg version on 2.7 compared to not. But (1) that's not how they're called in real life, and (2) the improvement is within the stddev.

(FWIW PyPy 6 2.7 clocks in at around 1.5 ns for each of the benchmarks, roughly 100 times faster than CPython.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants