You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Feb 26, 2020. It is now read-only.
That is after modifying CFLAGS to use -O2 -m32 on my Haswell E3-1276 v3 and killing the SSE2 code (hence why the vector column is zero). The system and scl_us columns are the ones that are relevant to us, where performance is twice as fast.
We should look into switching to libdivide for 64-bit division on 32-bit platforms and if there is a way to get the compiler to use our own function for 64-bit division on 64-bit platforms, we would likely be better off switching to it. libdivide is under the zlib license, which allows us to use it.
This is not a high priority idea from a performance standpoint (how often do we do division?), but it is something I want to put out there for anyone new to contributing that is interested in working on it.
A couple related thoughts of possible interest are "does division being non-pipelined has an effect on another thread in Intel's SMT implementation?" and "is hardware division preemptible?". These questions might worth asking the realtime Linux developers.
The text was updated successfully, but these errors were encountered:
The hardware division units are non-pipelined and have horrific latencies:
https://gmplib.org/~tege/x86-timing.pdf
A few quick tests using libdivide's benchmark shows that it is about twice as fast at 64-bit division as the native implement:
That is after modifying CFLAGS to use
-O2 -m32
on my Haswell E3-1276 v3 and killing the SSE2 code (hence why the vector column is zero). The system and scl_us columns are the ones that are relevant to us, where performance is twice as fast.http://libdivide.com/
We should look into switching to libdivide for 64-bit division on 32-bit platforms and if there is a way to get the compiler to use our own function for 64-bit division on 64-bit platforms, we would likely be better off switching to it. libdivide is under the zlib license, which allows us to use it.
This is not a high priority idea from a performance standpoint (how often do we do division?), but it is something I want to put out there for anyone new to contributing that is interested in working on it.
A couple related thoughts of possible interest are "does division being non-pipelined has an effect on another thread in Intel's SMT implementation?" and "is hardware division preemptible?". These questions might worth asking the realtime Linux developers.
The text was updated successfully, but these errors were encountered: