-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect AVX2 support at runtime #67
Comments
The only use of AVX2 intrinsics AFAIK is within c-blosc. @FrancescAlted could you confirm that c-blosc does not perform runtime dispatching based on hardware capabilities? If so, is this feasible? |
So when we had investigated issue ( zarr-developers/zarr-python#136 ) last time (admittedly about ~1yr ago). We had narrowed it down to an AVX2 instruction, Now I have not investigated the analogous case since the Zarr/Numcodecs split, but suspect the issue still exists. Can try and generate a new reproducer using newer versions of Zarr and Numcodecs, which should help us understand where this problem occurs now. Looking back at the C code now, would suspect this line to have caused the issue. Fixing this sort of issue may require some trickery on the building end of things. |
My apologies, I had forgotten this.
On Mon, 19 Feb 2018 at 18:08, jakirkham ***@***.***> wrote:
So when we had investigated issue ( zarr-developers/zarr-python#136
<zarr-developers/zarr-python#136> ) last time
(admittedly about ~1yr ago). We had narrowed it down to an AVX2
instruction, vinserti128, popping up in __pyx_pw_4zarr_5blosc_19compress
<https://github.com/zarr-developers/zarr/blob/v2.1.4/zarr/blosc.c#L2803>,
which was used by all compression code paths (except Zlib)
<zarr-developers/zarr-python#136 (comment)>.
@FrancescAlted <https://github.com/francescalted> had previously looked
and found that there was no vinserti128 in Blosc
<zarr-developers/zarr-python#136 (comment)>.
This means it had to have been in the Zarr Cython-generated C code. We
decided the solution was to allow one to disable AVX2 instructions at
compile time. This works, but comes with caveat that we cannot use AVX2
instructions at run time should they be available.
Now I have not investigated the analogous case since the Zarr/Numcodecs
split, but suspect the issue still exists. Can try and generate a new
reproducer using newer versions of Zarr and Numcodecs, which should help us
understand where this problem occurs now. Looking back at the C code now,
would suspect this line
<https://github.com/zarr-developers/zarr/blob/v2.1.4/zarr/blosc.c#L2812>
to have caused the issue. Fixing this sort of issue may require some
trickery on the building end of things.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#67 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAq8Qqx1wuQoXoBopWXOxhk-UsuICrWDks5tWbiDgaJpZM4SJeew>
.
--
If I do not respond to an email within a few days and you need a response,
please feel free to resend your email and/or contact me by other means.
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health <http://cggh.org>
Big Data Institute Building
Old Road Campus
Roosevelt Drive
Oxford
OX3 7LF
United Kingdom
Phone: +44 (0)1865 743596 or +44 (0)7866 541624
Skype: londonbonsaipurple
Email: alimanfoo@googlemail.com
Web: http://a <http://purl.org/net/aliman>limanfoo.github.io/
Twitter: https://twitter.com/alimanfoo
|
I confirm that C-Blosc does perform runtime dispatching based on hardware capabilities. In order to better assess if the different acceleration paths are being available to Blosc, I have just implemented the possibility to print the different CPU capabilities that will be used via the BLOSC_PRINT_SHUFFLE_ACCEL environment variable. And yes, it should be possible to activate the AVX2 path just in processors having this capability. |
Thanks Francesc.
I am way out of my depth here, but I don't believe there are any AVX2
intrinsic function calls in the Cython-generated C code, and so if there
are some AVX2 instructions in the compiled code I guess this must be an
optimisation the compiler has figured out by itself. So if we want to
compile c-blosc with the potential to use AVX2 when available, but also
have the compiled code safe to run on hardware without AVX2, it sounds like
we need to be able to tell the compiler something like "if you see an AVX2
intrinsic function call in the source code then go ahead and compile AVX2
instructions, but otherwise do not insert any AVX2 instructions by
yourself". I wonder if this could be achieved with gcc via the -o flag,
although I don't know if there would be other performance considerations.
…On Friday, February 23, 2018, Francesc Alted ***@***.***> wrote:
I confirm that C-Blosc *does* perform *runtime* dispatching based on
hardware capabilities. In order to better assess if the different
acceleration paths are being available to Blosc, I have just implemented
the possibility to print the different CPU capabilities that will be used
via the BLOSC_PRINT_SHUFFLE_ACCEL environment variable
<Blosc/c-blosc@dbf989d#diff-3b57192dd1ce214552c18801cfb7ae7bR33>.
And yes, it should be possible to activate the AVX2 path just in processors
having this capability.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#67 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAq8QqNUHL-hiEmLmLJVphE11PvmnaLmks5tXn-JgaJpZM4SJeew>
.
--
If I do not respond to an email within a few days, please feel free to
resend your email and/or contact me by other means.
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health <http://cggh.org>
Big Data Institute Building
Old Road Campus
Roosevelt Drive
Oxford
OX3 7LF
United Kingdom
Phone: +44 (0)1865 743596 or +44 (0)7866 541624
Skype: londonbonsaipurple
Email: alimanfoo@googlemail.com
Web: http://a <http://purl.org/net/aliman>limanfoo.github.io/
Twitter: https://twitter.com/alimanfoo
|
Yeah, I am not an expert either. My hunch is that the At any rate, I am pinging the guy who did most of the SSE2/AVX2 runtime detection in Blosc some years ago. @juliantaylor any hints on this would be highly appreciated. Thanks in advance! |
Correct, -mavx2 allows the compiler to place avx2 code into whatever place it likes. This piece of code looks like it compiles in avx2 unconditionally, though I am not familiar with this cython feature, it might just be an annotation not used during compilation: If your code that profits from avx2 is inside of non-public cython code called from python it should be pretty easy to compile it twice wrap the appropriate call depending on runtime environment in python. |
I'm experiencing this issue on some of my machines. The kernel thinks the illegal instruction is in the blosc library which seems to be provided by numcodecs
|
Currently users have to decide at compile time if they would like to build a binary that supports AVX2 intrinsics or not. If they build with AVX2 intrinsics and end up deploying to somewhere that lacks AVX2 intrinsics, they will suffer a segfault due to the illegal instruction. Though users can build without AVX2 intrinsics and it will work fine regardless of whether the target infrastructure has AVX2 support, the compression algorithms here may run slower than if they were built with AVX2 support. Admittedly avoiding a segfault is much more important than degraded performance.
However, in the ideal case, we could build
numcodecs
with and without AVX2 support and then merely detect at runtime whether AVX2 instructions were permitted and thus choose the appropriate code path without crashing in either case. This will take a bit of work to understand where AVX2 instructions are being introduced and how to avoid them. Though some of that was already done in the first referenced issue below.xref: zarr-developers/zarr-python#136
xref: #24
xref: #26
xref: #27
The text was updated successfully, but these errors were encountered: