Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

configure must not pass -msse2 by default on x86 #486

Closed
manxorist opened this issue Oct 16, 2022 · 9 comments · Fixed by #491
Closed

configure must not pass -msse2 by default on x86 #486

manxorist opened this issue Oct 16, 2022 · 9 comments · Fixed by #491

Comments

@manxorist
Copy link
Contributor

Passing -msse2 allows the compiler to make use of any SSE2 instruction everywhere, and not only for the explicit SSE2 intrinsics.
This very obviously breaks on x86 CPUs that do not support SSE2.

I figured, I can pass --disable-sse, but I would prefer a default configuration that actually works.

It took quite a while to figure out what was actually happening here, as my CPU (AMD Duron 1800 (Athlon-XP core)) chooses to not die with Illegal Instruction but instead just produces garbled values.

There were very few x86 CPUs (some Pentium 4, Pentium M, Core (not Core 2), some Atom, VIA C7, Transmeta Efficeon) supporting SSE2 but not also supporting 64bit, which means almost all SSE2 CPUs will be running a 64bit system anyway, so the harm done by not passing -msse2 by default on x86 should not be that huge.

@ktmf01
Copy link
Collaborator

ktmf01 commented Oct 16, 2022

Passing -msse2 allows the compiler to make use of any SSE2 instruction everywhere, and not only for the explicit SSE2 intrinsics. This very obviously breaks on x86 CPUs that do not support SSE2.

This behaviour has been in FLAC for 9 years already. It was considered appropriate at the time, and you'll find almost no software not requiring it today. Windows 7 and later require it, for example.

Dropping -msse2 will harm performance on modern systems.

There were very few x86 CPUs (some Pentium 4, Pentium M, Core (not Core 2), some Atom, VIA C7, Transmeta Efficeon) supporting SSE2 but not also supporting 64bit, which means almost all SSE2 CPUs will be running a 64bit system anyway, so the harm done by not passing -msse2 by default on x86 should not be that huge.

The problem here is that there is a lot of 32-bit software around still. Foobar2000 for example, is only now migrating to 64-bit, and a 32-bit version will be around for a while. Such 32-bit software will use a 32-bit version of libFLAC. So, this is still very much relevant.

@manxorist
Copy link
Contributor Author

Well, I disagree.

This behaviour has been in FLAC for 9 years already. It was considered appropriate at the time, and you'll find almost no software not requiring it today. Windows 7 and later require it, for example.

9 years ago, this decision was very wrong, so I somewhat question how that could have been concluded as "appropriate" back then. Windows Vista was very much supported and used, and did not at all require SSE2. Neither did Windows 7 (back then), only in the last few months of support Microsoft added a requirement on SSE2 for some Spectre and Meltdown patches.

Debian does not require SSE2, and is a supported OS on non-SSE2 x86 systems right now.

You can easily detect whether any user-specified or system default CFLAGS or implicit compiler switches target SSE2 or not: GCC and Clang will #define __SSE2__, MSVC (which is not very relevant with Autoconf here) will #define _M_IX86_FP 2. If you need to know that in the build system, you can implement a configure check for exactly that.

If some application chooses to target SSE2, and builds its own dependencies, it's its responsibility to build them appropriately.

If a distribution does target SSE2, they can and will build their compiler to automatically target SSE2 by default.

So, which use case does adding -msse2 and thus overriding the system's and user's choices actually solve?

Why does FLAC need to invent its own way to select the target architecture?

In any case, I do not have a strong opinion on the issue, so if you continue to disagree and do not want to discuss any further, please feel free to close as wontfix.

@ktmf01
Copy link
Collaborator

ktmf01 commented Oct 16, 2022

Debian does not require SSE2, and is a supported OS on non-SSE2 x86 systems right now.

But does it have a modern, secure web browser that runs on non-SSE2 x86? As far as I know Firefox dropped support for non-SSE2 CPUs quite a few years ago. See also https://bugs.launchpad.net/ubuntu/+source/firefox/+bug/1698501 As you can see, even a bug report for such an important package gets very little attention.

Anyway, Debian passes --disable-sse when building FLAC, so what Debian packages runs just fine on those CPUs.

MSVC

Nice example actually, MSVC does the exact same for all projects. See https://learn.microsoft.com/en-us/cpp/build/reference/arch-x86?view=msvc-170

/arch:SSE2
Enables the use of SSE2 instructions. This option is the default instruction set on x86 platforms if no /arch option is specified.

So, which use case does adding -msse2 and thus overriding the system's and user's choices actually solve?

Few users/packagers can afford doing thorough profiling of every (data-intensive) package to check what helps. Therefore, FLAC sets -msse2 by default. This is documented for autotools and CMake. This favors users on 99.9% of systems, users of the 0.1% of non-SSE2 systems will need to do a bit more research, yes.

But I agree this is not a nice solution.

Perhaps it is wise to limit this addition of -msse2 to Windows compiles. Linux users will probably mostly use 32-bit OSes on non-64-bit capable hardware, and as far as I know use of 32-bit software on 64-bit OSes is limited on non-Windows systems. Also, there are no non-SSE2 Windows systems that are still supported, while there are non-SSE2 Linux distro's.

But then again, distro's might already be aware they can pass --disable-sse

@ktmf01
Copy link
Collaborator

ktmf01 commented Oct 16, 2022

Funny thing actually, issue 486 is an issue specific to x86. :-)

@manxorist
Copy link
Contributor Author

Debian does not require SSE2, and is a supported OS on non-SSE2 x86 systems right now.

But does it have a modern, secure web browser that runs on non-SSE2 x86?

How is that relevant? I am not running X11 on that system. If anything, it just shows similar broken behaviour in other application for equally no reason: If you do have a pure portable C implementation of the algorithms (Which you absolutely do have in portable code anyway for less common platforms), there is little reason to arbitrarily limit the default target.

Anyway, Debian passes --disable-sse when building FLAC, so what Debian packages runs just fine on those CPUs.

Yes, the Debian package works around the wrong default.

MSVC

Nice example actually, MSVC does the exact same for all projects.

Yes, and they did that in 2012 when they still supported non-SSE2 system, which was also just stupid. Nowadays it's fine, as MSVC does not target any Windows version which does not require SSE2 any more.

So, which use case does adding -msse2 and thus overriding the system's and user's choices actually solve?

Few users/packagers can afford doing thorough profiling of every (data-intensive) package to check what helps.

The situation is more dangerous: Very few packagers are running the build on a non-SSE2 system, so the test suite will succeed, and the problem will only be caught later on by some users. Now, given that FLAC did this for 9 years already, we can assume that for the major distributions that still support old systems, this very likely has already been caught. It's still extra work required by packagers which could be avoided by choosing safe defaults.

The symptoms of the wrong default on my CPU are actually very annoying: There was no process dying with Illegal Instruction, instead flac just failed to properly parse a WAV input header and claimed it had 0 channels. make check died when running MD5 tests, and at different stages depending on the compiler version and options I used. It took me a whole day to figure out what the heck was happening.
On a more meta-level: ./configure && make check should generally not fail, but it does for FLAC on non-SSE2 x86.

Perhaps it is wise to limit this addition of -msse2 to Windows compiles.

If you only care about supported Windows API implementations by Microsoft itself, that's fair (ignoring Wine and Reactos). Would work for me, and probably cover most of the very few use cases you care about. However, modern Windows systems are already covered regardless of what the FLAC build system chooses, because MSVC already defaults to that anyway. So it's only MinGW/MSYS2, and they build their compiler to still target older CPUs by default. Do you question their default choices?

Also, there are no non-SSE2 Windows systems that are still supported, while there are non-SSE2 Linux distro's.

Not exactly true, as Wine is still supported on non-SSE2 systems. @OpenMPT actually has users on 32bit non-SSE2 Linux systems running Wine, and we do support that (probably as long as we still support VS2017, maybe even longer). We did try to require SSE2 in 2020, but users complained and we did re-add support for non-SSE2 systems. We use our own build system for FLAC though.

But then again, distro's might already be aware they can pass --disable-sse

Still, why does FLAC need to invent its own way to specify the target architecture? This information is already present in CFLAGS and/or implicit compiler switches.

@ktmf01
Copy link
Collaborator

ktmf01 commented Oct 17, 2022

Still, why does FLAC need to invent its own way to specify the target architecture?

I would say for performance reasons. I need to check how much of a performance hit dropping -msse2 results in though.

@manxorist
Copy link
Contributor Author

manxorist commented Oct 17, 2022

CPU: AMD A10-5700 (Piledriver core, AVX)
Compiler: GCC 12.1.0
System: MSYS2 MinGW32 on Windows 10 Pro 64bit

build settings (current git master):

  • stock: ./configure --enable-static --disable-shared
  • disable-sse: ./configure --disable-sse --enable-static --disable-shared

test commands:

  • 16bit -2: time ./src/flac/flac -c -2 /tmp/16.wav > /dev/null
  • 16bit -8: time ./src/flac/flac -c -8 /tmp/16.wav > /dev/null
  • 16bit -8p: time ./src/flac/flac -c -8p /tmp/16.wav > /dev/null
  • 24bit -2: time ./src/flac/flac -c -2 /tmp/24.wav > /dev/null
  • 24bit -8: time ./src/flac/flac -c -8 /tmp/24.wav > /dev/null

Reported time is real time, best of 3 runs.

version 16bit -2 16bit -8 16bit -8p 24bit -2 24bit -8
stock 0m0.631s 0m2.733s 0m11.207s 0m0.932s 0m3.392s
disable-sse 0m0.647s (-3%) 0m2.897s (-6%) 0m11.931s (-6%) 0m0.932s (+0%) 0m3.482s (-3%)

So, on "modern" (this CPU 10 is years old by now, and not even supported by Windows 11 any more) systems, the impact is IMHO negligible.

I do not have a SSE2-only CPU (where I would expect more impact) available right now to easily run tests on though.
However, given that we concluded that performance regressions on only slightly older x86 CPUs are generally acceptable in #452, I really do not see the point of -msse2 by default. x86 32bit running on actually modern systems is not impacted much.

@ktmf01
Copy link
Collaborator

ktmf01 commented Oct 17, 2022

Actually, I was more concerned about decoding speed. Turns out that isn't affected much either.

Here are results for a 32-bit compile with MinGW, GCC 12.2, running on Windows 10, Intel i5 7200U, Kaby Lake-R processor, processing CDDA. Presets -0 through -8 are tested, 0 at top right, 8 at bottom left.

very large set of tracks.pdf
very large set of tracks - logarithmic.pdf

Decoding results are inconclusive. It seems presets 0 through 6 benefit from the smaller binary size, while 7 and 8 benefit from SSE2. Differences are generally less than 1% though. For encoding it seems uniform across all presets, about 5%.

I still would like to keep -msse2 default on for MinGW compiles though.

@manxorist
Copy link
Contributor Author

I still would like to keep -msse2 default on for MinGW compiles though.

That would be fine for me. Also for macOS (if x86 32bit still makes any sense there at all (I am not sure)).

ktmf01 added a commit to ktmf01/flac that referenced this issue Oct 18, 2022
Remove default addition of msse2 on x86. After profiling it was
found this helps little on modern systems. See
xiph#486

The mention of Asm optimizations at the end of configuration was
wrong in many ways: it was 'yes' on platforms for which there are
no optimizations, and wasn't set to 'no' in case intrinsics headers
aren't available.
@ktmf01 ktmf01 linked a pull request Oct 19, 2022 that will close this issue
ktmf01 added a commit that referenced this issue Oct 20, 2022
Remove default addition of msse2 on x86. After profiling it was
found this helps little on modern systems. See
#486

The mention of Asm optimizations at the end of configuration was
wrong in many ways: it was 'yes' on platforms for which there are
no optimizations, and wasn't set to 'no' in case intrinsics headers
aren't available.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants