Remove all assembler and intrinsics from decoder #347
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This commit drops all use of assembler and intrinsics from the libFLAC decoder. This is because they are only for 32-bit x86, hard to debug, maintain and fuzz properly, and because the decoder has much greater security risks than the encoder.
I've tested the impact on decoding speed of this change with clang 14, gcc 11.3 and MSVC 2022, on a Intel Kaby Lake-R processor with 16-bit and 24-bit input. For each compiler, 4 32-bit compiles were tested: one with asm optimizations (called asm), one without asm optimizations (but with
-msse2
or/arch:sse2
, called unrolled), one without asm optimizations and with loop unrolling of (called plain) and one with a switch-case statement (called switchcase).16-bit input
24-bit input
These graphs are a bit hard to read, so here are my own findings:
TL;DR: this change generally slightly improves decoding speed with 16-bit input and slightly decreases decoding speed with 24-bit input
I'm currently also running tests on a AMD Jaguar CPU to compare the results