-
Notifications
You must be signed in to change notification settings - Fork 277
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
libFLAC: Add a workaround for a bug in MSVC2105 update2
MSVC2105 update2 compiles the C code: abs_residual_partition_sums[partition] = (FLAC__uint32)_mm_cvtsi128_si32(mm_sum); into this: movq QWORD PTR [rsi], xmm2 while it should be: movd eax, xmm2 mov QWORD PTR [rsi], rax With this patch, MSVC emits: movq QWORD PTR [rsi], xmm2 mov DWORD PTR [rsi+4], r9d so the price of this workaround is 1 extra write instruction per partition. Patch-from: lvqcl <lvqcl.mail@gmail.com>
- Loading branch information
Showing
3 changed files
with
12 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
94a6124
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For those interested, this is still a problem with the current MS compiler (_MSC_FULL_VER reports 191225835). I suggest removing at least the _MSC_FULL_VER check if you want an x64 release build to compress down more than 92%.
Also, the Connect site link goes nowhere. For more background,
https://www.mail-archive.com/flac-dev@xiph.org/msg04176.html
Update: the bug introduced by the release x64 build is a new one; same effect.
Here is the doesn't-work one (from stream_encoder_intrin_sse2.c -- don't forget to turn off LTCG (make that, /GL) for this file to get the /Facs generated):
`; 96 : __m128i mm_res = local_abs_epi32(_mm_cvtsi32_si128(residual[residual_sample]));
00120 66 0f 6e 01 movd xmm0, DWORD PTR [rcx]
00124 48 8d 49 04 lea rcx, QWORD PTR [rcx+4]
[snip]
; 102 : abs_residual_partition_sums[partition] = (FLAC__uint32)_mm_cvtsi128_si32(mm_sum);
00165 66 0f 7e c1 movd ecx, xmm0
00169 48 89 0e mov QWORD PTR [rsi], rcx
`
The problem is, the high 32 bits of rcx still have the result from a previous lea (at 00124), so same problem, different reason.
Here is the working one:
`; 102 : abs_residual_partition_sums[partition] = (FLAC__uint32)_mm_cvtsi128_si32(mm_sum);
00165 66 0f 7e c1 movd ecx, xmm0
00169 48 89 0e mov QWORD PTR [rsi], rcx
; 107 : abs_residual_partition_sums[partition] &= 0xFFFFFFFF;
0016c 44 89 4e 04 mov DWORD PTR [rsi+4], r9d
`
where r9d has 0, clearing out the high 32 bits.
94a6124
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So how about a PR with an updated version check and a corrected URL?