Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tsan also needs sanitizer nerf for crc64 #122

Closed
nate-thirdwave opened this issue May 30, 2024 · 3 comments
Closed

tsan also needs sanitizer nerf for crc64 #122

nate-thirdwave opened this issue May 30, 2024 · 3 comments

Comments

@nate-thirdwave
Copy link

The no_sanitize_address nerf for crc64 also needs no_sanitize_thread since the tsan checker also looks for heap-use-after-free.

--- src/liblzma/check/crc64_fast.c	2023-11-01 12:19:29.000000000 +0000
+++ src/liblzma/check/crc64_fast.c-new	2024-05-30 16:56:41.935335535 +0000
@@ -209,11 +209,14 @@
 // The intrinsics use 16-byte-aligned reads from buf, thus they may read
 // up to 15 bytes before or after the buffer (depending on the alignment
 // of the buf argument). The values of the extra bytes are ignored.
-// This unavoidably trips -fsanitize=address so address sanitizier has
-// to be disabled for this function.
+// This unavoidably trips -fsanitize=address and -fsanitize=thread
+// so the sanitizers have to be disabled for this function.
 #if lzma_has_attribute(__no_sanitize_address__)
 __attribute__((__no_sanitize_address__))
 #endif
+#if lzma_has_attribute(__no_sanitize_thread__)
+__attribute__((__no_sanitize_thread__))
+#endif
 static uint64_t
 crc64_clmul(const uint8_t *buf, size_t size, uint64_t crc)
 {
@nate-thirdwave
Copy link
Author

Observing a tsan fault depends on the state of your heap (some other previously free'd memory block needs to be in the vestigial space before or after the buffer)

@Larhzu
Copy link
Member

Larhzu commented Jun 3, 2024

Perhaps with Clang it could be __attribute__((__no_sanitize__("*"))) or such as clearly it can trip about every sanitizer. It's funny as in assembly code the code would be fine. But in any case the current code will be replaced with one that won't trip sanitizers and not require these attributes.

Larhzu added a commit that referenced this issue Jun 11, 2024
It's not enough to silence the address sanitizer. Also memory and
thread sanitizers would need to be silenced. They, at least currently,
aren't smart enough to see that the extra bytes are discarded from
the xmm registers by later instructions.

Valgrind is smarter, possibly because this kind of code isn't weird
to write in assembly. Agner Fog's optimizing_assembly.pdf even mentions
this idea of doing an aligned read and then discarding the extra
bytes. The sanitizers don't instrument assembly code but Valgrind
checks all code.

It's better to change the implementation to avoid the sanitization
attributes which also look scary in the code. (Somehow they can look
more scary than __asm__ which is implictly unsanitized.)

See also:
#112
#122
Larhzu added a commit that referenced this issue Jun 11, 2024
It's faster with both tiny and large buffers and doesn't require
disabling any sanitizers. With large buffers the extra speed is
from folding four 16-byte chunks in parallel.

It is unknown if the MSVC workaround on 32-bit x86 is still needed.
I omitted it from this commit.

Fixes: #112
Fixes: #122
Larhzu added a commit that referenced this issue Jun 11, 2024
It's faster with both tiny and large buffers and doesn't require
disabling any sanitizers. With large buffers the extra speed is
from folding four 16-byte chunks in parallel.

It is unknown if the MSVC workaround on 32-bit x86 is still needed.
I omitted it from this commit.

Fixes: #112
Fixes: #122
Larhzu added a commit that referenced this issue Jun 11, 2024
It's not enough to silence the address sanitizer. Also memory and
thread sanitizers would need to be silenced. They, at least currently,
aren't smart enough to see that the extra bytes are discarded from
the xmm registers by later instructions.

Valgrind is smarter, possibly because this kind of code isn't weird
to write in assembly. Agner Fog's optimizing_assembly.pdf even mentions
this idea of doing an aligned read and then discarding the extra
bytes. The sanitizers don't instrument assembly code but Valgrind
checks all code.

It's better to change the implementation to avoid the sanitization
attributes which also look scary in the code. (Somehow they can look
more scary than __asm__ which is implictly unsanitized.)

See also:
#112
#122
Larhzu added a commit that referenced this issue Jun 11, 2024
It's faster with both tiny and large buffers and doesn't require
disabling any sanitizers. With large buffers the extra speed is
from folding four 16-byte chunks in parallel.

It is unknown if the MSVC workaround on 32-bit x86 is still needed.
I omitted it from this commit.

Fixes: #112
Fixes: #122
Larhzu added a commit that referenced this issue Jun 11, 2024
It's faster with both tiny and large buffers and doesn't require
disabling any sanitizers. With large buffers the extra speed is
from folding four 16-byte chunks in parallel.

It is unknown if the MSVC workaround on 32-bit x86 is still needed.
I omitted it from this commit.

Fixes: #112
Fixes: #122
Larhzu added a commit that referenced this issue Jun 11, 2024
It's not enough to silence the address sanitizer. Also memory and
thread sanitizers would need to be silenced. They, at least currently,
aren't smart enough to see that the extra bytes are discarded from
the xmm registers by later instructions.

Valgrind is smarter, possibly because this kind of code isn't weird
to write in assembly. Agner Fog's optimizing_assembly.pdf even mentions
this idea of doing an aligned read and then discarding the extra
bytes. The sanitizers don't instrument assembly code but Valgrind
checks all code.

It's better to change the implementation to avoid the sanitization
attributes which also look scary in the code. (Somehow they can look
more scary than __asm__ which is implictly unsanitized.)

See also:
#112
#122
Larhzu added a commit that referenced this issue Jun 11, 2024
It's faster with both tiny and large buffers and doesn't require
disabling any sanitizers. With large buffers the extra speed is
from folding four 16-byte chunks in parallel.

It is unknown if the MSVC workaround on 32-bit x86 is still needed.
I omitted it from this commit.

Thanks to Sam James for the feedback.

Fixes: #112
Fixes: #122
@Larhzu
Copy link
Member

Larhzu commented Jun 11, 2024

See #127.

Larhzu added a commit that referenced this issue Jun 16, 2024
It's not enough to silence the address sanitizer. Also memory and
thread sanitizers would need to be silenced. They, at least currently,
aren't smart enough to see that the extra bytes are discarded from
the xmm registers by later instructions.

Valgrind is smarter, possibly because this kind of code isn't weird
to write in assembly. Agner Fog's optimizing_assembly.pdf even mentions
this idea of doing an aligned read and then discarding the extra
bytes. The sanitizers don't instrument assembly code but Valgrind
checks all code.

It's better to change the implementation to avoid the sanitization
attributes which also look scary in the code. (Somehow they can look
more scary than __asm__ which is implictly unsanitized.)

See also:
#112
#122
Larhzu added a commit that referenced this issue Jun 16, 2024
It's faster with both tiny and large buffers and doesn't require
disabling any sanitizers. With large buffers the extra speed is
from folding four 16-byte chunks in parallel.

The 32-bit x86 with MSVC reportedly still needs a workaround.
Now the simpler "__asm mov ebx, ebx" trick is enough but it
needs to be in lzma_crc64() instead of crc64_arch_optimized().
Thanks to Iouri Kharon for testing and the fix.

Thanks to Sam James for general feedback.

Fixes: #112
Fixes: #122
@Larhzu Larhzu closed this as completed in 54eaea5 Jun 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@nate-thirdwave @Larhzu and others