Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Threading primitives are not aligned properly #194

Open
luke-jr opened this issue Oct 30, 2021 · 37 comments
Open

Threading primitives are not aligned properly #194

luke-jr opened this issue Oct 30, 2021 · 37 comments
Labels
stale No response received after potential fixes.

Comments

@luke-jr
Copy link
Contributor

luke-jr commented Oct 30, 2021

> /mnt/hd2019c-nobackup/dev/rust/mrustc/output-1.54.0/build_libc-0_2_95_H19_run
Process was terminated with signal 7
FAILING COMMAND:  /mnt/hd2019c-nobackup/dev/rust/mrustc/output-1.54.0/build_libc-0_2_95_H19_run make: *** [minicargo.mk:106: output-1.54.0/libtest.rlib] Error 1
Program received signal SIGBUS, Bus error.
0x00007ffff7eaea9c in __pthread_rwlock_rdlock_full64 (abstime=0x0, clockid=0, 
    rwlock=0x1000ef929 <ZRG4ch3std50_0_03sys4unix2os8ENV_LOCK0g+1>) at pthread_rwlock_common.c:353
353     pthread_rwlock_common.c: No such file or directory.
(gdb) bt
#0  0x00007ffff7eaea9c in __pthread_rwlock_rdlock_full64 (abstime=0x0, clockid=0, 
    rwlock=0x1000ef929 <ZRG4ch3std50_0_03sys4unix2os8ENV_LOCK0g+1>) at pthread_rwlock_common.c:353
#1  __GI___pthread_rwlock_rdlock (rwlock=0x1000ef929 <ZRG4ch3std50_0_03sys4unix2os8ENV_LOCK0g+1>)
    at pthread_rwlock_rdlock.c:27
#2  0x000000010006a140 in ZRIG4ch3std50_0_03sys4unix6rwlock6RWLock0g4read0g (
    arg0=0x1000ef928 <ZRG4ch3std50_0_03sys4unix2os8ENV_LOCK0g>) at output-1.54.0/libstd.rlib.c:199887
#3  0x0000000100086b1c in ZRG4ch3std50_0_03sys4unix2os6getenv0g (arg0=...) at output-1.54.0/libstd.rlib.c:104244
#4  0x0000000100086cc0 in ZRG2ch3std50_0_03env7_var_os0g (arg0=...) at output-1.54.0/libstd.rlib.c:88484
#5  0x000000010000e4ac in ZRG2ch3std50_0_03env6var_os1gBsCy (arg0=...)
    at output-1.54.0/build_libc-0_2_95_H19_run.c:4149
#6  0x000000010000bf20 in ZRG1c019rustc_minor_nightly0g () at output-1.54.0/build_libc-0_2_95_H19_run.c:3029
#7  0x000000010000a7fc in ZRG1c04main0g () at output-1.54.0/build_libc-0_2_95_H19_run.c:2624
#8  0x00000001000157ac in ZRQG2ch3std50_0_02rth7closure12lang_start_01gT03ch4core50_0_03ops8function2Fn1gT04call0g (arg0=0x7fffffffe690, arg1=...) at output-1.54.0/build_libc-0_2_95_H19_run.c:8031
#9  0x0000000100062ee0 in ZRG3ch3std50_0_09panickingh0117do_call2gBsD3ch4core50_0_03ops8function2Fn1gT01Cf22cb46marker4Sync0g2cb05panic13RefUnwindSafe0gCf (arg0=0x7fffffffe320 "\220\346\377\377\377\177")
    at output-1.54.0/libstd.rlib.c:97274
#10 0x000000010006d09c in ZRG2ch3std50_0_09panicking3try2gCfBsD3ch4core50_0_03ops8function2Fn1gT01Cf22cb36marker4Sync0g2cb05panic13RefUnwindSafe0g (arg0=...) at output-1.54.0/libstd.rlib.c:98407
#11 0x0000000100080a10 in ZRG2ch3std50_0_02rt19lang_start_internal0g (arg0=..., arg1=<optimized out>, 
    arg2=<optimized out>) at output-1.54.0/libstd.rlib.c:98907
#12 0x000000010000e5a8 in ZRG2ch3std50_0_02rt10lang_start0g (arg0=0x10000a5d4 <ZRG1c04main0g>, arg1=1, 
    arg2=0x7fffffffeb18) at output-1.54.0/build_libc-0_2_95_H19_run.c:4169
#13 0x0000000100015d10 in main (argc=1, argv=0x7fffffffeb18) at output-1.54.0/build_libc-0_2_95_H19_run.c:8239
@thepowersgang
Copy link
Owner

This is probably PPC specific (as x86-64 works).

I have no idea what could trigger a bus error, the argument appears correct.

@thepowersgang
Copy link
Owner

A bit of googling says that it is probably bad alignment on the lock (SIGBUS can come from misaligned accesses).
Likely the alignment config is wrong, OR something is doing an atomic operation with bad alignment (not captured in the normal type alignment logic).

@luke-jr
Copy link
Contributor Author

luke-jr commented Oct 31, 2021

Forcing -O0 in codegen_c.cpp "fixes" this issue. Re-applying -O0 -fsection-anchors reproduces it. Simply appending -fno-section-anchors by itself reproduces the bug as well (EDIT: this is a typo - I forgot to run this test before posting - will edit again with correct result).

Since this is part of -O1 (and even -Og!) it is expected to never break any well-defined code, which suggests either mrustc is depending on undefined behaviour somewhere, or GCC [11.2.0 in this case] has a bug (IMO much less likely).

Considering that this is inside ZRG2ch3std50_0_02rt19lang_start_internal0g, is it possible the rwlock hasn't been properly initialised yet?

(Alignments appear correct, and I would expect the problem to exist with -O0 if it were an alignment issue)

@bjorn3
Copy link
Contributor

bjorn3 commented Oct 31, 2021

@luke-jr
Copy link
Contributor Author

luke-jr commented Oct 31, 2021

But when is that assignment made?

@bjorn3
Copy link
Contributor

bjorn3 commented Oct 31, 2021

ENV_LOCK is a static, so the result of RwLock::new() ends up in the .data section: https://github.com/rust-lang/rust/blob/1.54.0/library/std/src/sys/unix/os.rs#L490 Note that StaticRwLock is a wrapper around the RwLock I pointed to above: https://github.com/rust-lang/rust/blob/1.54.0/library/std/src/sys_common/rwlock.rs#L9

@luke-jr
Copy link
Contributor Author

luke-jr commented Oct 31, 2021

Observing that pthread_rwlock_t should have a valid alignment for both unsigned int and long int and __GI___pthread_rwlock_rdlock is being called with rwlock=0x1000ef929, which is not aligned for either. The alignments defined in mrustc are:

const TargetArch ARCH_POWERPC64LE = {
    "powerpc64",
    64, false,
    { /*atomic(u8)=*/true, true, true, true,  true },
    TargetArch::Alignments(2, 4, 8, 16, 4, 8, 8)
};

So why isn't it being aligned?

FWIW, the actual results of merely adding -fno-section-anchors to the compile options was that it's now failing on __GI___pthread_mutex_lock instead, suggesting that mutexes are similarly mis-aligned... :/

@bjorn3
Copy link
Contributor

bjorn3 commented Oct 31, 2021

X86 doesn't care about alignment as much as other architectures, so it wouldn't be too surprising to me if this alignment issue exists on all platforms.

@luke-jr
Copy link
Contributor Author

luke-jr commented Oct 31, 2021

Seems like -O0 working was just a coincidence :(

@thepowersgang
Copy link
Owner

thepowersgang commented Oct 31, 2021

Check the libstd.rlib.c file for usages of ZRG4ch3std50_0_03sys4unix2os8ENV_LOCK0g
Looking closer at the above backtrace, it's passed correctly to ZRIG4ch3std50_0_03sys4unix6rwlock6RWLock0g4read0g but not to __GI___pthread_rwlock_rdlock - so look at the generated source for ZRIG4ch3std50_0_03sys4unix6rwlock6RWLock0g4read0g

@luke-jr
Copy link
Contributor Author

luke-jr commented Nov 1, 2021

Not sure how I got the original backtrace - having trouble getting one with the arguments not optimised out at this point (since -O0 makes reproducing it hard). :/

Currently looking at

#0  0x00007ffff7b8ae68 in __GI___pthread_mutex_lock (
    mutex=0x1001d9562 <ZRG5ch3std100_0_0_H3003sys4unix2osh02138ENV_LOCK0g>) at ../nptl/pthread_mutex_lock.c:80
#1  0x00007ffff7ddcc18 in pthread_mutex_lock (mutex=<optimized out>) at forward.c:117
#2  0x0000000100144f2c in ZRIG4ch3std100_0_0_H3003sys4unix5mutex5Mutex0g4lock0g (
    arg0=0x1001d9562 <ZRG5ch3std100_0_0_H3003sys4unix2osh02138ENV_LOCK0g>) at output-1.39.0/libstd.rlib.c:174145
#3  0x0000000100154594 in ZRG4ch3std100_0_0_H3003sys4unix2os6getenv0g (arg0=...)
    at output-1.39.0/libstd.rlib.c:93748
#4  0x0000000100154a38 in ZRG2ch3std100_0_0_H3003env7_var_os0g (arg0=...) at output-1.39.0/libstd.rlib.c:81068
#5  0x0000000100010ff8 in ZRG2ch3std100_0_0_H3003env6var_os1gBsCy (arg0=...)
    at output-1.39.0/build_libc-0_2_62_Hd_run.c:5212
#6  0x000000010001d4e4 in ZRG1c019rustc_minor_version0g () at output-1.39.0/build_libc-0_2_62_Hd_run.c:3257
#7  0x000000010001e2c4 in ZRG1c04main0g () at output-1.39.0/build_libc-0_2_62_Hd_run.c:2951
#8  0x0000000100010288 in ZRQG2ch3std100_0_0_H3002rth7closure12lang_start_01gT03ch4core50_0_03ops8function2Fn1gT04call0g (arg0=0x7fffffffe6b0, arg1=...) at output-1.39.0/build_libc-0_2_62_Hd_run.c:10139
#9  0x00000001000dbe04 in ZRG3ch3std100_0_0_H3009panickingh0127do_call2gG2cb02rth7closure21lang_start_internal_10gCf (arg0=0x7fffffffe450 "\260\346\377\377\377\177") at output-1.39.0/libstd.rlib.c:87492
#10 0x00000001000aea20 in __rust_maybe_catch_panic (
    arg0=0x1000dbd34 <ZRG3ch3std100_0_0_H3009panickingh0127do_call2gG2cb02rth7closure21lang_start_internal_10gCf>, arg1=0x7fffffffe450 "\260\346\377\377\377\177", arg2=0x7fffffffe348, arg3=0x7fffffffe350)
    at output-1.39.0/libpanic_abort.rlib.c:138
#11 0x0000000100148cb0 in ZRG2ch3std100_0_0_H3009panicking3try2gCfG2cb02rth7closure21lang_start_internal_10g (
    arg0=...) at output-1.39.0/libstd.rlib.c:88298
#12 0x000000010015623c in ZRG2ch3std100_0_0_H3002rt19lang_start_internal0g (arg0=..., arg1=1, 
    arg2=0x7fffffffeb38) at output-1.39.0/libstd.rlib.c:88945
#13 0x00000001000110f4 in ZRG2ch3std100_0_0_H3002rt10lang_start0g (arg0=0x10001e1d8 <ZRG1c04main0g>, arg1=1, 
    arg2=0x7fffffffeb38) at output-1.39.0/build_libc-0_2_62_Hd_run.c:5232
#14 0x000000010001f438 in main (argc=1, argv=0x7fffffffeb38) at output-1.39.0/build_libc-0_2_62_Hd_run.c:10463

The very first mutex-related call has the misaligned address:

Breakpoint 1, ZRIG4ch3std100_0_0_H3003sys4unix5mutex5Mutex0g4lock0g (
    arg0=0x1001d9562 <ZRG5ch3std100_0_0_H3003sys4unix2osh02138ENV_LOCK0g>) at output-1.39.0/libstd.rlib.c:174116
174116  {
(gdb) 
Continuing.

Breakpoint 2, 0x00007ffff7ddcbd8 in pthread_mutex_lock (
    mutex=0x1001d9562 <ZRG5ch3std100_0_0_H3003sys4unix2osh02138ENV_LOCK0g>) at forward.c:117
117     forward.c: No such file or directory.
(gdb) 
Continuing.

Breakpoint 2, 0x00007ffff7b8ae00 in __GI___pthread_mutex_lock (
    mutex=0x1001d9562 <ZRG5ch3std100_0_0_H3003sys4unix2osh02138ENV_LOCK0g>) at ../nptl/pthread_mutex_lock.c:64
64      ../nptl/pthread_mutex_lock.c: No such file or directory.
(gdb) 
Continuing.

Program received signal SIGBUS, Bus error.

@luke-jr luke-jr changed the title make -f minicargo.mk LIBS fails: Bus error Threading primitives are not aligned properly Nov 4, 2021
@thepowersgang
Copy link
Owner

Testing locally shows that they are aligned properly (well, an assert shows that they're correct).
This might be a quirk of the PPC target? or just the compiler version?
Can you confirm that the lock type is annotated with the align attribute?

@luke-jr
Copy link
Contributor Author

luke-jr commented Nov 6, 2021

printf("%d\n", (int)alignof(pthread_mutex_t));

tells me 8

I would expect if this were a system problem, the issue would affect a lot more than just mrustc...?

@thepowersgang
Copy link
Owner

thepowersgang commented Nov 6, 2021

I meant the mrustc-emitted type (within libstd.rlib.c). From my local checks, it's correctly aligned to 8 bytes (EDIT: Was checking 1.29, not 1.39 - 1.39 is wrong)

@luke-jr
Copy link
Contributor Author

luke-jr commented Nov 6, 2021

I don't know what I am looking for then

rdrpenguin04 pushed a commit to LightningCreations/mrustc that referenced this issue Nov 7, 2021
@thepowersgang
Copy link
Owner

The above commit included an assertion that the alignment is correct (at least, the C matches the value expected by mrustc).
Look for the definition of s_ZRG4ch4libc90_2_62_Hd4unix10linux_like5linux15pthread_mutex_t0g in libstd.rlib.c

Locally (x86-64 linux), this has an alignment of 1, ... which seems wrong - might be a failure in handling repr(align(N))

@luke-jr
Copy link
Contributor Author

luke-jr commented Nov 7, 2021

// struct ::"libc-0_2_62_Hd"::unix::linux_like::linux::pthread_mutex_t
struct s_ZRG4ch4libc90_2_62_Hd4unix10linux_like5linux15pthread_mutex_t0g  {
        t_ZRTA40Ca _0; // [u8; 40]
} ;

@thepowersgang
Copy link
Owner

Confirmed - repr(align(N)) isn't handled properly (it's just being ignored in HIR lowering).
I'm working on a fix (slowly), but it's breaking parts of the MSVC build.

thepowersgang added a commit that referenced this issue Nov 10, 2021
… pointer if alignment more than 1 pointer) (ref #194)
@thepowersgang
Copy link
Owner

Potential fix in the above commit, waiting for it to regression test on linux

@thepowersgang
Copy link
Owner

Seems good @luke-jr Mind checking with your PPC64 branch?

rdrpenguin04 pushed a commit to LightningCreations/mrustc that referenced this issue Nov 14, 2021
@thepowersgang
Copy link
Owner

@luke-jr Reminder: Can you confirm that the above fix works for you?

@thepowersgang
Copy link
Owner

@luke-jr Still looking for confirmation. Alignment should be properly supported now, but I'd like to confirm before closing the issue.

@luke-jr
Copy link
Contributor Author

luke-jr commented Jan 23, 2022

Neither ffb0961 (w/ patches) nor current master work for me, apparently due to issues unrelated to alignment (but it has been far too long to confirm the alignment-related issue is fixed or not).

@thepowersgang
Copy link
Owner

16d1d29 was the commit that originally addressed this issue.
However, surprising that current master fails.

@luke-jr
Copy link
Contributor Author

luke-jr commented Jan 23, 2022

Yes, I was including patching 16d1d29 into ffb0961 of course. :)

master is generating code that looks like (IIRC) (int128_t)-ll missing the constant number somehow

@luke-jr
Copy link
Contributor Author

luke-jr commented Jan 23, 2022

specifically in output-1.39.0/libcore.rlib.c

@luke-jr
Copy link
Contributor Author

luke-jr commented Jan 23, 2022

aed9b36 is the first bad commit (for the -ll thing)

@luke-jr
Copy link
Contributor Author

luke-jr commented Jan 23, 2022

Trying 6f42b74, output/cargo fails with

thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `0`', rustc-1.39.0-src/vendor/hashbrown/src/raw/mod.rs:1086:59
Process was terminated with signal 6
FAILING COMMAND:  /var/tmp/portage/dev-lang/rust-1.40.0_p20220113/work/rustc-1.40.0-src/mrustc/output/cargo-build/build_miniz-sys-0_1_11_run thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `0`', rustc-1.39.0-src/vendor/hashbrown/src/raw/mod.rs:1086:59
thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `0`', rustc-1.39.0-src/vendor/hashbrown/src/raw/mod.rs:1086:59

Looking at the mentioned assert line, it sounds like it may still be an alignment-related issue :/

@thepowersgang
Copy link
Owner

Can you identify which structure is badly aligned?

@bjorn3
Copy link
Contributor

bjorn3 commented Jan 27, 2022

The panic happens at https://github.com/rust-lang/hashbrown/blob/e7cd4a57a2690f199a527434f635206363ad661f/src/raw/mod.rs#L1086 which asserts that a "control bytes" pointer is aligned to the "group width". If on x86 this may be __m128i not being aligned correctly: https://github.com/rust-lang/hashbrown/blob/e7cd4a57a2690f199a527434f635206363ad661f/src/raw/sse2.rs#L34 It needs to be aligned to at least 128 bits. If on another target it may be u32 not being aligned to 4 bytes or u64 to 8 bytes.

@thepowersgang
Copy link
Owner

the sse2 file shouldn't be used (target_feature = "sse2" shoudn't pass) so that can't be it.

@thepowersgang
Copy link
Owner

@luke-jr Can you confirm that the issue still exists?
If it does, are you able to identify the cause (i.e. what structure is unaligned, or other error).

I have a very fuzzy memory of seeing a similar error when a constant expression wasn't properly converted to a static.

@luke-jr
Copy link
Contributor Author

luke-jr commented Feb 22, 2022

As of f08a7cb trying to build 1.39.0:

...
(17/112) BUILDING cc v1.0.35
> /mnt/hd2019c-nobackup/dev/rust/mrustc/bin/mrustc rustc-1.39.0-src/vendor/cc/src/lib.rs -o output-1.39.0/rustc-build/libcc-1_0_35.rlib --crate-name cc --crate-type rlib -C emit-depfile=output-1.39.0/rustc-build/libcc-1_0_35.rlib.d --crate-tag 1_0_35 -g --cfg debug_assertions -O -L output-1.39.0 -L output-1.39.0/rustc-build
> /mnt/hd2019c-nobackup/dev/rust/mrustc/bin/mrustc rustc-1.39.0-src/src/librustc_llvm/build.rs --crate-name build --crate-type bin -o output-1.39.0/rustc-build/build_rustc_llvm_run -L output-1.39.0/rustc-build -g -L output-1.39.0 --extern build_helper=output-1.39.0/rustc-build/libbuild_helper-0_1_0.rlib --extern cc=output-1.39.0/rustc-build/libcc-1_0_35.rlib --edition 2018
> /mnt/hd2019c-nobackup/dev/rust/mrustc/output-1.39.0/rustc-build/build_rustc_llvm_run
thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `0`', rustc-1.39.0-src/vendor/hashbrown/src/raw/mod.rs:1086:59
Process was terminated with signal 6
FAILING COMMAND:  /mnt/hd2019c-nobackup/dev/rust/mrustc/output-1.39.0/rustc-build/build_rustc_llvm_run
Calling /mnt/hd2019c-nobackup/dev/rust/mrustc/output-1.39.0/rustc-build/build_rustc_llvm_run failed (see /mnt/hd2019c-nobackup/dev/rust/mrustc/output-1.39.0/rustc-build/build_rustc_llvm.txt_failed.txt for stdout)
BUILD FAILED
make: *** [minicargo.mk:223: output-1.39.0/rustc] Error 1

I don't know how I would diagnose it further?

@thepowersgang
Copy link
Owner

A backtrace from that panic would help, and maybe a some reading of the generated code to see if you can see where that ctrl value is coming from (and thus why it's not properly aligned)

@thepowersgang
Copy link
Owner

Also: what platform are you on? I can't seem to reproduce this failure on mint 20.3 (gcc 9.3.0-17ubuntu1~20.04)

@thepowersgang thepowersgang added the stale No response received after potential fixes. label Jun 25, 2022
@mrvdb
Copy link

mrvdb commented Aug 20, 2022

I'm running into this issue as well (powerpc64le using guix to build rust). Is anyone still working on this or is a workaround known? How can I help?

@thepowersgang
Copy link
Owner

@mrvdb
It should have been fixed with the improvements to alignment handling... but if it's still crashing, you can help by identifying the misaligned type - and the correct alignment for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale No response received after potential fixes.
Projects
None yet
Development

No branches or pull requests

4 participants