New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ld64 + thin-lto wrongfully merges static data with non-static data #58

Closed
glandium opened this Issue Jul 29, 2018 · 5 comments

Comments

Projects
None yet
2 participants
@glandium
Contributor

glandium commented Jul 29, 2018

Consider the following testcase:

$ cat <<EOF > a.c
static const char foo[] = "foo";

char get_foo(int i) {
    return foo[i];
}
EOF
$ cat <<EOF > b.c
const char foo[] = "bar";

char get_foo2(int i) {
  return foo[i];
}
EOF
$ /builds/worker/workspace/build/src/clang/bin/clang -target x86_64-apple-darwin11 -B /builds/worker/workspace/build/src/cctools/bin -isysroot /builds/worker/workspace/build/src/MacOSX10.11.sdk -flto=thin -o libfoo.dylib -shared -fPIC [ab].c
$ /builds/worker/workspace/build/src/clang/bin/llvm-objdump -s libfoo.dylib

libfoo.dylib:   file format Mach-O 64-bit x86-64

Contents of section __text:
 0f70 554889e5 488d052d 00000089 7dfc4863  UH..H..-....}.Hc
 0f80 4dfc0fbe 04085dc3 90909090 90909090  M.....].........
 0f90 554889e5 488d050d 00000089 7dfc4863  UH..H.......}.Hc
 0fa0 4dfc0fbe 04085dc3                    M.....].
Contents of section __const:
 0fa8 62617200                             bar.
Contents of section __unwind_info:
 0fac 01000000 1c000000 00000000 1c000000  ................
 0fbc 00000000 1c000000 02000000 700f0000  ............p...
 0fcc 34000000 34000000 a90f0000 00000000  4...4...........
 0fdc 34000000 03000000 0c000100 10000100  4...............
 0fec 00000000 00000001                    ........

See how there is no data section containing the "foo" string, because it was "merged" with the "bar" string.

Interestingly, this does not happen with full lto:

$ /builds/worker/workspace/build/src/clang/bin/clang -target x86_64-apple-darwin11 -B /builds/worker/workspace/build/src/cctools/bin -isysroot /builds/worker/workspace/build/src/MacOSX10.11.sdk -flto -o libfoo.dylib -shared -fPIC [ab].c
$ /builds/worker/workspace/build/src/clang/bin/llvm-objdump -s libfoo.dylib

libfoo.dylib:   file format Mach-O 64-bit x86-64

Contents of section __text:
 0f70 554889e5 488d0531 00000089 7dfc4863  UH..H..1....}.Hc
 0f80 4dfc0fbe 04085dc3 0f1f8400 00000000  M.....].........
 0f90 554889e5 488d050d 00000089 7dfc4863  UH..H.......}.Hc
 0fa0 4dfc0fbe 04085dc3                    M.....].
Contents of section __const:
 0fa8 62617200                             bar.
Contents of section __cstring:
 0fac 666f6f00                             foo.
Contents of section __unwind_info:
 0fb0 01000000 1c000000 00000000 1c000000  ................
 0fc0 00000000 1c000000 02000000 700f0000  ............p...
 0fd0 34000000 34000000 a90f0000 00000000  4...4...........
 0fe0 34000000 03000000 0c000100 10000100  4...............
 0ff0 00000000 00000001                    ........
@glandium

This comment has been minimized.

Show comment
Hide comment
@glandium

glandium Jul 29, 2018

Contributor

Enabling the logging code in lto_file.cpp says:

bitcode file: /tmp/a-88db1a.o
	0x000019A0 _get_foo
bitcode file: /tmp/b-cbe283.o
	0x000019A0 _get_foo2
	0x00001980 _foo
AtomSyncer, mach-o atom 0x55643d5d3900 synced to lto atom 0x7f6bbc01f190 (name=_get_foo)
  adjusting fixups in atom: _get_foo
    updating direct reference to 0x55643d5d3950 to be ref to 0x7f6bb400e9f8: _foo
AtomSyncer, mach-o atom 0x55643d5d3950 synced to lto atom 0x7f6bb400e9f8 (name=_foo)
  adjusting fixups in atom: _foo
AtomSyncer, mach-o atom 0x55643d5d39a0 is totally new (name=CIE)
  adjusting fixups in atom: CIE
AtomSyncer, mach-o atom 0x55643d5c4800 synced to lto atom 0x7f6bb400e9c0 (name=_get_foo2)
  adjusting fixups in atom: _get_foo2
    adding by-name symbol _foo
AtomSyncer, mach-o atom 0x55643d5c4850 synced to lto atom 0x7f6bb400e9f8 (name=_foo)
  adjusting fixups in atom: _foo
AtomSyncer, mach-o atom 0x55643d5c48a0 is totally new (name=CIE)
  adjusting fixups in atom: CIE

With full lto:

bitcode file: /tmp/a-9cfb57.o
	0x000019A0 _get_foo
bitcode file: /tmp/b-aeee6f.o
	0x000019A0 _get_foo2
	0x00001980 _foo
AtomSyncer, mach-o atom 0x5648887a01c0 synced to lto atom 0x7fef1401f190 (name=_get_foo)
  adjusting fixups in atom: _get_foo
AtomSyncer, mach-o atom 0x5648887a0210 synced to lto atom 0x7fef0c00e9c0 (name=_get_foo2)
  adjusting fixups in atom: _get_foo2
    adding by-name symbol _foo
AtomSyncer, mach-o atom 0x5648887a0260 is totally new (name=_foo.2)
  adjusting fixups in atom: _foo.2
AtomSyncer, mach-o atom 0x5648887a02b0 synced to lto atom 0x7fef0c00e9f8 (name=_foo)
  adjusting fixups in atom: _foo
AtomSyncer, mach-o atom 0x5648887a0300 is totally new (name=CIE)
  adjusting fixups in atom: CIE
Contributor

glandium commented Jul 29, 2018

Enabling the logging code in lto_file.cpp says:

bitcode file: /tmp/a-88db1a.o
	0x000019A0 _get_foo
bitcode file: /tmp/b-cbe283.o
	0x000019A0 _get_foo2
	0x00001980 _foo
AtomSyncer, mach-o atom 0x55643d5d3900 synced to lto atom 0x7f6bbc01f190 (name=_get_foo)
  adjusting fixups in atom: _get_foo
    updating direct reference to 0x55643d5d3950 to be ref to 0x7f6bb400e9f8: _foo
AtomSyncer, mach-o atom 0x55643d5d3950 synced to lto atom 0x7f6bb400e9f8 (name=_foo)
  adjusting fixups in atom: _foo
AtomSyncer, mach-o atom 0x55643d5d39a0 is totally new (name=CIE)
  adjusting fixups in atom: CIE
AtomSyncer, mach-o atom 0x55643d5c4800 synced to lto atom 0x7f6bb400e9c0 (name=_get_foo2)
  adjusting fixups in atom: _get_foo2
    adding by-name symbol _foo
AtomSyncer, mach-o atom 0x55643d5c4850 synced to lto atom 0x7f6bb400e9f8 (name=_foo)
  adjusting fixups in atom: _foo
AtomSyncer, mach-o atom 0x55643d5c48a0 is totally new (name=CIE)
  adjusting fixups in atom: CIE

With full lto:

bitcode file: /tmp/a-9cfb57.o
	0x000019A0 _get_foo
bitcode file: /tmp/b-aeee6f.o
	0x000019A0 _get_foo2
	0x00001980 _foo
AtomSyncer, mach-o atom 0x5648887a01c0 synced to lto atom 0x7fef1401f190 (name=_get_foo)
  adjusting fixups in atom: _get_foo
AtomSyncer, mach-o atom 0x5648887a0210 synced to lto atom 0x7fef0c00e9c0 (name=_get_foo2)
  adjusting fixups in atom: _get_foo2
    adding by-name symbol _foo
AtomSyncer, mach-o atom 0x5648887a0260 is totally new (name=_foo.2)
  adjusting fixups in atom: _foo.2
AtomSyncer, mach-o atom 0x5648887a02b0 synced to lto atom 0x7fef0c00e9f8 (name=_foo)
  adjusting fixups in atom: _foo
AtomSyncer, mach-o atom 0x5648887a0300 is totally new (name=CIE)
  adjusting fixups in atom: CIE
@glandium

This comment has been minimized.

Show comment
Hide comment
@glandium

glandium Jul 29, 2018

Contributor

And making foo static in b.c:

bitcode file: /tmp/a-70bbe5.o
	0x000019A0 _get_foo
bitcode file: /tmp/b-4778ce.o
	0x000019A0 _get_foo2
AtomSyncer, mach-o atom 0x557507baceb0 synced to lto atom 0x7f3ca401f190 (name=_get_foo)
  adjusting fixups in atom: _get_foo
AtomSyncer, mach-o atom 0x557507bacf00 is totally new (name=_foo)
  adjusting fixups in atom: _foo
AtomSyncer, mach-o atom 0x557507bacf50 is totally new (name=CIE)
  adjusting fixups in atom: CIE
AtomSyncer, mach-o atom 0x557507b9f580 synced to lto atom 0x7f3c9c00e9c0 (name=_get_foo2)
  adjusting fixups in atom: _get_foo2
AtomSyncer, mach-o atom 0x557507b9f5d0 is totally new (name=_foo)
  adjusting fixups in atom: _foo
AtomSyncer, mach-o atom 0x557507b9f620 is totally new (name=CIE)
  adjusting fixups in atom: CIE
Contributor

glandium commented Jul 29, 2018

And making foo static in b.c:

bitcode file: /tmp/a-70bbe5.o
	0x000019A0 _get_foo
bitcode file: /tmp/b-4778ce.o
	0x000019A0 _get_foo2
AtomSyncer, mach-o atom 0x557507baceb0 synced to lto atom 0x7f3ca401f190 (name=_get_foo)
  adjusting fixups in atom: _get_foo
AtomSyncer, mach-o atom 0x557507bacf00 is totally new (name=_foo)
  adjusting fixups in atom: _foo
AtomSyncer, mach-o atom 0x557507bacf50 is totally new (name=CIE)
  adjusting fixups in atom: CIE
AtomSyncer, mach-o atom 0x557507b9f580 synced to lto atom 0x7f3c9c00e9c0 (name=_get_foo2)
  adjusting fixups in atom: _get_foo2
AtomSyncer, mach-o atom 0x557507b9f5d0 is totally new (name=_foo)
  adjusting fixups in atom: _foo
AtomSyncer, mach-o atom 0x557507b9f620 is totally new (name=CIE)
  adjusting fixups in atom: CIE
@glandium

This comment has been minimized.

Show comment
Hide comment
@glandium

glandium Jul 29, 2018

Contributor

FYI, I'm testing a possible fix.

Contributor

glandium commented Jul 29, 2018

FYI, I'm testing a possible fix.

@glandium

This comment has been minimized.

Show comment
Hide comment
@glandium

glandium Jul 31, 2018

Contributor

FYI, I'm testing a possible fix.

This didn't work out. Some interesting fact: it's reproducible with Xcode 8, but not Xcode 9 (ld64-351.8, unfortunately not on https://opensource.apple.com/source/ld64/). I tried contacting ld64's listed author.

Contributor

glandium commented Jul 31, 2018

FYI, I'm testing a possible fix.

This didn't work out. Some interesting fact: it's reproducible with Xcode 8, but not Xcode 9 (ld64-351.8, unfortunately not on https://opensource.apple.com/source/ld64/). I tried contacting ld64's listed author.

@tpoechtrager

This comment has been minimized.

Show comment
Hide comment
@tpoechtrager

tpoechtrager Jul 31, 2018

Owner

I tried contacting ld64's listed author.

Thank you. I can't do anything in this case. :(

I am not sure if Apple will continue releasing the source code of LD64. It's been a long time since the last release (> 1.5 years, I think).

Owner

tpoechtrager commented Jul 31, 2018

I tried contacting ld64's listed author.

Thank you. I can't do anything in this case. :(

I am not sure if Apple will continue releasing the source code of LD64. It's been a long time since the last release (> 1.5 years, I think).

glandium added a commit to glandium/cctools-port that referenced this issue Aug 22, 2018

During ThinLTO, avoid merging static data with non-static data with t…
…he same name

When doing ThinLTO, the linker updates LTO proxy atoms to point to the
right linker atoms. When there are atoms with different scope with the
same name (in the case where e.g. one translation unit has a static
instance of data and another has a global instance of data with the same
name), we may end up making all translation units using a symbol with
that name link against the one global symbol with that name.
Presumably, the same problem can happen with static/global functions
with the same name too.

When the data/code under those names happen to be identical, that's
fine, but when they're not, bad things happen.

So when dealing with LTO proxy atoms for the translation unit scope,
don't look up in the list of atoms from the linker (for global symbols).

Fixes #58

@tpoechtrager tpoechtrager closed this in #59 Aug 22, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment