Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Use version-script instead of symbol visibility to control export-list of shared libraries #6722

Open
jim19930609 opened this issue Nov 24, 2022 · 4 comments
Labels
doc Documentation related issues & PRs

Comments

@jim19930609
Copy link
Contributor

jim19930609 commented Nov 24, 2022

Subtask of #6793

A common problem with shared libraries are symbol collision due to unexpected symbol exports.

For example libtaichi_c_api.so accidentally exports the global variable llvm::AllSubCommands from statically linked libLLVM.a library, which conflicts with symbol of the same name but exported from libLLVM.so:

image

To avoid symbol collisions, we want all the symbols to have "LOCAL" bind type and only export those appeared in the C-API header files. There are two distinct techniques to achieve that, namely symbol visibility and version script.

Symbol Visibility

Brief introduction to symbol visibilty

Symbol visibility specifies how symbol should be resolved by dynamic linker when linking with a shared library (https://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/Code-Gen-Options.html#Code-Gen-Options). To resolve symbol collision, we should at least set visibility=protected which prohibits symbol preemption. To prevent symbols being directly used in the user code, we should then set visibility=hidden.

Drawbacks of symbol visibility

Setting symbol visibility uses a compile-time option instead of link-time option. The compiler tags symbols in each compile unit (.cpp) and then automatically determines their visibility when linking to the shared library.

readelf -s c_api.cpp.o:

Symbol table '.symtab' contains 5 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     ...
     4: 0000000000000010     6 FUNC    GLOBAL HIDDEN     2 _Z26ti_initialize_llvm_ru

readelf -s libc_api.so:

Symbol table '.symtab' contains 48 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
   ....
    41: 0000000000001110     6 FUNC    LOCAL  DEFAULT   11 _Z26ti_initialize_llvm_ru

( You'll notice that compiler will automatically turn a GLOBAL symbol with visibility=hidden to a LOCAL symbol due to the same behavior. )

Guarding the export-list of a shared library through symbol visibility relies on strict control over each individual symbol, which is why people suggest setting all symbols to hidden by default and explicitly mark symbols to export as default (https://gcc.gnu.org/wiki/Visibility). However, this practice still have the following problems:

  1. If we have multiple shared libraries to release (libtaichi_c_api.so, libtaichi_python.so, libtaichi_export_core.so), then they have to compromise to the least strict visibility (default)
  2. Third-party libraries may not have symbol visibility set correctly, and there's nothing you can do about it.
  3. Will always recursively export symbols exported by third-party libraries.

Version script

On the contrary, version-script (https://nxmnpg.lemoda.net/1/ld.lld) is a link-time option and serves as a gatekeeper to guarantee only symbols satisfied certain rules will be exported from a shared library, which is a more exhaustive way to exclude any unexpected symbols from being exported.

@jim19930609 jim19930609 added the doc Documentation related issues & PRs label Nov 24, 2022
@jim19930609 jim19930609 changed the title [RFC] Use version script instead of visibility to control export-list for shared libraries [RFC] Use version-script instead of symbol visibility to control export-list of shared libraries Nov 24, 2022
@k-ye
Copy link
Member

k-ye commented Nov 24, 2022

Thanks for the write up!

While I acknowledge all the problems listed here, I wonder if that's a symptom of Taichi itself not having a good control over header-inclusion rules. That is, even if some third party libs don't care about symbol visibility at all, if we are careful enough such that these third party libs are never used in public headers, we should still be fine?

Guarding the export-list of a shared library through symbol visibility relies on strict control over each individual symbol, which is why people suggest setting all symbols to hidden by default and explicitly mark symbols to export as default (https://gcc.gnu.org/wiki/Visibility).

+1. This is the most standard way to control visibility, to my knowledge.

(We probably discussed this before) Note that I'm not suggesting we should not consider version-script. I wonder if we can get to a point where public headers are super clean, should we be able to go back to symbol visibility?


I also wonder if C++20' module should magically make these problems all disappear..

@jim19930609
Copy link
Contributor Author

jim19930609 commented Nov 24, 2022

While I acknowledge all the problems listed here, I wonder if that's a symptom of Taichi itself not having a good control over header-inclusion rules.
if we are careful enough such that these third party libs are never used in public headers, we should still be fine
I wonder if we can get to a point where public headers are super clean, should we be able to go back to symbol visibility?

IMO, the nature of visibility makes it hard to exhaust all the unnecessary symbols. The killing problem is that visibility is a compile-time options, so you have no control over the codes compiled outside of your repo (third-party libraries). In addition, compile-time option also means that it operates on each compile unit (.cpp file), and the symbols get leaked at linking stage - no matter how clean the header file is.

Following is an example to demonstrate how symbols from third-party library can get leaked to the shared library:

Suppose C-API only release one interface named ti_initialize_program(), the implementation of which uses LLVMProgram::program_init() in taichi_core.a, whereas LLVMProgram::program_init() further uses ThirdParty::init_third_party() and a global variable ThirdParty p_ in a third party library third_party.a:

Both C-API and taichi_core.a are compiled with visibility=hidden, whereas third_party.a is compiled with visibility=default

image

Note that C-API neither directly link, nor uses any symbol from third_party.a.

Example code

c_api.h

#pragma once
void ti_initialize_program();

c_api.cpp

#include "c_api.h"
#include "taichi_core/core.h"

void ti_initialize_program() {
    LLVMProgram* prog = new LLVMProgram();
    prog->program_init();
}

taichi_core/core.h

#pragma once
struct LLVMProgram {
    int y = 10;
    void program_init();
};

taichi_core/core.cpp

#include "taichi_core/core.h"
#include "third_party/third_party.h"

void LLVMProgram::program_init() {
    p_.init_third_party();
    y = p_.x;
}

third_party/third_party.h

struct ThirdParty {
    int x = 100;
    void init_third_party();
};

third_party/third_party.cpp

#include "third_party/third_party.h"

void ThirdParty::init_third_party() {
    x = 200;
}

Results

readelf -s c_api.cpp.o

image

ti_initialize_program() is hidden.

readelf -s taichi_core/core.cpp.o

image

LLVMProgram::program_init() is hidden

readelf -s third_party/third_party.cpp.o

image

ThirdParty::init_third_party() and ThirdParty p_ are default, because they're compiled from third-party and there's nothing you can do about it.

readelf -s libtaichi_c_api.so

Just because we linked with third_party.a, the two default symbols ThirdParty::init_third_party() and ThirdParty p_ will be leaked to libtaichi_c_api.so

image

ExampleCode.zip

@jim19930609
Copy link
Contributor Author

jim19930609 commented Nov 24, 2022

So visibility itself is not able to exhaust unexpected symbols, and we have to either use version-script or link options such as exclude-libs to enforce that. Among those link options, version-script has better cross-platform support whereas exclude-libs or so are only supported on certain platforms.

@k-ye
Copy link
Member

k-ye commented Nov 24, 2022

Thank you so much for the explanation!

jim19930609 added a commit that referenced this issue Dec 1, 2022
Issue: fix #5872, RFC #6722

*Note: This PR will also remove GGUI symbols from libtaichi_c_api.so on
MacOS. Do not have this PR merged until we replaced all the GGUI
renderers in taichi-aot-demos.

**After this PR, libtaichi_c_api.so exports the following symbols:**
Linux:
[c_api_linux_exports.txt](https://github.com/taichi-dev/taichi/files/10102729/c_api_linux_exports.txt)
Windows:
[c_api_windows_exports.txt](https://github.com/taichi-dev/taichi/files/9624666/c_api_windows_exports.txt)
MacOS:
[c_api_mac_exports.txt](https://github.com/taichi-dev/taichi/files/9624830/c_api_mac_exports.txt)
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
Issue: fix taichi-dev#5872, RFC taichi-dev#6722

*Note: This PR will also remove GGUI symbols from libtaichi_c_api.so on
MacOS. Do not have this PR merged until we replaced all the GGUI
renderers in taichi-aot-demos.

**After this PR, libtaichi_c_api.so exports the following symbols:**
Linux:
[c_api_linux_exports.txt](https://github.com/taichi-dev/taichi/files/10102729/c_api_linux_exports.txt)
Windows:
[c_api_windows_exports.txt](https://github.com/taichi-dev/taichi/files/9624666/c_api_windows_exports.txt)
MacOS:
[c_api_mac_exports.txt](https://github.com/taichi-dev/taichi/files/9624830/c_api_mac_exports.txt)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Documentation related issues & PRs
Projects
Status: Todo
Development

No branches or pull requests

2 participants