Skip to content

Optimizations

unhappy-ending edited this page Aug 6, 2023 · 25 revisions

CFLAGS

CFLAGS control the behavior of the compiler during compilation of C code. Everything set in CFLAGS will be carried over to CXXFLAGS.

-fcf-protection=none

Disable code flow integrity protection. CFI is a set of features which are designed to abort the program upon detecting certain forms of undefined behavior that can potentially allow attackers to subvert the program’s control flow. This causes a slight performance overhead and increases code size.

-fdata-sections

Create sections for data and allow the linker to mark sections as not needed. Code marked as unnecessary data can be removed with -Wl,--gc-sections which can drastically reduce code size.

-fdirect-access-external-data

WARNING: Clang specfic flag! Don't use it with GCC!

Use direct access relocations instead of GOT to reference external data symbols. This is similar behavior to static code. Gentoo compiles shared PIC/PIE code by default. This flag helps minimize GOT/PLT context switching as much as possible.

-ffast-math

Break C standards in favor of ricing. It allows the compiler to use math shortcuts at the expense of accuracy, allowing code to execute less calculations resulting in faster execution. Disable it for packages that require accurate math, such as many of the packages in the sci-libs category.

Yes, -Ofast implies -O3 -ffast-math, but you get more flexibility by using -ffast-math since you can define it with any optimization level, such as -O2.

-fforce-emit-vtables

WARNING: Clang specfic flag! Don't use it with GCC!

Emit more virtual tables to improve devirtualization.

-ffp-contract=fast

Form fused floating-point operations. Clang's default is set to on for most code, fast for CUDA code, and fast-honor-pragmas for HIP code. This changes Clang's -ffp-contract behavior to fast which is the default behavior in GCC. This option is turned on with -ffast-math, but having it always enabled will allow it to be used when -ffast-math causes failures.

-ffunction-sections

Creates sections for functions and allow the linker to mark sections as not needed. Code marked as unnecessary functions can be removed with -Wl,--gc-sections which can drastically reduce code size.

-flto

Optimize code at link-time rather than compile time. The benefit is the compiler can see everything at once and then make the best optimizations for the whole program. Theoretically, this builds smaller, more optimized programs but that is not always the case. Some programs end up slower or incorrectly compiled with LTO.

There are two methods for LTO: full and thin. Thin has better memory efficiency and allows the LTO phase to run in parallel for faster compilation. The downside is you sacrifice program visibility and flags like -fno-semantic-interposition are more prone to failure with -flto=thin. Full mode is slower since it isn't parallel. It also requires more memory but has better visibility since code isn't being cut into pieces for parallel threading and merged back together. For that reason it works better with -fno-semantic-interposition and you can only use -fvirtual-function-elimination with full mode. If you can manage the extra memory required and don't mind a little slower compilation, choose full for the extra visibility.

In order to take full advantage of devirtualization, it's recommended to use link-time optimization.

-fno-common

Variables without initializers won't have common linkage. Common linkage implies a speed and size penalty, and is currently deprecated. It's harmless to use so it's defined here just in case.

-fno-plt

Use GOT indirection instead of PLT to make external function calls. This leads to more efficient code by eliminating PLT stubs and exposing GOT loads to optimizations.

-fno-sanitize=all

Sanitizers cause unwanted memory and CPU overhead. It's not possible to turn all sanitizers on at once, but they can be disabled all at once with this mighty flag.

-fno-semantic-interposition

For shared code ELF allows interposing of symbols by the dynamic linker. This means for symbols exported from the DSO, the compiler cannot perform inter-procedural propagation, in-lining and other optimizations. This returns some of the performance stolen by PIC/PIE.

-fno-stack-protector

Stack smashing protection helps the compiler detect stack buffer overflows. The extra checks cause extra overhead so off with their heads.

-fvirtual-function-elimination

WARNING: Clang specfic flag! Don't use it with GCC!

Remove dead virtual functions from vtables so that CGProfile metadata gets cleaned up correctly. It can only be used with full LTO because it needs to see every call to llvm.type.checked.load in the linkage unit, which ThinLTO doesn't support currently.

This requires -fwhole-program-vtables to function, which also requires -flto.

-fwhole-program-vtables

WARNING: Clang specfic flag! Don't use it with GCC!

Enable whole-program vtable optimizations for classes with hidden LTO visibility.

This flag requires -flto.

CPPFLAGS

-U_FORTIFY_SOURCE

FORTIFY_SOURCE provides light weight compile and runtime protection to some memory and string functions. It's supposed to have little to no runtime overhead and can be enabled for all applications and libraries in an OS. -D_FORTIFY_SOURCE is the default option. If the extra overhead is undesirable use this flag at the cost of some security.

CXXFLAGS

-fstrict-vtable-pointers

WARNING: Clang specfic flag! Don't use it with GCC!

Optimizes based on the strict rules for overwriting polymorphic C++ and other object oriented languages.

-fvisibility-inlines-hidden

Make inlines hidden during compile time. When paired with -flto hidden inlines become visible during link-time for better optimization.

LDFLAGS

-Wl,-Bsymbolic-functions

Bind default visibility defined symbols (or functions) locally for shared code. Use -Wl,-Bsymbolic-non-weak-functions when this causes issues.

-Wl,-O2

WARNING: LLD specfic flag! Don't use it with BFD!

Use zlib to compress the final code output. There are 2 useful levels: level 1 and level 2. Level 0 obviously disables size optimization. Level 1 is fastest compression and level 2 is high compression equal to zlib level 6.

-Wl,-z,now

Changes the default linker behavior from lazy to eager binding. This makes the code resolve all symbols at load.

-Wl,-z,relro

Force relocation read-only. Define it here in case some builds try to override it.

-Wl,--as-needed

Sets DT_NEEDED for shared libraries. If libraries aren't needed during link-time, the linker skips them saving code size and unnecessary executions.

-Wl,--gc-sections

Collect garbage during link-time, removing unused symbols that can bloat the code. This helps keep code size smaller and more memory efficient.

-Wl,--icf=all

WARNING: LLD specfic flag! Don't use it with BFD!

Fold identical code during link-time. This helps keep code size smaller and memory efficient. There are three levels: none, safe, and all. If all causes failures, try safe, and then try disabling the flag.

-Wl,--lto-O3

WARNING: LLD specfic flag! Don't use it with BFD!

Sets the linker optimization pipeline level during link-time. There are 4 levels: level 0, level 1, level 2, and level 3. Level 3 is the maximum level, you can't rice beyond it. This option adds more passes and makes some passes more aggressive.