Skip to content

zamazan4ik/awesome-pgo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

awesome-pgo

Various materials about Profile Guided Optimization (PGO) and other similar stuff like AutoFDO, Bolt, etc.

!!!ARTICLE!!!

There is an (unfinished) article about all the details about PGO, PLO, etc. - link. With high chance, it will answer (almost) all your questions about PGO and PLO.

Theory (a little bit)

Also, you could find PDO (Profile Directed Optimization), FDO (Feedback Driven Optimization), FBO (Feedback Based Optimization), PDF (Profile Directed Feedback), PBO (Profile Based Optimization) - do not worry, that's just a PGO but with a different name.

Additionally, I need to mention Link-Time Optimization (LTO) since usually PGO is applied after LTO (since usually LTO is easier to enable and it brings significant performance and/or binary size improvements). PGO does not replace LTO but complements it. More information about LTO can be found in lto.md.

PGO Showcases

Here I collect links to the articles/benchmarks/etc. with PGO on multiple projects (with numbers!).

Browsers

Compilers and interpreters

Developer tooling

Operating systems

Virtual machines

Databases

Logging

Proxy

Other

Projects with already integrated PGO into their build scripts

Below you can find some examples of where and how PGO is integrated into different projects.

Project-specific documentation about PGO

Here we collect projects where PGO is described as an optimization option in the documentation:

PGO support in programming languages and compilers

Possibly other compilers support PGO too. If you know any, please let me know.

PGO support in build systems

Here we collect and track PGO integrations into build systems:

Sampling PGO (AutoFDO) support

Here we collect information about supporting PGO via sampling across different compilers.

  • C and C++:
    • GCC: supports
    • Clang: supports
  • Rust:

Are we PGO yet?

Check "are_we_pgo_yet.md" file in the repo to check the PGO status in a project.

BOLT showcases

Here I collect all results by applying LLVM BOLT to the projects (with numbers).

Projects with already integrated BOLT into their build scripts

Are we BOLT yet?

Just a list of BOLT-related issues in different projects. So you can estimate the BOLT state in your favorite open-source product.

LTO, PGO, BOLT, etc and provided by someone binaries

Well, it's hard to say, is your binary already LTO/PGO optimized or not. It depends on multiple factors like upstream support for LTO/PGO, maintainers willing to enable these optimizations, etc. Usually, the most obvious way to check it - just ask the question "Is the binary LTO/PGO optimized?" from the binary author (a person who built the binary). It could be your colleague (if you build programs on your own), build scripts from CI, maintainers from your favorite OS/repository (if you use provided by repos binaries), software developers (if you use downloaded from a site "official" binaries). Do not hesitate to ask!

PGO adoption across projects

PGO usually is not enabled by the upstream developers due to a lack of support for sample load or a lack of resources for the multi-stage build. So please ask maintainers explicitly about PGO support addition.

PGO adoption across Linux distros

Even if PGO is supported by a project, it does not mean that your favorite Linux distro builds this project with PGO enabled. For this there are a lot of reasons: maintainer burden (because we are humans (yet)), build machines burden (in general you need to compile twice), reproducibility issues (like profile is an additional input to the build process and you need to make it reproducible), a maintainer just don't know about PGO, etc.

So here I will try to collect information about the PGO status across the Linux distros for the projects that support PGO in the upstream. If you didn't find your distro - don't worry! Just check it somehow (probably in some chats/distros' build systems, etc.) and report it here (e.g. via Issues) - I will add it to the list.

  • GCC:
    • Note: PGO for GCC usually is not enabled for all architectures since it requires too much from the build systems
    • Debian: yes
    • Ubuntu: same as Debian
    • RedHat: Yes. And that is the reason why PGO is enabled for GCC in all RedHat-based distros.
    • Fedora: yes
    • Rocky Linux: yes
    • Alma Linux: yes
    • NixOS: no
    • OpenSUSE: yes, see line 2414
  • Clang:
    • Binaries from LLVM are already PGO-optimized (according to the note about using "stage2" build - it's PGO optimized build)
    • RedHat (CentOS Stream): no
    • Fedora: no
    • AlmaLinux: no
    • Rocky Linux: no
    • NixOS: no
    • Arch Linux: sent an email to the Clang maintainer in Arch Linux - no response yet
  • Rustc:
  • CPython:
    • Fedora: yes. Also, check this discussion. I guess other RedHat-based distro builds are the same for this package (however I didn't check it but Rocky Linux is the same).

BOLT adoption across Linux distros

Here we track LLVM BOLT enablement across various projects in various OS-specific build scripts:

  • Clang:
  • GCC: TODO
  • Rustc:
    • Fedora: no
    • RedHat: no
  • CPython: TODO
  • Pyston: TODO

Meta-issues about PGO and LLVM BOLT usage in different OSs and package managers:

Other optimization techniques like BOLT

BOLT and others certainly are not enabled by default anywhere right now. So if you see a performance improvement from it - contact the upstream.

Beyond PGO (could be covered here later as well)

Traps

The biggest problem is "How to collect a good profile?". There are multiple ways to do this:

  • Prepare a reference workload. It could be quite difficult to create and maintain (since during the time it could become more and more different from your actual workload). However, for some loads like compilers load is usually predictable (compiling programs) so this way is good enough in this case. For other cases like databases the workload could hugely depend on the actual input from your users and users can change their queries for some reason. So be careful.
  • Collect profile from your actual production. It could be difficult to do with a usual PGO since it requires an instrumentation, and instrumentation binaries could work too slowly. If it's your case - you could try to use AutoFDO since it has a low overhead due to the underlying perf nature. But it also has its own limitations (usually Linux-only, less efficient than usual PGO, could be more buggy). E.g. Google uses AutoFDO for profiling all their services and has a lot of automation around sampling profiles at their scale, storing them, integration into CI pipelines, etc. But all this tooling is closed-source so you need to implement it from the scratch.

In my opinion, usually you should start with simple PGO via Instrumentation mode, especially if you upgrade your binaries seldomly. And only if Instrumentation starts to hurt you - start thinking about AutoFDO.

Another issue could be reproducibility. Since you are injecting some information from runtime (some execution counters based on your sample workload) you get more variables that could influence your binary. In this case, you need to store somewhere in VCS your sample workload, probably collected profiles based on this workload, etc.

Other pitfalls include the following things:

  • PGO
    • Requires multiple builds (at least two stages, in Context-Sensitive LLVM PGO (CSPGO) - three stages)
    • Instrumented binaries work too slowly, so rarely could be used in production -> you need to prepare a "sample" workload
    • For services sometimes PGO reports are not flushed to the disk properly, so you need to do it manually like here
    • Reproducibility issues - could be important for some use cases even more than performance
    • Bugs. E.g. LLVM issues when PGO is combined with LTO - GitHub issue
  • AutoFDO
    • Huge memory consumption during profile conversion: GitHub issue
    • Supports only perf, so cannot be used with other profilers from different like Windows/macOS (support for other profilers could be implemented manually)
    • "Support" from Google is at least questionable: no regular releases, compilation issues
  • Bolt
    • Huge memory usage during build: GitHub issue
    • For better results, you need hardware/software with LBR/BRS support
    • There are a lot of bugs - be careful
  • Propeller:
    • Too Google-oriented - could be hard to use outside of Google
    • Relies on the latest compiler developments, also unstable

Useful links

Communities

Here is the incomplete community list where you can find PGO-related advice with higher probability:

  • Gentoo (chats, forums)
  • ClearLinux (chats, forums)

Related projects

Where PGO did not help (according to my tests)

  • Catboost - I think this is due to the highly math-oriented nature of this. I did a test on fit and calc modes (training and evaluation, respectively) on epsilon dataset. In the calc mode PGO for some reason made things even worse. Maybe, PGO could help in other modes but I didn't test it (yet).

Contribute

If you have an example where PGO shines (and where doesn't) - please open an issue and/or PR to the repo. It's important to collect as many as possible showcases about PGO!

About

Various materials about Profile Guided Optimization and other similar stuff like AutoFDO, Bolt, etc.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published