Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate Profile-Guided Optimization (PGO) and LLVM BOLT #51

Open
zamazan4ik opened this issue Oct 19, 2023 · 2 comments
Open

Evaluate Profile-Guided Optimization (PGO) and LLVM BOLT #51

zamazan4ik opened this issue Oct 19, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@zamazan4ik
Copy link

Hi!

Recently I checked Profile-Guided Optimization (PGO) improvements on multiple projects. The results are here. E.g. PGO helps with optimizing Envoyproxy, HAProxy (link), httpd (link), Nginx (link, but the more mature tests should be performed for it). According to the multiple tests, PGO can help with improving performance in many other cases. That's why I think trying to optimize the CPU parts of Angie with PGO can be a good idea.

I can suggest the following action points:

  • Perform PGO benchmarks on Angie. If it shows improvements - add a note about possible improvements in Angie performance with PGO.
  • Providing an easier way (e.g. a build option) to build scripts with PGO can be helpful for the end-users and maintainers since they will be able to optimize Angie according to their own workloads.
  • Optimize pre-built binaries

Maybe testing Post-Link Optimization techniques (like LLVM BOLT) would be interesting too (Clang and Rustc already use BOLT as an addition to PGO) but I recommend starting from the usual PGO.

Here are some examples of how PGO optimization is integrated in other projects:

@VBart VBart added the enhancement New feature or request label Oct 20, 2023
@VBart
Copy link
Contributor

VBart commented Oct 20, 2023

Hi,

First of all, thanks for sharing these results.

Angie shares the same architecture and workload cases with nginx. I did before various PGO benchmarks with nginx and found out, that possible performance benefits just didn't worth the effort. In real case scenarios with real configurations, PGO gave less than 1% difference on peak loads (most of the time the performance gain was even statistically indistinguishable). The same is true for attempts to tune compilation flags (like using -march=native and -O3 or even -Ofast), so we use system default flags for our builds.

The reason you see some performance gain is because of testing methodology: you use very synthetic micro-benchmark and profile the build directly for this scenario. But this is very far away from any real use cases for most of our users.

Angie isn't a good choice for such optimization because it's not CPU-bound. In real case scenarios most of the time worker processes are spent waiting on syscalls in kernel. The only CPU intensive tasks (like compression, image processing, cryptography) are done by other external libraries. The code you try to optimize actually adds just a few percent to a typical request processing time. So, making few percent of the processing faster by a few percent gives less than one percent benefits overall. You can even gain more by tuning some configuration directives.

While some architectural changes or syscall optimizations can give multiple performance gains. Here's an example of optimizations I did in the past: https://www.nginx.com/blog/thread-pools-boost-performance-9x/

Sure, tuning of the compilation process is relatively "low hanging fruit" here, but the real gains from it are missing in practice.

@Seirdy
Copy link

Seirdy commented Nov 9, 2023

A build manifest of some sort (CI manifest, Docker container, etc) that builds Nginx and all it dependencies from source (perhaps with static linking) would be a useful place to implement PGO.

IME with BoringSSL at least, cryptography doesn't actually benefit tremendously from PGO. Perhaps it's because the cryptographic primitives are mostly written in generated ASM these days instead of optimizeable C?

I'd be interested in if/how PCRE2, libcrypt, and zlib (zlib-ng?) benefit though!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants