-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Selectively enable opt-level 1 #8141
Conversation
This PR compiles all non-workspace dependencies, as well as `turbo-tasks-memory` (which is particularly sensitive) with basic optimizations. Most crates in the workspace still use opt-level 0 locally. While not as good as applying opt-level 1 everywhere, this significantly reduces execution times versus opt-level 0, while making cold builds about 50-60% slower. Warm build times are largely unaffected. The debugging (gdb/lldb) experience may also be slightly worsened by the optimizations.
The latest updates on your projects. Learn more about Vercel for Git ↗︎
8 Ignored Deployments
|
This stack of pull requests is managed by Graphite. Learn more about stacking. |
🟢 Turbopack Benchmark CI successful 🟢Thanks |
|
@@ -77,8 +77,13 @@ turbopack-wasi = [ | |||
[workspace.lints.clippy] | |||
too_many_arguments = "allow" | |||
|
|||
[profile.dev.package.turbo-tasks-macros] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This config was redundant with the
[profile.dev.build-override]
below. I measured no meaningful change to build times with/without it.
|
||
# Set the options for dependencies (not crates in the workspace), this mostly impacts cold builds | ||
[profile.dev.package."*"] | ||
opt-level = 1 | ||
|
||
# Set the settings for build scripts and proc-macros. | ||
[profile.dev.build-override] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow-up: One problem is that due to precedence ordering issues in cargo, dependencies like syn
will now be compiled with opt-level = 1
instead of opt-level = 3
. I need to perform further benchmarking to determine if this is significant and worth working around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this will mostly affect macro eval? turbo tasks has a few thousand alone but I would hope that the aggregate wins from not optimizing deps would be greater than the perf loss.
also TY for adding comments. our build configs are under documented
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am on board with this pending verification of syn costs. I don't expect it to be an issue but can't hurt
I modified the Additions to `Cargo.toml`[profile.dev.package.syn]
opt-level = 3
[profile.dev.package.quote]
opt-level = 3
[profile.dev.package.proc-macro2]
opt-level = 3
[profile.dev.package.serde_derive]
opt-level = 3
[profile.dev.package.serde_derive_internals]
opt-level = 3
[profile.dev.package.wasm-bindgen-backend]
opt-level = 3
[profile.dev.package.bindgen]
opt-level = 3
[profile.dev.package.tokio-macros]
opt-level = 3
[profile.dev.package.rkyv_derive]
opt-level = 3 I repeated the "Warm time to build single binary" benchmark, which uses many of these proc macros, but doesn't include the time spent optimizing them at
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me. Our sccache setup should hopefully mitigate cold builds being slower
This PR compiles all non-workspace dependencies, as well as
turbo-tasks-memory
(which is particularly sensitive) with basic optimizations. Most crates in the workspace still use opt-level 0 locally.While not as good as applying opt-level 1 everywhere, this significantly reduces execution times versus opt-level 0, while making cold builds about 50-60% slower. Warm build times are largely unaffected.
The debugging (gdb/lldb) experience may also be slightly worsened by the optimizations.
What about
cargo check
/cargo clippy
/rust-analyzer
? No expected change, as (outside of proc macros) these don't perform LLVM code generation.Why selectively, and not everywhere? While applying this everwhere can give us about 3x faster execution, this still gives us most of the runtime performance benefits, while avoiding most of the compilation cost (especially for warm builds). I believe we should still optimize more for build times than execution times. I benchmarked applying opt-level 1 to all crates here: https://docs.google.com/document/d/1iaREbzYpDmBt54fT2egzptTfx0OYsTIJ633gRqddzDY/edit?usp=sharing
Why not just a few hot dependencies? I tried profiling the debug build and only optimizing the hot crates, but I wasn't able to get meaningful improvements in my testing.
Benchmarking Notes
mold
, as GNUld
is incredibly slow (and often causes OOMs with 16GB of RAM). We're already using mold in the private nextpack meta-repository. I'll follow up with another PR to use mold or lld by default.Build Time Benchmarks
There's a significant regression to cold builds, but there's no meaningful regression for warm builds.
Cold time to build tests (2 runs):
Before:
After:
Warm time to build tests (2 runs):
Modify a string in an error message inside of
crates/turbopack-ecmascript/src/minify.rs
. This guarantees forced recompilation of all dependent crates without meaninfully changing any behavior. Then run:Before:
After:
Warm time to build single binary (2 runs):
This is less dependent on linking than the tests, which generate many binary targets.
Modify a string in an error message inside of
crates/turbopack-ecmascript/src/minify.rs
. This guarantees forced recompilation of all dependent crates without meaninfully changing any behavior. Then run:Before:
After:
Cold time to build a single turborepo binary:
Before:
After:
Execution Time Benchmarks
turbopack-cli's
bench_startup
Before:
After:
Test Execution (excluding build, 2 runs)
With a completely warm build cache (such that nothing needs to build), run:
Before:
After: