Selectively enable opt-level 1 #8141

bgw · 2024-05-14T22:29:00Z

This PR compiles all non-workspace dependencies, as well as turbo-tasks-memory (which is particularly sensitive) with basic optimizations. Most crates in the workspace still use opt-level 0 locally.

While not as good as applying opt-level 1 everywhere, this significantly reduces execution times versus opt-level 0, while making cold builds about 50-60% slower. Warm build times are largely unaffected.

The debugging (gdb/lldb) experience may also be slightly worsened by the optimizations.

What about cargo check/cargo clippy/rust-analyzer? No expected change, as (outside of proc macros) these don't perform LLVM code generation.

Why selectively, and not everywhere? While applying this everwhere can give us about 3x faster execution, this still gives us most of the runtime performance benefits, while avoiding most of the compilation cost (especially for warm builds). I believe we should still optimize more for build times than execution times. I benchmarked applying opt-level 1 to all crates here: https://docs.google.com/document/d/1iaREbzYpDmBt54fT2egzptTfx0OYsTIJ633gRqddzDY/edit?usp=sharing

Why not just a few hot dependencies? I tried profiling the debug build and only optimizing the hot crates, but I wasn't able to get meaningful improvements in my testing.

Benchmarking Notes

System configuration is here: https://github.com/bgw/benchmark-scripts . This is a downclocked machine with most CPU cores disabled to get low-noise measurements. Treat these results as relative to each other, not as absolute values.
Build benchmarks are run with mold, as GNU ld is incredibly slow (and often causes OOMs with 16GB of RAM). We're already using mold in the private nextpack meta-repository. I'll follow up with another PR to use mold or lld by default.

Build Time Benchmarks

There's a significant regression to cold builds, but there's no meaningful regression for warm builds.

Cold time to build tests (2 runs):

rm -rf target/ && time RUSTFLAGS=-Clink-arg=-fuse-ld=mold cargo nextest run -- dummy_filter_build_only_dont_run_any_tests

Before:

real	9m29.839s
real	9m27.522s

After:

real    15m28.105s
real    15m28.577s

Warm time to build tests (2 runs):

Modify a string in an error message inside of crates/turbopack-ecmascript/src/minify.rs. This guarantees forced recompilation of all dependent crates without meaninfully changing any behavior. Then run:

time RUSTFLAGS=-Clink-arg=-fuse-ld=mold cargo nextest run -- dummy_filter_build_only_dont_run_any_tests

Before:

real    1m33.497s
real    1m36.134s

After:

real    1m41.232s
real    1m40.153s

Warm time to build single binary (2 runs):

This is less dependent on linking than the tests, which generate many binary targets.

Modify a string in an error message inside of crates/turbopack-ecmascript/src/minify.rs. This guarantees forced recompilation of all dependent crates without meaninfully changing any behavior. Then run:

time RUSTFLAGS=-Clink-arg=-fuse-ld=mold cargo build -p turbopack-cli

Before:

real    0m37.565s
real    0m37.058s

After:

real    0m36.450s
real    0m36.849s

Cold time to build a single turborepo binary:

rm -rf target/ && time RUSTFLAGS=-Clink-arg=-fuse-ld=mold cargo build -p turbo

Before:

real    3m43.488s

After:

real    4m54.416s

Execution Time Benchmarks

turbopack-cli's `bench_startup`

cargo bench --profile dev -p turbopack-cli

Before:

bench_startup/Turbopack CSR/1000 modules
                    	time:   [20.744 s 20.869 s 20.995 s]

After:

bench_startup/Turbopack CSR/1000 modules
                        time:   [7.8037 s 7.8505 s 7.9030 s]

Test Execution (excluding build, 2 runs)

With a completely warm build cache (such that nothing needs to build), run:

time RUSTFLAGS=-Clink-arg=-fuse-ld=mold cargo nextest run -E 'not test(node_file_trace)'

Before:

real	2m51.767s
real	2m51.482s

After:

real    1m17.286s
real    1m12.520s

This PR compiles all non-workspace dependencies, as well as `turbo-tasks-memory` (which is particularly sensitive) with basic optimizations. Most crates in the workspace still use opt-level 0 locally. While not as good as applying opt-level 1 everywhere, this significantly reduces execution times versus opt-level 0, while making cold builds about 50-60% slower. Warm build times are largely unaffected. The debugging (gdb/lldb) experience may also be slightly worsened by the optimizations.

vercel · 2024-05-14T22:29:01Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
examples-nonmonorepo	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	May 14, 2024 10:29pm
examples-vite-web	🔄 Building (Inspect)	Visit Preview	💬 Add feedback	May 14, 2024 10:29pm

8 Ignored Deployments

Name	Status	Preview	Updated (UTC)
examples-basic-web	⬜️ Ignored (Inspect)	Visit Preview	May 14, 2024 10:29pm
examples-designsystem-docs	⬜️ Ignored (Inspect)	Visit Preview	May 14, 2024 10:29pm
examples-gatsby-web	⬜️ Ignored (Inspect)	Visit Preview	May 14, 2024 10:29pm
examples-kitchensink-blog	⬜️ Ignored (Inspect)	Visit Preview	May 14, 2024 10:29pm
examples-native-web	⬜️ Ignored (Inspect)	Visit Preview	May 14, 2024 10:29pm
examples-svelte-web	⬜️ Ignored (Inspect)	Visit Preview	May 14, 2024 10:29pm
examples-tailwind-web	⬜️ Ignored (Inspect)	Visit Preview	May 14, 2024 10:29pm
rust-docs	⬜️ Ignored (Inspect)	Visit Preview	May 14, 2024 10:29pm

bgw · 2024-05-14T22:29:14Z

Selectively enable opt-level 1 #8141 👈
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @bgw and the rest of your teammates on Graphite

github-actions · 2024-05-14T22:31:03Z

🟢 Turbopack Benchmark CI successful 🟢

Thanks

github-actions · 2024-05-14T22:50:19Z

⚠️ CI failed ⚠️

The following steps have failed in CI:

Turbopack Rust tests (mac/win, non-blocking)

See workflow summary for details

bgw · 2024-05-14T23:05:52Z

Cargo.toml

@@ -77,8 +77,13 @@ turbopack-wasi = [
 [workspace.lints.clippy]
 too_many_arguments = "allow"

-[profile.dev.package.turbo-tasks-macros]


This config was redundant with the

[profile.dev.build-override]

below. I measured no meaningful change to build times with/without it.

bgw · 2024-05-14T23:10:41Z

Cargo.toml

+
+# Set the options for dependencies (not crates in the workspace), this mostly impacts cold builds
+[profile.dev.package."*"]
+opt-level = 1

 # Set the settings for build scripts and proc-macros.
 [profile.dev.build-override]


Follow-up: One problem is that due to precedence ordering issues in cargo, dependencies like syn will now be compiled with opt-level = 1 instead of opt-level = 3. I need to perform further benchmarking to determine if this is significant and worth working around.

rust-lang/cargo#9351

I guess this will mostly affect macro eval? turbo tasks has a few thousand alone but I would hope that the aggregate wins from not optimizing deps would be greater than the perf loss.

also TY for adding comments. our build configs are under documented

arlyon

I am on board with this pending verification of syn costs. I don't expect it to be an issue but can't hurt

bgw · 2024-05-15T18:35:46Z

I modified the Cargo.toml to explicitly include opt-level = 3 for all the "big" proc-macro related dependencies I could find that would've been overridden by [profile.dev.package."*"].

Additions to `Cargo.toml`

[profile.dev.package.syn]
opt-level = 3

[profile.dev.package.quote]
opt-level = 3

[profile.dev.package.proc-macro2]
opt-level = 3

[profile.dev.package.serde_derive]
opt-level = 3

[profile.dev.package.serde_derive_internals]
opt-level = 3

[profile.dev.package.wasm-bindgen-backend]
opt-level = 3

[profile.dev.package.bindgen]
opt-level = 3

[profile.dev.package.tokio-macros]
opt-level = 3

[profile.dev.package.rkyv_derive]
opt-level = 3

I repeated the "Warm time to build single binary" benchmark, which uses many of these proc macros, but doesn't include the time spent optimizing them at opt-level = 3. This is the benchmark I'd expect to see the most benefit from improved proc macro execution. There was no meaningful change:

real    0m37.078s

NicholasLYang

Sounds good to me. Our sccache setup should hopefully mitigate cold builds being slower

turbo-orchestrator bot added created-by: turbopack owned-by: turborepo owned-by: turbopack labels May 14, 2024

vercel bot deployed to Preview – examples-nonmonorepo May 14, 2024 22:29 View deployment

bgw commented May 14, 2024

View reviewed changes

bgw marked this pull request as ready for review May 14, 2024 23:18

bgw requested review from a team as code owners May 14, 2024 23:18

bgw requested review from mehulkar and codybrouwers May 14, 2024 23:18

mehulkar requested review from chris-olszewski and NicholasLYang and removed request for mehulkar May 15, 2024 03:08

arlyon approved these changes May 15, 2024

View reviewed changes

NicholasLYang approved these changes May 15, 2024

View reviewed changes

bgw merged commit 46b888b into main May 15, 2024
50 of 52 checks passed

bgw deleted the bgw/dev-opt-level branch May 15, 2024 22:00

ForsakenHarmony mentioned this pull request May 16, 2024

feat(turbopack-ecmascript): use import attributes for annotations vercel/next.js#59370

Merged

kdy1 mentioned this pull request May 17, 2024

Update turbopack & swc_core vercel/next.js#65450

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Selectively enable opt-level 1 #8141

Selectively enable opt-level 1 #8141

bgw commented May 14, 2024 •

edited

vercel bot commented May 14, 2024 •

edited

bgw commented May 14, 2024

github-actions bot commented May 14, 2024

github-actions bot commented May 14, 2024 •

edited

bgw May 14, 2024

bgw May 14, 2024 •

edited

arlyon May 15, 2024

arlyon left a comment

bgw commented May 15, 2024 •

edited

NicholasLYang left a comment

Selectively enable opt-level 1 #8141

Selectively enable opt-level 1 #8141

Conversation

bgw commented May 14, 2024 • edited

Benchmarking Notes

Build Time Benchmarks

Cold time to build tests (2 runs):

Warm time to build tests (2 runs):

Warm time to build single binary (2 runs):

Cold time to build a single turborepo binary:

Execution Time Benchmarks

turbopack-cli's bench_startup

Test Execution (excluding build, 2 runs)

vercel bot commented May 14, 2024 • edited

bgw commented May 14, 2024

github-actions bot commented May 14, 2024

🟢 Turbopack Benchmark CI successful 🟢

github-actions bot commented May 14, 2024 • edited

⚠️ CI failed ⚠️

bgw May 14, 2024

Choose a reason for hiding this comment

bgw May 14, 2024 • edited

Choose a reason for hiding this comment

arlyon May 15, 2024

Choose a reason for hiding this comment

arlyon left a comment

Choose a reason for hiding this comment

bgw commented May 15, 2024 • edited

NicholasLYang left a comment

Choose a reason for hiding this comment

bgw commented May 14, 2024 •

edited

turbopack-cli's `bench_startup`

vercel bot commented May 14, 2024 •

edited

github-actions bot commented May 14, 2024 •

edited

bgw May 14, 2024 •

edited

bgw commented May 15, 2024 •

edited