Improve Rust compile times #6065

str4d · 2022-07-08T16:03:36Z

Currently we spend quite a bit of time recompiling Rust when making incremental changes. It would be nice to figure out how to improve this. Ideas for improvement:

We seem to have a noticeable performance cost in compiling the Rust library and the binaries separately.
We currently use codegen-units = 1.

The text was updated successfully, but these errors were encountered:

str4d · 2022-07-08T16:04:55Z

Benchmarked codegen-units on Ryzen 9 5950X. Note that the -j30 parameter affects the C++ side, not the Rust side (which just uses all 32 threads).

`./zcutil/clean.sh && ./zcutil/build.sh -j30`

With `codegen-units = 1`

real    3m6.597s
user    6m24.224s
sys     0m43.280s

real    3m6.023s
user    6m23.447s
sys     0m43.794s

Without `codegen-units = 1`

real    2m29.704s
user    5m58.134s
sys     0m44.960s

real    2m21.490s
user    5m59.819s
sys     0m45.727s

`touch src/rust/src/rustzcash.rs && make -j30`

With `codegen-units = 1`

real    1m27.347s
user    1m26.618s
sys     0m0.920s

real    1m28.987s
user    1m31.265s
sys     0m4.161s

Without `codegen-units = 1`

real    0m59.504s
user    1m19.499s
sys     0m1.066s

real    0m59.122s
user    1m21.626s
sys     0m4.131s

So it looks like using the default codegen-units saves us around 30 seconds of build time, which is around 22% of clean build time and 33% of incremental rebuild time.

str4d · 2022-08-29T11:16:01Z

Ran the same scripts on the same CPU again, but on current master:

`./zcutil/clean.sh && time ./zcutil/build.sh -j30`

Note that this does not include the build time of vendored dependencies (otherwise you'd see a disparity between the two measurements).

With `codegen-units = 1`

real    3m3.069s
user    6m18.913s
sys     0m44.811s

real    2m58.695s
user    6m15.580s
sys     0m44.638s

Without `codegen-units = 1`

real    2m19.684s
user    6m2.841s
sys     0m46.162s

real    2m18.342s
user    6m3.672s
sys     0m45.786s

This is slightly faster than before; it might be due to the LLVM 14 or Rust 1.63 updates.

Without `codegen-units = 1`, and merging `--lib` and `--bins` building

real    2m10.844s
user    5m43.436s
sys     0m44.201s

real    2m10.715s
user    5m47.419s
sys     0m44.728s

So it looks like using the default codegen-units currently saves us around 40 seconds of build time, and building the binaries and library at the same time saves a further 10 seconds. The latter is because the binaries can be built in parallel; currently that is prevented due to locks on the Rust build folder. The rationale for separating them was to avoid any cost of the binary builds when not needed, but clearly they are not the main cost, because the library build cost itself is so high:

This enables `cargo` to parallelize the library and binary builds internally, reducing the Rust build time by the build time of the binaries (because they are overall faster than the library build). Part of zcash#6065.

str4d · 2022-08-29T11:51:31Z

Hmm, I had zcash-inspect disabled because of a local linker issue. I re-enabled it because parallelising the library and binaries also fixed my linker issue, and it revealed a significantly different performance profile (I presume since zcash-inspect was merged):

Avenues for improvement:

Figure out what is going on with the orchard crate's codegen.
Move more of the librustzcash code to use the cxx crate, and see what effect that has.
Move more of the librustzcash code out to external Rust crates.
- Some of these might make sense to live in this repo, but librustzcash is intended to only be FFI code, so any business logic should likely move into https://github.com/zcash/librustzcash.

str4d · 2023-02-06T17:51:24Z

I forgot to mention that I obtained the above figures by adding the --timings flag to RUST_BUILD_OPTS in src/Makefile.am.

str4d · 2023-02-06T18:42:59Z

I tried using rustc's self-profiling to figure out where the time was being spent:

diff --git a/src/Makefile.am b/src/Makefile.am
index 0e21d2d44c..daecbe9832 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -139,7 +139,7 @@ LIBBITCOIN_CRYPTO += $(LIBBITCOIN_CRYPTO_SHANI)
 endif
 
 cargo-build: $(CARGO_CONFIGURED) $(LIBSECP256K1)
-       $(rust_verbose)$(RUST_ENV_VARS) $(CARGO) build $(RUST_BUILD_OPTS) $(cargo_verbose)
+       $(rust_verbose)$(RUST_ENV_VARS) RUSTC_BOOTSTRAP=1 $(CARGO) rustc $(RUST_BUILD_OPTS) $(cargo_verbose) --lib -- -Z self-profile -Z self-profile-events=default,args
 
 cargo-build-lib: cargo-build

Here's what I get on my Ryzen 9 5950X (also with a zcashd reindex running in the background during its "read all the blocks" phase):

Oof, that's a lot of time doing linking! And about half of it is LTO:

zcash/Cargo.toml

Lines 111 to 112 in feec543

    
           [profile.release] 
        
           lto = true

It turns out that lto = true is equivalent to lto = 'fat'. Here's what happens when I switch to lto = 'thin' (which should give close-to-equivalent performance improvements):

It uses a bunch more threads (the timeline scrolls off down to a thread number of 196), and saves over 20s of compile time.

Some of the durations for comparison (in seconds):

Phase	`lto = true`	`lto = 'thin'`
`LLVM_passes`	83.40	59.94
`codegen_module_perform_lto`	37.56	17.40
`codegen_module_optimize`	44.23	42.30

This should achieve similar performance gains to "fat" LTO (which we were previously using) while taking substantially less time to run (over 20s saved on a Ryzen 9 5950X). Part of zcash#6065.

str4d · 2023-02-06T19:46:49Z

I tried combining #6407 with removing codegen-units (on my Ryzen 9 5950X):

`./zcutil/clean.sh && time ./zcutil/build.sh -j30`

master

real    2m52.682s
user    6m24.228s
sys     0m42.750s

#6407

real    2m28.203s
user    7m5.917s
sys     0m44.407s

#6407 without `codegen-units = 1`

real    1m45.812s
user    8m38.458s
sys     0m50.999s

`touch src/rust/src/rustzcash.rs && time make -j30`:

master

real    1m28.580s
user    1m27.796s
sys     0m0.899s

#6407

real    1m8.626s
user    1m51.819s
sys     0m5.348s

#6407 without `codegen-units = 1`

real    0m17.458s
user    2m57.936s
sys     0m5.158s

So the combination of ThinLTO and default codegen-units basically almost completely eliminates the slowness of recompiles, and has an equivalent effect on clean builds. We should definitely consider the latter. @ebfull what was the original motivation for codegen-units = 1?

daira · 2023-02-06T20:17:06Z

We should check that this doesn't break determinism of the gitian builds.

This moves us back to using the default value of 16 for the release profile (which does not enable incremental builds). Part of zcash#6065.

str4d · 2023-03-02T20:54:48Z

@ebfull what was the original motivation for codegen-units = 1?

@ebfull told me that the original motivation here was performance; he seems to recall that the compiler at the time could produce faster code if all codegen was in a single unit. We should check there are no significant performance regressions caused by #6411 (or if there are, decide if they are worth the compile time improvements).

str4d added I-performance Problems and improvements with respect to performance A-build Area: Build system L-rust Involves Rust code. labels Jul 8, 2022

str4d mentioned this issue Aug 29, 2022

build: Build Rust library and binaries at the same time #6141

Merged

str4d mentioned this issue Feb 6, 2023

rust: Compile with ThinLTO #6407

Merged

str4d added a commit to str4d/zcash that referenced this issue Feb 6, 2023

rust: Unset codegen-units = 1

b4dc2cb

This moves us back to using the default value of 16 for the release profile (which does not enable incremental builds). Part of zcash#6065.

str4d mentioned this issue Feb 6, 2023

rust: Unset "codegen-units = 1" #6411

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Rust compile times #6065

Improve Rust compile times #6065

str4d commented Jul 8, 2022

str4d commented Jul 8, 2022 •

edited

str4d commented Aug 29, 2022

str4d commented Aug 29, 2022

str4d commented Feb 6, 2023

str4d commented Feb 6, 2023

str4d commented Feb 6, 2023

daira commented Feb 6, 2023

str4d commented Mar 2, 2023

Improve Rust compile times #6065

Improve Rust compile times #6065

Comments

str4d commented Jul 8, 2022

str4d commented Jul 8, 2022 • edited

./zcutil/clean.sh && ./zcutil/build.sh -j30

With codegen-units = 1

Without codegen-units = 1

touch src/rust/src/rustzcash.rs && make -j30

With codegen-units = 1

Without codegen-units = 1

str4d commented Aug 29, 2022

./zcutil/clean.sh && time ./zcutil/build.sh -j30

With codegen-units = 1

Without codegen-units = 1

Without codegen-units = 1, and merging --lib and --bins building

str4d commented Aug 29, 2022

str4d commented Feb 6, 2023

str4d commented Feb 6, 2023

str4d commented Feb 6, 2023

./zcutil/clean.sh && time ./zcutil/build.sh -j30

master

#6407

#6407 without codegen-units = 1

touch src/rust/src/rustzcash.rs && time make -j30:

master

#6407

#6407 without codegen-units = 1

daira commented Feb 6, 2023

str4d commented Mar 2, 2023

str4d commented Jul 8, 2022 •

edited

`./zcutil/clean.sh && ./zcutil/build.sh -j30`

With `codegen-units = 1`

Without `codegen-units = 1`

`touch src/rust/src/rustzcash.rs && make -j30`

With `codegen-units = 1`

Without `codegen-units = 1`

`./zcutil/clean.sh && time ./zcutil/build.sh -j30`

With `codegen-units = 1`

Without `codegen-units = 1`

Without `codegen-units = 1`, and merging `--lib` and `--bins` building

`./zcutil/clean.sh && time ./zcutil/build.sh -j30`

#6407 without `codegen-units = 1`

`touch src/rust/src/rustzcash.rs && time make -j30`:

#6407 without `codegen-units = 1`