automatically shrink aggregation lists #8657

sokra · 2024-07-03T09:47:22Z

Description

When unloading tasks, edges from the aggregation structure are removed, but their memory is not reclaimed as Vecs don't shrink automatically.

This adds a shrink_amortized method which shrinks the Vec/HashMap when capacity is 2 times larger than length.

This also calls shrink_to_fit when running GC on a task.

Testing Instructions

vercel · 2024-07-03T09:47:26Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
rust-docs	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Jul 4, 2024 1:14pm

9 Skipped Deployments

Name	Status	Preview	Updated (UTC)
examples-basic-web	⬜️ Ignored (Inspect)	Visit Preview	Jul 4, 2024 1:14pm
examples-designsystem-docs	⬜️ Ignored (Inspect)	Visit Preview	Jul 4, 2024 1:14pm
examples-gatsby-web	⬜️ Ignored (Inspect)	Visit Preview	Jul 4, 2024 1:14pm
examples-kitchensink-blog	⬜️ Ignored (Inspect)	Visit Preview	Jul 4, 2024 1:14pm
examples-native-web	⬜️ Ignored (Inspect)	Visit Preview	Jul 4, 2024 1:14pm
examples-svelte-web	⬜️ Ignored (Inspect)	Visit Preview	Jul 4, 2024 1:14pm
examples-tailwind-web	⬜️ Ignored (Inspect)	Visit Preview	Jul 4, 2024 1:14pm
examples-vite-web	⬜️ Ignored (Inspect)	Visit Preview	Jul 4, 2024 1:14pm
examples-nonmonorepo	⬜️ Skipped (Inspect)		Jul 4, 2024 1:14pm

github-actions · 2024-07-03T09:49:29Z

🟢 Turbopack Benchmark CI successful 🟢

Thanks

github-actions · 2024-07-03T09:50:47Z

⚠️ This change may fail to build next-swc.

Logs

packages/next-swc/crates/next-core/src/next_shared/transforms/relay.rs:15:46: �[38;5;9merror[E0061]: this function takes 2 arguments but 1 argument was supplied
error: could not compile `next-core` (lib) due to 1 previous error

See job summary for details

github-actions · 2024-07-03T09:54:30Z

⚠️ CI failed ⚠️

The following steps have failed in CI:

Turbopack Rust tests (mac/win, non-blocking)
Turbopack Rust benchmark tests (mac/win, non-blocking)

See workflow summary for details

bgw · 2024-07-03T16:49:16Z

crates/turbo-tasks-auto-hash-map/src/map.rs

+    pub fn shrink_amortized(&mut self) {
+        match self {
+            AutoMap::List(list) => {
+                if list.capacity() > list.len() * 2 {


Because HashMap and Vec grow by powers of two, this could lead to situations were we're frequently growing and shrinking the same list every time something is added and removed.

It's worse for memory, but better for time complexity to grow and shrink by different thresholds. E.g. If list.capacity() > list.len() * 4 then shrink to list.len() * 2.

Many dynamic arrays also deallocate some of the underlying storage if its size drops below a certain threshold, such as 30% of the capacity. This threshold must be strictly smaller than 1/a in order to provide hysteresis (provide a stable band to avoid repeatedly growing and shrinking) and support mixed sequences of insertions and removals with amortized constant cost.

-- https://en.wikipedia.org/wiki/Dynamic_array#Geometric_expansion_and_amortized_cost

If you think it's worth it, we could intercept inserts here and manipulate Vec/HashMap to achieve a different growth factor than 2. Many languages use 1.5. Python uses the very low value of 1.125 (!).

It also may be worth adding an arbitrary minimum size here where it's not worth shrinking the Vec below (e.g. 4, 8, or 10) to avoid excessive resizes of very small arrays.

bgw

LGTM, as long as the shrink threshold is adjusted to be different than the growth threshold.

### Description When unloading tasks, edges from the aggregation structure are removed, but their memory is not reclaimed as Vecs don't shrink automatically. This adds a `shrink_amortized` method which shrinks the Vec/HashMap when capacity is 2 times larger than length. This also calls `shrink_to_fit` when running GC on a task. ### Testing Instructions

… when constructing a cell (#72113) Cell contents are immutable once constructed, so there's no chance that they'll grow in size again. Common collections can be shrunk to avoid storing empty spare capacity in this case (note: if they're already correctly sized, `shrink_to_fit` bails out early). **Result:** This gives a ~1.4% decrease in top-line peak memory consumption, for a theoretical CPU/Wall time cost that's too small to measure. **Inspiration:** The inspiration for this was vercel/turborepo#2873, which decreased task storage (not the top-line memory usage?) by ~14%, vercel/turborepo#8657, and other similar optimization PRs. ## Additional Opportunities - There may be more places where cell are constructed (e.g. persistent storage deserialization) where a cell's `SharedReference` is constructed that is not currently captured by this. - Depending on the library used, deserialization may already construct exact-sized collections. - As an additional example, manually constructing a `ReadRef` and converting it into a cell skips this optimization because `ReadRef::cell` internally uses the type-erased shared-reference `raw_cell` API which is incompatible with this optimization. We could special-case that in the `ReadRef::new_owned` constructor (not in `ReadRef::new_arc` though), but nobody should be manually constructing `ReadRef`s. - We still don't use `shrink_to_fit` on `RcStr` types. Some of these are in-place extended (when they have a refcount of 1) with `RcStr::map`, so we probably don't want to be too aggressive about this to avoid `O(n^2)` time complexity blowups. ## Memory Benchmark Setup ```bash cd ~/next.js cargo run -p next-build-test --release -- generate ~/shadcn-ui/apps/www/ > ~/shadcn-ui/apps/www/project_options.json pnpm pack-next --project ~/shadcn-ui/apps/www/ ``` ```bash cd ~/shadcn-ui pnpm i cd ~/shadcn-ui/apps/www/ heaptrack ~/next.js/target/release/next-build-test run sequential 1 1 '/sink' heaptrack --analyze ~/shadcn-ui/apps/www/heaptrack.next-build-test.3604648.zst ``` ### Memory Before (canary branch) First Run: ``` peak heap memory consumption: 3.23G peak RSS (including heaptrack overhead): 4.75G ``` Second Run: ``` peak heap memory consumption: 3.23G peak RSS (including heaptrack overhead): 4.75G ``` ### Memory After (this PR) First Run: ``` peak heap memory consumption: 3.18G peak RSS (including heaptrack overhead): 4.74G ``` Second Run: ``` peak heap memory consumption: 3.19G peak RSS (including heaptrack overhead): 4.73G ``` This is about a 1.4% decrease in top-line memory consumption. ## Wall Time with `hyperfine` (Counter-Metric) This is theoretically a time-memory tradeoff, as we'll spend some time `memcpy`ing things into smaller allocations, though in some cases reducing memory usage can improve cache locality, so it's not always obvious. ``` hyperfine --warmup 3 -r 30 --time-unit millisecond '~/next.js/target/release/next-build-test run sequential 1 1 /sink' ``` This benchmark is slow and takes about 30 minutes to run. Before: ``` Benchmark 1: ~/next.js/target/release/next-build-test run sequential 1 1 /sink Time (mean ± σ): 56387.5 ms ± 212.6 ms [User: 107807.5 ms, System: 9509.8 ms] Range (min … max): 55934.4 ms … 56872.9 ms 30 runs ``` After: ``` Benchmark 1: ~/next.js/target/release/next-build-test run sequential 1 1 /sink Time (mean ± σ): 56020.9 ms ± 235.4 ms [User: 107483.8 ms, System: 9371.8 ms] Range (min … max): 55478.2 ms … 56563.6 ms 30 runs ``` This is a ~0.65% *reduction* in wall time. This is small enough (<2 standard deviations) to likely just be noise. ## Wall Time with `turbopack-bench` (Counter-Metric) ``` cargo bench -p turbopack-bench -p turbopack-cli -- "hmr_to_eval/Turbopack CSR" ``` Gives: ``` bench_hmr_to_eval/Turbopack CSR/1000 modules time: [15.123 ms 15.208 ms 15.343 ms] change: [-0.8471% +0.4882% +1.9719%] (p = 0.55 > 0.05) No change in performance detected. ``` Using https://github.com/bgw/benchmark-scripts/ In practice, it's not really possible to measure changes in wall time <1%, so this is within "noise" territory (as noted in the criterion output). Closes PACK-3361

… when constructing a cell (vercel#72113) Cell contents are immutable once constructed, so there's no chance that they'll grow in size again. Common collections can be shrunk to avoid storing empty spare capacity in this case (note: if they're already correctly sized, `shrink_to_fit` bails out early). **Result:** This gives a ~1.4% decrease in top-line peak memory consumption, for a theoretical CPU/Wall time cost that's too small to measure. **Inspiration:** The inspiration for this was vercel/turborepo#2873, which decreased task storage (not the top-line memory usage?) by ~14%, vercel/turborepo#8657, and other similar optimization PRs. ## Additional Opportunities - There may be more places where cell are constructed (e.g. persistent storage deserialization) where a cell's `SharedReference` is constructed that is not currently captured by this. - Depending on the library used, deserialization may already construct exact-sized collections. - As an additional example, manually constructing a `ReadRef` and converting it into a cell skips this optimization because `ReadRef::cell` internally uses the type-erased shared-reference `raw_cell` API which is incompatible with this optimization. We could special-case that in the `ReadRef::new_owned` constructor (not in `ReadRef::new_arc` though), but nobody should be manually constructing `ReadRef`s. - We still don't use `shrink_to_fit` on `RcStr` types. Some of these are in-place extended (when they have a refcount of 1) with `RcStr::map`, so we probably don't want to be too aggressive about this to avoid `O(n^2)` time complexity blowups. ## Memory Benchmark Setup ```bash cd ~/next.js cargo run -p next-build-test --release -- generate ~/shadcn-ui/apps/www/ > ~/shadcn-ui/apps/www/project_options.json pnpm pack-next --project ~/shadcn-ui/apps/www/ ``` ```bash cd ~/shadcn-ui pnpm i cd ~/shadcn-ui/apps/www/ heaptrack ~/next.js/target/release/next-build-test run sequential 1 1 '/sink' heaptrack --analyze ~/shadcn-ui/apps/www/heaptrack.next-build-test.3604648.zst ``` ### Memory Before (canary branch) First Run: ``` peak heap memory consumption: 3.23G peak RSS (including heaptrack overhead): 4.75G ``` Second Run: ``` peak heap memory consumption: 3.23G peak RSS (including heaptrack overhead): 4.75G ``` ### Memory After (this PR) First Run: ``` peak heap memory consumption: 3.18G peak RSS (including heaptrack overhead): 4.74G ``` Second Run: ``` peak heap memory consumption: 3.19G peak RSS (including heaptrack overhead): 4.73G ``` This is about a 1.4% decrease in top-line memory consumption. ## Wall Time with `hyperfine` (Counter-Metric) This is theoretically a time-memory tradeoff, as we'll spend some time `memcpy`ing things into smaller allocations, though in some cases reducing memory usage can improve cache locality, so it's not always obvious. ``` hyperfine --warmup 3 -r 30 --time-unit millisecond '~/next.js/target/release/next-build-test run sequential 1 1 /sink' ``` This benchmark is slow and takes about 30 minutes to run. Before: ``` Benchmark 1: ~/next.js/target/release/next-build-test run sequential 1 1 /sink Time (mean ± σ): 56387.5 ms ± 212.6 ms [User: 107807.5 ms, System: 9509.8 ms] Range (min … max): 55934.4 ms … 56872.9 ms 30 runs ``` After: ``` Benchmark 1: ~/next.js/target/release/next-build-test run sequential 1 1 /sink Time (mean ± σ): 56020.9 ms ± 235.4 ms [User: 107483.8 ms, System: 9371.8 ms] Range (min … max): 55478.2 ms … 56563.6 ms 30 runs ``` This is a ~0.65% *reduction* in wall time. This is small enough (<2 standard deviations) to likely just be noise. ## Wall Time with `turbopack-bench` (Counter-Metric) ``` cargo bench -p turbopack-bench -p turbopack-cli -- "hmr_to_eval/Turbopack CSR" ``` Gives: ``` bench_hmr_to_eval/Turbopack CSR/1000 modules time: [15.123 ms 15.208 ms 15.343 ms] change: [-0.8471% +0.4882% +1.9719%] (p = 0.55 > 0.05) No change in performance detected. ``` Using https://github.com/bgw/benchmark-scripts/ In practice, it's not really possible to measure changes in wall time <1%, so this is within "noise" territory (as noted in the criterion output). Closes PACK-3361

automatically shrink aggregation lists

147888e

sokra requested a review from a team as a code owner July 3, 2024 09:47

turbo-orchestrator bot added created-by: turbopack labels Jul 3, 2024

sokra requested review from bgw and arlyon July 3, 2024 09:47

bgw reviewed Jul 3, 2024

View reviewed changes

bgw approved these changes Jul 3, 2024

View reviewed changes

Use different factors for shrink and grow

bebdff6

vercel bot temporarily deployed to Preview – examples-nonmonorepo July 4, 2024 13:12 Inactive

vercel bot deployed to Preview – rust-docs July 4, 2024 13:14 View deployment

sokra merged commit 0df64d5 into main Jul 4, 2024
49 of 54 checks passed

sokra deleted the sokra/shrink-memory branch July 4, 2024 13:22

bgw mentioned this pull request Oct 31, 2024

perf(turbo-tasks): Call .shrink_to_fit() on common collection types when constructing a cell vercel/next.js#72113

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

automatically shrink aggregation lists #8657

automatically shrink aggregation lists #8657

sokra commented Jul 3, 2024

vercel bot commented Jul 3, 2024 •

edited

Loading

github-actions bot commented Jul 3, 2024 •

edited

Loading

github-actions bot commented Jul 3, 2024 •

edited

Loading

github-actions bot commented Jul 3, 2024 •

edited

Loading

bgw Jul 3, 2024

bgw left a comment

automatically shrink aggregation lists #8657

automatically shrink aggregation lists #8657

Conversation

sokra commented Jul 3, 2024

Description

Testing Instructions

vercel bot commented Jul 3, 2024 • edited Loading

github-actions bot commented Jul 3, 2024 • edited Loading

🟢 Turbopack Benchmark CI successful 🟢

github-actions bot commented Jul 3, 2024 • edited Loading

github-actions bot commented Jul 3, 2024 • edited Loading

⚠️ CI failed ⚠️

bgw Jul 3, 2024

Choose a reason for hiding this comment

bgw left a comment

Choose a reason for hiding this comment

vercel bot commented Jul 3, 2024 •

edited

Loading

github-actions bot commented Jul 3, 2024 •

edited

Loading

github-actions bot commented Jul 3, 2024 •

edited

Loading

github-actions bot commented Jul 3, 2024 •

edited

Loading