-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
automatically shrink aggregation lists #8657
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
9 Skipped Deployments
|
🟢 Turbopack Benchmark CI successful 🟢Thanks |
Logs
See job summary for details |
|
pub fn shrink_amortized(&mut self) { | ||
match self { | ||
AutoMap::List(list) => { | ||
if list.capacity() > list.len() * 2 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because HashMap and Vec grow by powers of two, this could lead to situations were we're frequently growing and shrinking the same list every time something is added and removed.
It's worse for memory, but better for time complexity to grow and shrink by different thresholds. E.g. If list.capacity() > list.len() * 4
then shrink to list.len() * 2
.
Many dynamic arrays also deallocate some of the underlying storage if its size drops below a certain threshold, such as 30% of the capacity. This threshold must be strictly smaller than 1/a in order to provide hysteresis (provide a stable band to avoid repeatedly growing and shrinking) and support mixed sequences of insertions and removals with amortized constant cost.
-- https://en.wikipedia.org/wiki/Dynamic_array#Geometric_expansion_and_amortized_cost
If you think it's worth it, we could intercept inserts here and manipulate Vec/HashMap to achieve a different growth factor than 2. Many languages use 1.5. Python uses the very low value of 1.125 (!).
It also may be worth adding an arbitrary minimum size here where it's not worth shrinking the Vec below (e.g. 4, 8, or 10) to avoid excessive resizes of very small arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, as long as the shrink threshold is adjusted to be different than the growth threshold.
### Description When unloading tasks, edges from the aggregation structure are removed, but their memory is not reclaimed as Vecs don't shrink automatically. This adds a `shrink_amortized` method which shrinks the Vec/HashMap when capacity is 2 times larger than length. This also calls `shrink_to_fit` when running GC on a task. ### Testing Instructions <!-- Give a quick description of steps to test your changes. -->
### Description When unloading tasks, edges from the aggregation structure are removed, but their memory is not reclaimed as Vecs don't shrink automatically. This adds a `shrink_amortized` method which shrinks the Vec/HashMap when capacity is 2 times larger than length. This also calls `shrink_to_fit` when running GC on a task. ### Testing Instructions <!-- Give a quick description of steps to test your changes. -->
### Description When unloading tasks, edges from the aggregation structure are removed, but their memory is not reclaimed as Vecs don't shrink automatically. This adds a `shrink_amortized` method which shrinks the Vec/HashMap when capacity is 2 times larger than length. This also calls `shrink_to_fit` when running GC on a task. ### Testing Instructions <!-- Give a quick description of steps to test your changes. -->
### Description When unloading tasks, edges from the aggregation structure are removed, but their memory is not reclaimed as Vecs don't shrink automatically. This adds a `shrink_amortized` method which shrinks the Vec/HashMap when capacity is 2 times larger than length. This also calls `shrink_to_fit` when running GC on a task. ### Testing Instructions <!-- Give a quick description of steps to test your changes. -->
… when constructing a cell (#72113) Cell contents are immutable once constructed, so there's no chance that they'll grow in size again. Common collections can be shrunk to avoid storing empty spare capacity in this case (note: if they're already correctly sized, `shrink_to_fit` bails out early). **Result:** This gives a ~1.4% decrease in top-line peak memory consumption, for a theoretical CPU/Wall time cost that's too small to measure. **Inspiration:** The inspiration for this was vercel/turborepo#2873, which decreased task storage (not the top-line memory usage?) by ~14%, vercel/turborepo#8657, and other similar optimization PRs. ## Additional Opportunities - There may be more places where cell are constructed (e.g. persistent storage deserialization) where a cell's `SharedReference` is constructed that is not currently captured by this. - Depending on the library used, deserialization may already construct exact-sized collections. - As an additional example, manually constructing a `ReadRef` and converting it into a cell skips this optimization because `ReadRef::cell` internally uses the type-erased shared-reference `raw_cell` API which is incompatible with this optimization. We could special-case that in the `ReadRef::new_owned` constructor (not in `ReadRef::new_arc` though), but nobody should be manually constructing `ReadRef`s. - We still don't use `shrink_to_fit` on `RcStr` types. Some of these are in-place extended (when they have a refcount of 1) with `RcStr::map`, so we probably don't want to be too aggressive about this to avoid `O(n^2)` time complexity blowups. ## Memory Benchmark Setup ```bash cd ~/next.js cargo run -p next-build-test --release -- generate ~/shadcn-ui/apps/www/ > ~/shadcn-ui/apps/www/project_options.json pnpm pack-next --project ~/shadcn-ui/apps/www/ ``` ```bash cd ~/shadcn-ui pnpm i cd ~/shadcn-ui/apps/www/ heaptrack ~/next.js/target/release/next-build-test run sequential 1 1 '/sink' heaptrack --analyze ~/shadcn-ui/apps/www/heaptrack.next-build-test.3604648.zst ``` ### Memory Before (canary branch) First Run: ``` peak heap memory consumption: 3.23G peak RSS (including heaptrack overhead): 4.75G ``` Second Run: ``` peak heap memory consumption: 3.23G peak RSS (including heaptrack overhead): 4.75G ``` ### Memory After (this PR) First Run: ``` peak heap memory consumption: 3.18G peak RSS (including heaptrack overhead): 4.74G ``` Second Run: ``` peak heap memory consumption: 3.19G peak RSS (including heaptrack overhead): 4.73G ``` This is about a 1.4% decrease in top-line memory consumption. ## Wall Time with `hyperfine` (Counter-Metric) This is theoretically a time-memory tradeoff, as we'll spend some time `memcpy`ing things into smaller allocations, though in some cases reducing memory usage can improve cache locality, so it's not always obvious. ``` hyperfine --warmup 3 -r 30 --time-unit millisecond '~/next.js/target/release/next-build-test run sequential 1 1 /sink' ``` This benchmark is slow and takes about 30 minutes to run. Before: ``` Benchmark 1: ~/next.js/target/release/next-build-test run sequential 1 1 /sink Time (mean ± σ): 56387.5 ms ± 212.6 ms [User: 107807.5 ms, System: 9509.8 ms] Range (min … max): 55934.4 ms … 56872.9 ms 30 runs ``` After: ``` Benchmark 1: ~/next.js/target/release/next-build-test run sequential 1 1 /sink Time (mean ± σ): 56020.9 ms ± 235.4 ms [User: 107483.8 ms, System: 9371.8 ms] Range (min … max): 55478.2 ms … 56563.6 ms 30 runs ``` This is a ~0.65% *reduction* in wall time. This is small enough (<2 standard deviations) to likely just be noise. ## Wall Time with `turbopack-bench` (Counter-Metric) ``` cargo bench -p turbopack-bench -p turbopack-cli -- "hmr_to_eval/Turbopack CSR" ``` Gives: ``` bench_hmr_to_eval/Turbopack CSR/1000 modules time: [15.123 ms 15.208 ms 15.343 ms] change: [-0.8471% +0.4882% +1.9719%] (p = 0.55 > 0.05) No change in performance detected. ``` Using https://github.com/bgw/benchmark-scripts/ In practice, it's not really possible to measure changes in wall time <1%, so this is within "noise" territory (as noted in the criterion output). Closes PACK-3361
… when constructing a cell (vercel#72113) Cell contents are immutable once constructed, so there's no chance that they'll grow in size again. Common collections can be shrunk to avoid storing empty spare capacity in this case (note: if they're already correctly sized, `shrink_to_fit` bails out early). **Result:** This gives a ~1.4% decrease in top-line peak memory consumption, for a theoretical CPU/Wall time cost that's too small to measure. **Inspiration:** The inspiration for this was vercel/turborepo#2873, which decreased task storage (not the top-line memory usage?) by ~14%, vercel/turborepo#8657, and other similar optimization PRs. ## Additional Opportunities - There may be more places where cell are constructed (e.g. persistent storage deserialization) where a cell's `SharedReference` is constructed that is not currently captured by this. - Depending on the library used, deserialization may already construct exact-sized collections. - As an additional example, manually constructing a `ReadRef` and converting it into a cell skips this optimization because `ReadRef::cell` internally uses the type-erased shared-reference `raw_cell` API which is incompatible with this optimization. We could special-case that in the `ReadRef::new_owned` constructor (not in `ReadRef::new_arc` though), but nobody should be manually constructing `ReadRef`s. - We still don't use `shrink_to_fit` on `RcStr` types. Some of these are in-place extended (when they have a refcount of 1) with `RcStr::map`, so we probably don't want to be too aggressive about this to avoid `O(n^2)` time complexity blowups. ## Memory Benchmark Setup ```bash cd ~/next.js cargo run -p next-build-test --release -- generate ~/shadcn-ui/apps/www/ > ~/shadcn-ui/apps/www/project_options.json pnpm pack-next --project ~/shadcn-ui/apps/www/ ``` ```bash cd ~/shadcn-ui pnpm i cd ~/shadcn-ui/apps/www/ heaptrack ~/next.js/target/release/next-build-test run sequential 1 1 '/sink' heaptrack --analyze ~/shadcn-ui/apps/www/heaptrack.next-build-test.3604648.zst ``` ### Memory Before (canary branch) First Run: ``` peak heap memory consumption: 3.23G peak RSS (including heaptrack overhead): 4.75G ``` Second Run: ``` peak heap memory consumption: 3.23G peak RSS (including heaptrack overhead): 4.75G ``` ### Memory After (this PR) First Run: ``` peak heap memory consumption: 3.18G peak RSS (including heaptrack overhead): 4.74G ``` Second Run: ``` peak heap memory consumption: 3.19G peak RSS (including heaptrack overhead): 4.73G ``` This is about a 1.4% decrease in top-line memory consumption. ## Wall Time with `hyperfine` (Counter-Metric) This is theoretically a time-memory tradeoff, as we'll spend some time `memcpy`ing things into smaller allocations, though in some cases reducing memory usage can improve cache locality, so it's not always obvious. ``` hyperfine --warmup 3 -r 30 --time-unit millisecond '~/next.js/target/release/next-build-test run sequential 1 1 /sink' ``` This benchmark is slow and takes about 30 minutes to run. Before: ``` Benchmark 1: ~/next.js/target/release/next-build-test run sequential 1 1 /sink Time (mean ± σ): 56387.5 ms ± 212.6 ms [User: 107807.5 ms, System: 9509.8 ms] Range (min … max): 55934.4 ms … 56872.9 ms 30 runs ``` After: ``` Benchmark 1: ~/next.js/target/release/next-build-test run sequential 1 1 /sink Time (mean ± σ): 56020.9 ms ± 235.4 ms [User: 107483.8 ms, System: 9371.8 ms] Range (min … max): 55478.2 ms … 56563.6 ms 30 runs ``` This is a ~0.65% *reduction* in wall time. This is small enough (<2 standard deviations) to likely just be noise. ## Wall Time with `turbopack-bench` (Counter-Metric) ``` cargo bench -p turbopack-bench -p turbopack-cli -- "hmr_to_eval/Turbopack CSR" ``` Gives: ``` bench_hmr_to_eval/Turbopack CSR/1000 modules time: [15.123 ms 15.208 ms 15.343 ms] change: [-0.8471% +0.4882% +1.9719%] (p = 0.55 > 0.05) No change in performance detected. ``` Using https://github.com/bgw/benchmark-scripts/ In practice, it's not really possible to measure changes in wall time <1%, so this is within "noise" territory (as noted in the criterion output). Closes PACK-3361
Description
When unloading tasks, edges from the aggregation structure are removed, but their memory is not reclaimed as Vecs don't shrink automatically.
This adds a
shrink_amortized
method which shrinks the Vec/HashMap when capacity is 2 times larger than length.This also calls
shrink_to_fit
when running GC on a task.Testing Instructions