-
Notifications
You must be signed in to change notification settings - Fork 30.4k
Turbopack Persistence: Improve heuristic for compacted database access #89497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
Tests Passed |
Merging this PR will not alter performance
Comparing Footnotes
|
Stats from current PR✅ No significant changes detected📊 All Metrics📖 Metrics GlossaryDev Server Metrics:
Build Metrics:
Change Thresholds:
⚡ Dev Server
📦 Dev Server (Webpack) (Legacy)📦 Dev Server (Webpack)
⚡ Production Builds
📦 Production Builds (Webpack) (Legacy)📦 Production Builds (Webpack)
📦 Bundle SizesBundle Sizes⚡ TurbopackClient Main Bundles: **437 kB** → **437 kB**
|
| Canary | PR | Change | |
|---|---|---|---|
| middleware-b..fest.js gzip | 756 B | 759 B | ✓ |
| Total | 756 B | 759 B |
Build Details
Build Manifests
| Canary | PR | Change | |
|---|---|---|---|
| _buildManifest.js gzip | 451 B | 452 B | ✓ |
| Total | 451 B | 452 B |
📦 Webpack
Client
Main Bundles
| Canary | PR | Change | |
|---|---|---|---|
| 5528-HASH.js gzip | 5.47 kB | N/A | - |
| 6280-HASH.js gzip | 57 kB | N/A | - |
| 6335.HASH.js gzip | 169 B | N/A | - |
| 912-HASH.js gzip | 4.53 kB | N/A | - |
| e8aec2e4-HASH.js gzip | 62.5 kB | N/A | - |
| framework-HASH.js gzip | 59.7 kB | 59.7 kB | ✓ |
| main-app-HASH.js gzip | 255 B | 253 B | ✓ |
| main-HASH.js gzip | 39.1 kB | 39.1 kB | ✓ |
| webpack-HASH.js gzip | 1.68 kB | 1.68 kB | ✓ |
| 262-HASH.js gzip | N/A | 4.53 kB | - |
| 2889.HASH.js gzip | N/A | 169 B | - |
| 5602-HASH.js gzip | N/A | 5.49 kB | - |
| 6948ada0-HASH.js gzip | N/A | 62.5 kB | - |
| 9544-HASH.js gzip | N/A | 57.6 kB | - |
| Total | 230 kB | 231 kB |
Polyfills
| Canary | PR | Change | |
|---|---|---|---|
| polyfills-HASH.js gzip | 39.4 kB | 39.4 kB | ✓ |
| Total | 39.4 kB | 39.4 kB | ✓ |
Pages
| Canary | PR | Change | |
|---|---|---|---|
| _app-HASH.js gzip | 194 B | 194 B | ✓ |
| _error-HASH.js gzip | 183 B | 180 B | 🟢 3 B (-2%) |
| css-HASH.js gzip | 331 B | 330 B | ✓ |
| dynamic-HASH.js gzip | 1.81 kB | 1.81 kB | ✓ |
| edge-ssr-HASH.js gzip | 256 B | 256 B | ✓ |
| head-HASH.js gzip | 351 B | 352 B | ✓ |
| hooks-HASH.js gzip | 384 B | 383 B | ✓ |
| image-HASH.js gzip | 580 B | 581 B | ✓ |
| index-HASH.js gzip | 260 B | 260 B | ✓ |
| link-HASH.js gzip | 2.49 kB | 2.49 kB | ✓ |
| routerDirect..HASH.js gzip | 320 B | 319 B | ✓ |
| script-HASH.js gzip | 386 B | 386 B | ✓ |
| withRouter-HASH.js gzip | 315 B | 315 B | ✓ |
| 1afbb74e6ecf..834.css gzip | 106 B | 106 B | ✓ |
| Total | 7.97 kB | 7.97 kB | ✅ -1 B |
Server
Edge SSR
| Canary | PR | Change | |
|---|---|---|---|
| edge-ssr.js gzip | 126 kB | 126 kB | ✓ |
| page.js gzip | 249 kB | 249 kB | ✓ |
| Total | 375 kB | 376 kB |
Middleware
| Canary | PR | Change | |
|---|---|---|---|
| middleware-b..fest.js gzip | 615 B | 614 B | ✓ |
| middleware-r..fest.js gzip | 156 B | 155 B | ✓ |
| middleware.js gzip | 33.1 kB | 33.2 kB | ✓ |
| edge-runtime..pack.js gzip | 842 B | 842 B | ✓ |
| Total | 34.7 kB | 34.8 kB |
Build Details
Build Manifests
| Canary | PR | Change | |
|---|---|---|---|
| _buildManifest.js gzip | 733 B | 735 B | ✓ |
| Total | 733 B | 735 B |
Build Cache
| Canary | PR | Change | |
|---|---|---|---|
| 0.pack gzip | 3.84 MB | 3.85 MB | 🔴 +8.26 kB (+0%) |
| index.pack gzip | 103 kB | 103 kB | ✓ |
| index.pack.old gzip | 104 kB | 103 kB | 🟢 1.23 kB (-1%) |
| Total | 4.05 MB | 4.05 MB |
🔄 Shared (bundler-independent)
Runtimes
| Canary | PR | Change | |
|---|---|---|---|
| app-page-exp...dev.js gzip | 315 kB | 315 kB | ✓ |
| app-page-exp..prod.js gzip | 167 kB | 167 kB | ✓ |
| app-page-tur...dev.js gzip | 315 kB | 315 kB | ✓ |
| app-page-tur..prod.js gzip | 167 kB | 167 kB | ✓ |
| app-page-tur...dev.js gzip | 312 kB | 312 kB | ✓ |
| app-page-tur..prod.js gzip | 166 kB | 166 kB | ✓ |
| app-page.run...dev.js gzip | 312 kB | 312 kB | ✓ |
| app-page.run..prod.js gzip | 166 kB | 166 kB | ✓ |
| app-route-ex...dev.js gzip | 70.5 kB | 70.5 kB | ✓ |
| app-route-ex..prod.js gzip | 49 kB | 49 kB | ✓ |
| app-route-tu...dev.js gzip | 70.5 kB | 70.5 kB | ✓ |
| app-route-tu..prod.js gzip | 49 kB | 49 kB | ✓ |
| app-route-tu...dev.js gzip | 70.1 kB | 70.1 kB | ✓ |
| app-route-tu..prod.js gzip | 48.8 kB | 48.8 kB | ✓ |
| app-route.ru...dev.js gzip | 70.1 kB | 70.1 kB | ✓ |
| app-route.ru..prod.js gzip | 48.7 kB | 48.7 kB | ✓ |
| dist_client_...dev.js gzip | 324 B | 324 B | ✓ |
| dist_client_...dev.js gzip | 326 B | 326 B | ✓ |
| dist_client_...dev.js gzip | 318 B | 318 B | ✓ |
| dist_client_...dev.js gzip | 317 B | 317 B | ✓ |
| pages-api-tu...dev.js gzip | 43.2 kB | 43.2 kB | ✓ |
| pages-api-tu..prod.js gzip | 32.9 kB | 32.9 kB | ✓ |
| pages-api.ru...dev.js gzip | 43.2 kB | 43.2 kB | ✓ |
| pages-api.ru..prod.js gzip | 32.8 kB | 32.8 kB | ✓ |
| pages-turbo....dev.js gzip | 52.5 kB | 52.5 kB | ✓ |
| pages-turbo...prod.js gzip | 39.4 kB | 39.4 kB | ✓ |
| pages.runtim...dev.js gzip | 52.5 kB | 52.5 kB | ✓ |
| pages.runtim..prod.js gzip | 39.4 kB | 39.4 kB | ✓ |
| server.runti..prod.js gzip | 62.7 kB | 62.7 kB | ✓ |
| Total | 2.8 MB | 2.8 MB | ✓ |
9fc1be9 to
7f428b3
Compare
265b4f7 to
e44a55f
Compare
e44a55f to
cf0ebe2
Compare
cf0ebe2 to
dbc4dee
Compare
bf8ee6c to
3a076c1
Compare
dbc4dee to
008301a
Compare
22404fd to
d6784c9
Compare
c69fc0d to
7703abd
Compare
d6784c9 to
5821f7d
Compare
turbopack/crates/turbo-persistence/src/static_sorted_file_builder.rs
Outdated
Show resolved
Hide resolved
turbopack/crates/turbo-persistence/src/static_sorted_file_builder.rs
Outdated
Show resolved
Hide resolved
|
i don't understand the PR description. This adjusts the mix of medium and small values, not the size of SST files? or if it does it is indirect from changing the block overheads? |
6004949 to
7d03ae9
Compare
5821f7d to
4c141f6
Compare
4c141f6 to
13e142c
Compare
7d03ae9 to
fc13ca9
Compare
13e142c to
26dd63f
Compare
lukesandberg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will greatly increase the number of small value blocks (and medium blocks, and blocks overall)
each block has a 4 byte overhead from the SST file level 'directory' plus of course the 4 byte decompressed size header. im guessing this will greatly increase the number of blocks.
i see the benefit of medium values during compaction, but couldn't we leverage the same optimization for small values (remapping block indices when merging?)
Also the statement in the PR description: "Small values benefit from better compression by being merged together in blocks, avoiding the need for a compression dictionary." is only partially true, we only use compression dictionaries for key blocks, not value blocks
26dd63f to
324e005
Compare
…ck count Track value block count during collection and compaction to prevent exceeding the u16 block index limit in SST files. Adds a ValueBlockCountTracker that monitors medium values (1 block each) and small value block packing, triggering is_full when approaching the limit.
|
Small values vs medium values is basically a trade-off.
Access costThe idea of my change was to make accessing small values cheaper: Before every access to a small value needed to decompress 64KiB before it could read the value. Now it only need to decompress 4kiB. Since decompression dominated the access time, this is a big improvement. Storage overheadThis change also causes more blocks. The storage overhead of every small value increased from
So the storage cost difference is very small according to this formula. It's a bit inefficient to use small values that are larger than 1/4 of the small value block size. But we have to take into account that the 8 bytes in the key block are compressed (tho luke figured out that key blocks are usually not compressed anyway). The 8 bytes in the block table are not compressed. Maybe we should compress the block table and decompress it into memory when we open the SST file? But a problem we are running into with this increased block size is that we are running into the u16::MAX block limit. I added some code to prevent that, but it makes SST files smaller when hitting the limit. That's probably not what we want here... Compression SizeBy decreasing the block size (for both small and medium values), we reducing our compression ratio. Compression is more efficient when more data is compressed. For smaller blocks it's recommended to use a compression dictionary. When the compression size is >4kB a compression dictionary isn't really needed anymore as there is enough stuff in the block itself to compress well. My change caused some (medium value) blocks to be between 1kB and 4kB, which isn't optimal. I think we could increase the MAX_SMALL_VALUE_SIZE to 4kB to address that. SummaryDue to hitting the block count limit and the better compression ratio, I think it makes sense to increase the This would be the updated table:
|
…IN_SMALL_VALUE_BLOCK_SIZE of 8kB Increase MAX_SMALL_VALUE_SIZE from 1kB to 4kB so more values are packed into shared blocks instead of getting dedicated medium value blocks. This significantly reduces block count, avoiding the u16::MAX block limit. Rename MAX_SMALL_VALUE_BLOCK_SIZE to MIN_SMALL_VALUE_BLOCK_SIZE (8kB) and change semantics from "don't exceed this size" to "emit block once this size is reached". This halves the block count compared to the previous 4kB setting while keeping access cost acceptable (~8-12kB decompression per lookup). Update README with value type trade-off documentation.
…e size boundaries Update outdated comments referencing old MAX_SMALL_VALUE_SIZE (1024) to reflect the new value (4096). Expand batch_get_different_sizes test to cover all value types: empty, inline, small, medium, larger, and blob.
324e005 to
5da224d
Compare

What?
Adjusts block sizing constants and heuristics in turbo-persistence to improve the balance between small and medium values, reduce block count, and improve access performance.
Changes
MAX_SMALL_VALUE_SIZE: 64 KiB → 4 KiB. Values up to 4 KiB are now stored as small values (packed into shared blocks). Values larger than 4 KiB become medium values with dedicated blocks that can be copied without decompression during compaction.MAX_SMALL_VALUE_BLOCK_SIZE→MIN_SMALL_VALUE_BLOCK_SIZE: Renamed and changed from a maximum (64 KiB) to a minimum (8 KiB). Small value blocks are now emitted once they accumulate at least 8 KiB, resulting in actual block sizes of 8–12 KiB.KEY_BLOCK_ENTRY_META_OVERHEAD: Updated from 8 to 20 to reflect the actual worst-case overhead per entry in a key block (type, position, hash, block index, size, position in block).Block count overflow protection: Added
ValueBlockCountTrackerto prevent exceeding the u16 block index limit (MAX_VALUE_BLOCK_COUNT = u16::MAX / 2), which accounts for the 50/50 merge-and-split during compaction.README: Updated value type documentation with size boundaries and added a trade-off table covering compression, compaction, access cost, and storage overhead.
Value type trade-offs