Skip to content

Conversation

@sokra
Copy link
Member

@sokra sokra commented Feb 4, 2026

What?

Adjusts block sizing constants and heuristics in turbo-persistence to improve the balance between small and medium values, reduce block count, and improve access performance.

Changes

  1. MAX_SMALL_VALUE_SIZE: 64 KiB → 4 KiB. Values up to 4 KiB are now stored as small values (packed into shared blocks). Values larger than 4 KiB become medium values with dedicated blocks that can be copied without decompression during compaction.

  2. MAX_SMALL_VALUE_BLOCK_SIZEMIN_SMALL_VALUE_BLOCK_SIZE: Renamed and changed from a maximum (64 KiB) to a minimum (8 KiB). Small value blocks are now emitted once they accumulate at least 8 KiB, resulting in actual block sizes of 8–12 KiB.

  3. KEY_BLOCK_ENTRY_META_OVERHEAD: Updated from 8 to 20 to reflect the actual worst-case overhead per entry in a key block (type, position, hash, block index, size, position in block).

  4. Block count overflow protection: Added ValueBlockCountTracker to prevent exceeding the u16 block index limit (MAX_VALUE_BLOCK_COUNT = u16::MAX / 2), which accounts for the 50/50 merge-and-split during compaction.

  5. README: Updated value type documentation with size boundaries and added a trade-off table covering compression, compaction, access cost, and storage overhead.

Value type trade-offs

Inline Small Medium Blob
Size ≤ 8 B 9 B .. 4 kB 4 kB .. 64 MB > 64 MB
Compression unit size ≤ 16 kB 8 kB .. 12 kB 4 kB .. 64 MB > 64 MB
Access cost none decompress ~8 kB decompress value size open file, decompress value size
Compaction re-compressed re-compressed copied compressed pointer copied

@nextjs-bot nextjs-bot added created-by: Turbopack team PRs by the Turbopack team. Turbopack Related to Turbopack with Next.js. labels Feb 4, 2026
Copy link
Member Author

sokra commented Feb 4, 2026

@nextjs-bot
Copy link
Collaborator

nextjs-bot commented Feb 4, 2026

Tests Passed

@codspeed-hq
Copy link

codspeed-hq bot commented Feb 4, 2026

Merging this PR will not alter performance

✅ 17 untouched benchmarks
⏩ 3 skipped benchmarks1


Comparing sokra/db-compaction-heuristic (5da224d) with canary (c76b0fe)

Open in CodSpeed

Footnotes

  1. 3 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@nextjs-bot
Copy link
Collaborator

nextjs-bot commented Feb 4, 2026

Stats from current PR

✅ No significant changes detected

📊 All Metrics
📖 Metrics Glossary

Dev Server Metrics:

  • Listen = TCP port starts accepting connections
  • First Request = HTTP server returns successful response
  • Cold = Fresh build (no cache)
  • Warm = With cached build artifacts

Build Metrics:

  • Fresh = Clean build (no .next directory)
  • Cached = With existing .next directory

Change Thresholds:

  • Time: Changes < 50ms AND < 10%, OR < 2% are insignificant
  • Size: Changes < 1KB AND < 1% are insignificant
  • All other changes are flagged to catch regressions

⚡ Dev Server

Metric Canary PR Change Trend
Cold (Listen) 559ms 559ms ▅█▅▁█
Cold (Ready in log) 544ms 534ms ▇▇▇▆▇
Cold (First Request) 1.036s 1.027s ▇██▇▇
Warm (Listen) 558ms 559ms ▁█▁▁▁
Warm (Ready in log) 542ms 543ms ▃▇▄▁▆
Warm (First Request) 415ms 413ms ▂▂█▁▂
📦 Dev Server (Webpack) (Legacy)

📦 Dev Server (Webpack)

Metric Canary PR Change Trend
Cold (Listen) 456ms 456ms ▁▁▁█▁
Cold (Ready in log) 435ms 435ms ▃▁▂█▂
Cold (First Request) 1.834s 1.816s ▁▁▁█▁
Warm (Listen) 455ms 456ms ▁▁▁█▁
Warm (Ready in log) 436ms 436ms ▂▁▂█▁
Warm (First Request) 1.857s 1.842s ▁▁▁█▁

⚡ Production Builds

Metric Canary PR Change Trend
Fresh Build 4.910s 4.938s ▄▃▄▂▂
Cached Build 4.912s 4.814s ▂▁▂▁▁
📦 Production Builds (Webpack) (Legacy)

📦 Production Builds (Webpack)

Metric Canary PR Change Trend
Fresh Build 13.773s 13.772s ▁▁▁█▁
Cached Build 13.868s 13.894s ▁▁▁█▁
node_modules Size 467 MB 467 MB ▁▁▁▁▁
📦 Bundle Sizes

Bundle Sizes

⚡ Turbopack

Client

Main Bundles: **437 kB** → **437 kB** ⚠️ +11 B

81 files with content-based hashes (individual files not comparable between builds)

Server

Middleware
Canary PR Change
middleware-b..fest.js gzip 756 B 759 B
Total 756 B 759 B ⚠️ +3 B
Build Details
Build Manifests
Canary PR Change
_buildManifest.js gzip 451 B 452 B
Total 451 B 452 B ⚠️ +1 B

📦 Webpack

Client

Main Bundles
Canary PR Change
5528-HASH.js gzip 5.47 kB N/A -
6280-HASH.js gzip 57 kB N/A -
6335.HASH.js gzip 169 B N/A -
912-HASH.js gzip 4.53 kB N/A -
e8aec2e4-HASH.js gzip 62.5 kB N/A -
framework-HASH.js gzip 59.7 kB 59.7 kB
main-app-HASH.js gzip 255 B 253 B
main-HASH.js gzip 39.1 kB 39.1 kB
webpack-HASH.js gzip 1.68 kB 1.68 kB
262-HASH.js gzip N/A 4.53 kB -
2889.HASH.js gzip N/A 169 B -
5602-HASH.js gzip N/A 5.49 kB -
6948ada0-HASH.js gzip N/A 62.5 kB -
9544-HASH.js gzip N/A 57.6 kB -
Total 230 kB 231 kB ⚠️ +612 B
Polyfills
Canary PR Change
polyfills-HASH.js gzip 39.4 kB 39.4 kB
Total 39.4 kB 39.4 kB
Pages
Canary PR Change
_app-HASH.js gzip 194 B 194 B
_error-HASH.js gzip 183 B 180 B 🟢 3 B (-2%)
css-HASH.js gzip 331 B 330 B
dynamic-HASH.js gzip 1.81 kB 1.81 kB
edge-ssr-HASH.js gzip 256 B 256 B
head-HASH.js gzip 351 B 352 B
hooks-HASH.js gzip 384 B 383 B
image-HASH.js gzip 580 B 581 B
index-HASH.js gzip 260 B 260 B
link-HASH.js gzip 2.49 kB 2.49 kB
routerDirect..HASH.js gzip 320 B 319 B
script-HASH.js gzip 386 B 386 B
withRouter-HASH.js gzip 315 B 315 B
1afbb74e6ecf..834.css gzip 106 B 106 B
Total 7.97 kB 7.97 kB ✅ -1 B

Server

Edge SSR
Canary PR Change
edge-ssr.js gzip 126 kB 126 kB
page.js gzip 249 kB 249 kB
Total 375 kB 376 kB ⚠️ +446 B
Middleware
Canary PR Change
middleware-b..fest.js gzip 615 B 614 B
middleware-r..fest.js gzip 156 B 155 B
middleware.js gzip 33.1 kB 33.2 kB
edge-runtime..pack.js gzip 842 B 842 B
Total 34.7 kB 34.8 kB ⚠️ +142 B
Build Details
Build Manifests
Canary PR Change
_buildManifest.js gzip 733 B 735 B
Total 733 B 735 B ⚠️ +2 B
Build Cache
Canary PR Change
0.pack gzip 3.84 MB 3.85 MB 🔴 +8.26 kB (+0%)
index.pack gzip 103 kB 103 kB
index.pack.old gzip 104 kB 103 kB 🟢 1.23 kB (-1%)
Total 4.05 MB 4.05 MB ⚠️ +7.54 kB

🔄 Shared (bundler-independent)

Runtimes
Canary PR Change
app-page-exp...dev.js gzip 315 kB 315 kB
app-page-exp..prod.js gzip 167 kB 167 kB
app-page-tur...dev.js gzip 315 kB 315 kB
app-page-tur..prod.js gzip 167 kB 167 kB
app-page-tur...dev.js gzip 312 kB 312 kB
app-page-tur..prod.js gzip 166 kB 166 kB
app-page.run...dev.js gzip 312 kB 312 kB
app-page.run..prod.js gzip 166 kB 166 kB
app-route-ex...dev.js gzip 70.5 kB 70.5 kB
app-route-ex..prod.js gzip 49 kB 49 kB
app-route-tu...dev.js gzip 70.5 kB 70.5 kB
app-route-tu..prod.js gzip 49 kB 49 kB
app-route-tu...dev.js gzip 70.1 kB 70.1 kB
app-route-tu..prod.js gzip 48.8 kB 48.8 kB
app-route.ru...dev.js gzip 70.1 kB 70.1 kB
app-route.ru..prod.js gzip 48.7 kB 48.7 kB
dist_client_...dev.js gzip 324 B 324 B
dist_client_...dev.js gzip 326 B 326 B
dist_client_...dev.js gzip 318 B 318 B
dist_client_...dev.js gzip 317 B 317 B
pages-api-tu...dev.js gzip 43.2 kB 43.2 kB
pages-api-tu..prod.js gzip 32.9 kB 32.9 kB
pages-api.ru...dev.js gzip 43.2 kB 43.2 kB
pages-api.ru..prod.js gzip 32.8 kB 32.8 kB
pages-turbo....dev.js gzip 52.5 kB 52.5 kB
pages-turbo...prod.js gzip 39.4 kB 39.4 kB
pages.runtim...dev.js gzip 52.5 kB 52.5 kB
pages.runtim..prod.js gzip 39.4 kB 39.4 kB
server.runti..prod.js gzip 62.7 kB 62.7 kB
Total 2.8 MB 2.8 MB

@sokra sokra force-pushed the sokra/db-compaction-heuristic branch 2 times, most recently from 9fc1be9 to 7f428b3 Compare February 5, 2026 18:01
@sokra sokra force-pushed the sokra/db-compaction-heuristic branch 2 times, most recently from 265b4f7 to e44a55f Compare February 5, 2026 22:27
@sokra sokra changed the base branch from sokra/db-bench to graphite-base/89497 February 5, 2026 23:44
@sokra sokra force-pushed the sokra/db-compaction-heuristic branch from e44a55f to cf0ebe2 Compare February 5, 2026 23:44
@sokra sokra changed the base branch from graphite-base/89497 to sokra/remove-amqf-cache February 5, 2026 23:44
@sokra sokra force-pushed the sokra/db-compaction-heuristic branch from cf0ebe2 to dbc4dee Compare February 6, 2026 00:01
@sokra sokra force-pushed the sokra/remove-amqf-cache branch 2 times, most recently from bf8ee6c to 3a076c1 Compare February 6, 2026 09:50
@sokra sokra force-pushed the sokra/db-compaction-heuristic branch from dbc4dee to 008301a Compare February 6, 2026 09:50
@sokra sokra force-pushed the sokra/db-compaction-heuristic branch 2 times, most recently from 22404fd to d6784c9 Compare February 6, 2026 10:41
@sokra sokra force-pushed the sokra/remove-amqf-cache branch 2 times, most recently from c69fc0d to 7703abd Compare February 6, 2026 15:34
@sokra sokra force-pushed the sokra/db-compaction-heuristic branch from d6784c9 to 5821f7d Compare February 6, 2026 15:34
Copy link
Contributor

i don't understand the PR description. This adjusts the mix of medium and small values, not the size of SST files? or if it does it is indirect from changing the block overheads?

@sokra sokra force-pushed the sokra/remove-amqf-cache branch 2 times, most recently from 6004949 to 7d03ae9 Compare February 8, 2026 08:10
@sokra sokra force-pushed the sokra/db-compaction-heuristic branch from 5821f7d to 4c141f6 Compare February 8, 2026 08:10
@sokra sokra marked this pull request as ready for review February 8, 2026 08:19
@sokra sokra changed the base branch from sokra/remove-amqf-cache to graphite-base/89497 February 8, 2026 08:44
@sokra sokra force-pushed the sokra/db-compaction-heuristic branch from 4c141f6 to 13e142c Compare February 8, 2026 08:45
@sokra sokra force-pushed the graphite-base/89497 branch from 7d03ae9 to fc13ca9 Compare February 8, 2026 08:45
@graphite-app graphite-app bot changed the base branch from graphite-base/89497 to canary February 8, 2026 08:46
@sokra sokra force-pushed the sokra/db-compaction-heuristic branch from 13e142c to 26dd63f Compare February 8, 2026 08:46
Copy link
Contributor

@lukesandberg lukesandberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will greatly increase the number of small value blocks (and medium blocks, and blocks overall)

each block has a 4 byte overhead from the SST file level 'directory' plus of course the 4 byte decompressed size header. im guessing this will greatly increase the number of blocks.

i see the benefit of medium values during compaction, but couldn't we leverage the same optimization for small values (remapping block indices when merging?)

Also the statement in the PR description: "Small values benefit from better compression by being merged together in blocks, avoiding the need for a compression dictionary." is only partially true, we only use compression dictionaries for key blocks, not value blocks

@sokra sokra force-pushed the sokra/db-compaction-heuristic branch from 26dd63f to 324e005 Compare February 9, 2026 10:42
…ck count

Track value block count during collection and compaction to prevent
exceeding the u16 block index limit in SST files. Adds a
ValueBlockCountTracker that monitors medium values (1 block each) and
small value block packing, triggering is_full when approaching the limit.
Copy link
Member Author

sokra commented Feb 10, 2026

Small values vs medium values is basically a trade-off.

What? Inlined Small value Medium value Blob value
Compression Size <= 16kB MAX_KEY_BLOCK_SIZE <= 4kB (before 64kB) MAX_SMALL_VALUE_BLOCK_SIZE > 1kB (before 64kB) MAX_SMALL_VALUE_SIZE > 64 MB MAX_MEDIUM_VALUE_SIZE
Compaction re-compressed re-compressed copied compressed pointer copied
Access cost no extra overhead uncompress 4kB (before 64kB) uncompress value size open separate file, uncompress value size
Storage overhead 0 8 + value size / 4kB * 8 2 + 8 4 + 4

Access cost

The idea of my change was to make accessing small values cheaper: Before every access to a small value needed to decompress 64KiB before it could read the value. Now it only need to decompress 4kiB. Since decompression dominated the access time, this is a big improvement.

Storage overhead

This change also causes more blocks. The storage overhead of every small value increased from 8 + value size / 64kB * 8 to 8 + value size / 4kB * 8 (so 8 bytes in the key block + 8 bytes in the block table which is distributed to 4KiB of small values). The worse case (max small value of 1024 bytes) would be 8 + 2 = 10 bytes overhead which is 1% of the value. Smaller values have a bigger percentage of overhead, but that was true before:

size old new
1 0 (inlined) 0 (inlined)
2 0 (inlined) 0 (inlined)
4 0 (inlined) 0 (inlined)
8 0 (inlined) 0 (inlined)
16 8.0020 (50.01%) 8.0313 (50.20%)
32 8.0039 (25.01%) 8.0625 (25.20%)
64 8.0078 (12.51%) 8.1250 (12.70%)
128 8.0156 (6.26%) 8.2500 (6.45%)
256 8.0313 (3.14%) 8.5000 (3.32%)
512 8.0625 (1.57%) 9.0000 (1.76%)
1024 8.1250 (0.79%) 10.0000 (0.98%)
2048 8.2500 (0.40%) 10.0000 (0.49%)
4096 8.5000 (0.21%) 10.0000 (0.24%)
8192 9.0000 (0.11%) 10.0000 (0.12%)
16384 10.0000 (0.06%) 10.0000 (0.06%)
32768 12.0000 (0.04%) 10.0000 (0.03%)
65536 16.0000 (0.02%) 10.0000 (0.02%)
131072 10.0000 (0.01%) 10.0000 (0.01%)

So the storage cost difference is very small according to this formula. It's a bit inefficient to use small values that are larger than 1/4 of the small value block size.

But we have to take into account that the 8 bytes in the key block are compressed (tho luke figured out that key blocks are usually not compressed anyway). The 8 bytes in the block table are not compressed. Maybe we should compress the block table and decompress it into memory when we open the SST file?

But a problem we are running into with this increased block size is that we are running into the u16::MAX block limit. I added some code to prevent that, but it makes SST files smaller when hitting the limit. That's probably not what we want here...

Compression Size

By decreasing the block size (for both small and medium values), we reducing our compression ratio. Compression is more efficient when more data is compressed. For smaller blocks it's recommended to use a compression dictionary. When the compression size is >4kB a compression dictionary isn't really needed anymore as there is enough stuff in the block itself to compress well.

My change caused some (medium value) blocks to be between 1kB and 4kB, which isn't optimal.

I think we could increase the MAX_SMALL_VALUE_SIZE to 4kB to address that.
MAX_SMALL_VALUE_BLOCK_SIZE must be at least 2x of that to avoid very small small value blocks due to fragmentation, but I think we could change it to MIN_SMALL_VALUE_BLOCK_SIZE and make it only emit a small value block when we reach that size.
That would cause all small value blocks to be at least that size. In rare cases this could cause the block size to be up to 2 * MIN_SMALL_VALUE_BLOCK_SIZE.

Summary

Due to hitting the block count limit and the better compression ratio, I think it makes sense to increase the MIN_SMALL_VALUE_BLOCK_SIZE to 8kB, which cuts the block count in half, but double the decompression cost for the access. Sounds acceptable.

This would be the updated table:

size old new
1 0 (inlined) 0 (inlined)
2 0 (inlined) 0 (inlined)
4 0 (inlined) 0 (inlined)
8 0 (inlined) 0 (inlined)
16 8.0020 (50.01%) 8.0156 (50.10%)
32 8.0039 (25.01%) 8.0313 (25.10%)
64 8.0078 (12.51%) 8.0625 (12.60%)
128 8.0156 (6.26%) 8.1250 (6.35%)
256 8.0313 (3.14%) 8.2500 (3.22%)
512 8.0625 (1.57%) 8.5000 (1.66%)
1024 8.1250 (0.79%) 9.0000 (0.88%)
2048 8.2500 (0.40%) 10.0000 (0.49%)
4096 8.5000 (0.21%) 10.0000 (0.24%)
8192 9.0000 (0.11%) 10.0000 (0.12%)
16384 10.0000 (0.06%) 10.0000 (0.06%)
32768 12.0000 (0.04%) 10.0000 (0.03%)
65536 16.0000 (0.02%) 10.0000 (0.02%)

…IN_SMALL_VALUE_BLOCK_SIZE of 8kB

Increase MAX_SMALL_VALUE_SIZE from 1kB to 4kB so more values are packed
into shared blocks instead of getting dedicated medium value blocks. This
significantly reduces block count, avoiding the u16::MAX block limit.

Rename MAX_SMALL_VALUE_BLOCK_SIZE to MIN_SMALL_VALUE_BLOCK_SIZE (8kB)
and change semantics from "don't exceed this size" to "emit block once
this size is reached". This halves the block count compared to the
previous 4kB setting while keeping access cost acceptable (~8-12kB
decompression per lookup).

Update README with value type trade-off documentation.
…e size boundaries

Update outdated comments referencing old MAX_SMALL_VALUE_SIZE (1024)
to reflect the new value (4096). Expand batch_get_different_sizes test
to cover all value types: empty, inline, small, medium, larger, and blob.
@sokra sokra force-pushed the sokra/db-compaction-heuristic branch from 324e005 to 5da224d Compare February 10, 2026 10:08
@sokra sokra requested review from bgw and lukesandberg February 10, 2026 10:16
@sokra sokra merged commit 384cb2d into canary Feb 11, 2026
165 checks passed
@sokra sokra deleted the sokra/db-compaction-heuristic branch February 11, 2026 07:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

created-by: Turbopack team PRs by the Turbopack team. Turbopack Related to Turbopack with Next.js.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants