[claude] bench: measure FlatBuffers verification cost for Layout and Array#8014
Open
joseph-isaacs wants to merge 2 commits into
Open
[claude] bench: measure FlatBuffers verification cost for Layout and Array#8014joseph-isaacs wants to merge 2 commits into
joseph-isaacs wants to merge 2 commits into
Conversation
Adds a divan benchmark in vortex-layout/benches/flatbuffer_verify.rs
that compares root::<T> (checked), root_with_opts::<T>, and
root_unchecked::<T> for representative Layout and Array shapes.
Results on this machine (medians, release):
Layout (per file open):
1 x 8 (2.0 KB) -> 726 ns checked, 2.6 ns unchecked
16 x 32 (34 KB) -> 33.3 us checked
128 x 32 (295 KB) -> 277 us checked
1024 x 32 (2.5 MB)-> 2.23 ms checked
Array (per SerializedArray decode and per buffer_lengths() call):
8 fields (2.0 KB) -> 636 ns checked
100 fields(6.0 KB) -> 5.8 us checked
1000 fields(44 KB) -> 56 us checked
Key findings:
- root_unchecked is ~3 ns regardless of size; the verifier IS the cost.
- root_with_opts is the same cost as root - the Vortex VerifierOptions
knob is a DoS bound, not a perf knob.
- Verifier walks roughly O(table count + buffer size), ~100 ns/KB.
Motivates dropping the redundant root::<Array> re-verification inside
SerializedArray::buffer_lengths() (vortex-array/src/serde.rs:514), which
runs on an already-validated buffer.
Signed-off-by: Claude <noreply@anthropic.com>
…uffer_lengths `SerializedArray::buffer_lengths()` was calling `root::<fba::Array>()` on every invocation, which re-runs the full FlatBuffer verifier across the array tree. The buffer is already verified once at construction time by `validate_array_tree` (line 527), and the rest of `SerializedArray` already exploits this invariant - see `from_flatbuffer_and_segment_with_overrides` at line 614, which uses `fba::root_as_array_unchecked` with the same safety justification. Switch `buffer_lengths` to `fba::root_as_array_unchecked` for consistency. Measured on the same workload with the new buffer_lengths bench: n_fields fixed legacy(root::<Array>) speedup 1 26 ns 237 ns 9x 8 35 ns 772 ns 22x 32 59 ns 2.56 us 43x 100 140 ns 7.75 us 55x 1000 1.13 us 73.7 us 65x Post-fix cost grows with n_fields purely from the Vec<usize> allocation + iteration to extract buffer descriptor lengths; the verifier overhead is gone. Signed-off-by: Claude <noreply@anthropic.com>
Merging this PR will not alter performance
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| 🆕 | Simulation | buffer_lengths_fixed[1000] |
N/A | 16.2 µs | N/A |
| 🆕 | Simulation | buffer_lengths_fixed[1] |
N/A | 1.7 µs | N/A |
| 🆕 | Simulation | buffer_lengths_fixed[100] |
N/A | 3 µs | N/A |
| 🆕 | Simulation | buffer_lengths_fixed[32] |
N/A | 2.1 µs | N/A |
| 🆕 | Simulation | buffer_lengths_fixed[8] |
N/A | 1.8 µs | N/A |
| 🆕 | Simulation | buffer_lengths_legacy_root[1000] |
N/A | 336.3 µs | N/A |
| 🆕 | Simulation | buffer_lengths_legacy_root[100] |
N/A | 39 µs | N/A |
| 🆕 | Simulation | buffer_lengths_legacy_root[1] |
N/A | 6.4 µs | N/A |
| 🆕 | Simulation | buffer_lengths_legacy_root[32] |
N/A | 16.6 µs | N/A |
| 🆕 | Simulation | buffer_lengths_legacy_root[8] |
N/A | 8.8 µs | N/A |
| ❌ | Simulation | chunked_varbinview_canonical_into[(100, 100)] |
273.3 µs | 308 µs | -11.28% |
| ⚡ | Simulation | chunked_varbinview_opt_canonical_into[(1000, 10)] |
224.8 µs | 187.7 µs | +19.75% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing claude/flatbuffers-memory-safety-XKbWQ (172e0c7) with develop (7b47788)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds a divan benchmark in vortex-layout/benches/flatbuffer_verify.rs
that compares root:: (checked), root_with_opts::, and
root_unchecked:: for representative Layout and Array shapes.
Results on this machine (medians, release):
Layout (per file open):
1 x 8 (2.0 KB) -> 726 ns checked, 2.6 ns unchecked
16 x 32 (34 KB) -> 33.3 us checked
128 x 32 (295 KB) -> 277 us checked
1024 x 32 (2.5 MB)-> 2.23 ms checked
Array (per SerializedArray decode and per buffer_lengths() call):
8 fields (2.0 KB) -> 636 ns checked
100 fields(6.0 KB) -> 5.8 us checked
1000 fields(44 KB) -> 56 us checked
Key findings:
knob is a DoS bound, not a perf knob.
Motivates dropping the redundant root:: re-verification inside
SerializedArray::buffer_lengths() (vortex-array/src/serde.rs:514), which
runs on an already-validated buffer.
Signed-off-by: Claude noreply@anthropic.com