optimization: Batch DB and git operations in post-run pipeline#38
Merged
TrevorBasinger merged 7 commits intomainfrom Mar 17, 2026
Merged
optimization: Batch DB and git operations in post-run pipeline#38TrevorBasinger merged 7 commits intomainfrom
TrevorBasinger merged 7 commits intomainfrom
Conversation
Reduces per-file post-processing overhead from ~8.5ms to ~2.8ms by: - Batch git ls-files: single subprocess call instead of one per file in classify_all (files.py) - Batch artifact registration: bulk hash lookup and bulk insert instead of per-file ORM queries (artifact.py, job_recording.py) - Batch job edge creation: add_inputs_batch/add_outputs_batch with single flush instead of per-file insert+flush (job.py) - Batch hash retrieval: get_hashes_batch eliminates N+1 queries in get_inputs/get_outputs (artifact.py, job.py) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace per-file has_input_path/has_output_path calls with single IN-clause queries, fix redundant setdefault in get_hashes_batch, and document register_batch's reduced signature vs register(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When ROAR_TIMING=1 is set, prints a JSON timing summary to stderr with tracer, post-run, provenance, and record phase durations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
TrevorBasinger
approved these changes
Mar 17, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Reduces per-file post-run overhead from ~10 ms/file to ~2-3 ms/file by eliminating N+1 patterns in artifact registration and file classification.
Reduces backend time per file from ~8ms/file to ~2.5ms/file.
Test plan
test_job_recording,test_job_recording_dedup,test_file_filter,test_file_classifier_perf,test_byte_range_registration)bm_trace_1) confirm per-file costs with tracer/post-run breakdownroar show @Ndisplays correct inputs/outputs (exercisesget_hashes_batchpath)🤖 Generated with https://claude.com/claude-code