feat(diff): A/B-labeled change format + root-input comparison#104
Merged
Conversation
Two changes to `roar diff` output: 1. Param, code, and env diffs now render `A: x B: y` instead of `x -> y`. The arrow left the A/B mapping implicit (you had to infer it from the header) and implied a before->after direction that doesn't apply — A and B are siblings, not a timeline. 2. `roar diff` now compares root input artifacts — the raw data each lineage started from, matching `roar inputs` semantics (a producer with no inputs, e.g. a download, is itself a source). When they match, UNCHANGED gains an `inputs (same artifacts)` line; when they differ, an `INPUTS` section lists the divergence and supersedes the per-step DATA category so the same fact isn't reported twice. Adds `collect_root_inputs` plus `root_inputs_match` / `root_inputs_a` / `root_inputs_b` on `GraphDiffComputation` and `DiffResult`; JSON output includes the new fields. Tests cover the format, the root-input helper (including the inputless-producer-is-root case), the comparison, and renderer behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
TrevorBasinger
approved these changes
May 14, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two changes to
roar diffoutput, both verified against a real MNIST pipeline:A: x B: yinstead ofx -> y. The arrow left the A/B mapping implicit (inferred from the header) and implied a before→after direction that doesn't apply — A and B are siblings being compared, not a timeline.roar diffnow compares the raw artifacts each lineage started from, matchingroar inputssemantics (a producer with no inputs — a download /roar get— is itself a source). When root inputs match,UNCHANGEDgains aninputs (same artifacts)line. When they differ, anINPUTSsection lists the divergence and supersedes the per-stepDATAcategory so the same fact isn't reported in both places.New
collect_root_inputshelper +root_inputs_match/root_inputs_a/root_inputs_bonGraphDiffComputationandDiffResult; JSON output includes the new fields.Test plan
tests/unit/test_diff_engine.py+test_diff_renderer.py— 14 new tests (A/B format,collect_root_inputsincl. the inputless-producer-is-root case, root-input comparison, rendererINPUTS/UNCHANGED/DATA-suppression, JSON fields).extract → trainpipeline.Out of scope (pre-existing, observed during testing)
--outputfilename shows up as aPARAMSdiff and can score asROOT CAUSEover a real code change (PARAM_CHANGED0.9 >CODE_CHANGED0.85).wgetstep whose URL differs past the truncation point renders "command differs" with identical-looking A/B.Both are scoring/truncation papercuts, separate from this change.
🤖 Generated with Claude Code