Skip to content

feat(diff): A/B-labeled change format + root-input comparison#104

Merged
TrevorBasinger merged 1 commit into
mainfrom
cg/diff-ab-format-root-inputs
May 14, 2026
Merged

feat(diff): A/B-labeled change format + root-input comparison#104
TrevorBasinger merged 1 commit into
mainfrom
cg/diff-ab-format-root-inputs

Conversation

@christophergeyer
Copy link
Copy Markdown
Member

Summary

Two changes to roar diff output, both verified against a real MNIST pipeline:

  • A/B-labeled change format. Param, code, and env diffs render A: x B: y instead of x -> y. The arrow left the A/B mapping implicit (inferred from the header) and implied a before→after direction that doesn't apply — A and B are siblings being compared, not a timeline.
  • Root-input comparison. roar diff now compares the raw artifacts each lineage started from, matching roar inputs semantics (a producer with no inputs — a download / roar get — is itself a source). When root inputs match, UNCHANGED gains an inputs (same artifacts) line. When they differ, an INPUTS section lists the divergence and supersedes the per-step DATA category so the same fact isn't reported in both places.

New collect_root_inputs helper + root_inputs_match / root_inputs_a / root_inputs_b on GraphDiffComputation and DiffResult; JSON output includes the new fields.

Test plan

  • tests/unit/test_diff_engine.py + test_diff_renderer.py — 14 new tests (A/B format, collect_root_inputs incl. the inputless-producer-is-root case, root-input comparison, renderer INPUTS/UNCHANGED/DATA-suppression, JSON fields).
  • Full unit suite: 882 passed, 1 skipped.
  • Manual: exercised all five diff dimensions (params, code, inputs-differ, pipeline-structure, identical) on a real MNIST extract → train pipeline.

Out of scope (pre-existing, observed during testing)

  • The --output filename shows up as a PARAMS diff and can score as ROOT CAUSE over a real code change (PARAM_CHANGED 0.9 > CODE_CHANGED 0.85).
  • A wget step whose URL differs past the truncation point renders "command differs" with identical-looking A/B.

Both are scoring/truncation papercuts, separate from this change.

🤖 Generated with Claude Code

Two changes to `roar diff` output:

1. Param, code, and env diffs now render `A: x   B: y` instead of
   `x -> y`. The arrow left the A/B mapping implicit (you had to infer
   it from the header) and implied a before->after direction that
   doesn't apply — A and B are siblings, not a timeline.

2. `roar diff` now compares root input artifacts — the raw data each
   lineage started from, matching `roar inputs` semantics (a producer
   with no inputs, e.g. a download, is itself a source). When they
   match, UNCHANGED gains an `inputs (same artifacts)` line; when they
   differ, an `INPUTS` section lists the divergence and supersedes the
   per-step DATA category so the same fact isn't reported twice.

Adds `collect_root_inputs` plus `root_inputs_match` / `root_inputs_a` /
`root_inputs_b` on `GraphDiffComputation` and `DiffResult`; JSON output
includes the new fields. Tests cover the format, the root-input helper
(including the inputless-producer-is-root case), the comparison, and
renderer behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@TrevorBasinger TrevorBasinger merged commit 30b24eb into main May 14, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants