Skip to content

Conversation

@Robdel12
Copy link
Contributor

@Robdel12 Robdel12 commented Dec 4, 2025

What's this all about?

This PR upgrades honeydiff to v0.5.0 which brings a major improvement to how we detect color differences! We're moving from the old YIQ-based comparison to CIEDE2000 (ΔE00) - the CIE's official standard for perceptual color difference.

The TL;DR: colors that look different to humans will now be detected as different, and colors that look the same won't trigger false positives. It's a much smarter approach.

Why ditch YIQ?

The old YIQ-based threshold was... kind of broken. Here's the deal:

The YIQ formula normalized values by dividing by 35,215, which produced tiny numbers. A 10-unit channel difference resulted in a value of 0.000455 - way below the 0.1 default threshold. This meant:

  • Threshold of 0.1 → allowed almost anything through (not great for catching real differences)
  • Threshold of 0.001 → caught differences but also flagged 7-pixel run-to-run variance as failures

There was no sweet spot. You were either too strict or too lenient.

CIEDE2000 fixes this with an intuitive scale where 1.0 = the Just Noticeable Difference (JND). It's the industry standard for visual regression testing and gives us predictable, meaningful threshold values.

Breaking Changes

Heads up - this is a breaking change for the threshold values:

What Before After
Unit YIQ normalized (0.0-1.0) CIEDE2000 Delta E
Default 0.1 2.0
Max allowed 1.0 No limit

New Threshold Scale

Here's how to think about the new values:

  • 0.0 = exact pixel matching (strictest)
  • 1.0 = Just Noticeable Difference (JND) - very strict
  • 2.0 = recommended default (ignores sub-pixel rendering variance)
  • 3.0+ = more permissive

Changes

  • Bump @vizzly-testing/honeydiff to ^0.5.0
  • Swap colorThresholdthreshold in tdd-service.js
  • Update all default thresholds from 0.12.0
  • Remove the upper bound validation (no more <= 1 check)
  • Update config schema, loader, and all validation logic
  • Update tests to match the new validation behavior

Test plan

  • All 678 tests passing
  • npm run build succeeds
  • vizzly doctor shows valid threshold
  • Manual testing with threshold values > 1.0

Replaces YIQ-based color comparison with CIEDE2000 (ΔE00), the CIE standard
for perceptual color difference.

## Breaking Changes

- `threshold` now uses CIEDE2000 Delta E units instead of YIQ normalized values
- Default threshold changed from 0.1 to 2.0
- Threshold validation now allows values > 1.0 (was previously capped at 1.0)

## Threshold Scale

- 0.0 = exact pixel matching
- 1.0 = Just Noticeable Difference (JND)
- 2.0 = recommended default (ignores sub-pixel rendering variance)
- 3.0+ = permissive

## Changes

- Update @vizzly-testing/honeydiff to ^0.5.0
- Change `colorThreshold` option to `threshold` in tdd-service.js
- Update all default thresholds from 0.1 to 2.0
- Remove upper bound validation (threshold <= 1)
- Update config schema, loader, and validation
- Update tests to reflect new validation behavior
@claude
Copy link

claude bot commented Dec 4, 2025

PR Review: Upgrade to honeydiff v0.5.0 with CIEDE2000

Summary

This PR upgrades the visual comparison engine from YIQ-based color difference to CIEDE2000 (ΔE00), the industry-standard perceptual color difference metric. The implementation is solid overall, but there are critical documentation gaps that need addressing before merge.


🔴 Critical Issues

1. Incomplete Documentation Update

The code changes are thorough, but multiple documentation files still reference the old threshold values:

Files needing updates:

  • docs/doctor-command.md:15 - Still says "between 0 and 1"
  • docs/tdd-mode.md:227,239 - Default still listed as 0.1
  • docs/api-reference.md:531,664 - References old scale and default
  • README.md:282,362 - Old default and range
  • SDK-API.md:427,470 - Example configs use 0.1
  • claude-plugin/skills/debug-visual-regression/SKILL.md:121,173 - Examples use 0.1

Why this matters: Users reading docs will set threshold: 0.1, which is 50x more strict than intended with CIEDE2000, causing massive test failures.

Recommendation: Update all documentation to:

  • Change default from 0.12.0
  • Remove "0-1" range constraints
  • Add brief explanation of CIEDE2000 scale where threshold is documented

2. TypeScript Type Definition Files Not Updated

Files with old values:

  • test-d/config.test-d.ts:36,53,88 - Type tests still use threshold: 0.1 and 0.05
  • test-d/main.test-d.ts - May also need checking

Why this matters: These files serve as examples for TypeScript users and validate type correctness. Using old values creates confusion and makes type tests less realistic.

Recommendation: Update type test examples to use realistic CIEDE2000 values (2.0, 3.0, etc.)


3. Test Fixtures Still Use Old Values

Files found:

  • tests/reporter/fixtures/failed-state.json (2 instances)
  • tests/reporter/fixtures/passed-state.json (2 instances)
  • tests/reporter/test-helper.js:113
  • Multiple test files use 0.1 in their mocks

Why this matters: While tests pass (they're mocks), using old values:

  1. Makes tests less realistic
  2. Could hide bugs if code expects values in CIEDE2000 scale
  3. Confuses developers reading tests as examples

Recommendation: Update test fixtures and mocks to use 2.0 as the default. Keep some tests with edge cases (0.0, 1.0, 5.0) to validate the new range.


✅ What's Done Well

Code Quality

  • Consistent parameter rename: colorThresholdthreshold properly updated in tdd-service.js:1229
  • Validation logic updated: Removed upper bound check across all validation functions
  • Config schema: Zod schema properly updated with .min(0) (no max constraint)
  • Default values: All defaults changed to 2.0 across the codebase
  • Error messages: Properly updated to reference CIEDE2000

Test Coverage

  • ✅ Tests updated to validate new behavior (threshold > 1 now passes)
  • ✅ Validation tests properly check non-negative constraint
  • ✅ All 678 tests passing according to PR description

User Experience

  • ✅ Clear breaking change documentation in PR body
  • ✅ Helpful threshold scale reference table
  • ✅ Good inline comments explaining CIEDE2000 scale

🟡 Minor Observations

1. Migration Guide for Users

Consider adding a migration note to help users convert their existing configs. Something like:

## Migrating from v0.4.x

If you have a custom threshold configured:
- Very strict (was 0.01-0.05): Try 1.0-1.5 (JND range)
- Default-ish (was 0.1-0.2): Use 2.0-3.0 (recommended)
- Permissive (was 0.3-0.5): Use 4.0-6.0
- Very permissive (was 0.6-1.0): Use 8.0-15.0

Test and adjust based on your needs!

2. Comment Clarity

In src/server/handlers/tdd-handler.js:262, the comment references the old value:

// This happens when test helper passes { properties: {...}, threshold: 2.0 }

Consider updating to use 2.0 for consistency.

3. Settings UI Validation

src/reporter/src/components/views/settings-view.jsx - Verify the UI form has proper validation for the new range (though the fallback to 2.0 is good).


🔒 Security & Performance

  • ✅ No security concerns identified
  • ✅ honeydiff is a trusted first-party package
  • ✅ No performance regressions expected (CIEDE2000 is the new default in honeydiff)
  • ✅ No breaking API changes to external consumers (threshold is still a number)

📋 Action Items

Before Merge:

  1. ⚠️ Update all documentation files (7+ files listed above)
  2. ⚠️ Update TypeScript type test examples
  3. 💡 Consider updating test fixtures to use realistic values (optional but recommended)
  4. 💡 Add migration guide to PR or CHANGELOG (optional but helpful)

After Review:

  • All tests passing ✅
  • Build succeeds ✅
  • Manual testing complete ✅

Verdict

Conditional Approve ⚠️

The code implementation is excellent and thorough, but the documentation gaps are critical and will cause user confusion. Once the documentation is updated to reflect the new threshold scale, this is ready to merge.

Great work on adopting CIEDE2000! This is a meaningful improvement to visual comparison accuracy. 🎨


Review completed following CLAUDE.md conventions. No AI attribution in commits per project guidelines.

Updates all docs, type tests, and fixtures to use the new CIEDE2000
threshold scale (default 2.0) instead of the old YIQ scale (0.0-1.0).

Files updated:
- docs/doctor-command.md
- docs/tdd-mode.md
- docs/api-reference.md
- README.md
- SDK-API.md
- claude-plugin/skills/debug-visual-regression/SKILL.md
- test-d/config.test-d.ts
- tests/reporter/fixtures/*.json
- tests/reporter/test-helper.js
@Robdel12 Robdel12 merged commit 058e321 into main Dec 4, 2025
17 checks passed
@Robdel12 Robdel12 deleted the feat/ciede2000-threshold branch December 4, 2025 07:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants