✨ Upgrade to honeydiff v0.5.0 with CIEDE2000 color difference #105

Robdel12 · 2025-12-04T06:58:54Z

What's this all about?

This PR upgrades honeydiff to v0.5.0 which brings a major improvement to how we detect color differences! We're moving from the old YIQ-based comparison to CIEDE2000 (ΔE00) - the CIE's official standard for perceptual color difference.

The TL;DR: colors that look different to humans will now be detected as different, and colors that look the same won't trigger false positives. It's a much smarter approach.

Why ditch YIQ?

The old YIQ-based threshold was... kind of broken. Here's the deal:

The YIQ formula normalized values by dividing by 35,215, which produced tiny numbers. A 10-unit channel difference resulted in a value of 0.000455 - way below the 0.1 default threshold. This meant:

Threshold of 0.1 → allowed almost anything through (not great for catching real differences)
Threshold of 0.001 → caught differences but also flagged 7-pixel run-to-run variance as failures

There was no sweet spot. You were either too strict or too lenient.

CIEDE2000 fixes this with an intuitive scale where 1.0 = the Just Noticeable Difference (JND). It's the industry standard for visual regression testing and gives us predictable, meaningful threshold values.

Breaking Changes

Heads up - this is a breaking change for the threshold values:

What	Before	After
Unit	YIQ normalized (0.0-1.0)	CIEDE2000 Delta E
Default	0.1	2.0
Max allowed	1.0	No limit

New Threshold Scale

Here's how to think about the new values:

0.0 = exact pixel matching (strictest)
1.0 = Just Noticeable Difference (JND) - very strict
2.0 = recommended default (ignores sub-pixel rendering variance)
3.0+ = more permissive

Changes

Bump @vizzly-testing/honeydiff to ^0.5.0
Swap colorThreshold → threshold in tdd-service.js
Update all default thresholds from 0.1 → 2.0
Remove the upper bound validation (no more <= 1 check)
Update config schema, loader, and all validation logic
Update tests to match the new validation behavior

Test plan

All 678 tests passing
npm run build succeeds
vizzly doctor shows valid threshold
Manual testing with threshold values > 1.0

Replaces YIQ-based color comparison with CIEDE2000 (ΔE00), the CIE standard for perceptual color difference. ## Breaking Changes - `threshold` now uses CIEDE2000 Delta E units instead of YIQ normalized values - Default threshold changed from 0.1 to 2.0 - Threshold validation now allows values > 1.0 (was previously capped at 1.0) ## Threshold Scale - 0.0 = exact pixel matching - 1.0 = Just Noticeable Difference (JND) - 2.0 = recommended default (ignores sub-pixel rendering variance) - 3.0+ = permissive ## Changes - Update @vizzly-testing/honeydiff to ^0.5.0 - Change `colorThreshold` option to `threshold` in tdd-service.js - Update all default thresholds from 0.1 to 2.0 - Remove upper bound validation (threshold <= 1) - Update config schema, loader, and validation - Update tests to reflect new validation behavior

claude · 2025-12-04T07:00:51Z

PR Review: Upgrade to honeydiff v0.5.0 with CIEDE2000

Summary

This PR upgrades the visual comparison engine from YIQ-based color difference to CIEDE2000 (ΔE00), the industry-standard perceptual color difference metric. The implementation is solid overall, but there are critical documentation gaps that need addressing before merge.

🔴 Critical Issues

1. Incomplete Documentation Update

The code changes are thorough, but multiple documentation files still reference the old threshold values:

Files needing updates:

docs/doctor-command.md:15 - Still says "between 0 and 1"
docs/tdd-mode.md:227,239 - Default still listed as 0.1
docs/api-reference.md:531,664 - References old scale and default
README.md:282,362 - Old default and range
SDK-API.md:427,470 - Example configs use 0.1
claude-plugin/skills/debug-visual-regression/SKILL.md:121,173 - Examples use 0.1

Why this matters: Users reading docs will set threshold: 0.1, which is 50x more strict than intended with CIEDE2000, causing massive test failures.

Recommendation: Update all documentation to:

Change default from 0.1 → 2.0
Remove "0-1" range constraints
Add brief explanation of CIEDE2000 scale where threshold is documented

2. TypeScript Type Definition Files Not Updated

Files with old values:

test-d/config.test-d.ts:36,53,88 - Type tests still use threshold: 0.1 and 0.05
test-d/main.test-d.ts - May also need checking

Why this matters: These files serve as examples for TypeScript users and validate type correctness. Using old values creates confusion and makes type tests less realistic.

Recommendation: Update type test examples to use realistic CIEDE2000 values (2.0, 3.0, etc.)

3. Test Fixtures Still Use Old Values

Files found:

tests/reporter/fixtures/failed-state.json (2 instances)
tests/reporter/fixtures/passed-state.json (2 instances)
tests/reporter/test-helper.js:113
Multiple test files use 0.1 in their mocks

Why this matters: While tests pass (they're mocks), using old values:

Makes tests less realistic
Could hide bugs if code expects values in CIEDE2000 scale
Confuses developers reading tests as examples

Recommendation: Update test fixtures and mocks to use 2.0 as the default. Keep some tests with edge cases (0.0, 1.0, 5.0) to validate the new range.

✅ What's Done Well

Code Quality

✅ Consistent parameter rename: colorThreshold → threshold properly updated in tdd-service.js:1229
✅ Validation logic updated: Removed upper bound check across all validation functions
✅ Config schema: Zod schema properly updated with .min(0) (no max constraint)
✅ Default values: All defaults changed to 2.0 across the codebase
✅ Error messages: Properly updated to reference CIEDE2000

Test Coverage

✅ Tests updated to validate new behavior (threshold > 1 now passes)
✅ Validation tests properly check non-negative constraint
✅ All 678 tests passing according to PR description

User Experience

✅ Clear breaking change documentation in PR body
✅ Helpful threshold scale reference table
✅ Good inline comments explaining CIEDE2000 scale

🟡 Minor Observations

1. Migration Guide for Users

Consider adding a migration note to help users convert their existing configs. Something like:

## Migrating from v0.4.x

If you have a custom threshold configured:
- Very strict (was 0.01-0.05): Try 1.0-1.5 (JND range)
- Default-ish (was 0.1-0.2): Use 2.0-3.0 (recommended)
- Permissive (was 0.3-0.5): Use 4.0-6.0
- Very permissive (was 0.6-1.0): Use 8.0-15.0

Test and adjust based on your needs!

2. Comment Clarity

In src/server/handlers/tdd-handler.js:262, the comment references the old value:

// This happens when test helper passes { properties: {...}, threshold: 2.0 }

Consider updating to use 2.0 for consistency.

3. Settings UI Validation

src/reporter/src/components/views/settings-view.jsx - Verify the UI form has proper validation for the new range (though the fallback to 2.0 is good).

🔒 Security & Performance

✅ No security concerns identified
✅ honeydiff is a trusted first-party package
✅ No performance regressions expected (CIEDE2000 is the new default in honeydiff)
✅ No breaking API changes to external consumers (threshold is still a number)

📋 Action Items

Before Merge:

⚠️ Update all documentation files (7+ files listed above)
⚠️ Update TypeScript type test examples
💡 Consider updating test fixtures to use realistic values (optional but recommended)
💡 Add migration guide to PR or CHANGELOG (optional but helpful)

After Review:

All tests passing ✅
Build succeeds ✅
Manual testing complete ✅

Verdict

Conditional Approve ⚠️

The code implementation is excellent and thorough, but the documentation gaps are critical and will cause user confusion. Once the documentation is updated to reflect the new threshold scale, this is ready to merge.

Great work on adopting CIEDE2000! This is a meaningful improvement to visual comparison accuracy. 🎨

Review completed following CLAUDE.md conventions. No AI attribution in commits per project guidelines.

Updates all docs, type tests, and fixtures to use the new CIEDE2000 threshold scale (default 2.0) instead of the old YIQ scale (0.0-1.0). Files updated: - docs/doctor-command.md - docs/tdd-mode.md - docs/api-reference.md - README.md - SDK-API.md - claude-plugin/skills/debug-visual-regression/SKILL.md - test-d/config.test-d.ts - tests/reporter/fixtures/*.json - tests/reporter/test-helper.js

Robdel12 merged commit 058e321 into main Dec 4, 2025
17 checks passed

Robdel12 deleted the feat/ciede2000-threshold branch December 4, 2025 07:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

✨ Upgrade to honeydiff v0.5.0 with CIEDE2000 color difference #105

✨ Upgrade to honeydiff v0.5.0 with CIEDE2000 color difference #105

Uh oh!

Robdel12 commented Dec 4, 2025 •

edited

Loading

Uh oh!

claude bot commented Dec 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

✨ Upgrade to honeydiff v0.5.0 with CIEDE2000 color difference #105

✨ Upgrade to honeydiff v0.5.0 with CIEDE2000 color difference #105

Uh oh!

Conversation

Robdel12 commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's this all about?

Why ditch YIQ?

Breaking Changes

New Threshold Scale

Changes

Test plan

Uh oh!

claude bot commented Dec 4, 2025

PR Review: Upgrade to honeydiff v0.5.0 with CIEDE2000

Summary

🔴 Critical Issues

1. Incomplete Documentation Update

2. TypeScript Type Definition Files Not Updated

3. Test Fixtures Still Use Old Values

✅ What's Done Well

Code Quality

Test Coverage

User Experience

🟡 Minor Observations

1. Migration Guide for Users

2. Comment Clarity

3. Settings UI Validation

🔒 Security & Performance

📋 Action Items

Verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Robdel12 commented Dec 4, 2025 •

edited

Loading