-
Notifications
You must be signed in to change notification settings - Fork 1
✨ Upgrade to honeydiff v0.5.0 with CIEDE2000 color difference #105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Replaces YIQ-based color comparison with CIEDE2000 (ΔE00), the CIE standard for perceptual color difference. ## Breaking Changes - `threshold` now uses CIEDE2000 Delta E units instead of YIQ normalized values - Default threshold changed from 0.1 to 2.0 - Threshold validation now allows values > 1.0 (was previously capped at 1.0) ## Threshold Scale - 0.0 = exact pixel matching - 1.0 = Just Noticeable Difference (JND) - 2.0 = recommended default (ignores sub-pixel rendering variance) - 3.0+ = permissive ## Changes - Update @vizzly-testing/honeydiff to ^0.5.0 - Change `colorThreshold` option to `threshold` in tdd-service.js - Update all default thresholds from 0.1 to 2.0 - Remove upper bound validation (threshold <= 1) - Update config schema, loader, and validation - Update tests to reflect new validation behavior
PR Review: Upgrade to honeydiff v0.5.0 with CIEDE2000SummaryThis PR upgrades the visual comparison engine from YIQ-based color difference to CIEDE2000 (ΔE00), the industry-standard perceptual color difference metric. The implementation is solid overall, but there are critical documentation gaps that need addressing before merge. 🔴 Critical Issues1. Incomplete Documentation UpdateThe code changes are thorough, but multiple documentation files still reference the old threshold values: Files needing updates:
Why this matters: Users reading docs will set Recommendation: Update all documentation to:
2. TypeScript Type Definition Files Not UpdatedFiles with old values:
Why this matters: These files serve as examples for TypeScript users and validate type correctness. Using old values creates confusion and makes type tests less realistic. Recommendation: Update type test examples to use realistic CIEDE2000 values (2.0, 3.0, etc.) 3. Test Fixtures Still Use Old ValuesFiles found:
Why this matters: While tests pass (they're mocks), using old values:
Recommendation: Update test fixtures and mocks to use 2.0 as the default. Keep some tests with edge cases (0.0, 1.0, 5.0) to validate the new range. ✅ What's Done WellCode Quality
Test Coverage
User Experience
🟡 Minor Observations1. Migration Guide for UsersConsider adding a migration note to help users convert their existing configs. Something like: ## Migrating from v0.4.x
If you have a custom threshold configured:
- Very strict (was 0.01-0.05): Try 1.0-1.5 (JND range)
- Default-ish (was 0.1-0.2): Use 2.0-3.0 (recommended)
- Permissive (was 0.3-0.5): Use 4.0-6.0
- Very permissive (was 0.6-1.0): Use 8.0-15.0
Test and adjust based on your needs!2. Comment ClarityIn // This happens when test helper passes { properties: {...}, threshold: 2.0 }Consider updating to use 2.0 for consistency. 3. Settings UI Validation
🔒 Security & Performance
📋 Action ItemsBefore Merge:
After Review:
VerdictConditional Approve The code implementation is excellent and thorough, but the documentation gaps are critical and will cause user confusion. Once the documentation is updated to reflect the new threshold scale, this is ready to merge. Great work on adopting CIEDE2000! This is a meaningful improvement to visual comparison accuracy. 🎨 Review completed following CLAUDE.md conventions. No AI attribution in commits per project guidelines. |
Updates all docs, type tests, and fixtures to use the new CIEDE2000 threshold scale (default 2.0) instead of the old YIQ scale (0.0-1.0). Files updated: - docs/doctor-command.md - docs/tdd-mode.md - docs/api-reference.md - README.md - SDK-API.md - claude-plugin/skills/debug-visual-regression/SKILL.md - test-d/config.test-d.ts - tests/reporter/fixtures/*.json - tests/reporter/test-helper.js
What's this all about?
This PR upgrades honeydiff to v0.5.0 which brings a major improvement to how we detect color differences! We're moving from the old YIQ-based comparison to CIEDE2000 (ΔE00) - the CIE's official standard for perceptual color difference.
The TL;DR: colors that look different to humans will now be detected as different, and colors that look the same won't trigger false positives. It's a much smarter approach.
Why ditch YIQ?
The old YIQ-based threshold was... kind of broken. Here's the deal:
The YIQ formula normalized values by dividing by 35,215, which produced tiny numbers. A 10-unit channel difference resulted in a value of 0.000455 - way below the 0.1 default threshold. This meant:
There was no sweet spot. You were either too strict or too lenient.
CIEDE2000 fixes this with an intuitive scale where 1.0 = the Just Noticeable Difference (JND). It's the industry standard for visual regression testing and gives us predictable, meaningful threshold values.
Breaking Changes
Heads up - this is a breaking change for the threshold values:
New Threshold Scale
Here's how to think about the new values:
Changes
@vizzly-testing/honeydiffto^0.5.0colorThreshold→thresholdin tdd-service.js0.1→2.0<= 1check)Test plan
npm run buildsucceedsvizzly doctorshows valid threshold