Skip to content

Fix discovery crash from quirk removing ZCL attributes#788

Draft
TheJulianJES wants to merge 6 commits into
zigpy:devfrom
TheJulianJES:tjj/fix-quirks-missing-attr-definitions
Draft

Fix discovery crash from quirk removing ZCL attributes#788
TheJulianJES wants to merge 6 commits into
zigpy:devfrom
TheJulianJES:tjj/fix-quirks-missing-attr-definitions

Conversation

@TheJulianJES

@TheJulianJES TheJulianJES commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

DRAFT.

Proposed change

This fixes an issue where a (custom) quirk can remove standard ZCL attributes. Most entity platforms already have guards checking if the attribute even exists, but some do not. This adds them.

Additional information

I'm not sure if this is something we should add – quirks shouldn't misbehave like this. Or are there valid use-cases for deleting ZCL attributes...? But currently, ZHA startup breaks completely when using these custom quirks.

Should address:

This "regression" was introduced with:

AI summary

Issue and fix summary (CLICK TO EXPAND)

Issue

Some custom v1 quirks fully replace a standard cluster's attribute definitions, e.g. the widely used ts0601_trv_moes.py from jacekk015/zha_quirks does:

class MoesWindowDetection(LocalDataCluster, OnOff):
    attributes = LocalDataCluster.attributes.copy()  # empty dict
    attributes.update({
        0x6000: ("window_detection_temperature", t.int16s),
        0x6001: ("window_detection_timeout_minutes", t.uint8_t),
    })

The resulting OnOff cluster has no on_off/start_up_on_off attribute definitions. Cluster.is_attribute_unsupported() (and find_attribute()) raise KeyError for attribute names without a definition.

Since #657, Switch._is_supported() calls cluster.is_attribute_unsupported("on_off") during entity discovery. The KeyError propagated through Device._add_pending_entities()Gateway.load_devices() → HA's async_setup_entry, so one broken custom quirk prevented the entire ZHA integration from starting (ConfigEntryNotReady retry loop). Previously, the cluster-handler-based code tolerated these clusters.

Affected code paths

  • Switch._is_supported() was the only _is_supported implementation missing the attributes_by_name guard that all other platforms already had (the crash from the linked issue).
  • WindowCoveringInversionSwitch._is_supported() had the guard, but evaluated is_attribute_unsupported() first.
  • configure_cluster_configs() called find_attribute() unguarded on reporting attributes aggregated from entities not yet filtered by is_supported(), failing on device join/reconfigure.
  • AggregatedClusterPoller.async_update() called is_attribute_unsupported() unguarded on sibling entities' cluster-config attributes during polling.

Fix

Check attr_name in cluster.attributes_by_name before calling is_attribute_unsupported()/find_attribute() at the four sites above, treating attributes without definitions as unsupported (entity not created / reporting skipped with a debug log).

A regression test joins a device with a quirks v2 quirk (local registry) that replaces the OnOff cluster with one whose AttributeDefs does not inherit the standard definitions, reproducing the exact KeyError: 'on_off' from the issue on unfixed code, and asserts device initialization succeeds with no switch entity created.

Custom v1 quirks that fully override a standard cluster's `attributes`
dict (e.g. `attributes = LocalDataCluster.attributes.copy()` on an
`OnOff` cluster) produce clusters without standard attribute
definitions. `Cluster.is_attribute_unsupported()` raises `KeyError` for
unknown attribute names, which propagated out of
`Switch._is_supported()` and failed the whole gateway initialization.

Check `attributes_by_name` first, like all other `_is_supported`
implementations already do. Also fix the check order in
`WindowCoveringInversionSwitch._is_supported`, where the existing guard
ran after `is_attribute_unsupported()`.
`configure_cluster_configs` aggregates configs from discovered entities
before they are filtered by `is_supported()`, so a quirk-replaced
cluster missing standard attribute definitions made
`find_attribute()` raise `KeyError` during device configuration.
Skip such attributes with a debug log, matching how the attribute read
path already tolerates them.
A sibling entity's `_server_cluster_config` can list attributes beyond
its own `_attribute_name`, which may not exist on a quirk-replaced
cluster, making `is_attribute_unsupported()` raise `KeyError` during
polling. Check `attributes_by_name` first.
Avoids registering a v1 quirk in the global `DEVICE_REGISTRY`.
An `AttributeDefs` class inheriting `BaseAttributeDefs` instead of
`OnOff.AttributeDefs` replaces the standard attribute definitions the
same way the legacy `attributes` dict override does.
@TheJulianJES TheJulianJES changed the title Fix quirk removing ZCL attributes crashing discovery Fix discovery crash from quirk removing ZCL attributes Jun 10, 2026
@codecov

codecov Bot commented Jun 10, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.41%. Comparing base (3a884dc) to head (449041c).
⚠️ Report is 7 commits behind head on dev.

Additional details and impacted files
@@           Coverage Diff           @@
##              dev     #788   +/-   ##
=======================================
  Coverage   97.41%   97.41%           
=======================================
  Files          50       50           
  Lines       10419    10423    +4     
=======================================
+ Hits        10150    10154    +4     
  Misses        269      269           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant