Support zigpy device re-interviews#755
Conversation
Co-authored-by: TheJulianJES <TheJulianJES@users.noreply.github.com> # Conflicts: # tests/test_device.py # zha/zigbee/device.py
These happened because of a zigpy bug that's now fixed, but maybe we should be safe here? Will re-check later.
# Conflicts: # tests/test_device.py # zha/zigbee/device.py
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## dev #755 +/- ##
==========================================
+ Coverage 97.63% 97.64% +0.01%
==========================================
Files 62 62
Lines 10818 10873 +55
==========================================
+ Hits 10562 10617 +55
Misses 256 256 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Adds ZHA support for zigpy “device re-interview” device-object swaps by rebuilding the ZHA Device, refreshing group entity subscriptions, and extending test coverage to cover OTA-triggered and user-triggered reinterview flows.
Changes:
- Add gateway handling for zigpy
device_reinterviewedevents and an HA-triggeredasync_reinterview_device()entrypoint. - Add ZHA
Deviceteardown/rebuild helpers and emit reconfigure-done consistently. - Refresh group member-entity subscriptions after rebuild and extend tests to validate swap/rebuild behavior.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
zha/zigbee/group.py |
Rebuilds member entity subscriptions to follow recreated entity objects after a reinterview swap. |
zha/zigbee/device.py |
Introduces teardown + rebuild from a new zigpy device and moves “reconfigure done” emission into a helper. |
zha/application/gateway.py |
Adds reinterview event handling, rebuild pipeline, and HA-triggered reinterview method. |
tests/test_update.py |
Updates OTA tests to avoid unintended reinterview side effects; adds post-OTA reinterview swap assertions. |
tests/test_device.py |
Adds tests covering gateway reinterview with swap, without swap, and zigpy-triggered OTA reinterview event path. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| try: | ||
| await zha_device.async_configure() | ||
| await zha_device.async_initialize() | ||
| except Exception: # noqa: BLE001 | ||
| _LOGGER.warning( | ||
| "Failed to configure/initialize device %s after reinterview", | ||
| new_zigpy_device.ieee, | ||
| exc_info=True, | ||
| ) | ||
|
|
||
| # Refresh group subscriptions so groups re-subscribe to the new | ||
| # platform entities. Without this, group state updates break | ||
| # because the old entity subscriptions are dead. | ||
| for group in self._groups.values(): | ||
| group.update_entity_subscriptions() | ||
|
|
||
| self.emit( | ||
| ZHA_GW_MSG_DEVICE_FULL_INIT, | ||
| DeviceFullInitEvent( | ||
| device_info=ExtendedDeviceInfoWithPairingStatus( | ||
| pairing_status=DevicePairingStatus.CONFIGURED, | ||
| **zha_device.extended_device_info.__dict__, | ||
| ), | ||
| ), | ||
| ) |
There was a problem hiding this comment.
_async_device_reinterviewed() emits ZHA_GW_MSG_DEVICE_FULL_INIT with pairing_status=CONFIGURED even when async_configure() / async_initialize() raised (the exception is logged but execution continues). This can cause consumers to treat a failed reinterview as successfully configured. Consider only emitting CONFIGURED on success, and emitting a different status (or skipping the full-init event) when the configure/initialize step fails.
There was a problem hiding this comment.
I think we need to do this for the frontend to not get stuck(?)
The Device already holds the zigpy device reference, so the second lookup against application_controller.devices was redundant and the None-guard was hiding a state-divergence bug rather than handling a real race.
`consider_unavailable_time` and `_available` are computed from `is_mains_powered` and `is_active_coordinator → is_coordinator`, both of which are `@cached_property`s listed in `_ZIGPY_CACHED_PROPERTIES`. The clear loop ran *after* those reads, so on re-interview the post-swap values were derived from pre-swap cached state.
# Conflicts: # zha/application/gateway.py
Mirrors the fix landed in zigpy#749 for `device_initialized`. The previous `pop(..., None)` lambda silently drops the replacement task's entry when a cancelled task's done-callback runs, leaving the in-flight task untracked.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
When zigpy swaps the underlying device, `_async_device_reinterviewed` runs `async_configure()` which emits its own reconfigure-done signal. Emitting again from `async_reinterview_device` produced a duplicate signal that fired before the rebuild finished, so consumers saw "configuration done" mid-flight. Detect the swap by comparing against `application_controller.devices` (which zigpy updates synchronously inside `reinterview()`) and only emit on the no-swap path.
If `reinterview()` raised (timeout, ZigbeeException, etc.) the function exited without emitting reconfigure_done, leaving the HA frontend stuck on "reconfiguring" indefinitely. Wrap the call in `try/finally` so the no-swap branch fires the completion signal in every exit path.
When `async_configure()` or `async_initialize()` raised after a swap, the gateway logged the failure but still emitted `DeviceFullInitEvent(pairing_status=CONFIGURED)`, misreporting a half-rebuilt device as fully configured. Worse, `async_configure()` hadn't reached its own `emit_reconfigure_done()`, so the HA reconfigure dialog would hang waiting for `zha_channel_cfg_done`. Now: emit `reconfigure_done` explicitly on failure to unstick the frontend, skip the `FullInit(CONFIGURED)` emit, and keep the group subscription refresh so groups drop stale references regardless.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Previously _remove_entity ran `del self._platform_entities[key]` only after `await entity.on_remove()` returned cleanly. If on_remove raised, _async_teardown caught the exception and continued — but the dict still held the stale entity. On reinterview, _add_pending_entities then saw the same key already present and silently dropped the rediscovered replacement, leaving a permanent zombie that shadowed the new entity. Move the dict cleanup (and removal-event emit) into a `finally` so the device's tracking is always consistent. The exception still propagates to the caller for logging.
`async_configure()` emits `reconfigure_done` internally on success. If `async_initialize()` then raised, the previous `not configured_ok` fallback would emit a second `reconfigure_done`. Track configure success separately so the fallback only fires when configure itself failed.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return | ||
|
|
||
| if zha_device.is_active_coordinator: | ||
| _LOGGER.debug("Skipping reinterview for active coordinator %s", ieee) |
There was a problem hiding this comment.
async_reinterview_device() returns early for the active coordinator without emitting emit_reconfigure_done(). Since this method is documented as the entry point for HA's reconfigure flow, skipping without the completion signal can leave the reconfigure UI waiting indefinitely. Consider emitting zha_device.emit_reconfigure_done() before returning (or explicitly documenting/ensuring the coordinator can never reach this path).
| _LOGGER.debug("Skipping reinterview for active coordinator %s", ieee) | |
| _LOGGER.debug("Skipping reinterview for active coordinator %s", ieee) | |
| zha_device.emit_reconfigure_done() |
There was a problem hiding this comment.
We don't show "reconfigure" in the frontend for coordinators, so I think we can skip this.
Generally, this should all be cleaned up in the future a bit, once the frontend actually properly supports re-interviewing.
|
Noticed no issues during testing. For now, re-interviewing cannot manually be triggered from HA Core. It only happens when a device is updated. |
AI summary
Support device re-interview (HA reconfigure + post-OTA). Note the description is not fully up-to-date with the latest commits.
When zigpy re-interviews a device, the underlying
zigpy.device.Deviceobject is replaced with a new one (different endpoints/clusters/quirks). Previously ZHA had no way to swap itsDevice's zigpy reference, so reconfigures and post-OTA re-interviews left ZHA holding stale endpoints, cluster handlers, and platform entities.Gateway
Gateway.async_reinterview_device(ieee)— entry point for HA's "reconfigure device" action. Callszigpy_device.reinterview(), then emitsreconfigure_doneso the HA frontend stops showing "reconfiguring". If zigpy swaps the device, thedevice_reinterviewedlistener handles the rebuild; if not, this is a no-op.Gateway.device_reinterviewed(device)— listener for zigpy'sdevice_reinterviewedevent (fired after OTA or successful reinterview). Cancels any in-flight init task for the device and spawns_async_device_reinterviewed.Gateway._async_device_reinterviewed(new_zigpy_device)— rebuilds the ZHA device viaasync_rebuild_from_zigpy_device, runsasync_configure+async_initialize, refreshes group subscriptions, and emitsDeviceFullInitEventso HA re-pairs entities.Device
Device._init_from_zigpy_device(zigpy_device)— extracted from__init__, sets up zigpy reference, quirk metadata, ZDO handler, and endpoints. Clears_on_remove_callbacks,_endpoints,_pending_entities, and invalidates cached properties listed in_ZIGPY_CACHED_PROPERTIES(name, manufacturer, model, signature, etc.) so they recompute against the new zigpy device.Device.async_rebuild_from_zigpy_device(zigpy_device)— tears down handlers/entities (emitting removal events so HA cleans up stale entities) and re-runs_init_from_zigpy_deviceagainst the new zigpy device.Device._async_teardown(emit_entity_events)— extracted fromon_remove. Re-interview path passesTrueto fireDeviceEntityRemovedEventfor each entity; shutdown path passesFalseto avoid noise.on_removenow delegates to_async_teardown(emit_entity_events=False).Device.emit_reconfigure_done()— extracted so the gateway can signal completion when an interview failed without a swap (zigpy keeps the old device, but HA still needs to know the operation finished).Groups
Group.update_entity_subscriptionsrewritten to unconditionally tear down member-entity subscriptions and re-subscribe to current members. The old logic kept subscriptions whoseunique_idwas still present, which broke after re-interview because the unique_id matched but the underlying entity object had been replaced — group state updates silently stopped flowing.Tests
test_gateway_reconfigure_with_swap— full reinterview path with a manufacturer/model swap; verifies entities, group subscriptions, and cached properties refresh.test_gateway_reconfigure_no_swap— reinterview where zigpy keeps the old device; verifies no rebuild happens.test_gateway_device_reinterviewed_ota_path— entry via thedevice_reinterviewedlistener (OTA path) rather thanasync_reinterview_device.test_add_entity_duplicate/test_remove_entity_nonexistent— guard the new_add_entity/_remove_entityinvariants.