Skip to content

Support zigpy device re-interviews#755

Merged
TheJulianJES merged 39 commits intozigpy:devfrom
TheJulianJES:tjj/reinterviewing
Apr 29, 2026
Merged

Support zigpy device re-interviews#755
TheJulianJES merged 39 commits intozigpy:devfrom
TheJulianJES:tjj/reinterviewing

Conversation

@TheJulianJES
Copy link
Copy Markdown
Contributor

@TheJulianJES TheJulianJES commented Apr 29, 2026

AI summary

Support device re-interview (HA reconfigure + post-OTA). Note the description is not fully up-to-date with the latest commits.

When zigpy re-interviews a device, the underlying zigpy.device.Device object is replaced with a new one (different endpoints/clusters/quirks). Previously ZHA had no way to swap its Device's zigpy reference, so reconfigures and post-OTA re-interviews left ZHA holding stale endpoints, cluster handlers, and platform entities.

Gateway

  • Gateway.async_reinterview_device(ieee) — entry point for HA's "reconfigure device" action. Calls zigpy_device.reinterview(), then emits reconfigure_done so the HA frontend stops showing "reconfiguring". If zigpy swaps the device, the device_reinterviewed listener handles the rebuild; if not, this is a no-op.
  • Gateway.device_reinterviewed(device) — listener for zigpy's device_reinterviewed event (fired after OTA or successful reinterview). Cancels any in-flight init task for the device and spawns _async_device_reinterviewed.
  • Gateway._async_device_reinterviewed(new_zigpy_device) — rebuilds the ZHA device via async_rebuild_from_zigpy_device, runs async_configure + async_initialize, refreshes group subscriptions, and emits DeviceFullInitEvent so HA re-pairs entities.

Device

  • Device._init_from_zigpy_device(zigpy_device) — extracted from __init__, sets up zigpy reference, quirk metadata, ZDO handler, and endpoints. Clears _on_remove_callbacks, _endpoints, _pending_entities, and invalidates cached properties listed in _ZIGPY_CACHED_PROPERTIES (name, manufacturer, model, signature, etc.) so they recompute against the new zigpy device.
  • Device.async_rebuild_from_zigpy_device(zigpy_device) — tears down handlers/entities (emitting removal events so HA cleans up stale entities) and re-runs _init_from_zigpy_device against the new zigpy device.
  • Device._async_teardown(emit_entity_events) — extracted from on_remove. Re-interview path passes True to fire DeviceEntityRemovedEvent for each entity; shutdown path passes False to avoid noise. on_remove now delegates to _async_teardown(emit_entity_events=False).
  • Device.emit_reconfigure_done() — extracted so the gateway can signal completion when an interview failed without a swap (zigpy keeps the old device, but HA still needs to know the operation finished).

Groups

  • Group.update_entity_subscriptions rewritten to unconditionally tear down member-entity subscriptions and re-subscribe to current members. The old logic kept subscriptions whose unique_id was still present, which broke after re-interview because the unique_id matched but the underlying entity object had been replaced — group state updates silently stopped flowing.

Tests

  • test_gateway_reconfigure_with_swap — full reinterview path with a manufacturer/model swap; verifies entities, group subscriptions, and cached properties refresh.
  • test_gateway_reconfigure_no_swap — reinterview where zigpy keeps the old device; verifies no rebuild happens.
  • test_gateway_device_reinterviewed_ota_path — entry via the device_reinterviewed listener (OTA path) rather than async_reinterview_device.
  • test_add_entity_duplicate / test_remove_entity_nonexistent — guard the new _add_entity / _remove_entity invariants.

TheJulianJES and others added 29 commits March 16, 2026 01:06
Co-authored-by: TheJulianJES <TheJulianJES@users.noreply.github.com>

# Conflicts:
#	tests/test_device.py
#	zha/zigbee/device.py
These happened because of a zigpy bug that's now fixed, but maybe we should be safe here? Will re-check later.
# Conflicts:
#	tests/test_device.py
#	zha/zigbee/device.py
Copilot AI review requested due to automatic review settings April 29, 2026 07:56
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 29, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.64%. Comparing base (3e6a038) to head (c4dd7d4).
⚠️ Report is 1 commits behind head on dev.

Additional details and impacted files
@@            Coverage Diff             @@
##              dev     #755      +/-   ##
==========================================
+ Coverage   97.63%   97.64%   +0.01%     
==========================================
  Files          62       62              
  Lines       10818    10873      +55     
==========================================
+ Hits        10562    10617      +55     
  Misses        256      256              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds ZHA support for zigpy “device re-interview” device-object swaps by rebuilding the ZHA Device, refreshing group entity subscriptions, and extending test coverage to cover OTA-triggered and user-triggered reinterview flows.

Changes:

  • Add gateway handling for zigpy device_reinterviewed events and an HA-triggered async_reinterview_device() entrypoint.
  • Add ZHA Device teardown/rebuild helpers and emit reconfigure-done consistently.
  • Refresh group member-entity subscriptions after rebuild and extend tests to validate swap/rebuild behavior.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
zha/zigbee/group.py Rebuilds member entity subscriptions to follow recreated entity objects after a reinterview swap.
zha/zigbee/device.py Introduces teardown + rebuild from a new zigpy device and moves “reconfigure done” emission into a helper.
zha/application/gateway.py Adds reinterview event handling, rebuild pipeline, and HA-triggered reinterview method.
tests/test_update.py Updates OTA tests to avoid unintended reinterview side effects; adds post-OTA reinterview swap assertions.
tests/test_device.py Adds tests covering gateway reinterview with swap, without swap, and zigpy-triggered OTA reinterview event path.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread zha/application/gateway.py Outdated
Comment thread zha/application/gateway.py Outdated
Comment on lines +487 to +511
try:
await zha_device.async_configure()
await zha_device.async_initialize()
except Exception: # noqa: BLE001
_LOGGER.warning(
"Failed to configure/initialize device %s after reinterview",
new_zigpy_device.ieee,
exc_info=True,
)

# Refresh group subscriptions so groups re-subscribe to the new
# platform entities. Without this, group state updates break
# because the old entity subscriptions are dead.
for group in self._groups.values():
group.update_entity_subscriptions()

self.emit(
ZHA_GW_MSG_DEVICE_FULL_INIT,
DeviceFullInitEvent(
device_info=ExtendedDeviceInfoWithPairingStatus(
pairing_status=DevicePairingStatus.CONFIGURED,
**zha_device.extended_device_info.__dict__,
),
),
)
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_async_device_reinterviewed() emits ZHA_GW_MSG_DEVICE_FULL_INIT with pairing_status=CONFIGURED even when async_configure() / async_initialize() raised (the exception is logged but execution continues). This can cause consumers to treat a failed reinterview as successfully configured. Consider only emitting CONFIGURED on success, and emitting a different status (or skipping the full-init event) when the configure/initialize step fails.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to do this for the frontend to not get stuck(?)

Comment thread zha/zigbee/device.py Outdated
The Device already holds the zigpy device reference, so the second
lookup against application_controller.devices was redundant and the
None-guard was hiding a state-divergence bug rather than handling a
real race.
`consider_unavailable_time` and `_available` are computed from
`is_mains_powered` and `is_active_coordinator → is_coordinator`, both of
which are `@cached_property`s listed in `_ZIGPY_CACHED_PROPERTIES`. The
clear loop ran *after* those reads, so on re-interview the post-swap
values were derived from pre-swap cached state.
# Conflicts:
#	zha/application/gateway.py
Mirrors the fix landed in zigpy#749 for `device_initialized`. The previous
`pop(..., None)` lambda silently drops the replacement task's entry
when a cancelled task's done-callback runs, leaving the in-flight task
untracked.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread zha/application/gateway.py Outdated
Comment thread zha/application/gateway.py Outdated
When zigpy swaps the underlying device, `_async_device_reinterviewed`
runs `async_configure()` which emits its own reconfigure-done signal.
Emitting again from `async_reinterview_device` produced a duplicate
signal that fired before the rebuild finished, so consumers saw
"configuration done" mid-flight.  Detect the swap by comparing against
`application_controller.devices` (which zigpy updates synchronously
inside `reinterview()`) and only emit on the no-swap path.
If `reinterview()` raised (timeout, ZigbeeException, etc.) the function
exited without emitting reconfigure_done, leaving the HA frontend stuck
on "reconfiguring" indefinitely.  Wrap the call in `try/finally` so the
no-swap branch fires the completion signal in every exit path.
When `async_configure()` or `async_initialize()` raised after a swap,
the gateway logged the failure but still emitted
`DeviceFullInitEvent(pairing_status=CONFIGURED)`, misreporting a
half-rebuilt device as fully configured.  Worse, `async_configure()`
hadn't reached its own `emit_reconfigure_done()`, so the HA reconfigure
dialog would hang waiting for `zha_channel_cfg_done`.

Now: emit `reconfigure_done` explicitly on failure to unstick the
frontend, skip the `FullInit(CONFIGURED)` emit, and keep the group
subscription refresh so groups drop stale references regardless.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread zha/zigbee/device.py
Comment thread zha/application/gateway.py Outdated
Previously _remove_entity ran `del self._platform_entities[key]` only
after `await entity.on_remove()` returned cleanly.  If on_remove raised,
_async_teardown caught the exception and continued — but the dict still
held the stale entity.  On reinterview, _add_pending_entities then saw
the same key already present and silently dropped the rediscovered
replacement, leaving a permanent zombie that shadowed the new entity.

Move the dict cleanup (and removal-event emit) into a `finally` so the
device's tracking is always consistent.  The exception still propagates
to the caller for logging.
`async_configure()` emits `reconfigure_done` internally on success.  If
`async_initialize()` then raised, the previous `not configured_ok`
fallback would emit a second `reconfigure_done`.  Track configure
success separately so the fallback only fires when configure itself
failed.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

return

if zha_device.is_active_coordinator:
_LOGGER.debug("Skipping reinterview for active coordinator %s", ieee)
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

async_reinterview_device() returns early for the active coordinator without emitting emit_reconfigure_done(). Since this method is documented as the entry point for HA's reconfigure flow, skipping without the completion signal can leave the reconfigure UI waiting indefinitely. Consider emitting zha_device.emit_reconfigure_done() before returning (or explicitly documenting/ensuring the coordinator can never reach this path).

Suggested change
_LOGGER.debug("Skipping reinterview for active coordinator %s", ieee)
_LOGGER.debug("Skipping reinterview for active coordinator %s", ieee)
zha_device.emit_reconfigure_done()

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't show "reconfigure" in the frontend for coordinators, so I think we can skip this.
Generally, this should all be cleaned up in the future a bit, once the frontend actually properly supports re-interviewing.

@TheJulianJES
Copy link
Copy Markdown
Contributor Author

Noticed no issues during testing. For now, re-interviewing cannot manually be triggered from HA Core. It only happens when a device is updated.

@TheJulianJES TheJulianJES merged commit f443146 into zigpy:dev Apr 29, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants