Retry after route rediscovery on NWK_NO_ROUTE in send_packet#273
Conversation
When a request fails with NWK_NO_ROUTE, send_packet rediscovered the route but then re-raised, relying on zigpy's application-level retry loop to actually re-send. zigpy 1.5.0 (#1824) moved retries out of ControllerApplication.request, so a directly-issued request was no longer retried and route rediscovery never took effect. Retry the send once within send_packet's existing recovery loop after rediscovering the route, matching the MAC_TRANSACTION_EXPIRED recovery path which already does this. This makes send_packet self-recovering regardless of caller.
There was a problem hiding this comment.
Pull request overview
This PR updates send_packet recovery so NWK_NO_ROUTE failures trigger route rediscovery and an immediate resend inside zigpy-znp, rather than relying on zigpy application-level retries.
Changes:
- Renames the retry loop variable so the attempt number can be used.
- Updates comments to include route rediscovery as a reason for the second send attempt.
- Adds a
NWK_NO_ROUTErecovery path that rediscovers the route and retries once.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The new NWK_NO_ROUTE retry path could combine with the existing MAC_TRANSACTION_EXPIRED recovery to consume both send attempts via `continue` without ever raising: a first-attempt NWK_NO_ROUTE followed by a first-time MAC_TRANSACTION_EXPIRED on the final attempt left the loop with succeeded=False and no exception, so an undelivered packet was reported as a successful send. Add a for/else safety net that re-raises the last error whenever the loop completes without a successful break. Add a regression test for the NWK_NO_ROUTE -> MAC_TRANSACTION_EXPIRED fall-through.
The scenario is a single deterministic config, so exact counts are stronger than `>= 1`: they pin route discovery to twice (NWK path plus MAC recovery) and verify the association is removed exactly once and re-added exactly once, catching an unbalanced remove/add that `>= 1` would miss.
Keep the comment focused on what the code does, not on zigpy's historical retry behavior.
|
I wonder if attaching through something like |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## dev #273 +/- ##
=======================================
Coverage 98.49% 98.49%
=======================================
Files 43 43
Lines 3579 3583 +4
=======================================
+ Hits 3525 3529 +4
Misses 54 54 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
We could also think about intentionally splitting up |
|
Yeah, I agree. This solution isn't great. Though I'd consider temporarily releasing a version with this, so we can unbreak HA before a b1 releases with the ZHA bump. Zigpy does pass |
|
Should we also bump the zigpy requirement to 1.5.0 or 1.5.1 in another PR before releasing a new version? Technically, the fixes all still work with older versions, only the adjusted tests require it. |
|
Sure, it won't hurt. |
When a request fails with
NWK_NO_ROUTE,send_packetrediscovered the route but then re-raised, relying on zigpy's application-level retry loop to actually re-send. zigpy 1.5.0 (zigpy/zigpy#1824) moved retries out ofControllerApplication.request, so a directly-issued request was no longer retried and route rediscovery never took effect.Retry the send once within
send_packet's existing recovery loop after rediscovering the route, matching theMAC_TRANSACTION_EXPIREDrecovery path which already does this. This makessend_packetself-recovering regardless of caller.The new
NWK_NO_ROUTEretry path could combine with the existingMAC_TRANSACTION_EXPIREDrecovery to consume both send attempts viacontinuewithout ever raising: a first-attemptNWK_NO_ROUTEfollowed by a first-timeMAC_TRANSACTION_EXPIREDon the final attempt left the loop withsucceeded=Falseand no exception, so an undelivered packet was reported as a successful send. The second commit adds afor/elsesafety net that re-raises the last error whenever the recovery loop completes without a successful send, along with a regression test for that fall-through (thanks to the Copilot review for catching this).Requires #272 and #274 for tests to pass.
Behavioral analysis: how the two recovery paths interact across versions
send_packet's recovery loop isfor retry_attempt in range(2)— a 2-attempt budget. The real change is who consumes that budget:MAC_TRANSACTION_EXPIREDpathNWK_NO_ROUTEpath_discover_route+continue(consumes an attempt)_discover_route+raise(consumes 0 internal attempts; relied on zigpy's app-level retry loop to re-runsend_packet)continue_discover_route+continue(now also consumes an attempt), guarded toretry_attempt < 1Before the PR, only MAC recovery used the internal budget; NWK recovery bailed out and leaned on zigpy's external
ControllerApplication.requestretry loop (default 3 attempts). zigpy 1.5.0 deleted that external loop, so both recovery paths now share the single 2-attempt internal budget.Empirical results (post-PR), for the two cross-order cases:
continue); attempt 1 hitsNWK_NO_ROUTEbut is the final attempt, so theretry_attempt < 1guard suppresses route rediscovery and it raises. CorrectDeliveryError; association re-added.continue; attempt 1 runs full MAC recovery +continue; loop exhausts →for/elseraises. CorrectDeliveryError; association re-added. (This is the case that silently succeeded before the safety net.)Both orderings terminate correctly and the association table stays balanced (
remove == add).Genuine differences vs. before the PR (all benign):
NWK_NO_ROUTEon the final internal attempt no longer triggers route rediscovery (theretry_attempt < 1guard). Negligible — route discovery only helps a subsequent send, and the nextDevice.request-level retry re-enterssend_packetat attempt 0 and rediscovers anyway.NWK → MACnow does an assoc-remove/re-add inside a call that can't retry afterward (slightly wasteful churn). Harmless — the re-add happens infinally, so the table stays consistent.Real traffic is not worse off: normal unicast goes through
Device.request, which in zigpy 1.5.x still retries (3 attempts), each invokingsend_packet's 2-attempt loop — so total send attempts are higher, not lower, and a pureNWK_NO_ROUTEnow self-recovers within a singlesend_packetcall. PureMAC_TRANSACTION_EXPIREDrecovery is unchanged. The only callers without the external retry are directapp.requestusers (broadcasts, some internal flows), which generally don't carry a single target device/association and so don't hit this recovery.