-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reset EZSP stack on error frames or missing heartbeats #147
Conversation
I can throw this on my old problematic system early next week and do some testing for you. |
@walthowd I appreciate the feedback. Keep in mind this PR addresses one area only (resseting NCP on error frame) so I does not include any other work arrounds related to pending sequence handling. |
and please also keep in mind that HA dev tree depends on |
46f32c7
to
f64cd7c
Compare
please merge after PR #150 which should make code much cleaner |
@Adminiuga Running on my old system today, I pulled the USB adapter out at 12:09:53 and it was successfully reset and ZHA commands continued to work after it was reset. I'll leave it running longer today and see if I can get a real UART error https://www.dropbox.com/s/ysiaa1sipy44xsd/home-assistant-serial-reset.txt?dl=0 |
That's really good feedback and offers some more visibility into what was happening (but not why)
For whatever reason it looks like the serial port is disconnected, like when the device is unplugged. You get that weird PS: this does not mean that exactly the same issue was happening to everyone else with the similar symptoms. |
@walthowd do you mind runing PR 150, 149 and 148 mashup branch? It has all three PRs mashed together and shouldn't leak any TSNs |
177edea
to
63615f8
Compare
@Adminiuga I pulled the mashup branch and ran the system for another few hours -- no natural occurring errors but I pulled the USB again at 19:26:21 - same behaviour as before. https://www.dropbox.com/s/vu6kdqud9yv07wk/home-assistant-serial-reset.log.gz?dl=0 It looks like the mashup branch does not have the extra logging to see if any sequence IDs are leaking? |
Nah, don't have any extra debug logging in this branch as I'm hope to get it included into upstream. I'd be very surprised if it leaks any TSNs, as everything is cleaned up by the "context manager" exception or no exception. This line pushes new "pending" request to
|
Make ControllerApplication.initialize() a retryable method. Add bellows specific exceptions. Handle EzspError exception in requests.
Refactor ControllerApplication reset flow. Reset ControllerApplication on watchod failures. Allow UART reconnects. Initiate ControllerApplication reset on UART connection loss.
Serial connection loss restarts reset task. Error frames do not pre-empts a running reset task.
If we receive a Reset ACK frame to a reset we've not initiated, then request ControllerApplication reset.
New ControllerError exception. Execute broadacast() and/or request() only if ControllerApplication is running.
3c65b1b
to
3d08520
Compare
@damarco any chance we can get this in a release in time for .91? |
give me two days, need to do some more testing, but so far I was satisfied with the results on Pi2 |
8e2606b
to
ddbfa2e
Compare
I think this is the version I want to go with. Will be running it for a few days and remove WIP tag once the dust settles. |
@damarco @dmulcahey I'm ready with this one. |
Properly handle replies arriving past EZSP cmd timeout.
Stop watchdog on controller shutdown.
ddbfa2e
to
a4c89db
Compare
Thanks! |
I believe I have similar issue from #124 (zigbee devices initially work, after some time they stop), is there a way I can test this new code on my system? I can provide feedback on the the effect. |
@rjgrandy what hardware are you running on and what environment. I hope this would make into 0.92 |
@Adminiuga I am using a HUSBZB with a RPI3B+, I have everything running using docker. I get a number of the below errors:
Then I start getting the below message over and over:
|
@rjgrandy looks like it would be in HA 0.91 next Wed. It is in beta now. |
@Adminiuga Ok awesome, if I install one of the betas now, I can check to see if it helps my issue? |
yes
…On Fri, 29 Mar 2019 at 17:34, rjgrandy ***@***.***> wrote:
@Adminiuga <https://github.com/Adminiuga> Ok awesome, if I install one of
the betas now, I can check to see if it helps my issue?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#147 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AFjmcDBE18k20pRoLWBVu_MLKIVyGKEgks5vbobIgaJpZM4bXw7->
.
|
@Adminiuga I pulled the beta2, so far my zigbee devices do seem to be able to maintain controllability, however it is putting out a lot more to the log than before (i do have all the relevant zigbee on debug in the logger) the zigbee devices appear to react slowly and it looks like it is constantly resetting EZSP and ASH.
|
Just to follow up on my last comment, I installed beta3 this morning, disabled all logger debug output for zigbee related things, deleted the home-assistant_v2.db (it was almost 3GB). After this, so far the zigbee devices seem pretty stable and responsive. Also I like the new ZHA "Add Devices" screen :) |
This is one of the fixes targeted to address #124
in my test it would successfully reconnect on USB stick un-plug/plug back in. I was able to reproduce the Error farme with "Maximum NAK exceeded" condition, by artificially suppressing sending ACKs to NCP and successfully recovering from it.
With that said, I'm looking for volunteers to test drive it and additional feedback.