-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Application should re-establish connection to uart after error #37
Comments
I noticed I could control the zigbee switch, but not get status. In conclusion I think Pine64 is giving slightly better reliability, but still has some issues. |
Updated to branch no-removal. I'm thinking this may work :-) 2017-08-04 12:26:33 DEBUG (MainThread) [bellows.uart] Data frame: b'1451b1ed502a15b459944a2dab55922f63f3e30d12316e8e00396788fd673fa7e3437e' |
Burst of error frames: 2017-08-04 13:28:59 DEBUG (MainThread) [bellows.uart] Data frame: b'2c59b1ed502a15b259944a2dae55923b61f7720b12316c22fcc69e03fcd66d77ebcde6837e' |
Do you always get these errors when it stops working? I'm guessing no based on some of the other information you provided. |
normal.log.txt Good progress with no-removal branch. I don't loose access to all devices any more. I'm again suspecting smart-socket acting as repeater, as it is the sensor furthest away that has the issue, I noticed a lot of time lag on the motion sensor furthest away. I have restarted from scratch, deleting zigbee.db, and now only have the two motion sensors to see how things work for the next 12 hours. Does bellows support a way to remove devices from zigbee.db ? Do you recommend a particular set of log settings for debugging zigbee? I have attached the setup log, and a normal startup log |
@rcloran Apparently I was wrong, the Frame Errors do seem to appear just as it stops responding. 2017-08-07 14:09:16 DEBUG (MainThread) [bellows.uart] Data frame: b'5e67b5ca4c903eb659fb47257f437e' |
These error frames are for error 0x51, described as "Error: Exceeded maximum ACK timeout count" in the documentation (https://www.silabs.com/documents/public/user-guides/UG101.pdf). This is an unrecoverable error. bellows writes the ACK as soon as it reads the frame. Given all the timestamps in the logs you've provided are in the same second, one of your hass components is probably blocking on the event loop. bellows could possibly improve in a couple of ways here -- re-establishing the connection to the NCP, and/or running the uart communication in a separate thread. I'm unlikely to work on either of these in the foreseeable future. I suggest you hunt down the component that's blocking the event loop and get that fixed. |
@rcloran Thanks for explaining cause and possible fixes. I have very few components loaded (frontend, updater, zwave, recorder, history, logger, sun, ios). |
Hello, I believe I am also experiencing this error as I am finding myself needing to restart home assistant every few hours to gain control my zigbee devices. Is there anything I can do to find the cause of this error? |
Running into the same issue, devices stop responding after a few hours. Captured a few logs and I always see the Error frame: b'c20251a8bd7e' before it stops responding.
|
I am also now having this issue after moving from 0.57.2 to 0.61.1. I am not running in a docker container though. I see mostly these warnings every 1-5 minutes. Along with the occasional "timer got out of sync resetting" although they seem to happen in bunches and then I don't see them for awhile. 2018-01-30 13:25:36 WARNING (MainThread) [bellows.zigbee.application] Unexpected message send notification |
I have the same issue with current hass version. It happens randomly every few hours. Do you know how we can debug the component that's causing this? Thanks. |
I have this issue as well. It seems to have started around HA 0.71 when power monitoring and battery level reporting was added (home-assistant/core#14561). I just updated to HA 0.78 with bellows 0.7 and zigpy 0.2 and that seems to have had no affect. It always starts with a Here are two recent logs: I originally posted in this issue but I see now this one my real issue. @rcloran I'm not sure this is an issue with another component. It seems to be getting worse as I add zigbee devices to my network. I recently removed almost all my components and that seems to have made no difference. All my zigbee devices are Sengled bulbs or Centralite outlets, but I have about 5 of each now, and the problem is making HA almost unusable; the zigbee network crashes at least once an hour. I have a script that checks for the I'm running Home Assistant on a Raspberry Pi 3B+ (via Hassbian) and am using the HUSBZB-1 stick. I'm curious if anyone else is experiencing this with different hardware? If this is a component is there any way we can attempt to debug which component it is? Is there some way we can restart just zigpy/bellows without restarting Home Assistant (that would help me at least)? Any other ideas for how we can about debugging this in general? |
@StephenWetzel when did you get the HUSBZB-1 stick? People were reporting issues similar to yours with sticks obtained in Apr--Jun 2018 IIRC which were solved after stick were replaced. Something about improperly loaded firmware. |
@Adminiuga Thanks for the feedback. I got my stick November 2017, and it was fine from then until around June when HA 0.71 was released. So, I don't think it's that, but perhaps. Wonder if there's any way to check or update the firmware on there. I was looking into Yoda-x's versions, that's probably my next step. If that doesn't work I suppose I may try running HA on my desktop. I wonder if there's any way for me to restart the device itself without restarting HA. Although I suppose bellows would have to support recovering from that still. |
@StephenWetzel I've got mine end of Jan'18 so I don't think it is the version. Not sure how to check for firmware update, couldn't find anything on vendor site. There's really no way to restart the device and really the problem is in zha/bellows and my guess bellows really should reinitialize UART and reset the NCP, once it starts receiving 'error' frames But it is also interesting that reports were around version 0.71, unless I was reading your reports in the forum :) |
Imho, the problem is the pi and python with asyncio. the ncp has some strict timing requirements expecting an ack for every send packet with a defined latency. asynio is not made for low latency, its for massive parallel tasks where latency is not an issue. Also hass uses only one cpu, even its multi-threaded. On a faster cpu this seems no issue. I came from an intel nuc and the problem started right after switching to a 3b+. Even if the cpu is most of the time idle, it happened that the uart throw this ack-retransmit error right after a restart or sometime during night, mostly together with an DB error or a timeout with a query for a router. Maybe the daily DB cleanup, don't know. I get this error more often when I enable the device discovery for routers.
There also some tuning options, as using on cpu exlusive for hass, thus it has no context switches to other cpus and other processes can't steal precious cpu cycles. |
@StephenWetzel I am experiencing the same exact problem where I get an error frame on the Centralite 3210-L and afterwards the zha component fails to work anymore without reboot. Would this benefit from running as its own standalone process? I've considered switching to the zigbee2mqtt route because I could see the benefit of isolating the zigbee stack to have its own dedicated resources. I haven't made the switch because I am already invested in using the Linear HUSBZB-1. I'm not really familiar with the codebase but potentially there could be a standalone python mqtt client that acts as a proxy between bellows and home assistant so that the zigbee stack isn't subject to the home assistant event loops. |
Fixed in #147 |
Running HA in docker session on Synology NAS bellows will stop showing any sign of life in log after 1-2 hours.
Restarting HA will make zigbee devices and bellows work for another 1-2 hours.
The setup has two zigbee motion sensors and a zigbee switch (for repeating), and a zwave switch.
To eliminate docker/NAS issues tried running natively on a Pine64 arm sbc (HA 0.50.2).
Mostly bellows will stop working after 1-2 hours.
This morning I had one motion sensor and the switch still alive after 8 hours, but one motion sensor was dead.
Would you please advice how I can help debug this issue.
Is there any way of bellows to give more more info or some kind of heartbeat?
This in continuation of: https://community.home-assistant.io/t/0-49-zigbee-stopped-working-after-adding-smart-socket-packet-routing-issue/22453
The text was updated successfully, but these errors were encountered: