-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
InvalidCommandResponse: BUFFER_FULL error and light state not updating in Home Assistant #42
Comments
Are you triggering a lot of individual lights simultaneously? If you enable debug logging for ZHA (it'll be quite verbose), do you see any warnings like As a potential stopgap solution, you may want to try to decrease the request concurrency from 16 to 8: zha:
zigpy_config:
znp_config:
# default is "auto", which is 16 for the CC2652R
max_concurrent_requests: 8 |
So I have 12 lamps, and they are all part of the same lamp group. It is this one group that I am turning on or off. I tried collecting a debug log with the mentioned error, but I have no success so far. However even without the error showing up in the logs, I was seeing individual lamps not getting updated I found some unhandeled commands though, but not very many:
|
is the full log, where I turned lights on, changed brigness and off. In particular during the turning off step at 14:28:49 some lights were not updated correctly. I really tried triggering the error again with debugging on, but for some reason its not happening any more. |
Looking through your log, all of the errors are being caused by the following two lights ( 2020-10-22 14:28:51 DEBUG (MainThread) [zigpy_znp.api] Sending request: AF.DataRequestExt.Req(DstAddrModeAddress=AddrModeAddress(mode=<AddrMode.NWK: 2>, address=0x2B90), DstEndpoint=3, DstPanId=0x0000, SrcEndpoint=1, ClusterId=8, TSN=136, Options=<TransmitOptions.RouteDiscovery|APSAck: 48>, Radius=30, Data=b'\x00\x88\x00\x00\x00'),
2020-10-22 14:28:51 DEBUG (MainThread) [zigpy_znp.api] Received command: AF.DataConfirm.Callback(Status=<Status.MAC_NO_ACK: 233>, Endpoint=1, TSN=136),
2020-10-22 14:28:54 DEBUG (MainThread) [zigpy_znp.api] Sending request: AF.DataRequestExt.Req(DstAddrModeAddress=AddrModeAddress(mode=<AddrMode.NWK: 2>, address=0xA4F0), DstEndpoint=3, DstPanId=0x0000, SrcEndpoint=1, ClusterId=8, TSN=149, Options=<TransmitOptions.RouteDiscovery|APSAck: 48>, Radius=30, Data=b'\x00\x95\x00\x00\x00'),
2020-10-22 14:28:55 DEBUG (MainThread) [zigpy_znp.api] Received command: AF.DataConfirm.Callback(Status=<Status.MAC_NO_ACK: 233>, Endpoint=1, TSN=149),
2020-10-22 14:28:12 DEBUG (MainThread) [zigpy_znp.api] Received command: AF.DataConfirm.Callback(Status=<Status.NWK_NO_ROUTE: 205>, Endpoint=1, TSN=98),
2020-10-22 14:28:05 DEBUG (MainThread) [zigpy_znp.api] Sending request: AF.DataRequestExt.Req(DstAddrModeAddress=AddrModeAddress(mode=<AddrMode.NWK: 2>, address=0x2B90), DstEndpoint=3, DstPanId=0x0000, SrcEndpoint=1, ClusterId=8, TSN=98, Options=<TransmitOptions.RouteDiscovery|APSAck: 48>, Radius=30, Data=b'\x00\x62\x00\x00\x00'),
2020-10-22 14:28:57 DEBUG (MainThread) [zigpy_znp.api] Received command: AF.DataConfirm.Callback(Status=<Status.NWK_NO_ROUTE: 205>, Endpoint=1, TSN=118),
2020-10-22 14:28:49 DEBUG (MainThread) [zigpy_znp.api] Sending request: AF.DataRequestExt.Req(DstAddrModeAddress=AddrModeAddress(mode=<AddrMode.NWK: 2>, address=0xA4F0), DstEndpoint=3, DstPanId=0x0000, SrcEndpoint=1, ClusterId=6, TSN=118, Options=<TransmitOptions.RouteDiscovery|APSAck: 48>, Radius=30, Data=b'\x00\x76\x00\x00\x00'), Here is the average link quality for the received packets from each of your devices (this quantity is directly proportional to the RSSI):
Have you tried updating the firmware of your bulbs? If they're one of the supported brands, it'll happen automatically over-the-air after you enable the correct settings and notify the devices: https://old.reddit.com/r/homeassistant/comments/fak430/how_to_update_your_ikea_or_ledevance_firmware/ |
yes but I think this is only true for the particular event I send to you. The lights are all relatively close. The furthers away is maybe 6m from the zigbee stick and there is no solid wall in between. Its all in one big room, light and zigbee stick. The stick is also on a 1m usb extension cable and there is not much 2.4 GHz WIFI interfering. The not correctly reporting lights is changing. Its sometimes this one and sometimes that one. However there are a few bulbs that always work. So silicon lottery could be part of it.
Its lightify bulbs, I can check if there is a new firmware available, but I have updated it in the past. However I think the lightify bulbs are of minor quality. Could you point me to the tool you used to get the LQis values. What makes me doubt that it is all about link quality is, that, manual calls to the on_off property of the OnOff cluster update light states 100% |
I managed to cache the BUFFER_FULL error wile debugging was running: |
For easy monitoring LQ and the network its easyest installing zha-map and the zigzag. PS: Take one look with one wifi scanner app so you can see if you have interference with wifi networks in the near and if needed changing the wifi / zigbee channel. |
THX took two images of my network. Is really each lamp a sibling of each lamp? I always thought zigbee tries to build a star topology and only if needed it routes through other devices. Everybody connect to everybody seems not to be a good idea to me. At least that is how I understood: https://www.zigbee2mqtt.io/information/zigbee_network.html#zigbee-network |
For my its looks like the mesh is not liking talking with other routers and only using the NCP for communicating :-((( My test network with Xiaomi sensors and IKEA outlets as base rouers (preparing for christmas lights) is building parents between all possible routers and is using 2 hops for reaching the last router (Vorzimmer). One thing I using one EZSP (IKEA module) as NCP for the moment and have the CC-2531 not active then its too weak (The latest with the EZSP is that is not liking direct children (End devices) and kicking them from the NCP so they is connected thue routes = I like that but its one bug in the firmware). One more thing: I was having 2 "HOMA" dimmer / LED drivers ("chinese ZB3" = old Zigbee pro) but have moving them to deCONZ because they was very bad routers and was only making things worse (no parents and only redlines for LQ). They is based on ITs CC2530 and have bad antennas / RF parts). You can also trying installing the zha-network-visualization-card its writing out more info like LQ and device info but is to large for the screen and not so easy to install. |
You have at least 3 router devices that is acting as parent to the NCP but all others is only siblings (children) connected as end device I think ( = not knowing). All my IKEA outlets is real Zigbee 3 and the bulbs LL but is new version (Zigbee PRO) so is connecting as parents. I'm interested wat @Adminiuga is thinking of this scenario. Edit: One of my IKEA bulb (Opal 1000lm = New LL) has not parent to the NCP but is have children = is one router in the mesh. Edit 2: My 2 Philips SML001 (BW Wohnzimmer X) is connected LL routers so perhaps they don't like IKEAs ZB3 routers. |
Is the LQi quantity a reliable number? Given the fact, that all devices are supposed to have bad connection I rather think it is the CP2652, that is causing the issue. |
In zha map the lqi is reported as seen by devices. Often the lqi interpretation is up to the vendor, but usually same vendor reports consistently across their devices |
I think (without knowing) its normally relevant but its depends of the chip manufacture how is implanted in the device. TI has one "normal" version but the CC-253X like My HOMAs and most OSRAM is famous for low LQI and real life not working so well. Silabs (EZSP) is using one not normal method and is recalculated in ZHA for presentation. I think EZSP is normally too high but normal working well. I have my IKEA GW 20 cm from my WiFi router and its working well and have long distance to devices. |
Just to check, I have also added a Philips LWB004 light bulb and positioned it arund 1m away from the stick. In the beginning it was showing an LQi of around 30, but now it has 141, so maybe it really is the bulbs them self. |
Interesting to see is the other routers is attaching 2 it and and getting better LQI thru it as from the coordinator. |
What CC2652R coordinator hardware are you using? |
If lights in the same room do not have a high (>100) LQI, there is a small possibility that you have a defective ZZH stick (I'm unsure of the specifics but I think there was a bad batch?). You may want to get in touch with @omerk via email to verify if this is the case. |
I have a second one :) so I give that one a try first. But my statement of the furthest away is 6m was underexagerated. If I look more closely its rather 12m, but all within one room. |
@deisi you can clone your network onto the second stick with an NVRAM backup/restore: https://github.com/zha-ng/zigpy-znp#nvram-backup-and-restore Afterwards make sure to clear the first stick and not run them concurrently (to migrate back you'll need to perform the same procedure but with the paths swapped): $ python -m zigpy_znp.tools.nvram_reset /dev/serial/by-id/old-radio |
ah thx |
It'll take a bit for the network to stabilize (I believe because the existing routes are not preserved) so I'd power cycle the lights afterwards or just let it sit for a bit before testing. Otherwise you'll get a few routing errors. |
@puddly but you have seen, that I got a logfile with debugging on of the buffer overflow error causing this hole threat? Of course I'm very grateful for all your help with my connection issues, but in the end this is not a support forum, so I hope the log helps with debugging the bug. |
The transmit buffer getting full in the CC2652R's firmware isn't really a bug with zigpy-znp, it's expected behavior if you trigger a lot of lights concurrently and they don't respond fast enough due to TX issues for the buffer to clear out completed requests. For comparison, I'm able to rapidly toggle close to 30 lights individually on my network once a second and do not receive a single error. Have you tried decreasing the request concurrency? zha:
zigpy_config:
znp_config:
# default is "auto", which is 16 for the CC2652R
max_concurrent_requests: 8 # maybe even try 4? Retrying requests will essentially have the same effect but I think I can include an internal retry for the |
A, so the two issues are indeed connected I have not tested your suggestion yet, as I was A trying to reproduce the error and B trying to test the other suggestions given here first but I have not forgotten about it. In my desperate attempt to improve Link Quality, I have flashed a C2531 with the router firmware. And added it to the Network. If I position it very close to the coordinator, it has a LQ of > 120, If i move it to the other end of the room. roughly 10m away, it has a LQ of 3. So maybe we have very dense air in the room here ^^, is LQ sensitive to interference? Ah and changing the zig-a-zig-ah stick didn't change much. |
You can read about how the LQI is calculated here: http://software-dl.ti.com/simplelink/esd/plugins/simplelink_zigbee_sdk_plugin/1.60.00.14/docs/zigbee_user_guide/html/zigbee/developing_zigbee_applications/z_stack_developers_guide/z-stack-overview.html#id10. It's the RSSI remapped to 0-255, where 0 is the minimum and 255 is the maximum observed value by the radio. If you got both zzh sticks at the same time it could be that they're from the same batch? Using the LAUNCHXL-CC26X2R1 with a trace antenna (whose LQI values are about 30% less on average than the zzh's when I compared the two) I get an LQI of 36 between the coordinator and an outdoor bulb about 15m away, where signals have to pass through an interior wall, closet doors, an outdoor wall, and a metal light fixture. If you're getting an LQI of 3 in the same room, I think something is going on with your hardware. |
Retrying when encountering If you want to test these changes out, you can install the latest commit from the |
Hi! TL;DR: at the end of post Here some general information: Besides that, I have a zig-a-zig-ah! stick lying around (because I just had to buy one). I started using Home Assistant about a year ago. Back then I had Tradfri Gateway and Philips Hue gateway, which I merged with a CC2531 + z2m (because somtimes lights did not respond). It worked fine at first, but the larger the network got, the more frames were lost. I then tried ZHA instead of z2m but dound the same behaviour. I then purchased a ConBee II and retried. Still, I had issues with frames being dropped and lights being reported falsely. I wrote the deCONZ support and they urged me to switch to Phoscon, which I did. I must say I was quite suprised how good things were going. I seldomly have connectivity issues so everything is pretty much fine. And now comes yesterday. I am a fan of "keeping stuff simple" which in this case meant getting rid of a (theoretically) unnecessary container and piece of software: replace Phoscon with ZHA. A report posted at jcallaghan/home-assistant-config#167 geve me hope everything would work out fine this time. For the sake of easy rollback I used the zzh! stick and added the ZHA integration. After a few failed attempts (It looks like it had remembered the network I set up months ago) I was able to restart with a fresh network (phoscon on channel 15, zzh! on channel 25) . I kicked out 6 lamps and 2 remotes in the office and joined them to the new network. Unfortunately a quickly saw the same behaviour I had seen earlier. Lights often did not respond, States were often not reported. I tried 3 different antennas (from small to huge) but was not able to make things better. I also saw the BUFFER_FULL messages and tried out your development branch. It did in fact fix the BUFFER_FULL message, but the lights behaved the same. To rule out the possibility that it is faulty hardware I then dumped all of it and reconfigured ZHA with my ConBee II. Lights rejoined and popped up in HA, but the experience sadly was the same: Lights were not being set and states not being reported correctly. So finally, I had to give up and go back to Phoscon again. Ok, sorry for the long post. I hope to help you guys with your problem, I believe we might have the same issue. TL;DR And thanks for all work you are putting into this. |
@dumpfheimer Thanks for the info and for testing out the latest codebase. Can you enable debug logging and upload your complete |
Hi! I just sent you an email with the logs from my last experiment. I think there is a tweak, that might be escalating this behaviour: Make a group of lights A, B and C. Is there a possibility to disable this "internal state check"? BR PS: I hope you can make sence of the log, I am unfortunately lacking the time to reproduce everything at the moment. |
I believe this issue has been fixed or significantly mitigated in the last few releases. |
Hey I'm finding the following in my HomeAssitatnt log:
I'm using HomeAssistant: 0.116.4
With the CC2652R from https://electrolama.com/projects/zig-a-zig-ah/ with the 20200925 firmware from https://github.com/Koenkk/Z-Stack-firmware/tree/master/coordinator/Z-Stack_3.x.0/bin
Also I'm having some issues with incorrect device states, as I can sometimes turn on my light bulbs, but HA is not getting informed. A manual Get call to the on_off attribute of the OnOff cluster however updates the state. I'm not sure weather this is a HA, a zipy or a zigpy-snp issue.
The text was updated successfully, but these errors were encountered: