Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network Breaks Nightly Until Zwave-JS Restart Since Friday #3829

Closed
3 of 11 tasks
kars85 opened this issue Dec 1, 2021 · 21 comments
Closed
3 of 11 tasks

Network Breaks Nightly Until Zwave-JS Restart Since Friday #3829

kars85 opened this issue Dec 1, 2021 · 21 comments
Projects

Comments

@kars85
Copy link

kars85 commented Dec 1, 2021

Is your problem within Home Assistant (Core or Z-Wave JS Integration)?

YES, BUT a Home Assistant developer has told me to come here

Is your problem within ZWaveJS2MQTT?

YES, BUT a ZWaveJS2MQTT developer has told me to come here

Checklist

Describe the bug

What causes the bug?
Every evening (sunset - 4AM approx), automations fail to execute and Zwave devices become unavailable

What do you observe?
Unable to control/manage nodes. Early testing appears that if I restart the zwavejs2mqtt container, nodes become available again, but network healing is unstable.
What did you expect to happen?
Stable behavior. When restarting the container and initiating a heal, the heal rate is ~50%.

I am attaching log archives going back to Friday when the behavior began (can't remember if Silly was enabled that far back, but I turned Silly on shortly thereafter over the weekend. Todays 2021-12-01.log**.old** is before I restarted the container this morning and began troubleshooting, so it should contain information on the network "crash".

The 2021-12-01.log is a fresh log starting with a container startup, network graph view, followed by a network heal (~50% success on the heal), then me shutting down the container.

I've also attached my Zooz controller's NVM backup along with my homeid.json

Device information

All devices

How are you using node-zwave-js?

  • zwavejs2mqtt Docker image (latest)
  • zwavejs2mqtt Docker image (dev)
  • zwavejs2mqtt Docker manually built (please specify branches)
  • ioBroker.zwave2 adapter (please specify version)
  • HomeAssistant zwave_js integration (please specify version)
  • pkg
  • node-red-contrib-zwave-js (please specify version, double click node to find out)
  • Manually built from GitHub (please specify branch)
  • Other (please describe)

Which branches or versions?

zwavejs2mqtt: 6.0.2
zwave-js: 8.8.2
home id: 3745883352
home hex: 0xdf45a4d8

Did you change anything?

no

If yes, what did you change?

No response

Did this work before?

Yes (please describe)

If yes, where did it work?

Prior to Friday all automations and control worked as expected.

Attach Driver Logfile

zwavejs_2021-11-30.log.gz
zwavejs_2021-11-29.log.gz
zwavejs_2021-11-28.log.gz
zwavejs_2021-11-27.log.gz
zwavejs_2021-12-01.log
zwavejs_2021-12-01.old.log
df45a4d8.json.txt
NVM_2021-12-01.bin.txt

@zwave-js-bot zwave-js-bot added this to Needs triage in Triage Dec 1, 2021
@AlCalzone
Copy link
Member

AlCalzone commented Dec 1, 2021

Are you able to test with an older version if the issue persists there (before node-zwave-js 8.8.0 / before zwavejs2mqtt 6.0.0)?

@kars85
Copy link
Author

kars85 commented Dec 1, 2021

Sure - what image tag would you like me to revert to and I'll exclude zwavejs2mqtt from watchtower. Do you perceive any issues with my existing setup by downgrading temporarily?

@Nixon506E
Copy link

I am also having this issue, tried a rollback to zwavejs2mqtt 5.12.0 that did not seem to help but it was working on that version before the upgrade.

I also am able to receive notifications from the zwave devices just not able to send changes to them.

@kars85
Copy link
Author

kars85 commented Dec 1, 2021

@Nixon506E - that was the zwavejs2mqtt version I was going to drop down to as well...are you having nearly the same time frame of drop outs? Where, shortly after going through network heals, physically power cycling nodes (breakers/air gaps) things come back, only for them to go dead after a handful of hours?

@AlCalzone - It is very hard for me to isolate this down, and I apologize for the inundation of logs/files. If there is anything I can do on my side, just let me know. I will still try dropping down to 5.12.0 and letting it run for the specific period of time I've consistently seen issues.

@Nixon506E
Copy link

They are not being marked as dead for me but rather they just stop responding. I do not need to reset the devices, only the driver but the rollback did not help and after a few hours it does happen every time.

@kars85
Copy link
Author

kars85 commented Dec 1, 2021

They are not being marked as dead for me but rather they just stop responding. I do not need to reset the devices, only the driver but the rollback did not help and after a few hours it does happen every time.

Got it - just so I'm not getting lost in the translation, when you mention "reset the driver", are you restarting the zwavejs2mqtt container, or something else?

I just changed my docker compose to stay at 5.12.0 (but it looks like it's still running zwave-js 8.8.2 underneath)and am running it now with a fresh log despite your issues persisting, and will know by the end of the night if issues are able to reproduced. I have lighting and a space heater automation that make it super easy to know if my network when kaboom.

@Nixon506E
Copy link

Yes, restarting zwavejs

@Eriner
Copy link

Eriner commented Dec 1, 2021

I have other issues..., but I think I am impacted by this as well. During my testing, occasionally the network will "lock up" and I'll have to do one of the following:

  • unplug the stick, plug it back in
  • open the settings menu and click "save" in zwavejs2mqtt (changing settings not required)
  • restart the zwavejs2mqtt docker container

What doesn't work is:

  • soft reset
  • manually healing a node
  • manually healing the controller

As I mentioned in #3810, my network "kind of" works. In the afternoon/evening after messing with the network (restarting things as described above), my automations will turn on the evening lights. However, since switching to the new stick my lights do not turn off by morning and the network has entered a "borked" state.

I am unsure if my issue is just a symptom of an issue unique to me, or if I'm sharing the same underlying issue as @kars85.

Edit: I previously suspected my issue was merely connectivity, but as described in my other issue, I'm having problems with this 700 series stick (and all powered devices but one on S2) that I never had with my aeotec stick in an identical environment.

@kars85
Copy link
Author

kars85 commented Dec 1, 2021

@Eriner - my response to your issue indicating my stick was showing neighbors (yours wasn't, as I understood?) was shortly after doing myriad of things to get random nodes reconnected like you. Not all of my nodes were responsive, but most. I guess I can't be 100% certain my stick shows neighbors when the network goes haywire, but another data point to check if downgrading to 5.12.0 is an issue.

I can say that I have had weeks worth of success at a time with the Zooz stick, so hopefully all is not lost. To be continued...

@wigster
Copy link

wigster commented Dec 2, 2021

I have noticed strange behaviour in the last couple of days also and am beginning to think this is related.

In particular, my Eurotronic spirit TRVs have stopped responding to temperature increases that I send to them in the morning. The display changes from the default 22 to e.g. the new value 28, but the radiator does not come on and touching the manual controls moves the temp around the default 22 to 22.5 or 21.5. From the point of the view of the zwavejs2mqtt interface everything looks fine.

I have an Aoetec ZUSB 5 Stick

I haven't really been able to troubleshoot more than this.

zwavejs2mqtt: 6.0.2
zwave-js: 8.8.2
zwavejs2mqtt plugin 0.31
HA 2021.11.5

@AlCalzone
Copy link
Member

@wigster I need driver logs please, loglevel debug or silly and attach it here as a file.
Make sure to do something that lets me see the problem in the log.

@kars85
Copy link
Author

kars85 commented Dec 2, 2021

Downgrading to zwavejs2mqtt 5.12.0 has resolved my issues over the past 36 hours. I would be more than happy to be the guinea pig/constant.

@AlCalzone
Copy link
Member

Having logs of the failing behavior would be immensely helpful. 🙏🏻

@mostlychris
Copy link

I'll just throw my name in the hat that I am also experiencing this same issue. I can get messages from zwave devices, but I can't send anything. Restarting the zwavejs2mqtt addon solves the issue. I'm running a Aeon Labs Z‐Stick Gen5 USB Controller | ZW090. Would be happy to supply logs if someone tells me where to find them. I'm running HA supervised on an Odroid N2+ (HA Blue).

@jwestyp
Copy link

jwestyp commented Dec 3, 2021

I have looked at logs myself and haven't seen anything that stands out showing the error. I have been rebooting every evening after my nightly automations don't happen, but I am still able to see state changes.

@AlCalzone
Copy link
Member

Guys please... No stories, post logs 🙏

@kars85
Copy link
Author

kars85 commented Dec 4, 2021

I am running 8.8.3 since last night, and since then, my network has slowly died to the point of being non-functional. Resetting logging to all nodes indicates that the driver has crashed and is a reboot loop. (Not attaching because this is all the logs capture before looping.

I will restart the driver and capture logging for the rest of the day and re-upload later.

2021-12-04T14:23:34.364Z DRIVER   version 8.8.3
2021-12-04T14:23:34.364Z DRIVER
2021-12-04T14:23:34.364Z DRIVER   starting driver...
2021-12-04T14:23:34.365Z DRIVER   opening serial port /dev/zwave
2021-12-04T14:23:34.366Z DRIVER   serial port opened
2021-12-04T14:23:34.366Z SERIAL » [NAK]                                                                   (0x15)
2021-12-04T14:23:34.367Z DRIVER   loading configuration...
2021-12-04T14:23:34.368Z CONFIG   Using external configuration dir /usr/src/app/store/.config-db
2021-12-04T14:23:34.368Z CONFIG   version 8.8.3
2021-12-04T14:23:34.571Z CONFIG   Priority device configuration directory /usr/src/app/store/config not found
2021-12-04T14:23:34.580Z DRIVER   beginning interview...
2021-12-04T14:23:34.580Z DRIVER   added request handler for AddNodeToNetwork (0x4a)...
                                  1 registered
2021-12-04T14:23:34.580Z DRIVER   added request handler for RemoveNodeFromNetwork (0x4b)...
                                  1 registered
2021-12-04T14:23:34.580Z DRIVER   added request handler for ReplaceFailedNode (0x63)...
                                  1 registered
2021-12-04T14:23:34.580Z CNTRLR   querying controller IDs...
2021-12-04T14:23:34.581Z SERIAL » 0x01030020dc                                                         (5 bytes)
2021-12-04T14:23:34.581Z DRIVER » [REQ] [GetControllerId]
2021-12-04T14:23:35.583Z CNTRLR   Failed to execute controller command after 1/3 attempts. Scheduling next try i
                                  n 100 ms.
2021-12-04T14:23:35.684Z SERIAL » 0x01030020dc                                                         (5 bytes)
2021-12-04T14:23:35.684Z DRIVER » [REQ] [GetControllerId]
2021-12-04T14:23:36.687Z CNTRLR   Failed to execute controller command after 2/3 attempts. Scheduling next try i
                                  n 1100 ms.
2021-12-04T14:23:37.790Z SERIAL » 0x01030020dc                                                         (5 bytes)
2021-12-04T14:23:37.790Z DRIVER » [REQ] [GetControllerId]
**2021-12-04T14:23:38.794Z DRIVER   Failed to initialize the driver: ZWaveError: Timeout while waiting for an ACK
                                  from the controller (ZW0200)
                                      at Driver.sendMessage (/usr/src/app/node_modules/zwave-js/src/lib/driver/D
                                  river.ts:3261:23)
                                      at ZWaveController.identify (/usr/src/app/node_modules/zwave-js/src/lib/co
                                  ntroller/Controller.ts:668:33)
                                      at Driver.initializeControllerAndNodes (/usr/src/app/node_modules/zwave-js
                                  /src/lib/driver/Driver.ts:938:26)
                                      at Immediate.<anonymous> (/usr/src/app/node_modules/zwave-js/src/lib/drive
                                  r/Driver.ts:818:16)**
2021-12-04T14:23:38.795Z DRIVER   destroying driver instance...

EDIT: Rebooting the container/driver has no effect. Zwave-js just loops all the same.

@AlCalzone
Copy link
Member

Timeout while waiting for an ACK from the controller

The stick is stuck and doesn't respond.

@kars85
Copy link
Author

kars85 commented Dec 4, 2021

Timeout while waiting for an ACK from the controller

The stick is stuck and doesn't respond.

Yep - even rebooting my docker host wasn't sufficient (you'd think that would re-initialize the USB controller/Zooz stick. Had to physically pull the stick and reinsert after the reboot. My Zooz stick came right back up.

Silly logging has commenced and will post later tonight if it still is of help here.

Although I'm wondering, are we spinning our wheels until we get a firmware update for the 700 series controller? I have no problem waiting for the NVM migration utility and going back to my Aeotec Z-Stick Gen 5 temporarily.

@AlCalzone
Copy link
Member

are we spinning our wheels until we get a firmware update for the 700 series controller

For some of the issues that have been popping up recently: yes!

For others that were related to the send queue changes, the logs were helpful, but I'm not expecting any new information from the 8.8.3 ones.

@AlCalzone
Copy link
Member

The driver related issues have been fixed on master. We'll track the firmware issue in #3810

Triage automation moved this from Needs triage to Closed Dec 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Triage
Closed
Development

No branches or pull requests

7 participants