-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Commands fail after temporary loss of RS485 connectivity #357
Comments
I have had this problem with socat in the past. Unfortunately, it is related to the adapter not opening up with the proper settings from socat. Here is my pm2 startup. Notice I hav added all the baud rate settings but it doesn't always restart appropriately. The funny part is that I created a small bit of code that opens the port locally at 9600 8N1 then closes it and it works every time. The solution below is still being tested and I have a completely different adapter. In your python program are you connecting via the python serial interface. If you are then perhaps sharing your connection code may shed some light on how to get the remote adapter to open with the correct settings.
|
Scratch the previous post. I do not see an outbound message when you issue the command. There is something else afoot here indicating that we are not clearing the error after reconnecting to the port. njsPC will try to reconnect for a period then it will stop so it doesn't become an nuisance. However, it should resume as soon as a reconnect is successful. |
Pull njsPC the connection once it was closed was being marked as read only. Once the ready event is raised by the port it will now mark itself as read/write again. |
Thanks for looking into this. I will do the pull and test tomorrow morning. |
Looks like the change didn't fix the problem. I ran a 'git pull' in both the nodejs-poolController and the nodejs-poolController-dashPanel folders and restarted both programs. Once again, after an interruption of the connection to the adapter, all attempts to change the state of a feature fail. The poolController program successfully identifies the loss of connection and does retries until the connection is reported to be working. However, commands sent after the connection is re-established don't work. Restarting the program fixes the problem, so whatever the program does to create the initial connection works. Here's a copy of the console display:
*** 12:53:00 Lights Circuit Commanded ON - Successful *** *** 12:53:30 Network connection blocked *** *** 12:54:06 Lights Circuit Commanded OFF - Failed *** *** 12:55:00 Network connection restored *** *** 12:55:31 Lights Circuit Commanded OFF - Failed *** *** 12:58:00 Lights Circuit Commanded OFF - Failed *** |
I reset the Ready To Send flag for the port so pull njsPC and try again. Unfortunately, I don't have this adapter and a remote connection appears to reconnect differently using a remote pi dongle and socat. |
That fixed it! Thank you. |
I have a couple of extra Elfin RS485 adapters, one that connects to a wired lan and another that connects via WiFi. If you have interest in either of them, I'd be pleased to send it to you as a 'thank you' for all of your help with this. Just let me know which type and how to get it to you.... |
You would be shocked by the sheer number of adapters, boards, components, and miscellaneous electronics that I have laying around. Just glad we fixed it and really don't want a new thing to play with. The issue was that the pool was simply filled up with failed commands. I'm going to close this for now and if you need anything else just submit another report. |
OK. Thanks again!
From: rstrouse ***@***.***>
Sent: Tuesday, October 19, 2021 5:20 PM
To: tagyoureit/nodejs-poolController ***@***.***>
Cc: jwtaylor310 ***@***.***>; Author ***@***.***>
Subject: Re: [tagyoureit/nodejs-poolController] Commands fail after temporary loss of RS485 connectivity (Issue #357)
You would be shocked by the sheer number of adapters, boards, components, and miscellaneous electronics that I have laying around. Just glad we fixed it and really don't want a new thing to play with. The issue was that the pool was simply filled up with failed commands. I'm going to close this for now and if you need anything else just submit another report.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub <#357 (comment)> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/AHSGXKSA2GYL33NHZKNFAH3UHXOHTANCNFSM5GE4PUMA> .
Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub> . <https://github.com/notifications/beacon/AHSGXKWW4LZOMQEVR52A6WTUHXOHTA5CNFSM5GE4PUMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOHBZ47DQ.gif>
|
I'm also getting this problem intermittently with an Elfin EE11. However the indicator light doesn't turn yellow, but stays green. When I go into dashPanel and open the RS485 port tab in configuration and expand the port it shows the status as "closed" but data is still being received (see screenshot below). If I restart njsPC it works again and shows status as "open". Is there a reason why njsPC is showing status as closed when it really is open and receiving data? Is it possible to provide a retry connection option when the port is showing closed in dashPanel? I tried to disable and then re-enable the port but it made no difference. |
If you simply save the port it will try to re-open the connection. I will have a look at the socket code to see if we can be more aggressive with half-open sockets. Please supply a replay at the time you are experiencing the failure. I assume that you are receiving messages but it cannot send. |
Saving port didn't reopen connection. It does appear to be a half-open socket as I can see temps, pump status etc. in real-time but when I select pool light I get an error (see extract from consoleLog below). The last 5 lines in log repeat each minute.
|
Well look at that. The EE11 is sending an ECONNRESET immediately after connecting. This comes from the other half of the socket. Do you by chance happen to have the MQTT packets enabled as well? An ECONNRESET can happen because the sockets on the other side are busy with another process. I can't imagine that there is a whole lot of processing power on the dongle.
Interestingly I am also seeing an error during packet capture that seems to indicate the logs directory does not exist.
|
Ah, I think you're onto something here. I had previously made a post on TFP about the device getting bogged down. In fact when I try to use the admin web interface I now get I don't have MQTT packets enabled. I've now changed Max Accept from 5 to 3 to see if that makes any difference - although I assume only a single connection would be made? |
It may be holding onto half of the socket if it disconnects even if that socket has no activity. Then it simply rejects a request for the connection. njsPC does send the proper handshake for dropping the connection though. Perhaps you can increase the Inactivity Timeout in dashPanel to see if it is just getting tardy. |
Thanks. When you say "it" I assume you mean the Elfin? I'm not clear about increasing the Inactivity Timeout, is that the inactivityRetry option in njsPC?
Can njsPC also set the indicator light to yellow when |
In the configuration in dashPanel simply increase the inactivity timeout value. Sorry njsPC cannot control the functions of the adapter. I am guessing that the light indicates the rs485 link and not the socket. |
Hmmm. That sounds like a great idea. Let me see what I can do. |
If you pull njsPC and dashPanel it will now show an icon on the header when there are system messages. If you click on the icon it will show what equipment it is having issue with. |
Thanks! I've pulled the latest changes and will report back when I see the icon appear. |
Love it!!! I had thought about a central place for all notifications in the past but it slipped my mind. Good to see you start down this path. |
OK, working as described. The icon now appears when socat is unable to reconnect. I've changed the default timeout from 0 to 300 seconds for the TCP server on the EE11 to see if that helps. Thanks for your help troubleshooting, I think you can now close this issue as the underlying problem rests with the Elfin. |
You should increase the Inactivity Timeout above as well just in case the Elfin is just tardy. IntelliCenter sends data at least every 5 seconds but if the process on the TCP server of the Elfin is tardy then it could assume the connection is dead. |
Thanks, I've increased it to 30 secs now. |
Let me know how all of this is working. I have a suspicion that the Elfin doesn't properly respond to the FIN packets for the socket. It simply waits to see if there is any activity on the socket and drops it over time. This means that it never cleans up old dead sockets and I can imagine that it simply has a limit to the number of active sockets. So it keeps sending on the inactive socket and doesn't release it for the new socket. In the last set of updates I forced a FIN to the socket on drop. |
It's been running fine since I last commented, the RS485 port stays open. Hoping that changes to timeouts for client and server made a difference. The admin web interface still only partially loads which was previously mentioned in a post on TFP. However I can still restart the Elfin device remotely with the following that calls the admin web server.
|
Cool. Serial communications are by nature processor bound since they require a timing thread. I would guess that there is only one thread on that little device with very limited resources. I might expect that reading the port buffer takes priority over all else. Setting the TCP server to 300 probably triggers the device to destroy dead sockets periodically which also frees up resources. Increasing the njsPC timeout to 30 seconds also probably helps if the response is tardy. I also added a FIN broadcast even if the port has been dead after the timeout. Maybe that is in play to close the half-open side on the device. It was half open because it was still sending bytes on the socket even though it couldn't receive any. I'm going to close this for now and if you have any other further issues open a new issue and reference this one. |
Describe the bug
After a temporary (1 minute or more) loss of RS485 connectivity, commands sent from dashPanel fail. Restarting the poolController program fixes the problem.
To Reproduce
Steps to reproduce the behavior:
Disable RS485 connectivity for approximately 1 minute. For testing, I am able to force the error by power cycling the network switch that connects the Pi running the programs and the Elfin RS485 adapter.
Expected behavior
Clicking on a Feature "button" should cause the feature to turn on/off. After the RS485 disruption, doing this causes the indicator 'light' to turn yellow for a few seconds and then revert to its previous state. The poolController console log shows the command was sent but no action results.
Screenshots
If applicable, add screenshots to help explain your problem.
Packet Capture
replay.zip
Pool Equipment
Desktop (please complete the following information):
Additional context
I have been experiencing intermittent command failures after the system has been running for several days. The only solution I have found has been restarting the poolController. I recently found that I can force the error by power cycling the network switch that the RS485 adapter uses to communicate with the Pi that is running the pool programs. Once the switch is power cycled, Feature on/off commands no longer work. I have an earlier Python program that communicates through the same network switch and it is able to successfully send commands after the power cycle. The poolController console display shows an apparently successful ECONNRESET which does not fix the problem. While the system is in the fail mode it still displays the pool status correctly, showing updates resulting from commands entered on the EasyTouch wireless or on the OCP.
Here is an annotated copy of the poolController console display showing the problem. Manual annotations are in Italics:
Init state for Pool Controller
[10/17/2021, 8:19:42 AM] info: The current git branch output is master
[10/17/2021, 8:19:42 AM] info: The current git commit output is ab95b22
[10/17/2021, 8:19:42 AM] info: Starting up SSDP server
[10/17/2021, 8:19:42 AM] info: Checking njsPC versions...
[10/17/2021, 8:19:42 AM] info: Starting Pool System easytouch
[10/17/2021, 8:19:42 AM] info: Server is now listening on 0.0.0.0:4200
[10/17/2021, 8:19:42 AM] info: Net connect (socat) connected to: 172.16.31.242:8899
[10/17/2021, 8:19:42 AM] info: Net connect (socat) Connection connected
[10/17/2021, 8:19:42 AM] info: Net connect (socat) ready and communicating: 172.16.31.242:8899
[10/17/2021, 8:19:43 AM] info: Auto-backup initialized Last Backup: 2021-10-16T23:23:21.532-0400 Next Backup: 2021-10-17T11:23:21.532-0400
[10/17/2021, 8:19:43 AM] info: New socket client connected lhRWFQBiD5jEGWbUAAAB -- 192.168.80.55
[10/17/2021, 8:19:43 AM] info: [8:19:43 AM] 192.168.80.55 GET /state/all?null {}
[10/17/2021, 8:19:43 AM] info: New socket client connected wqfub2-xKf-sfIXVAAAD -- 192.168.80.55
[10/17/2021, 8:19:43 AM] info: [8:19:43 AM] 192.168.80.55 GET /state/all?null {}
Pentair EasyTouch System Detected!
[10/17/2021, 8:19:44 AM] info: Found Controller Board EasyTouch2 8
[10/17/2021, 8:19:44 AM] info: Requesting easytouch configuration
[10/17/2021, 8:19:45 AM] info: New socket client connected kOpFvp-zS5Qo2qJMAAAF -- 172.16.31.151
[10/17/2021, 8:19:45 AM] info: [8:19:45 AM] 172.16.31.151 GET /state/all?null {}
sendRS485PortStats set to false
[10/17/2021, 8:19:54 AM] info: New socket client connected 0ec3sapjgsHuuQKxAAAH -- 192.168.80.55
[10/17/2021, 8:19:54 AM] info: [8:19:54 AM] 192.168.80.55 GET /state/all?null {}
RS485 Stats:{ "bytesReceived": 1465 "success": 48, "failed": 1, "bytesSent": 468, "collisions": 0, "failureRate": 2.04% }
RS485 Stats:{ "bytesReceived": 1692 "success": 56, "failed": 2, "bytesSent": 540, "collisions": 0, "failureRate": 3.45% }
[10/17/2021, 8:20:02 AM] info: EasyTouch system config complete.
[10/17/2021, 8:20:02 AM] info: Initializing Nixie Controller
[10/17/2021, 8:20:02 AM] info: Nixie Controller Initialized
[10/17/2021, 8:20:02 AM] info: [8:20:02 AM] 192.168.80.55 GET /state/all?null {}
[10/17/2021, 8:20:02 AM] info: New socket client connected utMk3_yfS5f4U9V1AAAJ -- 192.168.80.55
[10/17/2021, 8:20:02 AM] info: [8:20:02 AM] 192.168.80.55 GET /state/all?null {}
[10/17/2021, 8:20:03 AM] info: Last auto-backup 2021-10-16T23:23:21.532-0400 Next auto - backup 2021-10-17T11:23:21.532-0400
*** 08:21:00 Network Switch Disabled ***
*** 08:23:00 Network Switch Enabled ***
*** 08:24:52 “HP ENABLED” circuit commanded on ***
[10/17/2021, 8:24:52 AM] info: [8:24:52 AM] 192.168.80.55 PUT /state/circuit/setState {"id":9,"state":true}
*** FAILED ***
[10/17/2021, 8:24:52 AM] error: Net connect (socat) Connection: Error: read ECONNRESET. Retry in 10 seconds
[10/17/2021, 8:24:52 AM] info: Net connect (socat) closed due to error: 172.16.31.242:8899
[10/17/2021, 8:25:02 AM] info: Net connect (socat) connected to: 172.16.31.242:8899
[10/17/2021, 8:25:02 AM] info: Net connect (socat) Connection connected
[10/17/2021, 8:25:02 AM] info: Net connect (socat) ready and communicating: 172.16.31.242:8899
*** 08:27:00 “HP ENABLED” circuit commanded on ***
[10/17/2021, 8:27:00 AM] info: [8:27:00 AM] 192.168.80.55 PUT /state/circuit/setState {"id":9,"state":true}
*** FAILED ***
*** 08:29:00 “HP ENABLED” circuit commanded on ***
[10/17/2021, 8:29:00 AM] info: [8:29:00 AM] 192.168.80.55 PUT /state/circuit/setState {"id":9,"state":true}
*** FAILED ***
The text was updated successfully, but these errors were encountered: