-
-
Notifications
You must be signed in to change notification settings - Fork 507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
on random/unpredictable bases, OpenDTU slowing down, or stop at all, working ( live view, mqtt etc ) #836
Comments
Hi, i also have this "DTU command failed" from the beginning. |
I was not able to identify a rational and repeatable path about the issues as also hard for me to catch the right moment of start, it happened under a shining sun as well as in the middle of the night. Today "lucky" day regarding my setup & this issues, and a new "slowdown & stop" happened just few minutes ago again and today seems that "2 - DTU command failed" event win hands down but as said, sometime it happend with few o no error in event log.
|
This Error Message most likley appears in the follwing conditions:
|
Just one (Open)DTU here and never played with limit, or other "set", so the second option seems the more reasonable but it sound to me as an effect more than source of the problem. Unfortunately I have not option to physically connect esp board to the pc and at the same time, keep it in link with the inverter on the roof, but if you suggest something to investigate I'll happy to do my best. Quick list about what already explored without success :
|
I am running multiple DTUs with the default settings in terms of polling time and mqtt publish time without any problems. Maximum RF power will require additional capacitors and power supply. Otherwise it's most likely that it causes power issues. You can look at Info --> System --> Uptime to see if the device rebootet very often |
Since the last update of OpenDTU I also get these "DTU command failed" messages. In previous versions I never got it. But up to now the OpenDTU seems to work. I'm not sure which version was the last without errors. |
Since the update to v23.4.24 my OpenDTU performs also many reboots (35 times in the last 8 hours since the update). |
What is your poll interval? Can you post an output of the serial console at the time of reboot? (Web Console is not enough, must really be the serial console) This would help to determine the issue. |
My poll intervall is 1 second. I've always used this interval. By the way: All reboots happend during night time when the inverters were off. There were no further reboots since the first inverter was online. |
You don't have to.... 1 second interval leads currently to a full command queue at night because all the commands run into a timeout. Change it to 5 seconds and everything will work. I am aware of this issue and will fix it soon. |
OK, since which version does the issue occur, because I've always used a 1 second interval without any issues? |
Ok, added "uptime" to the "monitor" ( via mqtt ) and RF Power kept at "High ( -6 dB )" level. |
Could you please doublecheck version v23.4.25 ? |
Of course, I update the DTU this evening when the last inverter is offline. I give you feedback as soon as possible. Thanks for the quick fixes! |
ok, I'll try it. Thanks for the support :-) |
Last night I had no command failed messages and no reboots with v23.4.25. Edit: Tested with 1 second poll interval. |
I had the same issue, but it is fixed now with your last update: no more "DTU command failed". Thanks. |
Still here, same story ... 14 hours without a glitch then plenty of "2 - DTU command failed", web response slowed down, mqtt slowed down and so on and, as "usual", no alert by uptime and in line with the last "manual" action. This time a manual "restart" was needed to come back in good working condition. I'm open to any suggestions. p.s. I'm asking myself if I'm alone on this not so comfortable status. |
Hi, |
Further no Problems anymore with v23.4.25 |
Lucky man :-) ... unfortunately I'm not in the same state, and today ( even night ) sound like an happy day for the problem ... a good start with first "stop" around midnight ( so no long event list by DTU ) fixed just by manual reset then on morning startup with several, spontaneously aborted, "2 - DTU Command Failed" just before the inverter reaches "production" state. @tbnobody a ( maybe dumb ) question ... as written in the first post, I'm building FW by myself (no custom option, just syncing with this repo and compiling locally on VSC) ... may I switch to your pre-compiled version (last chance to try) simply by OTA or might be better to start with a full clear/reset in this case? |
würdest du mir deine config.json sowie deine pin_mapping.json zusenden? Beides bitte bearbeiten und wifi credentials sowie ggf. seriennummern der inverter löschen? Bei mir tritt das aktuell nicht mehr auf. ich würde jedoch einfach mal versuchen das mit deiner config nachzuvollziehen. |
Sorry not clear if the request was for me or not, anyway, in the archive, the config.json and pin_mapping.json. |
The pre-build binaries should be exactly the same as the generic environment which you build locally. the full reset will not change anything except that also the config partition will be erased. If you plan to recover your config afterwards it doesn't matter what you are doing. the result will be the same. |
Did you send Limit Commands to your Inverter(s)? If yes, how often you send Commands to the Inverter? |
Hi, no there no Limit Commands. |
Today I suddenly had the "DTU command failed" messages again (during the day). I'm currently on v23.4.28. |
Can you check uptime? Does it correspond the the DTU command failed ? |
As I wrote there was no reboot. The uptime is running since I did the last firmware update some days ago. The error occurred today. |
Ich hatte heute auch einen Restart/Absturz nach einer super langsamen DTU. Vorher relativ problemlos mehrere Tage gelaufen. Ganz selten mal DTU Errors und gerade eben die DTU aufgerufen (extrem langsam). Plopp Neustart und direkt DTU Errors im Log über mehrere Inverter....scheint noch nicht ganz aus der Welt zu sein das Problem :/. Zusätzlich verhält sich die DTU nach dem Neustart sehr langsam, seiten brauchen ein paar Sekunden um sich zu öffnen. Interessanterweise betrifft es nur die WR die auch Angeschalten sind. Die anderen beiden (sind gerade aus, da ich keine Leistung aus dem Akku benötige) verhalten sich noch brav): |
Habe die DTU vom Strom getrennt/trennen müssen, da gar nix mehr ging. Seitdem läuft es wieder flüssig. |
I'd not say but last 2 days went without (big) issues and spare/few event in event list ... FW still on v23.4.25 and the only change was about the mqtt publish time doubled against dtu poll interval ( mqtt from 5sec to 10sec, dtu poll kept at 5sec ). |
Hi, for me this happens only on some days. And then not that often in a row as reported before. I did not notice dtu degradation (but I am just monitoring it is pingable, not enough to exclude such) Since being aware of this issue, I checked the occurrences more closely, if I notice them. And I found it happen at least with 2 situations:
Unfortunately the errors do not get recorded. Perhaps one could add to the mqtt topics counters per error number? E.g.
(those are the errors, that I noticed up to now from web ui) |
A few miniutes ago I had again the issue. |
After about 3 days without problems, yesterday in the late afternoon the problem reappeared and continued, with ups and downs, until this morning. In short, the feeling is that the trigger in this case was again related to the WiFi coverage so, even if the hard link between "DTU error <> WiFi" is still a mystery to me, I did a little hack on the original code to interrupt the esp32 energy saving function. The "hacked" FW (latest source v23.4.28) is running now and the start was the strongest and most reliable WiFi link (AP report also confirms higher available bitrate). Let's see what happens. |
Still here and no, the total stop of the energy saving on esp32 confirm the strongest and most reliable WiFi link but not solved the issue.
Yes, even here the mqtt server log show various and quick connect / disconnect more or less il line with the timing of the several "2 - DTU Command Failed" event. |
IMHO there should be no mqtt re-connecting at all. I guess the symptoms arise from that. Maybe the openDTU get's blocked with the tcp connection breaking towards the mqtt server, Maybe some publish call is hanging until tcp timeout is finally terminating it. Or the mqtt server process is in trouble and does not consume incoming traffic (high system load, swapping, ...). |
Hi all, I have some good news to share. Although I wasn't able to identify the source of the hard link between the WiFi side and the DTU side (the long list of "2 - DTU Command Failed" in the DTU event list), I may have been able to address the source of the WiFi issue, at least in my setup. By default, the ESP32 tries to negotiate a higher bandwidth available by the 2.4 GHz WiFi specification (HT40). This may sound good, but in real life scenarios (e.g. less than optimal line of sight, neighbor's network, WiFi clients), it often backfires and results in a problematic or poor performing link. Dealing with this issue can be simple if your access point/router allows you to set up HT20 instead of HT40. In my setup, extending the WiFi up to the roof using a "WiFi extender" was the simplest way, but unfortunately, it did not allow me to play with bandwidth options ( Although this may not be at the top of the features list to be provided in a commercial product) a little hack to the openDTU source forcing "HT20" has resulted in a stable connection with no long lists of "2 - DTU Command Failed" in the event list and 0 connect/disconnect ping pong on the MQTT server log. Even after restoring the (standard) power saving setup, everything has been running rock solid for the past two days. Please see the attached screenshot of today's event, yesterday more or less the same, which show no "2" errors and only a few "Unknown" errors (with codes 12, 36, or 46) that were sparse. I'm not celebrating too soon, but even thinking around the real bandwidth needed by this kind of devices, as well as IoT devices in general, @tbnobody let me to suggest to evaluate the option to force "HT20" config as default in the OpenDTU code. Thank you for your attention. |
These are interesting findings, but I don't think there are any problems with the WLAN connection in my case. My DTU is located 30 cm from the access point. The problem also occurs quite differently for me. Sometimes I have the errors several times within a few days, then nothing happens for days. I could try to artificially degrade the WLAN connection or check whether the access point logs any changes in the connection when the issue occurs. |
This time the error pattern was somewhat different. Since yesterday evening, the OpenDTU has suddenly been sending faulty MQTT packets without interruption. The inverter topics suddenly all contained incorrect characters, which led to problems in my home automation system. I noticed the problem about 12 hours later. I was able to open the OpenDTU web interface, but the page with the live data did not load. Only the loading animation was visible. All the other pages worked without any problems. Since a few versions, the DTU has been running very unreliably. |
I had the same issue the previous day. An ESP32 was connected to the AP but the webinterface was very slow and the mqtt connection always toggled it's connection state. Interesting remark, it was a Sunton display with OpenHasp firmware.... After serveral resets everything was ok.... |
Yesterday morning a single error #2 showed up in the log. Interestingly
before inverter start, at 5:31, ~2 minutes after openDTU starting
acquisition again.
Mqtt server does not show a reconnect.
System load monitoring does not show any significant one that time.
I currently have no access to my wlan ap log, to look up if the opendtu
client reconnected or any other events that time.
Whatever this means.
|
Hi all, I'm still here ... after a few "quiet days", the issues have resurfaced this morning, along with the long list of "2" events. I'm quite certain that, at least in my case, the trigger is the unreliable WiFi infrastructure ( repeater ) but, despite its inconsistencies, I'm keeping it as is to facilitate further investigations. Let me share some findings:
I hope these pieces of information can be helpful in some way. |
Since my previous feedback, around 2 weeks ago, I have been managing the WiFi link to try to improve it and find further confirmation regarding the hard link between the event "2" flood and the WiFi. So, I moved the openDTU device inside and close to the (poor)AP. This did the trick, and there have been no more "2" flood in the events log. Sometimes there are spare events like 12, 36, or 46, but hardly "2" and no more disconnection in the MQTT server log. An additional unexpected benefit of moving from the roof to inside was a stronger signal for the "DTU", allowing me to lower the transmitting power from "Max" to "Low." ( even the "minimum" power setting works without issues, but I prefer to stay on the safe side) It might be interesting to note that around the same time, I implemented a sort of "delay counter" in the code and discovered that the esp32 occasionally gets heavily occupied somewhere and fails to properly execute the "network loop" code for about 3 or 4 seconds. Hoping it is useful in some way. |
Please note that we have renamed the Event Log Alarm Code @iomax so IMHO this may be something like the command sent to the inverter which contains the Timestamp has been delayed too much and does not match the internal timestamp of the inverter. |
What happened?
From time to time, on really hard to predict bases, sometimes after days, sometimes several time in an hour OpenDTU slowly down web response as mqtt publish as well.
Trigger a reset, if I'll be able to reach the "reset" page, seems able to restore the good working condition but I'll be not able to know if that will be for days, hours or minutes.
Most of time, but it will be not a must, event log report several "2 - DTU command failed" and/or "36 - Unknown" but unfortunately it's not clear to me if that could be the cause or just the effect.
Tried to slowing down polling interval ( DTU as MQTT as well), avoid live view page opening, lowering/raising NRF24 Transmitting power but nothing seems to provide any, positive or negative, change on this cases.
The feeling is that cloudy sky, so with unstable power production, could "help" the issues to start but it also happened at night so it seems difficult to think about it.
Attached the event log screen-shoot and below few rows that I was able to get from Virtual Debug Console few minutes, just during the last occurrence of the issue
To Reproduce Bug
It happen on random basis and I'm unable to identify the path to reproduce it
Expected Behavior
Work without random stop/reset ? :-)
Install Method
Self-Compiled
What git-hash/version of OpenDTU?
c078e88
Relevant log/trace output
Anything else?
OpenDTU on AZDelivery ESP32 NodeMcu WiFi CP2102 + Hoymiles HM-350.
Maybe usesefull to add that it was the same since the first setup about 2 months ago and with various FW release since that
The text was updated successfully, but these errors were encountered: