Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Devices stop responding to WebUI requests after period of inactivity #862

Closed
law1964 opened this issue May 13, 2018 · 20 comments
Closed

Devices stop responding to WebUI requests after period of inactivity #862

law1964 opened this issue May 13, 2018 · 20 comments
Labels

Comments

@law1964
Copy link

law1964 commented May 13, 2018

I have three devices flashed with Espurna 1.12.6. All three of them stopped taking WebUI requests over night. This has happened two nights in a row. I receive responses when I ping the IP addresses and the devices are able to control the lights when I push the physical button on the devices. However, I have no access via the webUI and I cannot control them using either Alexa or OpenHAB.

The issue is cleared when I power cycle each device and I am able to access the webUI.

Anyone else experiencing this problem?

@law1964
Copy link
Author

law1964 commented May 14, 2018

This has happened again with the same symptoms. The three devices have stopped responding to WebUI requests.

Is there any way to make the debug log persistent over power cycles. I need a better way to troubleshoot this. I cannot telnet into the devices.

@mcspr
Copy link
Collaborator

mcspr commented May 14, 2018

Do you compile or use the release version?

There is debug panel in the WebUI that emulates telnet connection. You can enter 'crash' to view the latest crashlog, if there is any. It does not keep the whole log.

@law1964
Copy link
Author

law1964 commented May 14, 2018

I compile and use OTA with ArduinoIDE to flash my devices. I need to do the two step flash because the compiled image is too large to do a one-shot.

Thank you for the crash command. My sonoff basic has the following crash log:

[572397] [WEBSOCKET] #5 connected, ip: 192.168.20.21, url: /ws
[578436] [WEBSOCKET] Requested action: dbgcmd
[578440] [DEBUG] Latest crash was at 48011 ms after boot
[578442] [DEBUG] Reason of restart: 2
[578443] [DEBUG] Exception cause: 29
[578444] [DEBUG] epc1=0x40216726 epc2=0x00000000 epc3=0x00000000
[578447] [DEBUG] excvaddr=0x00000000 depc=0x00000000
[578452] [DEBUG] >>>stack>>>
[DEBUG] 3ffffd20: 00000010 00000010 00000000 4010053d
[DEBUG] 3ffffd30: 00000014 000088d0 0000111a 00000023
[DEBUG] 3ffffd40: 3fff1040 000012ad 000012ad 3ffffdec
[DEBUG] 3ffffd50: 00000194 3fff8524 00000000 40216808
[DEBUG] 3ffffd60: 00000010 00000010 00000000 4010053d
[DEBUG] 3ffffd70: 3fff9574 00000012 3ffffdf0 3ffebcf0
[DEBUG] 3ffffd80: 3ffffde0 3fff8524 3fff8524 401004d8
[DEBUG] 3ffffd90: 3ffe8b58 00000000 3ffffde0 3ffebcf0
[DEBUG] 3ffffda0: 3ffec21b 3fff8524 00000000 40214f58
[DEBUG] 3ffffdb0: 00000194 3ffffdec 3ffffde0 4022558e
[DEBUG] 3ffffdc0: 3ffe8b58 3fff8524 3fff8524 40214f9b
[DEBUG] 3ffffdd0: 3ffe8b58 3fff8524 3ffe8b58 4020287f
[DEBUG] 3ffffde0:

My sonoff RF Bridge has this one:

[694246] [WEBSOCKET] #10 connected, ip: 192.168.20.21, url: /ws
[704037] [WEBSOCKET] Requested action: dbgcmd
[704044] [DEBUG] Latest crash was at 122134 ms after boot
[704045] [DEBUG] Reason of restart: 2
[704046] [DEBUG] Exception cause: 29
[704047] [DEBUG] epc1=0x4022630f epc2=0x00000000 epc3=0x00000000
[704048] [DEBUG] excvaddr=0x00000000 depc=0x00000000
[704049] [DEBUG] >>>stack>>>
[DEBUG] 3ffffdd0: 00000001 3fff9004 3ffffe08 40226cfc
[DEBUG] 3ffffde0: 00000001 3fff9004 3fff9038 40216260
[DEBUG] 3ffffdf0: 3fffbd8c 0000000f 0000000f 3fffb544
[DEBUG] 3ffffe00: 0000000f 0000000f 3fffbbac 0000003f
[DEBUG] 3ffffe10: 00000034 3fffbf44 0000009f 00000099
[DEBUG] 3ffffe20: 3fffbda4 0000038a 0000038a 00000000
[DEBUG] 3ffffe30: 3ffed946 3ffed9e2 [900560] [MAIN] Uptime: 15900 seconds
[900563] [MAIN] Free heap: 17520 bytes
[900564] [MAIN] Power: 3499 mV
[900566] [MAIN] Time: 2018-05-14 20:04:21

I'm not sure how to interpret them, but it looks like the crashes occurred after I power cycled the devices to get access to them again. I still have access to these devices after the crashes.

@law1964 law1964 closed this as completed May 14, 2018
@law1964 law1964 reopened this May 14, 2018
@mcspr
Copy link
Collaborator

mcspr commented May 14, 2018

Sorry, I should've also sent tool to actually read these: https://github.com/me-no-dev/EspExceptionDecoder

And check that you are using lwip 1.4 in Tools / lwip variant

@law1964
Copy link
Author

law1964 commented May 15, 2018

@mcspr Thanks for the EspExceptionDecoder. I installed it in the tools directory for ArduinoIDE and selected it from the tools menu. It is asking me for a .elf file. What's a .elf file? I don't suppose this has anything to do with the fact that I am using ArduinoIDE on Windows. The decoder is written in Java and Java is device independent. I'll have to do some reading and determine how to get the decoder up and running because I'll probably need it in the future.

Anyhow, the three Espurna devices are still up and running and are accepting webUI requests. I am hoping that the issue that I had two days in a row are aberrations.

@mcspr
Copy link
Collaborator

mcspr commented May 15, 2018

Java is runtime of Arduino IDE. External tool is used to make the firmware - toolchain by xtensa. Compilation of sketch needs to happen first, .elf is the resulting firmware that is then packed and flashed to the device. Decoder will use the last built .elf by default.

The gist of it is - you compile the firmware, flash it and then use cached .elf as a reference to decode addresses in the stack dump when it crashes. IDE caches it when building - on windows it should be at %TEMP%/arduino_build_*/*.elf

@wildwiz
Copy link
Contributor

wildwiz commented May 15, 2018

Might this issue be related with #381 ?

@law1964
Copy link
Author

law1964 commented May 15, 2018

Perhaps.

I've noticed that if I reboot my PC and I try to connect by entering the device's IP address on the web browser, I would get a blank screen on the browser. If I reload, I get Espurna's status screen.

I'm not certain that the timed out session would explain why I would also lose control of the switch from Alexa and OpenHAB. These may be two different issues that occurred at the same time.

@xoseperez
Copy link
Owner

Might be the same as in #540 or #614

@icevoodoo
Copy link

@xoseperez : Please read a little this page ESP8266 hangs/stops responding when sent commands from different devices . Can be this a temporary fix for our problem?

In my case, in order to prevent issues (and stop unwanted data flow), I am simply stopping WebServer after expected connection comes - then I am starting it once ready for the next one.

//we got new connection_
server.stop();

//let's do the job here;
delay(30000); // does your server really has to reply to each and every connection?
server.begin(); // ready to resume.

@mcspr
Copy link
Collaborator

mcspr commented May 16, 2018

@wildwiz I think that is just an issue with js client logic - there should be websock.onerror / onclose handler to recover lost connection.
@icevoodoo Not really, that just disables authors web server for a while.

And I'd still like to know what Core / lwip versions are you using @law1964

@law1964
Copy link
Author

law1964 commented May 16, 2018

This would be a satisfactory workaround, in my case, if use of WebServer is the only cause of the hangup issue. I would limit control of the devices to Alexa and OpenHAB commands. Only when I need additional configuration of ESPURNA settings would I need to use the WebUI.

Does anyone know whether the use of Alexa/OpenHAB with the ESP8266 devices contributes to the hanging issue?

As of now, it has been 47.7 hours since I have needed to reboot the devices. :)

@law1964
Copy link
Author

law1964 commented May 16, 2018

I'm using lwIP variant 1.4 Higher Bandwidth on ArduinoIDE 1.8.5. I am compiling 1.12.6 for the generic ESP8266/8285 modules, whichever is appropriate for the Sonoff device. For Sonoff Basic, I use 8266 and for the RF Bridge and Sonoff 4CH Pro, I use the 8285.

@law1964
Copy link
Author

law1964 commented May 17, 2018

I have lost WebUI access to all 3 devices over night (about 2.5 days of normal operation). I can still ping them. I tried the arp nping method used by others to recover access to WebUI, but this did not work for me.

I have now power cycled the devices and they are currently working.

@law1964
Copy link
Author

law1964 commented May 17, 2018

@xoseperez How far are you on implementing scheduled reboots (#803 )? This would be a workaround for the issue, but it would help considerably.

@law1964
Copy link
Author

law1964 commented May 18, 2018

I don't know whether this is related or not, but I tried recompiling Itead-Sonoff-Basic for the 1.12.6 code using platformio IDE (instead of ArduinoIDE) and I received the errors below for the library ESP Async WebServer. The problems uncovered were:

image

Compiling .pioenvs\itead-sonoff-basic\lib77c\ESP Async WebServer\WebResponses.cpp.o
.piolibdeps\ESP Async WebServer\src\WebHandlers.cpp: In member function 'AsyncStaticWebHandler& AsyncStaticWebHandler::setLastModified(tm*)':
.piolibdeps\ESP Async WebServer\src\WebHandlers.cpp:67:64: error: 'strftime' was not declared in this scope
strftime (result,30,"%a, %d %b %Y %H:%M:%S %Z", last_modified);
^
.piolibdeps\ESP Async WebServer\src\WebHandlers.cpp: In member function 'AsyncStaticWebHandler& AsyncStaticWebHandler::setLastModified(time_t)':
.piolibdeps\ESP Async WebServer\src\WebHandlers.cpp:73:60: error: 'gmtime' was not declared in this scope
return setLastModified((struct tm *)gmtime(&last_modified));
^
.piolibdeps\ESP Async WebServer\src\WebHandlers.cpp: In member function 'AsyncStaticWebHandler& AsyncStaticWebHandler::setLastModified()':
.piolibdeps\ESP Async WebServer\src\WebHandlers.cpp:78:25: error: 'time' was not declared in this scope
if(time(&last_modified) == 0) //time is not yet set
^
*** [.pioenvs\itead-sonoff-basic\lib77c\ESP Async WebServer\WebHandlers.cpp.o] Error 1

@xoseperez
Copy link
Owner

@law1964 This issue is due to the Time.h library and it has already been reported and a there is a workaround (see #6 or #445). Basically don't use the original Time library but my fork to fix it.

@law1964
Copy link
Author

law1964 commented May 20, 2018

Thank you, @xoseperez. This information gave me enough direction to fix my build problem on platformio. It wasn't the Time.h library, but some of the other libraries that were causing the problem. When I initially installed platformio, I had started to install some of the libraries manually, as I had done for the arduinoIDE set up. I stopped doing the manual library installation when I realized that the platformio.ini file specified the library dependencies that were required and the build process would download and install them automatically, if needed. After I uninstalled the redundant/conflicting libraries from Global Storage, the build was completed without error.

I have installed the new image to my RF Bridge and will monitor how stable it is with respect to the WebUI access issue.

@stale
Copy link

stale bot commented Aug 13, 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 30 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Aug 13, 2018
@stale
Copy link

stale bot commented Aug 20, 2018

This issue will be auto-closed because there hasn't been any activity for a few months. Feel free to open a new one if you still experience this problem.

@stale stale bot closed this as completed Aug 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants