Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash on polling web api every 5 seconds #611

Closed
helgeerbe opened this issue Feb 20, 2023 · 11 comments
Closed

Crash on polling web api every 5 seconds #611

helgeerbe opened this issue Feb 20, 2023 · 11 comments
Labels
bug Something isn't working

Comments

@helgeerbe
Copy link
Contributor

helgeerbe commented Feb 20, 2023

What happened?

I use telegram to poll the prometheus api every 5 seconds. After a while OpenDTU reboots.

To Reproduce Bug

I use telegram to poll prometheus api every 5 seconds

Expected Behavior

Don't crash.

Install Method

Self-Compiled

What git-hash/version of OpenDTU?

09942e8

Relevant log/trace output

14:23:23.000 > abort() was called at PC 0x40198987 on core 1
14:23:23.005 > 
14:23:23.005 > 
14:23:23.005 > Backtrace: 0x4008391d:0x3ffddf80 0x4008e0f1:0x3ffddfa0 0x400939f1:0x3ffddfc0 0x40198987:0x3ffde040 0x401989ce:0x3ffde060 0x401984c7:0x3ffde080 0x401985ee:0x3ffde0a0 0x40197ea5:0x3ffde0c0 0x4010ec95:0x3ffde0e0 0x40100f6d:0x3ffde100 0x400fe9ad:0x3ffde140 0x400f1cc7:0x3ffde160 0x401b5321:0x3ffde1f0 0x40101ca1:0x3ffde210 0x400ff209:0x3ffde260 0x400ff539:0x3ffde2b0 0x400ff749:0x3ffde310 0x4019ad41:0x3ffde330 0x4019adb9:0x3ffde360 0x4019b572:0x3ffde380
14:23:23.603 > 
14:23:23.603 >   #0  0x4008391d:0x3ffddf80 in panic_abort at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/esp_system/panic.c:402
14:23:23.603 >   #1  0x4008e0f1:0x3ffddfa0 in esp_system_abort at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/esp_system/esp_system.c:128
14:23:23.603 >   #2  0x400939f1:0x3ffddfc0 in abort at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/newlib/abort.c:46
14:23:23.603 >   #3  0x40198987:0x3ffde040 in __cxxabiv1::__terminate(void (*)()) at /Users/brnomac003/.gitlab-runner/builds/qR2TxTby/0/idf/crosstool-NG/.build/xtensa-esp32-elf/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:47
14:23:23.603 >   #4  0x401989ce:0x3ffde060 in std::terminate() at /Users/brnomac003/.gitlab-runner/builds/qR2TxTby/0/idf/crosstool-NG/.build/xtensa-esp32-elf/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:57
14:23:23.603 >   #5  0x401984c7:0x3ffde080 in __cxa_throw at /Users/brnomac003/.gitlab-runner/builds/qR2TxTby/0/idf/crosstool-NG/.build/xtensa-esp32-elf/src/gcc/libstdc++-v3/libsupc++/eh_throw.cc:95
14:23:23.603 >   #6  0x401985ee:0x3ffde0a0 in operator new(unsigned int) at /Users/brnomac003/.gitlab-runner/builds/qR2TxTby/0/idf/crosstool-NG/.build/xtensa-esp32-elf/src/gcc/libstdc++-v3/libsupc++/new_op.cc:54
14:23:23.603 >   #7  0x40197ea5:0x3ffde0c0 in operator new[](unsigned int) at /Users/brnomac003/.gitlab-runner/builds/qR2TxTby/0/idf/crosstool-NG/.build/xtensa-esp32-elf/src/gcc/libstdc++-v3/libsupc++/new_opv.cc:32
14:23:23.603 >   #8  0x4010ec95:0x3ffde0e0 in cbuf::cbuf(unsigned int) at /Users/herbe/.platformio/packages/framework-arduinoespressif32/cores/esp32/cbuf.cpp:24
14:23:23.603 >   #9  0x40100f6d:0x3ffde100 in AsyncResponseStream::AsyncResponseStream(String const&, unsigned int) at .pio/libdeps/d1 mini esp32/ESP Async WebServer/src/WebResponses.cpp:680
14:23:23.603 >   #10 0x400fe9ad:0x3ffde140 in AsyncWebServerRequest::beginResponseStream(String const&, unsigned int) at .pio/libdeps/d1 mini esp32/ESP Async WebServer/src/WebRequest.cpp:777
14:23:23.603 >   #11 0x400f1cc7:0x3ffde160 in WebApiPrometheusClass::onPrometheusMetricsGet(AsyncWebServerRequest*) at src/WebApi_prometheus.cpp:26
14:23:23.603 >   #12 0x401b5321:0x3ffde1f0 in void std::__invoke_impl<void, void (WebApiPrometheusClass::*&)(AsyncWebServerRequest*), WebApiPrometheusClass*&, AsyncWebServerRequest*>(std::__invoke_memfun_deref, void (WebApiPrometheusClass::*&)(AsyncWebServerRequest*), WebApiPrometheusClass*&, AsyncWebServerRequest*&&) at /Users/herbe/.platformio/packages/toolchain-xtensa-esp32@8.4.0+2021r2-patch5/xtensa-esp32-elf/include/c++/8.4.0/bits/invoke.h:73
14:23:23.603 >       (inlined by) std::__invoke_result<void (WebApiPrometheusClass::*&)(AsyncWebServerRequest*), WebApiPrometheusClass*&, AsyncWebServerRequest*>::type std::__invoke<void (WebApiPrometheusClass::*&)(AsyncWebServerRequest*), WebApiPrometheusClass*&, AsyncWebServerRequest*>(void (WebApiPrometheusClass::*&)(AsyncWebServerRequest*), WebApiPrometheusClass*&, AsyncWebServerRequest*&&) at /Users/herbe/.platformio/packages/toolchain-xtensa-esp32@8.4.0+2021r2-patch5/xtensa-esp32-elf/include/c++/8.4.0/bits/invoke.h:95
14:23:23.603 >       (inlined by) void std::_Bind<void (WebApiPrometheusClass::*(WebApiPrometheusClass*, std::_Placeholder<1>))(AsyncWebServerRequest*)>::__call<void, AsyncWebServerRequest*&&, 0u, 1u>(std::tuple<AsyncWebServerRequest*&&>&&, std::_Index_tuple<0u, 1u>) at /Users/herbe/.platformio/packages/toolchain-xtensa-esp32@8.4.0+2021r2-patch5/xtensa-esp32-elf/include/c++/8.4.0/functional:400
14:23:23.603 >       (inlined by) void std::_Bind<void (WebApiPrometheusClass::*(WebApiPrometheusClass*, std::_Placeholder<1>))(AsyncWebServerRequest*)>::operator()<AsyncWebServerRequest*, void>(AsyncWebServerRequest*&&) at /Users/herbe/.platformio/packages/toolchain-xtensa-esp32@8.4.0+2021r2-patch5/xtensa-esp32-elf/include/c++/8.4.0/functional:484
14:23:23.603 >       (inlined by) std::_Function_handler<void (AsyncWebServerRequest*), std::_Bind<void (WebApiPrometheusClass::*(WebApiPrometheusClass*, std::_Placeholder<1>))(AsyncWebServerRequest*)> >::_M_invoke(std::_Any_data const&, AsyncWebServerRequest*&&) at /Users/herbe/.platformio/packages/toolchain-xtensa-esp32@8.4.0+2021r2-patch5/xtensa-esp32-elf/include/c++/8.4.0/bits/std_function.h:297
14:23:23.603 >   #13 0x40101ca1:0x3ffde210 in std::function<void (AsyncWebServerRequest*)>::operator()(AsyncWebServerRequest*) const at /Users/herbe/.platformio/packages/toolchain-xtensa-esp32@8.4.0+2021r2-patch5/xtensa-esp32-elf/include/c++/8.4.0/bits/std_function.h:687
14:23:23.603 >       (inlined by) AsyncCallbackWebHandler::handleRequest(AsyncWebServerRequest*) at .pio/libdeps/d1 mini esp32/ESP Async WebServer/src/WebHandlerImpl.h:132
14:23:23.603 >   #14 0x400ff209:0x3ffde260 in AsyncWebServerRequest::_parseLine() at .pio/libdeps/d1 mini esp32/ESP Async WebServer/src/WebRequest.cpp:603 (discriminator 1)
14:23:23.603 >   #15 0x400ff539:0x3ffde2b0 in AsyncWebServerRequest::_onData(void*, unsigned int) at .pio/libdeps/d1 mini esp32/ESP Async WebServer/src/WebRequest.cpp:124
14:23:23.603 >   #16 0x400ff749:0x3ffde310 in std::_Function_handler<void (void*, AsyncClient*, void*, unsigned int), AsyncWebServerRequest::AsyncWebServerRequest(AsyncWebServer*, AsyncClient*)::{lambda(void*, AsyncClient*, void*, unsigned int)#6}>::_M_invoke(std::_Any_data const&, void*&&, AsyncClient*&&, std::_Any_data const&, unsigned int&&) at .pio/libdeps/d1 mini esp32/ESP Async WebServer/src/WebRequest.cpp:72
14:23:23.603 >       (inlined by) _M_invoke at /Users/herbe/.platformio/packages/toolchain-xtensa-esp32@8.4.0+2021r2-patch5/xtensa-esp32-elf/include/c++/8.4.0/bits/std_function.h:297
14:23:23.603 >   #17 0x4019ad41:0x3ffde330 in std::function<void (void*, AsyncClient*, void*, unsigned int)>::operator()(void*, AsyncClient*, void*, unsigned int) const at /Users/herbe/.platformio/packages/toolchain-xtensa-esp32@8.4.0+2021r2-patch5/xtensa-esp32-elf/include/c++/8.4.0/bits/std_function.h:687
14:23:23.603 >       (inlined by) AsyncClient::_recv(tcp_pcb*, pbuf*, signed char) at .pio/libdeps/d1 mini esp32/AsyncTCP/src/AsyncTCP.cpp:915
14:23:23.603 >   #18 0x4019adb9:0x3ffde360 in AsyncClient::_s_recv(void*, tcp_pcb*, pbuf*, signed char) at .pio/libdeps/d1 mini esp32/AsyncTCP/src/AsyncTCP.cpp:1191
14:23:23.603 >   #19 0x4019b572:0x3ffde380 in _async_service_task(void*) at .pio/libdeps/d1 mini esp32/AsyncTCP/src/AsyncTCP.cpp:159
14:23:23.603 >       (inlined by) _async_service_task at .pio/libdeps/d1 mini esp32/AsyncTCP/src/AsyncTCP.cpp:194
14:23:23.603 > 
14:23:23.603 > 
14:23:23.603 > 

or

14:37:26.140 > Success
14:37:26.929 > 
14:37:26.930 > abort() was called at PC 0x40198987 on core 1
14:37:26.934 > 
14:37:26.934 > 
14:37:26.934 > Backtrace: 0x4008391d:0x3ffddfa0 0x4008e0f1:0x3ffddfc0 0x400939f1:0x3ffddfe0 0x40198987:0x3ffde060 0x401989ce:0x3ffde080 0x401984c7:0x3ffde0a0 0x401985ee:0x3ffde0c0 0x40197ea5:0x3ffde0e0 0x4010ec95:0x3ffde100 0x40100f6d:0x3ffde120 0x400fe9ad:0x3ffde160 0x400f1cc7:0x3ffde180 0x401b5321:0x3ffde210 0x40101ca1:0x3ffde230 0x400ff209:0x3ffde280 0x400ff539:0x3ffde2d0 0x400ff749:0x3ffde330 0x4019ad41:0x3ffde350 0x4019adb9:0x3ffde380 0x4019b572:0x3ffde3a0
14:37:27.447 > 
14:37:27.447 >   #0  0x4008391d:0x3ffddfa0 in panic_abort at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/esp_system/panic.c:402
14:37:27.447 >   #1  0x4008e0f1:0x3ffddfc0 in esp_system_abort at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/esp_system/esp_system.c:128
14:37:27.447 >   #2  0x400939f1:0x3ffddfe0 in abort at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/newlib/abort.c:46
14:37:27.447 >   #3  0x40198987:0x3ffde060 in __cxxabiv1::__terminate(void (*)()) at /Users/brnomac003/.gitlab-runner/builds/qR2TxTby/0/idf/crosstool-NG/.build/xtensa-esp32-elf/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:47
14:37:27.447 >   #4  0x401989ce:0x3ffde080 in std::terminate() at /Users/brnomac003/.gitlab-runner/builds/qR2TxTby/0/idf/crosstool-NG/.build/xtensa-esp32-elf/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:57
14:37:27.447 >   #5  0x401984c7:0x3ffde0a0 in __cxa_throw at /Users/brnomac003/.gitlab-runner/builds/qR2TxTby/0/idf/crosstool-NG/.build/xtensa-esp32-elf/src/gcc/libstdc++-v3/libsupc++/eh_throw.cc:95
14:37:27.447 >   #6  0x401985ee:0x3ffde0c0 in operator new(unsigned int) at /Users/brnomac003/.gitlab-runner/builds/qR2TxTby/0/idf/crosstool-NG/.build/xtensa-esp32-elf/src/gcc/libstdc++-v3/libsupc++/new_op.cc:54
14:37:27.447 >   #7  0x40197ea5:0x3ffde0e0 in operator new[](unsigned int) at /Users/brnomac003/.gitlab-runner/builds/qR2TxTby/0/idf/crosstool-NG/.build/xtensa-esp32-elf/src/gcc/libstdc++-v3/libsupc++/new_opv.cc:32
14:37:27.447 >   #8  0x4010ec95:0x3ffde100 in cbuf::cbuf(unsigned int) at /Users/herbe/.platformio/packages/framework-arduinoespressif32/cores/esp32/cbuf.cpp:24
14:37:27.447 >   #9  0x40100f6d:0x3ffde120 in AsyncResponseStream::AsyncResponseStream(String const&, unsigned int) at .pio/libdeps/d1 mini esp32/ESP Async WebServer/src/WebResponses.cpp:680
14:37:27.447 >   #10 0x400fe9ad:0x3ffde160 in AsyncWebServerRequest::beginResponseStream(String const&, unsigned int) at .pio/libdeps/d1 mini esp32/ESP Async WebServer/src/WebRequest.cpp:777
14:37:27.447 >   #11 0x400f1cc7:0x3ffde180 in WebApiPrometheusClass::onPrometheusMetricsGet(AsyncWebServerRequest*) at src/WebApi_prometheus.cpp:26
14:37:27.447 >   #12 0x401b5321:0x3ffde210 in void std::__invoke_impl<void, void (WebApiPrometheusClass::*&)(AsyncWebServerRequest*), WebApiPrometheusClass*&, AsyncWebServerRequest*>(std::__invoke_memfun_deref, void (WebApiPrometheusClass::*&)(AsyncWebServerRequest*), WebApiPrometheusClass*&, AsyncWebServerRequest*&&) at /Users/herbe/.platformio/packages/toolchain-xtensa-esp32@8.4.0+2021r2-patch5/xtensa-esp32-elf/include/c++/8.4.0/bits/invoke.h:73
14:37:27.447 >       (inlined by) std::__invoke_result<void (WebApiPrometheusClass::*&)(AsyncWebServerRequest*), WebApiPrometheusClass*&, AsyncWebServerRequest*>::type std::__invoke<void (WebApiPrometheusClass::*&)(AsyncWebServerRequest*), WebApiPrometheusClass*&, AsyncWebServerRequest*>(void (WebApiPrometheusClass::*&)(AsyncWebServerRequest*), WebApiPrometheusClass*&, AsyncWebServerRequest*&&) at /Users/herbe/.platformio/packages/toolchain-xtensa-esp32@8.4.0+2021r2-patch5/xtensa-esp32-elf/include/c++/8.4.0/bits/invoke.h:95
14:37:27.447 >       (inlined by) void std::_Bind<void (WebApiPrometheusClass::*(WebApiPrometheusClass*, std::_Placeholder<1>))(AsyncWebServerRequest*)>::__call<void, AsyncWebServerRequest*&&, 0u, 1u>(std::tuple<AsyncWebServerRequest*&&>&&, std::_Index_tuple<0u, 1u>) at /Users/herbe/.platformio/packages/toolchain-xtensa-esp32@8.4.0+2021r2-patch5/xtensa-esp32-elf/include/c++/8.4.0/functional:400
14:37:27.447 >       (inlined by) void std::_Bind<void (WebApiPrometheusClass::*(WebApiPrometheusClass*, std::_Placeholder<1>))(AsyncWebServerRequest*)>::operator()<AsyncWebServerRequest*, void>(AsyncWebServerRequest*&&) at /Users/herbe/.platformio/packages/toolchain-xtensa-esp32@8.4.0+2021r2-patch5/xtensa-esp32-elf/include/c++/8.4.0/functional:484
14:37:27.447 >       (inlined by) std::_Function_handler<void (AsyncWebServerRequest*), std::_Bind<void (WebApiPrometheusClass::*(WebApiPrometheusClass*, std::_Placeholder<1>))(AsyncWebServerRequest*)> >::_M_invoke(std::_Any_data const&, AsyncWebServerRequest*&&) at /Users/herbe/.platformio/packages/toolchain-xtensa-esp32@8.4.0+2021r2-patch5/xtensa-esp32-elf/include/c++/8.4.0/bits/std_function.h:297
14:37:27.447 >   #13 0x40101ca1:0x3ffde230 in std::function<void (AsyncWebServerRequest*)>::operator()(AsyncWebServerRequest*) const at /Users/herbe/.platformio/packages/toolchain-xtensa-esp32@8.4.0+2021r2-patch5/xtensa-esp32-elf/include/c++/8.4.0/bits/std_function.h:687
14:37:27.447 >       (inlined by) AsyncCallbackWebHandler::handleRequest(AsyncWebServerRequest*) at .pio/libdeps/d1 mini esp32/ESP Async WebServer/src/WebHandlerImpl.h:132
14:37:27.447 >   #14 0x400ff209:0x3ffde280 in AsyncWebServerRequest::_parseLine() at .pio/libdeps/d1 mini esp32/ESP Async WebServer/src/WebRequest.cpp:603 (discriminator 1)
14:37:27.447 >   #15 0x400ff539:0x3ffde2d0 in AsyncWebServerRequest::_onData(void*, unsigned int) at .pio/libdeps/d1 mini esp32/ESP Async WebServer/src/WebRequest.cpp:124
14:37:27.447 >   #16 0x400ff749:0x3ffde330 in std::_Function_handler<void (void*, AsyncClient*, void*, unsigned int), AsyncWebServerRequest::AsyncWebServerRequest(AsyncWebServer*, AsyncClient*)::{lambda(void*, AsyncClient*, void*, unsigned int)#6}>::_M_invoke(std::_Any_data const&, void*&&, AsyncClient*&&, std::_Any_data const&, unsigned int&&) at .pio/libdeps/d1 mini esp32/ESP Async WebServer/src/WebRequest.cpp:72
14:37:27.447 >       (inlined by) _M_invoke at /Users/herbe/.platformio/packages/toolchain-xtensa-esp32@8.4.0+2021r2-patch5/xtensa-esp32-elf/include/c++/8.4.0/bits/std_function.h:297
14:37:27.447 >   #17 0x4019ad41:0x3ffde350 in std::function<void (void*, AsyncClient*, void*, unsigned int)>::operator()(void*, AsyncClient*, void*, unsigned int) const at /Users/herbe/.platformio/packages/toolchain-xtensa-esp32@8.4.0+2021r2-patch5/xtensa-esp32-elf/include/c++/8.4.0/bits/std_function.h:687
14:37:27.447 >       (inlined by) AsyncClient::_recv(tcp_pcb*, pbuf*, signed char) at .pio/libdeps/d1 mini esp32/AsyncTCP/src/AsyncTCP.cpp:915
14:37:27.447 >   #18 0x4019adb9:0x3ffde380 in AsyncClient::_s_recv(void*, tcp_pcb*, pbuf*, signed char) at .pio/libdeps/d1 mini esp32/AsyncTCP/src/AsyncTCP.cpp:1191
14:37:27.447 >   #19 0x4019b572:0x3ffde3a0 in _async_service_task(void*) at .pio/libdeps/d1 mini esp32/AsyncTCP/src/AsyncTCP.cpp:159
14:37:27.447 >       (inlined by) _async_service_task at .pio/libdeps/d1 mini esp32/AsyncTCP/src/AsyncTCP.cpp:194
14:37:27.447 > 
14:37:27.448 > 

Anything else?

I don't use the original code. But due to the crashes, I installed the original OpenDTU and got crashes, without knowing why.

Logs are from my extension to OpenDTU

@helgeerbe helgeerbe added the bug Something isn't working label Feb 20, 2023
@helgeerbe helgeerbe changed the title Crash on polling prometheus api every 5 seconds Crash on polling wea api every 5 seconds Feb 20, 2023
@helgeerbe helgeerbe changed the title Crash on polling wea api every 5 seconds Crash on polling web api every 5 seconds Feb 20, 2023
@helgeerbe
Copy link
Contributor Author

OK, still have crashes, but more seldom. On liveview it takes a while until system reboots.
Unfortunately I can't generate further log, because the OpenDTU is not near a computer.

I will setup a spare esp32 with original OpenDTU firmware and will stress the webapi through live view. Maybe I can catch some good logs

@helgeerbe
Copy link
Contributor Author

Used original OpenDTU. Running my config without connected nRF24L01+ module.

Did a permanent reload of liveview (manual clicking reload)

20:16:27.002 > assert failed: tcp_update_rcv_ann_wnd IDF/components/lwip/lwip/src/core/tcp.c:951 (new_rcv_ann_wnd <= 0xffff)

20:16:27.002 > 

20:16:27.002 > 

20:16:27.002 > Backtrace: 0x40083919:0x3ffb34c0 0x4008e0cd:0x3ffb34e0 0x40093ad1:0x3ffb3500 0x40128f46:0x3ffb3630 0x40128ff4:0x3ffb3650 0x40191efe:0x3ffb3670 0x40125864:0x3ffb3690

20:16:27.002 > 
20:16:27.002 >   #0  0x40083919:0x3ffb34c0 in panic_abort at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/esp_system/panic.c:402
20:16:27.002 >   #1  0x4008e0cd:0x3ffb34e0 in esp_system_abort at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/esp_system/esp_system.c:128
20:16:27.002 >   #2  0x40093ad1:0x3ffb3500 in __assert_func at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/newlib/assert.c:85
20:16:27.002 >   #3  0x40128f46:0x3ffb3630 in tcp_update_rcv_ann_wnd at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/lwip/lwip/src/core/tcp.c:951

20:16:27.002 >       (inlined by) tcp_update_rcv_ann_wnd at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/lwip/lwip/src/core/tcp.c:931
20:16:27.002 >   #4  0x40128ff4:0x3ffb3650 in tcp_recved at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/lwip/lwip/src/core/tcp.c:991
20:16:27.002 >   #5  0x40191efe:0x3ffb3670 in _tcp_recved_api(tcpip_api_call_data*) at .pio/libdeps/d1 mini esp32/AsyncTCP/src/AsyncTCP.cpp:419
20:16:27.002 >   #6  0x40125864:0x3ffb3690 in tcpip_thread_handle_msg at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/lwip/lwip/src/api/tcpip.c:172

20:16:27.002 >       (inlined by) tcpip_thread at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/lwip/lwip/src/api/tcpip.c:154
20:16:27.002 > 

@tbnobody
Copy link
Owner

tbnobody commented Feb 20, 2023

You are somehow crashing the TCP Stack (AsyncTCP). A such frequent reload interval was never be intended. Just don't do it. Your other Traces look like out of memory exceptions. This is due to the fact that the rendering of these outputs take a lot of memory and you maybe send a request before the previous one is finished completly

@helgeerbe
Copy link
Contributor Author

helgeerbe commented Feb 20, 2023

Thanks for the hint.
I logged the heapsize with prometheus and see no variation over time.

Pometheus is polled every 5 seconds and opened liveview in parallel (rate at 7 seconds). This is enough to crash the system. I will set prometheus pollrate to every 20 seconds.

@tbnobody
Copy link
Owner

14:37:27.447 > #6 0x401985ee:0x3ffde0c0 in operator new(unsigned int) at /Users/brnomac003/.gitlab-runner/builds/qR2TxTby/0/idf/crosstool-NG/.build/xtensa-esp32-elf/src/gcc/libstdc++-v3/libsupc++/new_op.cc:54

It seems to crash in the new operator. Maybe not out of heap but due to heap fragmentation it's not possible to allocate one block of memory.

@helgeerbe
Copy link
Contributor Author

I make use of maps in my VeDirect interface. Maybe static array would be better.

@helgeerbe
Copy link
Contributor Author

I reimplemented my VeDirect frame handler. Instead of maps I use static struct and fixed sized char arrays. Memory consumption is indeed better now.

But this doesn't solve the problem with the promotheus api. I tested it over a couple of hours. Without accessing the prometheus api, everything is stable. When I poll on the prometheus api every single minute (via telegraf) I get a bad_alloc exception (within minutes or hours). Maybe a race condition with other api calls (Web or MQTT).

10:56:47.712 >   #6  0x40198352:0x3ffde280 in operator new(unsigned int) at /Users/brnomac003/.gitlab-runner/builds/qR2TxTby/0/idf/crosstool-NG/.build/xtensa-esp32-elf/src/gcc/libstdc++-v3/libsupc++/new_op.cc:54
10:56:47.712 >   #7  0x40197c09:0x3ffde2a0 in operator new[](unsigned int) at /Users/brnomac003/.gitlab-runner/builds/qR2TxTby/0/idf/crosstool-NG/.build/xtensa-esp32-elf/src/gcc/libstdc++-v3/libsupc++/new_opv.cc:32
10:56:47.712 >   #8  0x4010ea35:0x3ffde2c0 in cbuf::cbuf(unsigned int) at /Users/herbe/.platformio/packages/framework-arduinoespressif32/cores/esp32/cbuf.cpp:24
10:56:47.712 >   #9  0x40100915:0x3ffde2e0 in AsyncResponseStream::AsyncResponseStream(String const&, unsigned int) at .pio/libdeps/d1 mini esp32/ESP Async WebServer/src/WebResponses.cpp:680
10:56:47.712 >   #10 0x400fe355:0x3ffde320 in AsyncWebServerRequest::beginResponseStream(String const&, unsigned int) at .pio/libdeps/d1 mini esp32/ESP Async WebServer/src/WebRequest.cpp:777
10:56:47.712 >   #11 0x400f19cf:0x3ffde340 in WebApiPrometheusClass::onPrometheusMetricsGet(AsyncWebServerRequest*) at src/WebApi_prometheus.cpp:26

All my crashes point to the prometheus api. Using google I found that request->beginResponseStream has some issues with memory and low free heap size conditions. I don't know if these are temporarily issues.

Actual I try this. Still waiting for the exception:

void WebApiPrometheusClass::onPrometheusMetricsGet(AsyncWebServerRequest* request)
{   
    try {
        auto stream = request->beginResponseStream("text/plain; charset=utf-8", 40960);
  
        ...
        ...
        stream->addHeader(F("Cache-Control"), F("no-cache"));
        request->send(stream);
    }
    catch (std::bad_alloc& bad_alloc) {
        MessageOutput.printf("Call to /api/prometheus/metrics temporarely out of resources. Reason: \"%s\".", bad_alloc.what());
        
        auto response = request->beginResponse(429, "text/plain", "Too Many Requests");
        response->addHeader("Retry-After", "60");
        request->send(response);
    }
}

If this will solve the issue I will prepare a PR.

@helgeerbe
Copy link
Contributor Author

Just saw in the logs twice

16:27:25.059 > Call to /api/prometheus/metrics temporarely out of resources. Reason: "std::bad_alloc".

Next call to prometheus api is successful again. So it seems really temporary issues.

Same (bad_alloc) happened once for the liveview api request. I catch now the exception and close the socket. Wait now, that this will happen again.

I'm just wondering, if I'm the only one who is polling permanently the prometheus api, and hast opened the liveview in parallel?

@helgeerbe
Copy link
Contributor Author

helgeerbe commented Feb 23, 2023

Call to prometheus api and liveview api request 40kb each. Asyncwebserver seems to allocate further memory. With a bad timing no heap is available.

Solution is:

  • send http 429 - Too Many Request for prometheus call or
  • just not sending liveview data in the actual loop call.

@tbnobody
Copy link
Owner

An exception handler for out of memory situations is now implemented. At least it will not crash anymore

Copy link

github-actions bot commented Apr 6, 2024

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new discussion or issue for related concerns.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 6, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants