-
-
Notifications
You must be signed in to change notification settings - Fork 495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash on polling web api every 5 seconds #611
Comments
OK, still have crashes, but more seldom. On liveview it takes a while until system reboots. I will setup a spare esp32 with original OpenDTU firmware and will stress the webapi through live view. Maybe I can catch some good logs |
Used original OpenDTU. Running my config without connected nRF24L01+ module. Did a permanent reload of liveview (manual clicking reload)
|
You are somehow crashing the TCP Stack (AsyncTCP). A such frequent reload interval was never be intended. Just don't do it. Your other Traces look like out of memory exceptions. This is due to the fact that the rendering of these outputs take a lot of memory and you maybe send a request before the previous one is finished completly |
Thanks for the hint. Pometheus is polled every 5 seconds and opened liveview in parallel (rate at 7 seconds). This is enough to crash the system. I will set prometheus pollrate to every 20 seconds. |
It seems to crash in the new operator. Maybe not out of heap but due to heap fragmentation it's not possible to allocate one block of memory. |
I make use of maps in my VeDirect interface. Maybe static array would be better. |
I reimplemented my VeDirect frame handler. Instead of maps I use static struct and fixed sized char arrays. Memory consumption is indeed better now. But this doesn't solve the problem with the promotheus api. I tested it over a couple of hours. Without accessing the prometheus api, everything is stable. When I poll on the prometheus api every single minute (via telegraf) I get a bad_alloc exception (within minutes or hours). Maybe a race condition with other api calls (Web or MQTT).
All my crashes point to the prometheus api. Using google I found that Actual I try this. Still waiting for the exception: void WebApiPrometheusClass::onPrometheusMetricsGet(AsyncWebServerRequest* request)
{
try {
auto stream = request->beginResponseStream("text/plain; charset=utf-8", 40960);
...
...
stream->addHeader(F("Cache-Control"), F("no-cache"));
request->send(stream);
}
catch (std::bad_alloc& bad_alloc) {
MessageOutput.printf("Call to /api/prometheus/metrics temporarely out of resources. Reason: \"%s\".", bad_alloc.what());
auto response = request->beginResponse(429, "text/plain", "Too Many Requests");
response->addHeader("Retry-After", "60");
request->send(response);
}
} If this will solve the issue I will prepare a PR. |
Just saw in the logs twice
Next call to prometheus api is successful again. So it seems really temporary issues. Same (bad_alloc) happened once for the liveview api request. I catch now the exception and close the socket. Wait now, that this will happen again. I'm just wondering, if I'm the only one who is polling permanently the prometheus api, and hast opened the liveview in parallel? |
Call to prometheus api and liveview api request 40kb each. Asyncwebserver seems to allocate further memory. With a bad timing no heap is available. Solution is:
|
An exception handler for out of memory situations is now implemented. At least it will not crash anymore |
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new discussion or issue for related concerns. |
What happened?
I use telegram to poll the prometheus api every 5 seconds. After a while OpenDTU reboots.
To Reproduce Bug
I use telegram to poll prometheus api every 5 seconds
Expected Behavior
Don't crash.
Install Method
Self-Compiled
What git-hash/version of OpenDTU?
09942e8
Relevant log/trace output
or
Anything else?
I don't use the original code. But due to the crashes, I installed the original OpenDTU and got crashes, without knowing why.
Logs are from my extension to OpenDTU
The text was updated successfully, but these errors were encountered: