Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

was: enable WebSocket destroy on exit in was_init #361

Merged
merged 1 commit into from
Jan 17, 2024

Conversation

stintel
Copy link
Collaborator

@stintel stintel commented Jan 17, 2024

The WAS WebSocket client sometimes fails to reconnect after receiving a WEBSOCKET_EVENT_CLOSED event due to it not being able to create the WebSocket client task:

I (13:15:28.958) WILLOW/WAS: WebSocket closed
I (13:15:28.959) WILLOW/WAS: initializing WebSocket client (ws://was.dev.willow:8502/ws)
E (13:15:28.961) websocket_client: Error create websocket task
E (13:15:28.967) WILLOW/WAS: failed to start WebSocket client: ESP_FAIL

When this happens, Willow stops trying to reconnect to WAS, and the user has to reset or power-cycle their device. As this is terrible UX, this should be avoided.

The root cause is not freeing resources of the WebSocket client before we reinitialize it in init_was. This eventually causes us to run out of IRAM, causing the WebSocket client task creation to fail.

To fix this, enable WebSocket destroy on exit in was_init, rather than in was_deinit, so it is always enabled. This is needed to avoid leaking resources due to not calling esp_websocket_client_destroy before calling esp_websocket_client_init. We don't want to call
esp_websocket_client_destroy, as that has proven to be very prone to crashes in the past.

Similarly, we don't want to call esp_websocket_client_start when the WAS WebSocket client receives a WEBSOCKET_EVENT_CLOSED event. This would avoid the resource leak, but testing has shown this is also very prone to crashes.

See the following commits for extra information:
985375c ("was: drop esp_websocket_client_destroy()") 1f3d298 ("was: drop esp_websocket_client_destroy() completely")

The WAS WebSocket client sometimes fails to reconnect after receiving a
WEBSOCKET_EVENT_CLOSED event due to it not being able to create the
WebSocket client task:

  I (13:15:28.958) WILLOW/WAS: WebSocket closed
  I (13:15:28.959) WILLOW/WAS: initializing WebSocket client (ws://was.dev.willow:8502/ws)
  E (13:15:28.961) websocket_client: Error create websocket task
  E (13:15:28.967) WILLOW/WAS: failed to start WebSocket client: ESP_FAIL

When this happens, Willow stops trying to reconnect to WAS, and the user
has to reset or power-cycle their device. As this is terrible UX, this
should be avoided.

The root cause is not freeing resources of the WebSocket client before
we reinitialize it in init_was. This eventually causes us to run out of
IRAM, causing the WebSocket client task creation to fail.

To fix this, enable WebSocket destroy on exit in was_init, rather than
in was_deinit, so it is always enabled. This is needed to avoid leaking
resources due to not calling esp_websocket_client_destroy before calling
esp_websocket_client_init. We don't want to call
esp_websocket_client_destroy, as that has proven to be very prone to
crashes in the past.

Similarly, we don't want to call esp_websocket_client_start when the WAS
WebSocket client receives a WEBSOCKET_EVENT_CLOSED event. This would
avoid the resource leak, but testing has shown this is also very prone
to crashes.

See the following commits for extra information:
985375c ("was: drop esp_websocket_client_destroy()")
1f3d298 ("was: drop esp_websocket_client_destroy() completely")
@kristiankielhofner kristiankielhofner merged commit 203942d into main Jan 17, 2024
15 checks passed
@kristiankielhofner kristiankielhofner deleted the fix/was_close_ws branch January 17, 2024 13:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants