Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

馃悶 [Bug]: the RMB peer losing connections periodically #923

Closed
muhamadazmy opened this issue Mar 7, 2024 · 3 comments
Closed

馃悶 [Bug]: the RMB peer losing connections periodically #923

muhamadazmy opened this issue Mar 7, 2024 · 3 comments
Assignees
Labels
type_bug Something isn't working
Milestone

Comments

@muhamadazmy
Copy link
Member

What happened?

The rmb peer in the sdk loses connection periodically. It reconnects again but we need to figure out what causes those disconnects (can be a mis-configured connection timeout or similar

which network/s did you face the problem on?

Dev

Twin ID/s

No response

Node ID/s

No response

Farm ID/s

No response

Contract ID/s

No response

Relevant log output

debug failed to read message error="read tcp 192.168.123.44:44778->176.9.62.68:443: use of closed network connection"
@muhamadazmy muhamadazmy added the type_bug Something isn't working label Mar 7, 2024
@Eslam-Nawara Eslam-Nawara added this to the 1.0.0 milestone Mar 7, 2024
@Eslam-Nawara
Copy link
Contributor

After investigation, I've found that rmb sends a ping over the connection every pingInterval = 20 Seconds
and wait for response for pongWait = 40 Seconds

If it didn't receive pong, it returns with error connection stalling, which then closes the local context and the websocket connection, so the reader function fails to read the message with the error specified in the issue and restart the connection.

failed to read message error="read tcp 192.168.123.44:44778->176.9.62.68:443: use of closed network connection

This is the scenario happened to me while investigating the error message, but still not sure why the pong message is not sent back in this case, so still investigating.

@Eslam-Nawara
Copy link
Contributor

Work Completed:

  • The problem is if there are a lot of requests sent to a process and there is always a new message is received by the process for 40 seconds then the process wouldn't have the chance to send ping in the first place, but still it will expect to have a pong every pongWait so it would report failure and restart the connection.
  • I updated the process to update the latest pong time whenever there's a new message received, as receiving a new message indicates that the connection is not stalling and now the connection is not restarting periodically.

@Omarabdul3ziz
Copy link
Contributor

verified by sending multiple requests to a node with the same connection the connection was held open until all the requests were finished, it takes about 2.5min

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type_bug Something isn't working
Projects
Status: Done
Development

No branches or pull requests

4 participants