Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reusing too old connection (ECONNRESET) #172

Closed
holgerpieta opened this issue Apr 25, 2021 · 13 comments
Closed

Reusing too old connection (ECONNRESET) #172

holgerpieta opened this issue Apr 25, 2021 · 13 comments

Comments

@holgerpieta
Copy link

A while ago I noticed that messages were randomly not send out to Telegram, the error message is ECONNRESET.

I traced the problem with tcpdump and found the following sequence of events:

  1. Message shall be send.
  2. TCP connection is established.
  3. Message is send.
  4. Connection is not closed (probably to be reused).
  5. Nothing happens for minutes to hours.
  6. Next message shall be send.
  7. Bot tries to reuse connection from step 2.
  8. Telegram server thinks WTF? and resets the connection. Result is ECONNRESET.
  9. Sending of message failed.

So I guess the problem is that telegrambot tries to reuse a connection that may be hours old and which has been forgotten by the server for ages.
Either this needs to be changed to always (or at least if the old connection is too old) making a new connection or alternatively, sending must be retried at least one time after ECONNRESET.

@windkh
Copy link
Owner

windkh commented Apr 25, 2021

Hm that sounds reaaonable. Do you think that I should reconnect from time to time?

@holgerpieta
Copy link
Author

I just had a look at the code and realized that connection management is not actually done here but in the telegram-bot-api you're using. So we shouldn't actually try to solve it here, but instead I'll open an issue at the upstream repository.

If you want to work around it (which we should avoid as long as there is a chance it's fixed upstream), you may remember when the last message was sent and if it's been too long when the next message comes in, do a reconnect before sending.

@windkh
Copy link
Owner

windkh commented Apr 25, 2021

I fully agree. Do you have any idea how long „too long“ could be? Is it minutes, hours or days?

@holgerpieta
Copy link
Author

That's hard to say, because it depends on how long the Telegram servers will remember the connection. The default idle timeout for TCP connections is 300 seconds.

@windkh
Copy link
Owner

windkh commented Apr 25, 2021

Yes but 300seconds is way too low. Btw did you check if the header of a somgle message contains some sort of token or authentication id? Maybe that one needs to be renewed?
I will search in the API documentation if there is any hint about connnection lifetime

@holgerpieta
Copy link
Author

The payload is encrypted (TLS), so I do not know if there is some internal Telegram connection ID.
However there is the TCP connection sequencing which identifies which package belongs to which active connection.
From what I can see, there are up to three connection active at the same time. Probably one polling connection for each of the two bots I'm running and one if messages are being sent to the bots.
Usually something will happen a couple of times per second, matching the default polling frequency of 300 ms.

But: Whenever the connection reset happens, all three connections have stopped transmitting data for something like 10 minutes or longer, sometimes hours. So I guess something somewhere went wrong and the polling stopped. Then, after a while, the Telegram server closes the connection, because they idled for too long.
If the telegram-bot-api then tries to send a message using the hours old connection, the server doesn't know where to put that and sends the reset.
But that means that the bot wouldn't be receiving messages for a while, anyway.

So, if you want to improve things on your side, you need to find some way to test if polling is still working and if it didn't work for too long (e.g. 300 s), trigger a reconnect. I guess the sending message problem would be fixed by that then, too.

@holgerpieta
Copy link
Author

There is actually something about taking care about polling errors:
https://github.com/yagop/node-telegram-bot-api/blob/master/doc/usage.md#polling-errors

But I think you're doing that already, from what I can see. But I cannot see the appropriate warning in the logs, so I guess something it's quite right there.

@windkh
Copy link
Owner

windkh commented Apr 25, 2021

If the connection is reset, what is written to the log? Can you paste it here?

@holgerpieta
Copy link
Author

Yes I can. Unfortunately the answer is: ""
There is nothing in the logs, both when the polling stops as well as when sending a message fails, even though I have the Verbose Logging box ticked.
Though I have set Send errors to second output and that's the result when sending fails:

{ payload:
   { chatId: 'REDACTED',
     type: 'message',
     content: 'REDACTED',
     options:
      { chat_id: 'REDACTED',
        text: 'REDACTED' } },
  _msgid: 'f62df514.3b79e8',
  error:
   'Caught exception in sender node:\r\nError: EFATAL: Error: read ECONNRESET\r\nwhen processing message: \r\n{"payload":{"chatId":"REDACTED","type":"message","content":"REDACTED","options":{"chat_id":"REDACTED","text":"REDACTED"}},"_msgid":"f62df514.3b79e8"}'}

Not particularly helpful either.

@windkh
Copy link
Owner

windkh commented Apr 26, 2021

Ah ok, then I guess it would be best to intercept that error in one of the sender nodes ... the only problem would be that the message is lost as we detected the problem after the attempt was made. Maybe it would be best to find out if the connection still works before sending something...

@holgerpieta
Copy link
Author

I think the best approach would be to find out why we don't notice the lost connection seconds after if happens and instead just sit there disconnected until randomly a message arrives to be send. Probably if we reconnect when that happens, the problems of sending messages will go away by itself.
Looking at the callback function for polling errors, I think there actually are some bugs in there (e.g. I think error.message doesn't exist). but even if this is the case, we should notice that the function has been called either by having some warnings in the logs or some errors if we run into a bug. But there is nothing. So in the moment it's my guess that the telegram-bot-api somehow misses the lost connection and never actually triggers the polling_error message.
Which means that we probably couldn't do much here and better try to get the API fixed.

@windkh
Copy link
Owner

windkh commented Jul 31, 2021

Version 10.0.3 will recreate the polling class. I hope that this helps to reestablish the connection somehow. Please test it and let me know. You may reopen the issue if the problem still exists.

@holgerpieta
Copy link
Author

Hello again.
I cannot reopen the issue because you (and not me) closed it.
Anyway, unfortunately 10.0.4 doesn't fix it.
I think the underlying problem is still that telegram-bot-api doesn't emit the polling_error event in the first place. So we can try to fix as much as we want in the handler of that event, but it still won't work because it's never going to be called.
We have to get telegram-bot-api to fix the problem on their side.
Alternatively, here on this side the only thing we can do is test the connection (frequently) and reconnect if it doesn't work. I think we should be able to find a message we can send to the bot that has no visible impact on the user. Like getting the member list of the chat or something like that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants