-
-
Notifications
You must be signed in to change notification settings - Fork 507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
await producer.connect() hangs indefinitely without retrying on receiving an ECONNRESET from kafka #909
Comments
But now you might call resolve() twice: Once immediately, with still requests pending -- and then later again when the request queue is actually empty (which is what the Given that you debugged this though, it would be nice to see the full stack trace here, and I would really be interested in seeing which requests are inflight/pending. From reading through the connection logic again an error in connecting (or a timeout later) should lead to a call of diff --git a/src/network/connection.js b/src/network/connection.js
index eed871e..c1394ee 100644
--- a/src/network/connection.js
+++ b/src/network/connection.js
@@ -155,8 +155,8 @@ module.exports = class Connection {
})
this.logError(error.message, { stack: e.stack })
- await this.disconnect()
this.rejectRequests(error)
+ await this.disconnect()
reject(error)
}
@@ -167,8 +167,8 @@ module.exports = class Connection {
})
this.logError(error.message)
- await this.disconnect()
this.rejectRequests(error)
+ await this.disconnect()
reject(error)
}
(Untested, pretty sure that we need to touch the connection state and set it to "disconnecting" to avoid new requests getting queued) |
I was not suggesting my hack was a fix :) I was just testing things out to understand how it works. In regards to inflight requests, I had only called the In regards to a fulls tack trace, i had the logs on debug but only got what I showed you. I can reproduce it again and put some extra logging in your package if you want? |
I will look at this now; the solution is not to immediately resolve since this would release the client before all connections are depleted. |
@hughlivingstone I think the client is working as expected; it is retrying up to the number configured in the cluster, by default 5 times, and then giving up. I used the server script you provided. Here I set the logger to debug mode to see what's happening: You can see that the client retried the connection 5 times, gradually increasing the retry time between the attempts. |
@hughlivingstone I can see that this is an integration test, any change the environment is mocking something or changing the behavior of something we use underneath? I've seen some libraries modify the net code or even mock event emitters, etc. |
I believe it could be related to #918 |
Given that #944 ended up looking quite like my comment in #909 (comment) ... maybe that issue is now also closed? |
Describe the bug
I have found that my producer is not retrying a broker connection when it received an ECONNRESET error from kafka. The scenario here is that i am trying to connect a producer but the kafka server does a ECONRESET on the TCP connection.
This causes the
await producer.connect()
to hang indefinitely. I tracked the hang to the following place in kafakjs.kafkajs/src/network/requestQueue/index.js
kafkajs/src/network/requestQueue/index.js
Lines 222 to 237 in 02bc8a3
The
promise resolve()
is wrapped inside a function within the emitter so it is not being executed. I think if there was an event listener listening it would execute.I changed the code to resolve like the following and the producer kept trying to reconnect as expected.
In 1.12.0 the producer kept trying to connect as I expected.
To Reproduce
This happened in my CI environment where kafka responded with the ECONNRESET error. However I was able to reproduce a similar situation by starting up a node server with the following code to mimic a kafka broker resetting the connection. I then tried to connect the producer. The code below causes the server to reset the TCP connection.
Expected behavior
The producer should continue to try and reconnect instead of hanging on the await producer.connect() call like it does in 1.12.0
Observed behavior
The producer stopped trying to reconnect as soon as it received the ECONNRESET error from kafka. Prior to that it was trying to reconnect.
Environment:
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: