You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have python workers in a Docker Image A (kafka-python). There are 4 workers that connect to another Docker Image B (kafka-server) that is running kafka-server. If Docker Image B (kafka-server) goes down, the workers in Docker Image A go into an infinite loop for DNS lookup until Docker Image B (kafka-server) comes back online.
Here's a part of the log
2023-02-17 15:48:32,489 [WARNING] elasticsearch_worker_0 /usr/local/lib/python3.8/site-packages/kafka/client_async.py:331 - Node 1 connection failed -- refreshing metadata
2023-02-17 15:48:33,430 [WARNING] elasticsearch_worker_0 /usr/local/lib/python3.8/site-packages/kafka/conn.py:1527 - DNS lookup failed for kafka-server:19092, exception was [Errno -2] Name or service not known. Is your advertised.listeners (called advertised.host.name before Kafka 9) correct and resolvable?
2023-02-17 15:48:33,430 [ERROR] elasticsearch_worker_0 /usr/local/lib/python3.8/site-packages/kafka/conn.py:315 - DNS lookup failed for kafka-server:19092 (AddressFamily.AF_UNSPEC)
2023-02-17 15:48:34,323 [WARNING] elasticsearch_worker_0 /usr/local/lib/python3.8/site-packages/kafka/conn.py:1527 - DNS lookup failed for kafka-server:19092, exception was [Errno -2] Name or service not known. Is your advertised.listeners (called advertised.host.name before Kafka 9) correct and resolvable?
2023-02-17 15:48:34,323 [ERROR] elasticsearch_worker_0 /usr/local/lib/python3.8/site-packages/kafka/conn.py:315 - DNS lookup failed for kafka-server:19092 (AddressFamily.AF_UNSPEC)
2023-02-17 15:48:35,110 [WARNING] elasticsearch_worker_0 /usr/local/lib/python3.8/site-packages/kafka/conn.py:1527 - DNS lookup failed for kafka-server:19092, exception was [Errno -2] Name or service not known. Is your advertised.listeners (called advertised.host.name before Kafka 9) correct and resolvable?
2023-02-17 15:48:35,110 [ERROR] elasticsearch_worker_0 /usr/local/lib/python3.8/site-packages/kafka/conn.py:315 - DNS lookup failed for kafka-server:19092 (AddressFamily.AF_UNSPEC)
2023-02-17 15:48:35,955 [WARNING] elasticsearch_worker_0 /usr/local/lib/python3.8/site-packages/kafka/conn.py:1527 - DNS lookup failed for kafka-server:19092, exception was [Errno -2] Name or service not known. Is your advertised.listeners (called advertised.host.name before Kafka 9) correct and resolvable?
2023-02-17 15:48:35,955 [ERROR] elasticsearch_worker_0 /usr/local/lib/python3.8/site-packages/kafka/conn.py:315 - DNS lookup failed for kafka-server:19092 (AddressFamily.AF_UNSPEC)
2023-02-17 15:48:36,795 [WARNING] elasticsearch_worker_0 /usr/local/lib/python3.8/site-packages/kafka/conn.py:1527 - DNS lookup failed for kafka-server:19092, exception was [Errno -2] Name or service not known. Is your advertised.listeners (called advertised.host.name before Kafka 9) correct and resolvable?
2023-02-17 15:48:36,795 [ERROR] elasticsearch_worker_0 /usr/local/lib/python3.8/site-packages/kafka/conn.py:315 - DNS lookup failed for kafka-server:19092 (AddressFamily.AF_UNSPEC)
When Docker Image B (kafka-server) comes back online, the workers will reconnect. But because of timeouts, only one worker will connect and it causes the kafka-server to start the topic with 1 partition instead of the 4 partitions which is what is expected.
It would be nice for the workers to actual fall off trying to connect and return execution to the main loop so I can handle the even when Docker Image B (kafka-server) goes offline.
What I've been seeing is when kafka-server comes back online, 1 worker will reconnect, 2 will connect but not be assigned a partition, and 1 will get a wakeup socket error https://github.com/dpkp/kafka-python/blob/4d598055dab7da99e41bfcceffa8462b32931cdd/kafka/client_async.py#L937
Also, random comment, this line should have a return value but is just an empty return. https://github.com/dpkp/kafka-python/blob/4d598055dab7da99e41bfcceffa8462b32931cdd/kafka/conn.py#L323
I'm sure I'm missing some details but at least this will get a thread/conversation started about what I'm observing.
The text was updated successfully, but these errors were encountered:
I have python workers in a Docker Image A (kafka-python). There are 4 workers that connect to another Docker Image B (kafka-server) that is running kafka-server. If Docker Image B (kafka-server) goes down, the workers in Docker Image A go into an infinite loop for DNS lookup until Docker Image B (kafka-server) comes back online.
Here's a part of the log
When Docker Image B (kafka-server) comes back online, the workers will reconnect. But because of timeouts, only one worker will connect and it causes the kafka-server to start the topic with 1 partition instead of the 4 partitions which is what is expected.
It would be nice for the workers to actual fall off trying to connect and return execution to the main loop so I can handle the even when Docker Image B (kafka-server) goes offline.
What I've been seeing is when kafka-server comes back online, 1 worker will reconnect, 2 will connect but not be assigned a partition, and 1 will get a wakeup socket error
https://github.com/dpkp/kafka-python/blob/4d598055dab7da99e41bfcceffa8462b32931cdd/kafka/client_async.py#L937
versions
Also, random comment, this line should have a return value but is just an empty return.
https://github.com/dpkp/kafka-python/blob/4d598055dab7da99e41bfcceffa8462b32931cdd/kafka/conn.py#L323
I'm sure I'm missing some details but at least this will get a thread/conversation started about what I'm observing.
The text was updated successfully, but these errors were encountered: