[BUG] too many connections #125

rudy2steiner · 2019-01-10T12:58:36Z

Describe the bug

I try to do a benchmark test on confluo as a pub/sub system, and have default conf.
100 proudcer and enough Memory(more than 100G), with a single partition, duration 5min
but server crashed after two minutes and notice two strange things as follow:

ERROR: signal 11 and following with ERROR: Could not start server listening on 0.0.0.0:60088: pthread_create failed

ERROR: signal 11
ERROR: signal 11
ERROR: signal 11
ERROR: signal 11
ERROR: signal 11
ERROR: signal 11
ERROR: signal 11
ERROR: signal 11
ERROR: signal 11
2019-01-10 20:00:01 ERROR: Could not start server listening on 0.0.0.0:60088: pthread_create failed
ERROR: signal 11
ERROR: signal 11
ERROR: signal 11
ERROR: signal 11
ERROR: signal 11
ERROR: signal 11
ERROR: signal 11
ERROR: signal 11

too many connections, far more than 100 from single ip:

PeerHost: 10.190.90.32
PeerAddress: 10.190.90.32
SocketInfo: <Host: 10.190.90.32 Port: 20957>
PeerHost: 10.190.90.32
PeerAddress: 10.190.90.32
SocketInfo: <Host: 10.190.90.32 Port: 20958>
PeerHost: 10.190.90.32
PeerAddress: 10.190.90.32
SocketInfo: <Host: 10.190.90.32 Port: 20959>
PeerHost: 10.190.90.32
PeerAddress: 10.190.90.32
SocketInfo: <Host: 10.190.90.32 Port: 20960>
PeerHost: 10.190.90.32
PeerAddress: 10.190.90.32
SocketInfo: <Host: 10.190.90.32 Port: 20961>
PeerHost: 10.190.90.32
PeerAddress: 10.190.90.32
SocketInfo: <Host: 10.190.90.32 Port: 20962>
PeerHost: 10.190.90.32
PeerAddress: 10.190.90.32
SocketInfo: <Host: 10.190.90.32 Port: 20963>
PeerHost: 10.190.90.32
PeerAddress: 10.190.90.32
SocketInfo: <Host: 10.190.90.32 Port: 20964>
PeerHost: 10.190.90.32
PeerAddress: 10.190.90.32
SocketInfo: <Host: 10.190.90.32 Port: 20965>
PeerHost: 10.190.90.32
PeerAddress: 10.190.90.32
SocketInfo: <Host: 10.190.90.32 Port: 20966>
PeerHost: 10.190.90.32
PeerAddress: 10.190.90.32
SocketInfo: <Host: 10.190.90.32 Port: 20967>
PeerHost: 10.190.90.32
PeerAddress: 10.190.90.32
SocketInfo: <Host: 10.190.90.32 Port: 20968>
PeerHost: 10.190.90.32
PeerAddress: 10.190.90.32
SocketInfo: <Host: 10.190.90.32 Port: 20969>
PeerHost: 10.190.90.32
PeerAddress: 10.190.90.32
SocketInfo: <Host: 10.190.90.32 Port: 20970>
PeerHost: 10.190.90.32
PeerAddress: 10.190.90.32
SocketInfo: <Host: 10.190.90.32 Port: 20971>
PeerHost: 10.190.90.32
PeerAddress: 10.190.90.32
SocketInfo: <Host: 10.190.90.32 Port: 20972>
PeerHost: 10.190.90.32
PeerAddress: 10.190.90.32
SocketInfo: <Host: 10.190.90.32 Port: 20973>
PeerHost: 10.190.90.32
PeerAddress: 10.190.90.32
SocketInfo: <Host: 10.190.90.32 Port: 20974>
PeerHost: 10.190.90.32
PeerAddress: 10.190.90.32
SocketInfo: <Host: 10.190.90.32 Port: 20975>
PeerHost: 10.190.90.32
PeerAddress: 10.190.90.32
SocketInfo: <Host: 10.190.90.32 Port: 20976>
PeerHost: 10.190.90.32
PeerAddress: 10.190.90.32
--------------------
[root@A02-R05-I143-108-BM9PLP2 confluo]# cat log/confluo.stderr |grep '10.190.90.32'|wc -l
2607

How to reproduce the bug?

https://github.com/rudy2steiner/confluo/blob/benchmark/javaclient/src/main/java/confluo/streaming/ConfluoProducer.java

Expected behavior

i thought rpc has a long connection with confluo server,will be use in the same producer until end,so connection should equal or about

Platform Details

i run confluo java client on mac

any one can help me?

anuragkh · 2019-01-10T22:07:15Z

Hi @rudy2steiner. Thanks for your interest in Confluo!

Can you provide more concrete steps to reproduce the bug? E.g., what were the exact steps you took to run Confluo, and your client program to trigger the issue you have outlined above?

rudy2steiner · 2019-01-11T02:14:32Z

sure, i will provide concrete steps to reproduce the bug @anuragkh

anuragkh · 2019-01-15T03:06:24Z

@rudy2steiner Checking back on this.

rudy2steiner · 2019-01-16T00:35:57Z

ok ,i will finish this in next few day

rudy2steiner · 2019-01-17T14:51:29Z

too many connections is not a bug, but caused by incorrect test.
in recent weeks, i try to reproduce the experiment, which mentioned in https://ucbrise.github.io/confluo/pub_sub/ , confluo as a in memory pub/sub system.
a java version client ,include ConfluoProducer(https://github.com/rudy2steiner/confluo/blob/benchmark/javaclient/src/main/java/confluo/streaming/ConfluoProducer.java) and Consumer,has been implemented, and it's easy to config concurrency and batch enable and et.

setting used in my experiment showed in below.and run on a 16 core(Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz),32threads ,256G server:

concurrency:1024, make sure producer numbers not more than concurrency, oterwise will lead exeception
1kb message body
use default setting for the remaining

i got some preliminary result(produce duration 3 minutes ), as follow:

1 producer finished,total msg:5781958, elapsed:180000 ms, qps:32121/s
16 producer finished,total msg:52561580, elapsed:180000 ms, qps:292008/s
28 producer finished,total msg:86468192, elapsed:180000 ms, qps:480378/
28 producer finished,total msg:25054401, elapsed:180000 ms, qps:139191/s( continue to produce in next 3 minutes)
32 producer finished,total msg:51053605, elapsed:180000 ms, qps:283631/s

i notice that 28 producer(concurrency) can get the most optimization QPS, and top command show as below(CPU info show in bold). since we produce 86468192*1Kb= 86G message in memory, the performance down to 139191/s( ?) if we continue to produce (a new produce task as before):

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
26571 root 20 0 96.744g 0.091t 2648 S 1940 37.0 51:27.72 confluod
26584 root 20 0 6701316 391428 10912 S 872.9 0.1 33:21.51 java

if we continue to increase producer up to 32, the QPS only equal to 16 producers, because of the cpu overload

we can do more case test based on the benchmark branch( https://github.com/rudy2steiner/confluo/blob/benchmark)

rudy2steiner · 2019-01-17T15:44:09Z

@anuragkh hi there

anuragkh · 2019-01-19T05:37:43Z

Hey @rudy2steiner, thanks for sharing your findings. I suspect you will be able to achieve higher qps using batching.

Would you be interested in adding in your implementation for the Producer/Consumer to Confluo by submitting a PR? You would need to clean up the implementation and add documentation, but I would be happy to review your code if you submit a PR.

rudy2steiner · 2019-01-19T15:33:55Z

it's my pleasure to submit a PR,i will try

anuragkh · 2019-01-29T23:42:54Z

Hi @rudy2steiner, things should improve with #136. Let me know if you are able to confirm this!

anuragkh · 2019-06-18T19:59:40Z

Closing this due to lack of activity. #136 improves how Confluo handles multiple client connections, and should remove this issue.

rudy2steiner changed the title ~~[BUG] two many connection s~~ [BUG] too many connections Jan 10, 2019

rudy2steiner mentioned this issue Jan 10, 2019

[FEATURE] can confluo provide one detailed testing example that we can compare with Kafka #123

Open

rudy2steiner closed this as completed Jan 17, 2019

anuragkh reopened this Jan 19, 2019

rudy2steiner mentioned this issue Jan 27, 2019

Java pub-sub Implementation #135

Open

anuragkh closed this as completed Jun 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] too many connections #125

[BUG] too many connections #125

rudy2steiner commented Jan 10, 2019

anuragkh commented Jan 10, 2019

rudy2steiner commented Jan 11, 2019

anuragkh commented Jan 15, 2019

rudy2steiner commented Jan 16, 2019

rudy2steiner commented Jan 17, 2019 •

edited

Loading

rudy2steiner commented Jan 17, 2019

anuragkh commented Jan 19, 2019

rudy2steiner commented Jan 19, 2019

anuragkh commented Jan 29, 2019

anuragkh commented Jun 18, 2019

[BUG] too many connections #125

[BUG] too many connections #125

Comments

rudy2steiner commented Jan 10, 2019

Describe the bug

How to reproduce the bug?

Expected behavior

Platform Details

anuragkh commented Jan 10, 2019

rudy2steiner commented Jan 11, 2019

anuragkh commented Jan 15, 2019

rudy2steiner commented Jan 16, 2019

rudy2steiner commented Jan 17, 2019 • edited Loading

setting used in my experiment showed in below.and run on a 16 core(Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz),32threads ,256G server:

i got some preliminary result(produce duration 3 minutes ), as follow:

i notice that 28 producer(concurrency) can get the most optimization QPS, and top command show as below(CPU info show in bold). since we produce 86468192*1Kb= 86G message in memory, the performance down to 139191/s( ?) if we continue to produce (a new produce task as before):

if we continue to increase producer up to 32, the QPS only equal to 16 producers, because of the cpu overload

rudy2steiner commented Jan 17, 2019

anuragkh commented Jan 19, 2019

rudy2steiner commented Jan 19, 2019

anuragkh commented Jan 29, 2019

anuragkh commented Jun 18, 2019

rudy2steiner commented Jan 17, 2019 •

edited

Loading