Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] too many connections #125

Closed
rudy2steiner opened this issue Jan 10, 2019 · 10 comments
Closed

[BUG] too many connections #125

rudy2steiner opened this issue Jan 10, 2019 · 10 comments

Comments

@rudy2steiner
Copy link

Describe the bug

I try to do a benchmark test on confluo as a pub/sub system, and have default conf.
100 proudcer and enough Memory(more than 100G), with a single partition, duration 5min
but server crashed after two minutes and notice two strange things as follow:

  1. ERROR: signal 11 and following with ERROR: Could not start server listening on 0.0.0.0:60088: pthread_create failed

ERROR: signal 11
ERROR: signal 11
ERROR: signal 11
ERROR: signal 11
ERROR: signal 11
ERROR: signal 11
ERROR: signal 11
ERROR: signal 11
ERROR: signal 11
2019-01-10 20:00:01 ERROR: Could not start server listening on 0.0.0.0:60088: pthread_create failed
ERROR: signal 11
ERROR: signal 11
ERROR: signal 11
ERROR: signal 11
ERROR: signal 11
ERROR: signal 11
ERROR: signal 11
ERROR: signal 11

  1. too many connections, far more than 100 from single ip:

PeerHost: 10.190.90.32
PeerAddress: 10.190.90.32
SocketInfo: <Host: 10.190.90.32 Port: 20957>
PeerHost: 10.190.90.32
PeerAddress: 10.190.90.32
SocketInfo: <Host: 10.190.90.32 Port: 20958>
PeerHost: 10.190.90.32
PeerAddress: 10.190.90.32
SocketInfo: <Host: 10.190.90.32 Port: 20959>
PeerHost: 10.190.90.32
PeerAddress: 10.190.90.32
SocketInfo: <Host: 10.190.90.32 Port: 20960>
PeerHost: 10.190.90.32
PeerAddress: 10.190.90.32
SocketInfo: <Host: 10.190.90.32 Port: 20961>
PeerHost: 10.190.90.32
PeerAddress: 10.190.90.32
SocketInfo: <Host: 10.190.90.32 Port: 20962>
PeerHost: 10.190.90.32
PeerAddress: 10.190.90.32
SocketInfo: <Host: 10.190.90.32 Port: 20963>
PeerHost: 10.190.90.32
PeerAddress: 10.190.90.32
SocketInfo: <Host: 10.190.90.32 Port: 20964>
PeerHost: 10.190.90.32
PeerAddress: 10.190.90.32
SocketInfo: <Host: 10.190.90.32 Port: 20965>
PeerHost: 10.190.90.32
PeerAddress: 10.190.90.32
SocketInfo: <Host: 10.190.90.32 Port: 20966>
PeerHost: 10.190.90.32
PeerAddress: 10.190.90.32
SocketInfo: <Host: 10.190.90.32 Port: 20967>
PeerHost: 10.190.90.32
PeerAddress: 10.190.90.32
SocketInfo: <Host: 10.190.90.32 Port: 20968>
PeerHost: 10.190.90.32
PeerAddress: 10.190.90.32
SocketInfo: <Host: 10.190.90.32 Port: 20969>
PeerHost: 10.190.90.32
PeerAddress: 10.190.90.32
SocketInfo: <Host: 10.190.90.32 Port: 20970>
PeerHost: 10.190.90.32
PeerAddress: 10.190.90.32
SocketInfo: <Host: 10.190.90.32 Port: 20971>
PeerHost: 10.190.90.32
PeerAddress: 10.190.90.32
SocketInfo: <Host: 10.190.90.32 Port: 20972>
PeerHost: 10.190.90.32
PeerAddress: 10.190.90.32
SocketInfo: <Host: 10.190.90.32 Port: 20973>
PeerHost: 10.190.90.32
PeerAddress: 10.190.90.32
SocketInfo: <Host: 10.190.90.32 Port: 20974>
PeerHost: 10.190.90.32
PeerAddress: 10.190.90.32
SocketInfo: <Host: 10.190.90.32 Port: 20975>
PeerHost: 10.190.90.32
PeerAddress: 10.190.90.32
SocketInfo: <Host: 10.190.90.32 Port: 20976>
PeerHost: 10.190.90.32
PeerAddress: 10.190.90.32
--------------------
[root@A02-R05-I143-108-BM9PLP2 confluo]# cat log/confluo.stderr |grep '10.190.90.32'|wc -l
2607

How to reproduce the bug?

https://github.com/rudy2steiner/confluo/blob/benchmark/javaclient/src/main/java/confluo/streaming/ConfluoProducer.java

Expected behavior

i thought rpc has a long connection with confluo server,will be use in the same producer until end,so connection should equal or about

Platform Details

i run confluo java client on mac

any one can help me?

@rudy2steiner rudy2steiner changed the title [BUG] two many connection s [BUG] too many connections Jan 10, 2019
@anuragkh
Copy link
Contributor

Hi @rudy2steiner. Thanks for your interest in Confluo!

Can you provide more concrete steps to reproduce the bug? E.g., what were the exact steps you took to run Confluo, and your client program to trigger the issue you have outlined above?

@rudy2steiner
Copy link
Author

sure, i will provide concrete steps to reproduce the bug @anuragkh

@anuragkh
Copy link
Contributor

@rudy2steiner Checking back on this.

@rudy2steiner
Copy link
Author

ok ,i will finish this in next few day

@rudy2steiner
Copy link
Author

rudy2steiner commented Jan 17, 2019

too many connections is not a bug, but caused by incorrect test.
in recent weeks, i try to reproduce the experiment, which mentioned in https://ucbrise.github.io/confluo/pub_sub/ , confluo as a in memory pub/sub system.
a java version client ,include ConfluoProducer(https://github.com/rudy2steiner/confluo/blob/benchmark/javaclient/src/main/java/confluo/streaming/ConfluoProducer.java) and Consumer,has been implemented, and it's easy to config concurrency and batch enable and et.

setting used in my experiment showed in below.and run on a 16 core(Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz),32threads ,256G server:
  1. concurrency:1024, make sure producer numbers not more than concurrency, oterwise will lead exeception
  2. 1kb message body
  3. use default setting for the remaining
i got some preliminary result(produce duration 3 minutes ), as follow:
  • 1 producer finished,total msg:5781958, elapsed:180000 ms, qps:32121/s
  • 16 producer finished,total msg:52561580, elapsed:180000 ms, qps:292008/s
  • 28 producer finished,total msg:86468192, elapsed:180000 ms, qps:480378/
    28 producer finished,total msg:25054401, elapsed:180000 ms, qps:139191/s( continue to produce in next 3 minutes)
  • 32 producer finished,total msg:51053605, elapsed:180000 ms, qps:283631/s
i notice that 28 producer(concurrency) can get the most optimization QPS, and top command show as below(CPU info show in bold). since we produce 86468192*1Kb= 86G message in memory, the performance down to 139191/s( ?) if we continue to produce (a new produce task as before):

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
26571 root 20 0 96.744g 0.091t 2648 S 1940 37.0 51:27.72 confluod
26584 root 20 0 6701316 391428 10912 S 872.9 0.1 33:21.51 java

if we continue to increase producer up to 32, the QPS only equal to 16 producers, because of the cpu overload

we can do more case test based on the benchmark branch( https://github.com/rudy2steiner/confluo/blob/benchmark)

@rudy2steiner
Copy link
Author

@anuragkh hi there

@anuragkh
Copy link
Contributor

Hey @rudy2steiner, thanks for sharing your findings. I suspect you will be able to achieve higher qps using batching.

Would you be interested in adding in your implementation for the Producer/Consumer to Confluo by submitting a PR? You would need to clean up the implementation and add documentation, but I would be happy to review your code if you submit a PR.

@anuragkh anuragkh reopened this Jan 19, 2019
@rudy2steiner
Copy link
Author

it's my pleasure to submit a PR,i will try

@anuragkh
Copy link
Contributor

Hi @rudy2steiner, things should improve with #136. Let me know if you are able to confirm this!

@anuragkh
Copy link
Contributor

Closing this due to lack of activity. #136 improves how Confluo handles multiple client connections, and should remove this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants