Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System Hangs #36

Closed
sm2017 opened this issue Jan 4, 2018 · 9 comments
Closed

System Hangs #36

sm2017 opened this issue Jan 4, 2018 · 9 comments

Comments

@sm2017
Copy link

sm2017 commented Jan 4, 2018

I have 8 machine and run in each machine 10 gateway and 10 business worker

I have heavy loads , and have a bad problem

In each machine CPU usage of all gateways is same , when in a single machine , CPU usage of gateways reaches 100% Whole system (All users in all servers) hangs and not works , even when other machines CPU usage of gateways and business worker is 10-30%

Currently I'm using stream_select , I think when a business worker send message to gatway with 100% CPU usage , The PHP locked on the line and with timeout skip from the line ? Right? or not?

Where is problem? How to solve?

@walkor
Copy link
Owner

walkor commented Jan 8, 2018

You can use command strace -ttp pid to find out what is the problem.

@sm2017
Copy link
Author

sm2017 commented Jan 8, 2018

@walkor In heavy load how strace can be usable?

@sm2017
Copy link
Author

sm2017 commented Jan 8, 2018

I see in a business worker when CPU usage is 100% , strace -ttp pidOfBW log the following messages and stop logging until CPU usage goes down

09:07:53.573285 mmap(NULL, 2097152, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5736e00000
09:07:53.573466 madvise(0x7f5736e00000, 2097152, MADV_HUGEPAGE) = 0
09:08:00.839684 --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
09:08:00.839772 rt_sigreturn({mask=[]}) = 0

and when CPU usage is higher than 90% I see this message logs at very high rate

09:08:43.833425 poll([{fd=213, events=POLLIN|POLLOUT|POLLERR|POLLHUP}], 1, 3000) = 1 ([{fd=213, revents=POLLOUT}])
09:08:43.834045 getsockopt(213, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
09:08:43.834220 fcntl(213, F_SETFL, O_RDWR) = 0
09:08:43.834330 sendto(213, "\0\0\0m\312\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\0\0\0{\"se"..., 142, 0, NULL, 0) = 142
09:08:43.834512 socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 214
09:08:43.834594 fcntl(214, F_GETFL)     = 0x2 (flags O_RDWR)
09:08:43.834688 fcntl(214, F_SETFL, O_RDWR|O_NONBLOCK) = 0
09:08:43.834812 connect(214, {sa_family=AF_INET, sin_port=htons(2305), sin_addr=inet_addr("10.10.10.11")}, 16) = -1 EINPROGRESS (Operation now in progress)
09:08:43.835136 poll([{fd=214, events=POLLIN|POLLOUT|POLLERR|POLLHUP}], 1, 3000) = 1 ([{fd=214, revents=POLLOUT}])
09:08:43.835422 getsockopt(214, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
09:08:43.835494 fcntl(214, F_SETFL, O_RDWR) = 0
09:08:43.835541 sendto(214, "\0\0\0m\312\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\0\0\0{\"se"..., 142, 0, NULL, 0) = 142
09:08:43.835620 socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 215
09:08:43.835683 fcntl(215, F_GETFL)     = 0x2 (flags O_RDWR)
09:08:43.835855 fcntl(215, F_SETFL, O_RDWR|O_NONBLOCK) = 0
09:08:43.835956 connect(215, {sa_family=AF_INET, sin_port=htons(2306), sin_addr=inet_addr("10.10.10.11")}, 16) = -1 EINPROGRESS (Operation now in progress)
09:08:43.836049 poll([{fd=215, events=POLLIN|POLLOUT|POLLERR|POLLHUP}], 1, 3000) = 1 ([{fd=215, revents=POLLOUT}])
09:08:43.836344 getsockopt(215, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
09:08:43.836415 fcntl(215, F_SETFL, O_RDWR) = 0
09:08:43.836482 sendto(215, "\0\0\0m\312\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\0\0\0{\"se"..., 142, 0, NULL, 0) = 142
09:08:43.836586 socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 216
09:08:43.836794 fcntl(216, F_GETFL)     = 0x2 (flags O_RDWR)
09:08:43.836851 fcntl(216, F_SETFL, O_RDWR|O_NONBLOCK) = 0
09:08:43.836917 connect(216, {sa_family=AF_INET, sin_port=htons(2307), sin_addr=inet_addr("10.10.10.11")}, 16) = -1 EINPROGRESS (Operation now in progress)
09:08:43.837016 poll([{fd=216, events=POLLIN|POLLOUT|POLLERR|POLLHUP}], 1, 3000) = 1 ([{fd=216, revents=POLLOUT}])
09:08:43.837564 getsockopt(216, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
09:08:43.837621 fcntl(216, F_SETFL, O_RDWR) = 0
09:08:43.837714 sendto(216, "\0\0\0m\312\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\0\0\0{\"se"..., 142, 0, NULL, 0) = 142
09:08:43.837888 socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 217
09:08:43.837968 fcntl(217, F_GETFL)     = 0x2 (flags O_RDWR)
09:08:43.838023 fcntl(217, F_SETFL, O_RDWR|O_NONBLOCK) = 0
09:08:43.838066 connect(217, {sa_family=AF_INET, sin_port=htons(2309), sin_addr=inet_addr("10.10.10.11")}, 16) = -1 EINPROGRESS (Operation now in progress)
09:08:43.838181 poll([{fd=217, events=POLLIN|POLLOUT|POLLERR|POLLHUP}], 1, 3000) = 1 ([{fd=217, revents=POLLOUT}])
09:08:43.838547 getsockopt(217, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
09:08:43.838615 fcntl(217, F_SETFL, O_RDWR) = 0
09:08:43.838679 sendto(217, "\0\0\0m\312\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\0\0\0{\"se"..., 142, 0, NULL, 0) = 142
09:08:43.838802 socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 218
09:08:43.838886 fcntl(218, F_GETFL)     = 0x2 (flags O_RDWR)
09:08:43.838950 fcntl(218, F_SETFL, O_RDWR|O_NONBLOCK) = 0
09:08:43.838998 connect(218, {sa_family=AF_INET, sin_port=htons(2308), sin_addr=inet_addr("10.10.10.11")}, 16) = -1 EINPROGRESS (Operation now in progress)
09:08:43.839080 poll([{fd=218, events=POLLIN|POLLOUT|POLLERR|POLLHUP}], 1, 3000) = 1 ([{fd=218, revents=POLLOUT}])
09:08:43.839395 getsockopt(218, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
09:08:43.839497 fcntl(218, F_SETFL, O_RDWR) = 0
09:08:43.839589 sendto(218, "\0\0\0m\312\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\0\0\0{\"se"..., 142, 0, NULL, 0) = 142
09:08:43.839759 socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 219
09:08:43.839838 fcntl(219, F_GETFL)     = 0x2 (flags O_RDWR)
09:08:43.839933 fcntl(219, F_SETFL, O_RDWR|O_NONBLOCK) = 0
09:08:43.839997 connect(219, {sa_family=AF_INET, sin_port=htons(2310), sin_addr=inet_addr("10.10.10.11")}, 16) = -1 EINPROGRESS (Operation now in progress)
09:08:43.840060 poll([{fd=219, events=POLLIN|POLLOUT|POLLERR|POLLHUP}], 1, 3000) = 1 ([{fd=219, revents=POLLOUT}])
09:08:43.840359 getsockopt(219, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
09:08:43.840425 fcntl(219, F_SETFL, O_RDWR) = 0
09:08:43.840482 sendto(219, "\0\0\0m\312\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\0\0\0{\"se"..., 142, 0, NULL, 0) = 142
09:08:43.840573 socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 220
09:08:43.840640 fcntl(220, F_GETFL)     = 0x2 (flags O_RDWR)
09:08:43.840706 fcntl(220, F_SETFL, O_RDWR|O_NONBLOCK) = 0
09:08:43.840800 connect(220, {sa_family=AF_INET, sin_port=htons(2311), sin_addr=inet_addr("10.10.10.11")}, 16) = -1 EINPROGRESS (Operation now in progress)
09:08:43.840904 poll([{fd=220, events=POLLIN|POLLOUT|POLLERR|POLLHUP}], 1, 3000^Cstrace: Process 18682 detached

@walkor
Copy link
Owner

walkor commented Jan 10, 2018

It looks like a large number of interface calls caused the high load.
But I can not tell you which interface.
Reducing the call frequency can reduce the load.

@sm2017
Copy link
Author

sm2017 commented Jan 10, 2018

What is interface calls exactly??

@walkor
Copy link
Owner

walkor commented Jan 10, 2018

For examples Gateway::sendToAll Gateway::getAllClientSession Gateway::.... etc.

@sm2017
Copy link
Author

sm2017 commented Jan 10, 2018

@walkor How can I reduce interface calls?? my load is really high

@farwish
Copy link

farwish commented Jan 15, 2018

Maybe some endless loop happened in the code logic that wasted all of the cpu resource,this situation can use gdb -p processId to analyse it.

@sm2017
Copy link
Author

sm2017 commented Jan 15, 2018

@farwish I know that is is not infinitive loop . because CPU usage get down after a while

@walkor walkor closed this as completed Aug 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants