Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem: segv on multi-broker (again) #47

Closed
hintjens opened this issue May 2, 2014 · 11 comments
Closed

Problem: segv on multi-broker (again) #47

hintjens opened this issue May 2, 2014 · 11 comments

Comments

@hintjens
Copy link
Member

hintjens commented May 2, 2014

open read pipe
open corresponding write pipe
close read pipe
close write pipe
open read pipe (same pipe name as initial open)

segv

@hintjens
Copy link
Member Author

hintjens commented May 2, 2014

I can't reproduce this. Can you get the animation output? Thanks.

@rpedde
Copy link
Contributor

rpedde commented May 2, 2014

exercise test looked like this:

read_pipe = zbroker.Zpipe('local|write_test')
write_pipe = zbroker.Zpipe('local2|>write_test')
read_pipe.close()
write_pipe.close()

read_pipe = zbroker.Zpipe('local|write_test')
write_pipe = zbroker.Zpipe('local2|>write_test')
read_pipe.close()
write_pipe.close()

Animation output:

14-05-02 10:17:23 I: joining cluster as 5907A26F11B11E8FD32288C1CEBD4774
14-05-02 10:17:23 N: starting zpipes_server service
14-05-02 10:17:23 N: binding zpipes service to 'ipc://@/zpipes/local'
14-05-02 10:17:23 I: ZPIPES server appeared at 2D3AD12281261F8F04045407688B9B47
14-05-02 10:17:27    366: start:
14-05-02 10:17:27    366:     INPUT
14-05-02 10:17:27    366:         $ lookup or create pipe
14-05-02 10:17:27    366:         $ open pipe reader
14-05-02 10:17:27    366:         > before reading
14-05-02 10:17:27    366: before reading:
14-05-02 10:17:27    366:     ok
14-05-02 10:17:27    366:         $ send INPUT_OK
14-05-02 10:17:27    366:         > reading
14-05-02 10:17:27    366: reading:
14-05-02 10:17:27    366:     CLOSE
14-05-02 10:17:27    366:         $ close pipe reader
14-05-02 10:17:27    366:         $ send CLOSE_OK
14-05-02 10:17:27    366:         > start
14-05-02 10:17:27    367: start:
14-05-02 10:17:27    367:     INPUT
14-05-02 10:17:27    367:         $ lookup or create pipe
14-05-02 10:17:27    367:         $ open pipe reader

Program received signal SIGSEGV, Segmentation fault.

backtrace:

0x00007ffff7bc5dcc in s_client_execute (self=0xffffffffffffffff, event=10)
    at zpipes_server_engine.h:521
521         self->next_event = event;
(gdb) bt
#0  0x00007ffff7bc5dcc in s_client_execute (self=0xffffffffffffffff, event=10)
    at zpipes_server_engine.h:521
#1  0x00007ffff7bc532c in engine_send_event (client=0xffffffffffffffff, 
    event=have_reader_event) at zpipes_server_engine.h:340
#2  0x00007ffff7bcb353 in pipe_attach_local_reader (self=0x7ffff0008600, 
    reader=0x7ffff0008910) at zpipes_server.c:220
#3  0x00007ffff7bcc44a in open_pipe_reader (self=0x7ffff0008910) at zpipes_server.c:481
#4  0x00007ffff7bc60ff in s_client_execute (self=0x7ffff0008910, event=2)
    at zpipes_server_engine.h:562
#5  0x00007ffff7bca95e in s_server_client_message (loop=0x7ffff00018b0, 
    item=0x7ffff0007810, argument=0x60ecb0) at zpipes_server_engine.h:1447
#6  0x00007ffff776dc6c in zloop_start (self=0x7ffff00018b0) at zloop.c:463
#7  0x00007ffff7bcaac7 in s_server_task (args=0x0, ctx=0x608ec0, pipe=0x6091f0)
    at zpipes_server_engine.h:1465
#8  0x00007ffff777915f in s_thread_shim (args=0x608e90) at zthread.c:81

This looks like a race as it isn't 100% reliable. On my hardware it segv 3 of 4 times. Works fine the other.

@rpedde
Copy link
Contributor

rpedde commented May 2, 2014

And sorry, yes, this is multi broker. Works fine single broker.

@hintjens
Copy link
Member Author

hintjens commented May 2, 2014

Is there any chance you're still running the version prior to the fix for #46?

@rpedde
Copy link
Contributor

rpedde commented May 2, 2014

yup, sure am.

@hintjens
Copy link
Member Author

hintjens commented May 2, 2014

Then it'll crash in exactly the same way :-)

Pretty much any cluster test will crash that code, since it was blithely sending events to an invalid pointer (-1). It escaped my initial testing just by luck.

@hintjens hintjens closed this as completed May 2, 2014
@rpedde
Copy link
Contributor

rpedde commented May 2, 2014

k. that's why I linked it to the other one, it looked like an identical backtrace.

@rpedde
Copy link
Contributor

rpedde commented May 2, 2014

Are you going to pr the fix?

@hintjens
Copy link
Member Author

hintjens commented May 2, 2014

Just for interest, how many brokers are you testing in a cluster now?

On Fri, May 2, 2014 at 5:45 PM, Ron Pedde notifications@github.com wrote:

k. that's why I linked it to the other one, it looked like an identical
backtrace.


Reply to this email directly or view it on GitHubhttps://github.com//issues/47#issuecomment-42045893
.

@hintjens
Copy link
Member Author

hintjens commented May 2, 2014

Are you going to pr the fix?

Oh, sorry... I didn't realize I'd forgotten to do that.

@rpedde
Copy link
Contributor

rpedde commented May 2, 2014

Right now in my test cluster, I'm only running 5 brokers, and my litmus test is a topology with 1000 mappers to one reducer. My larger cluster will be 20 brokers, but with that, I hope to be able to do a topology with as many as 70K edges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants