distributed nodes that don't see each other #19

Closed
Licenser opened this Issue Jun 8, 2012 · 3 comments

Comments

Projects
None yet
2 participants

Licenser commented Jun 8, 2012

I noticed that I keep running into problems with distributed gproc from the moment where I fire up two nodes that don't see each other when gproc starts but then are supposed to join together with discovery set to all. I kind of have the feeling it's a known issue but I figured it won't hurt to document.

Example timeline:
n1 - boot
n1 - start gproc
n2 - boot
n2 - start gproc
n1 - net_adm:ping(n2)
-> not joining together propperly.

That kind of is a netsplit issue and it propably can't be resolved in a entirely way since it's not guaranteed that conflicts in the two registreis can be joined automatically but what really would be cool if there would be some kind of callback saying: Hey we've a (re)join from a split with side 1 and 2 so it'd be possible to work out the stuff if possible.

Owner

uwiger commented Jun 8, 2012

This is a problem with gen_leader and, consequently, with gproc.

I believe some of the gen_leaders out there, not least vagabond and garrett-smith and can handle netsplits fairly well, but you'd need to check with them directly for details. Gproc would need some callback in order to resynch, and some changes to the internal data structures to be able to know what to do in case of conflicts.

Owner

uwiger commented Jun 25, 2012

I have added gproc:bcast() and :wide_await() to make it a bit easier to have multiple instances of local gproc services cooperating in a loose way. There may be other similar functions that could be added. Beyond that, I have no immediate plans to work on the global gproc part right now (not from lack of interest - I simply don't have the time).

Thanks mate, that are great improvements, I'll see if I can work with them to make my service more stable :) and no worries I know the time problem, sadly I don't think I am yet up to helping out here :(

uwiger closed this May 24, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment