Improve PeX stabilization time and partition-resistance. #7

lthibault · 2021-08-26T13:42:18Z

Given PeX's design goals, I'm inclined to err on the side of partition-resistance when choosing policies and parameters. With regards to view selection, we know that a pure rand policy implies the following trade-off:

Advantage: Partitions are slower to "forget" records from other partitions. This is helpful for repairing partitions and for reconnecting orphaned nodes. I think it is a significant advantage to be able to do this without a bootstrap service.
Disadvantage: The cluster will be slower to converge on a uniform distribution of records. I believe the rate of convergence under rand is linear, vs exponential with tail. This is not awful, but it puts constraints on max cluster size.

After reading [0] and [1], I think we might be able to do better (Kermarrec to the rescue 😝 )!

A central observation of these papers is that the view size does not impact the rate at which information propagates in an overlay, and therefore has no effect on convergence time. However, the fan-out factor f is highly determining of convergence rate. A value of f > log(n) where n is maximum size of the cluster is sufficient to deliver gossip to every peer with a very high probability¹.

We currently set a value of f=1, as described in Jelasity et al., so it seems plausible that we should be able to increase this value and combine it with a pure rand policy². In theory, this should give us the best of both worlds (though clearly we should measure this).

Practically speaking, it seems like this should be a simple matter of performing peer exchange with f peers during each gossip round.

@aratz-lasa Thoughts?

Footnotes

I'm unclear on the appropriate base for the logarithm. Is it log10? The natural log? Log of f? Do you know?
Do we still need to track hop if we're only using rand? 🤔

References

[0] Lightweight probabilistic broadcast
[1] Epidemic information dissemination in distributed systems

The text was updated successfully, but these errors were encountered:

lthibault · 2021-12-16T16:23:42Z

Closing, as this has been addressed by the updates to the PeX protocol.

See specs/pex.

lthibault added enhancement New feature or request question Further information is requested labels Aug 26, 2021

lthibault mentioned this issue Aug 26, 2021

Evaluate performance of PeX in a test setting #8

Closed

6 tasks

lthibault closed this as completed Dec 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve PeX stabilization time and partition-resistance. #7

Improve PeX stabilization time and partition-resistance. #7

lthibault commented Aug 26, 2021

lthibault commented Dec 16, 2021

Improve PeX stabilization time and partition-resistance. #7

Improve PeX stabilization time and partition-resistance. #7

Comments

lthibault commented Aug 26, 2021

Footnotes

References

lthibault commented Dec 16, 2021