Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve PeX stabilization time and partition-resistance. #7

Closed
lthibault opened this issue Aug 26, 2021 · 1 comment
Closed

Improve PeX stabilization time and partition-resistance. #7

lthibault opened this issue Aug 26, 2021 · 1 comment
Labels
enhancement New feature or request question Further information is requested

Comments

@lthibault
Copy link
Contributor

Given PeX's design goals, I'm inclined to err on the side of partition-resistance when choosing policies and parameters. With regards to view selection, we know that a pure rand policy implies the following trade-off:

  • Advantage: Partitions are slower to "forget" records from other partitions. This is helpful for repairing partitions and for reconnecting orphaned nodes. I think it is a significant advantage to be able to do this without a bootstrap service.
  • Disadvantage: The cluster will be slower to converge on a uniform distribution of records. I believe the rate of convergence under rand is linear, vs exponential with tail. This is not awful, but it puts constraints on max cluster size.

After reading [0] and [1], I think we might be able to do better (Kermarrec to the rescue 😝 )!

A central observation of these papers is that the view size does not impact the rate at which information propagates in an overlay, and therefore has no effect on convergence time. However, the fan-out factor f is highly determining of convergence rate. A value of f > log(n) where n is maximum size of the cluster is sufficient to deliver gossip to every peer with a very high probability1.

We currently set a value of f=1, as described in Jelasity et al., so it seems plausible that we should be able to increase this value and combine it with a pure rand policy2. In theory, this should give us the best of both worlds (though clearly we should measure this).

Practically speaking, it seems like this should be a simple matter of performing peer exchange with f peers during each gossip round.

@aratz-lasa Thoughts?

Footnotes

  1. I'm unclear on the appropriate base for the logarithm. Is it log10? The natural log? Log of f? Do you know?
  2. Do we still need to track hop if we're only using rand? 🤔

References

[0] Lightweight probabilistic broadcast
[1] Epidemic information dissemination in distributed systems

@lthibault lthibault added enhancement New feature or request question Further information is requested labels Aug 26, 2021
@lthibault
Copy link
Contributor Author

Closing, as this has been addressed by the updates to the PeX protocol.

See specs/pex.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant