New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

GossipSub Improvements blog post #130

Merged

kaiserd merged 7 commits into develop from rlogGossipSembr

Nov 6, 2023

Contributor

ufarooqstatus commented Sep 27, 2023 •

edited

Research blog post on P2P network scaling with GossipSub as a reference protocol


          sembr update

916d6b2

kaiserd reviewed

View reviewed changes

Contributor

kaiserd left a comment

Thank you for this post!
I will provide the rest of this feedback tomorrow.

(Many of the suggestions are sembr related, you can just accept them to integrate these. Do not forget to "git pull" after accepting these, to update your local copy.)

rlog/2023-09-27-gossipimprovements.mdx Outdated Show resolved Hide resolved

rlog/2023-09-27-gossipimprovements.mdx Outdated Show resolved Hide resolved

rlog/2023-09-27-gossipimprovements.mdx Outdated Show resolved Hide resolved

rlog/2023-09-27-gossipimprovements.mdx Outdated Show resolved Hide resolved

rlog/2023-09-27-gossipimprovements.mdx Outdated Show resolved Hide resolved

rlog/2023-09-27-gossipimprovements.mdx Outdated Show resolved Hide resolved

rlog/2023-09-27-gossipimprovements.mdx Outdated

+              Most of the research on P2P networks provides simulation results assuming nodes with similar capabilities.
+              The aspect of dissimilar capabilities and resource-constrained nodes is less explored.
+              It is discussed in GOAL1 that overlay mesh results in better performance if $D_{avg} = \ln(N) + C$.

Contributor

kaiserd Sep 27, 2023

What about gossipsub peer scoring in that context?

Contributor Author

ufarooqstatus Oct 1, 2023

Peer scoring is a function of neighbors, not the node itself. It may help control D for a node (as a response) with some manipulation. It will at least require node-specific thresholds etc

rlog/2023-09-27-gossipimprovements.mdx Outdated

+              At the same time, connecting high-bandwidth nodes through a low-bandwidth node undermines the network's performance.
+              Ideally, every node should contribute proportionally to the available resources. A better solution involves a two-phased operation:
+. Every node computes its available bandwidth and selects a node degree $D_i$ proportional to its available bandwidth. Different bandwidth estimation approaches are suggested in literature [4,5].

Contributor

kaiserd Sep 27, 2023

Every node computes its available bandwidth and selects a node degree $D_i$ proportional to its available bandwidth

Is there a source where this has been proposed?

Contributor Author

ufarooqstatus Sep 27, 2023

in [3]

Contributor Author

ufarooqstatus Oct 9, 2023

rlog/2023-09-27-gossipimprovements.mdx Outdated Show resolved Hide resolved

rlog/2023-09-27-gossipimprovements.mdx Outdated Show resolved Hide resolved

kaiserd mentioned this pull request

GossipSub Improvements log post #129

Closed

ufarooqstatus changed the title ~~sembr update~~ GossipSub Improvements blog post


          Apply suggestions from code review

df5719f

Co-authored-by: kaiserd <1684595+kaiserd@users.noreply.github.com>

kaiserd reviewed

View reviewed changes

Contributor

kaiserd left a comment

Here is the rest of my feedback :)

rlog/2023-09-27-gossipimprovements.mdx Outdated

+              However, problems like peer churn and in-network adversaries can be best alleviated through balanced redundant coverage.
+              # References
+              [1] D. Vyzovitis, Y. Napora, D. McCormick, D. Dias, and Y. Psaras, “Gossipsub: Attack-resilient message propagation in the filecoin and eth2. 0 networks,” arXiv preprint arXiv:2007.02754, 2020.

Contributor

kaiserd Sep 29, 2023

Hyperlink in the article would be good :), as well as a hyperlink in the reference list. (This post is not meant for print.)

rlog/2023-09-27-gossipimprovements.mdx Outdated Show resolved Hide resolved

rlog/2023-09-27-gossipimprovements.mdx Outdated Show resolved Hide resolved

rlog/2023-09-27-gossipimprovements.mdx Outdated

+              To further conform to the suggested mesh-degree average $D_{avg}$, every node tries achieving this number within its neighborhood, resulting in an overall similar mesh-degree average.
+. From the available local view, every node tries connecting peers with the lowest latency until $D$ connections are made.
+              We suggest referring to the peering solution discussed in GOAL5 to avoid partitioning.

Contributor

kaiserd Sep 29, 2023

Is this also from [3]? We should add a ref here, too.

Contributor

kaiserd Sep 29, 2023

Readers might ask: Will this lead to starvation of high latency nodes? Will high latency nodes still receive the data?
E.g. spread over ocean cables etc...

Contributor Author

ufarooqstatus Sep 29, 2023

This generates a very interesting discussion. I believe this will create partitions (clusters), where the nearest pears communicate with each other (at very low latency). That is why, I suggested in Goal 5 that high-capacity peers (from each cluster) must create an additional overlay.

rlog/2023-09-27-gossipimprovements.mdx Outdated

+              ## GOAL3: Bandwidth Maximization
+              Redundant message transmissions are essential for handling adversaries/node failure. However, these transmissions result in traffic bursts, cramming many overlay links.
+              This not only adds to the network-wide message dissemination latency but a significant share of the network's bandwidth is wasted on (usually) unnecessary transmissions.
+              It is essential to explore solutions that can minimize the impact of redundant transmissions while assuring resilience against node failures.

Contributor

kaiserd Sep 29, 2023

minimize the impact of redundant transmissions

Is this about minimizing the impact of each individual redundant transmissions,
or is it about minimizing the number of redundant transmissions?
(I assume the latter, I'd make this explicit.)

rlog/2023-09-27-gossipimprovements.mdx Outdated

+              ## GOAL5: Scalability
+              P2P networks are inherently scalable because every incoming node brings in bandwidth and compute resources.
+              Under such arrangements, it is desirable to achieve IP-like scalability, as seen in Bittorrent.

Contributor

kaiserd Sep 29, 2023

What do you mean by IP-like scalability?

Contributor Author

ufarooqstatus Oct 23, 2023

If the average message arrival rate is L bytes/sec, any node joining the network must have approximately L x D bandwidth available. As far as this holds, we can keep on adding new nodes. However, this increases network-wide message dissemination latency. Keeping the latency constant requires linking D to the network size!

Contributor

kaiserd Oct 30, 2023

This should be in the blog post. It is more expressive than IP-like scalability

Contributor Author

ufarooqstatus Oct 31, 2023

rlog/2023-09-27-gossipimprovements.mdx Outdated Show resolved Hide resolved

rlog/2023-09-27-gossipimprovements.mdx Outdated

+              Most efforts for bringing scalability to the P2P systems have focused on curtailing redundant transmissions and flat overlay adjustments. Hierarchical overlay designs, on the other hand, are less explored.
+              Placing a logical structure in unstructured P2P systems can help scale the P2P networks.
+              We suggest using a hierarchical overlay inspired by the approaches [14-16]. An abstract operation of the suggested overlay design is provided below:

Contributor

kaiserd Sep 29, 2023

Suggested change

      
            We suggest using a hierarchical overlay inspired by the approaches [14-16]. An abstract operation of the suggested overlay design is provided below:
          
            [14-16] propose using a hierarchical overlay inspired by the approaches .
          
            An abstract operation of the suggested overlay design is provided below:

Contributor

kaiserd Sep 29, 2023

We should no say that we suggest using a hierarchical approach. In fact, we would like to avoid that if possible.
Let's simply list that as one possibility.

Contributor Author

ufarooqstatus Oct 17, 2023

rlog/2023-09-27-gossipimprovements.mdx Outdated


		We suggest using a hierarchical overlay inspired by the approaches [14-16]. An abstract operation of the suggested overlay design is provided below:

		1. We cluster nodes based on locality, assuming that such peers will have lower intra-cluster latency and higher bandwidth.

Contributor

kaiserd Sep 29, 2023

We should not use "we" here, but a more neutral tone. This is one possibilty. Not what we propose (necessarily).
We would need more insights to suggest this.

Contributor

kaiserd Sep 29, 2023

Also applies to the rest of the "we" in the following.

Contributor Author

ufarooqstatus Oct 31, 2023

rlog/2023-09-27-gossipimprovements.mdx Outdated

+. Virtual nodes form a fully connected mesh to construct a hierarchical overlay.
+              Each virtual node is essentially a collection of super nodes; a link to any of the constituent super nodes represents a link to the virtual node.
+. We suggest using GossipSub for intra-cluster message dissemination and FloodSub for inter-cluster message dissemination.

Contributor

kaiserd Sep 29, 2023

Here, too, something like:

"A possible idea is.."

We do not suggest this.

Contributor Author

ufarooqstatus Oct 17, 2023


          Apply suggestions from code review

9cc8d99

Co-authored-by: kaiserd <1684595+kaiserd@users.noreply.github.com>

Menduist reviewed

View reviewed changes

rlog/2023-09-27-gossipimprovements.mdx

+              Publishing through a D-regular overlay triggers approximately $N \times D$ transmissions.
+              Reducing $D$ reduces the redundant transmissions but compromises reachability and latency.
+              Sharing metadata through a K-regular overlay (where $K > D$) allows nodes to pull missing messages.

Menduist Oct 2, 2023 •

edited

AFAIK, the K overlay is gossiping (since it happens at fixed intervals), the D overlay is flooding (or maybe another name, since we don't flood to everyone, not sure)

(so the description on line 32 isn't accurate, since it calls D the gossip)

Contributor Author

ufarooqstatus Oct 9, 2023

Menduist reviewed

View reviewed changes

rlog/2023-09-27-gossipimprovements.mdx Outdated

+              GossipSub uses eager push (through overlay mesh) and lazy push (through IWANT messages).
+              The mesh degree $D_{Low} \leq D \leq D_{High}$ is crucial in deciding message dissemination latency.
+              A smaller value for $D$ results in higher latency due to increased rounds, whereas a higher $D$ reduces latency on the cost of increased redundancy.

Menduist Oct 2, 2023

whereas a higher $D$ reduces latency on the cost of increased redundancy.

should we say at the cost of increased bandwidth usage to be more explicit?
Also, this is only true up to a point where the congestion slows down propagation (but maybe we don't have to say that here)

Contributor Author

ufarooqstatus Oct 9, 2023

Menduist reviewed

View reviewed changes

rlog/2023-09-27-gossipimprovements.mdx Outdated

+              The mesh degree $D_{Low} \leq D \leq D_{High}$ is crucial in deciding message dissemination latency.
+              A smaller value for $D$ results in higher latency due to increased rounds, whereas a higher $D$ reduces latency on the cost of increased redundancy.
+              It is suggested that the average mesh degree should be $D_{avg} = \ln(N) + C$ for an optimal operation, where $N$ is the network size and C is a small constant.

Menduist Oct 2, 2023

Where is that coming from?

Contributor Author

ufarooqstatus Oct 3, 2023

from [3]

Menduist Oct 3, 2023 •

edited

This only true if you want to keep the latency stable as the network grows, but that will cost more bandwidth

Most networks using GossipSub won't make that trade-off, and will use a fixed D (which means constant bandwidth, increasing latency as network grows)
Also, methods used to automatically estimate the network sizes will most likely be susceptible to attacks

Speaking about attacks, sybil resistance is a big part of choosing a D, since they are directly connected
Let's say you have D = 8, and then 75% of the network does an attack
Your effective D will become 8*0.25=2, until the network recovers (recovering speed depends on the scoring system)

All of that to say, choosing a D is a complex subject, and I don't think we can sum it up in one sentence like that

Contributor Author

ufarooqstatus Oct 9, 2023

Menduist reviewed

View reviewed changes

rlog/2023-09-27-gossipimprovements.mdx Outdated

		A better solution involves a two-phased operation:


		1. Every node computes its available bandwidth and selects a node degree $D_i$ proportional to its available bandwidth [3].

Menduist Oct 2, 2023

That would only be applicable in networks where participants have "bandwidth to spare"

Generally, home users don't want to spend all of their bandwidth to participate in some network (though maybe we could look at something like µTP)

And datacenter users tend to pay for their bandwidth, so even though they have a lot at hand, they might not want to "waste it"

However, I can see a benefit to lowering D locally if the node doesn't has enough bandwidth, as this would slow down the network either way

Contributor

kaiserd Oct 15, 2023

@ufarooqstatus did you add this? This should be reflected in the blog post.

Contributor Author

ufarooqstatus Oct 17, 2023

Menduist reviewed

View reviewed changes

rlog/2023-09-27-gossipimprovements.mdx

+              Suppose that we have three equal-length messages $x1, x2, x3$. Assuming an XOR coding function,
+              we know two trivial properties: $x1 \oplus x2 \oplus x2 = x1$ and $\vert x1 \vert = \vert x1 \oplus x2 \oplus x2 \vert$.
+              This implies that instead of sending messages individually, we can encode and transmit composite message(s) to the network.

Menduist Oct 2, 2023

This also implies that the messages are coming from the same node, which is only one specific use case

Contributor Author

ufarooqstatus Oct 9, 2023

I think a protocol can be devised (maybe by adding peer-IDs and message hash/ID-s) to indicate combined messages.

Contributor

kaiserd Oct 16, 2023

Is this improvment worth the complexity it adds? Maybe just add this question to mark this is just an option.

Contributor Author

ufarooqstatus Oct 17, 2023

Menduist reviewed

View reviewed changes

rlog/2023-09-27-gossipimprovements.mdx

		We can parallelize message transmission by partitioning large messages into smaller chunks, letting intermediate peers relay chunks as soon as they receive them.


		## GOAL5: Scalability

Menduist Oct 2, 2023 •

edited

What do you want to scale here?
Number of participants? I'd argue that GossipSub already scales that with O(log(n)), so it's already quite scalable

Sure, we could aim for O(log log n), but with the current scalability properties, going from 10k nodes to 100 million node would ~ double the latency, which seems already quite reasonable
(100 million being apparently the number of BitTorrent users, seems like a high target already)

Scaling message sizes? That's interesting, and that's already described in goal 4
Scaling message count? That's interesting, and probably the next step of GS research, but that's already described in goal 3

Contributor Author

ufarooqstatus Oct 23, 2023

If the average message arrival rate is L bytes/sec, any node joining the network must have approximately L x D bandwidth available. As far as this holds, we can keep on adding new nodes. However, this increases network-wide message dissemination latency (at least log_D (N) hops needed for network-wide message dissemination). Now, per hop latency can vary between few milliseconds to a few hundred miliseconds. connecting nearby nodes means lowering average per hop latency. And there is usually an upper bound on network-wide message dissemination time in many applications.

Contributor

kaiserd Oct 30, 2023 •

edited

Please add this to the blog post.

Contributor Author

ufarooqstatus Oct 31, 2023

ufarooqstatus and others added 4 commits

October 9, 2023 13:29


          PR-130 review changes

0d3dacb


          revision on feedbackv2

ba67fe6


          scalability elaborated

b0cf5b8


          minor formatting and adjustment before publishing

27a8b9d

Signed-off-by: ksr <kaiserd@users.noreply.github.com>

kaiserd merged commit 8a95b0c into develop

kaiserd deleted the rlogGossipSembr branch

November 6, 2023 08:50

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment