Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GossipSub Improvements blog post #130

Merged
merged 7 commits into from
Nov 6, 2023
Merged

GossipSub Improvements blog post #130

merged 7 commits into from
Nov 6, 2023

Conversation

ufarooqstatus
Copy link
Contributor

@ufarooqstatus ufarooqstatus commented Sep 27, 2023

Research blog post on P2P network scaling with GossipSub as a reference protocol

Copy link
Contributor

@kaiserd kaiserd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this post!
I will provide the rest of this feedback tomorrow.

(Many of the suggestions are sembr related, you can just accept them to integrate these. Do not forget to "git pull" after accepting these, to update your local copy.)

rlog/2023-09-27-gossipimprovements.mdx Outdated Show resolved Hide resolved
rlog/2023-09-27-gossipimprovements.mdx Outdated Show resolved Hide resolved
rlog/2023-09-27-gossipimprovements.mdx Outdated Show resolved Hide resolved
rlog/2023-09-27-gossipimprovements.mdx Outdated Show resolved Hide resolved
rlog/2023-09-27-gossipimprovements.mdx Outdated Show resolved Hide resolved
rlog/2023-09-27-gossipimprovements.mdx Outdated Show resolved Hide resolved
Most of the research on P2P networks provides simulation results assuming nodes with similar capabilities.
The aspect of dissimilar capabilities and resource-constrained nodes is less explored.

It is discussed in GOAL1 that overlay mesh results in better performance if $D_{avg} = \ln(N) + C$.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about gossipsub peer scoring in that context?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Peer scoring is a function of neighbors, not the node itself. It may help control D for a node (as a response) with some manipulation. It will at least require node-specific thresholds etc

At the same time, connecting high-bandwidth nodes through a low-bandwidth node undermines the network's performance.
Ideally, every node should contribute proportionally to the available resources. A better solution involves a two-phased operation:

1. Every node computes its available bandwidth and selects a node degree $D_i$ proportional to its available bandwidth. Different bandwidth estimation approaches are suggested in literature [4,5].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every node computes its available bandwidth and selects a node degree $D_i$ proportional to its available bandwidth

Is there a source where this has been proposed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in [3]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rlog/2023-09-27-gossipimprovements.mdx Outdated Show resolved Hide resolved
rlog/2023-09-27-gossipimprovements.mdx Outdated Show resolved Hide resolved
@ufarooqstatus ufarooqstatus changed the title sembr update GossipSub Improvements blog post Sep 27, 2023
Co-authored-by: kaiserd <1684595+kaiserd@users.noreply.github.com>
Copy link
Contributor

@kaiserd kaiserd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the rest of my feedback :)

However, problems like peer churn and in-network adversaries can be best alleviated through balanced redundant coverage.

# References
[1] D. Vyzovitis, Y. Napora, D. McCormick, D. Dias, and Y. Psaras, “Gossipsub: Attack-resilient message propagation in the filecoin and eth2. 0 networks,” arXiv preprint arXiv:2007.02754, 2020.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hyperlink in the article would be good :), as well as a hyperlink in the reference list. (This post is not meant for print.)

rlog/2023-09-27-gossipimprovements.mdx Outdated Show resolved Hide resolved
rlog/2023-09-27-gossipimprovements.mdx Outdated Show resolved Hide resolved
To further conform to the suggested mesh-degree average $D_{avg}$, every node tries achieving this number within its neighborhood, resulting in an overall similar mesh-degree average.

2. From the available local view, every node tries connecting peers with the lowest latency until $D$ connections are made.
We suggest referring to the peering solution discussed in GOAL5 to avoid partitioning.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this also from [3]? We should add a ref here, too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Readers might ask: Will this lead to starvation of high latency nodes? Will high latency nodes still receive the data?
E.g. spread over ocean cables etc...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generates a very interesting discussion. I believe this will create partitions (clusters), where the nearest pears communicate with each other (at very low latency). That is why, I suggested in Goal 5 that high-capacity peers (from each cluster) must create an additional overlay.

## GOAL3: Bandwidth Maximization
Redundant message transmissions are essential for handling adversaries/node failure. However, these transmissions result in traffic bursts, cramming many overlay links.
This not only adds to the network-wide message dissemination latency but a significant share of the network's bandwidth is wasted on (usually) unnecessary transmissions.
It is essential to explore solutions that can minimize the impact of redundant transmissions while assuring resilience against node failures.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minimize the impact of redundant transmissions

Is this about minimizing the impact of each individual redundant transmissions,
or is it about minimizing the number of redundant transmissions?
(I assume the latter, I'd make this explicit.)


## GOAL5: Scalability
P2P networks are inherently scalable because every incoming node brings in bandwidth and compute resources.
Under such arrangements, it is desirable to achieve IP-like scalability, as seen in Bittorrent.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by IP-like scalability?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the average message arrival rate is L bytes/sec, any node joining the network must have approximately L x D bandwidth available. As far as this holds, we can keep on adding new nodes. However, this increases network-wide message dissemination latency. Keeping the latency constant requires linking D to the network size!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be in the blog post. It is more expressive than IP-like scalability

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rlog/2023-09-27-gossipimprovements.mdx Outdated Show resolved Hide resolved
Most efforts for bringing scalability to the P2P systems have focused on curtailing redundant transmissions and flat overlay adjustments. Hierarchical overlay designs, on the other hand, are less explored.
Placing a logical structure in unstructured P2P systems can help scale the P2P networks.

We suggest using a hierarchical overlay inspired by the approaches [14-16]. An abstract operation of the suggested overlay design is provided below:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We suggest using a hierarchical overlay inspired by the approaches [14-16]. An abstract operation of the suggested overlay design is provided below:
[14-16] propose using a hierarchical overlay inspired by the approaches .
An abstract operation of the suggested overlay design is provided below:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should no say that we suggest using a hierarchical approach. In fact, we would like to avoid that if possible.
Let's simply list that as one possibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


We suggest using a hierarchical overlay inspired by the approaches [14-16]. An abstract operation of the suggested overlay design is provided below:

1. We cluster nodes based on locality, assuming that such peers will have lower intra-cluster latency and higher bandwidth.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not use "we" here, but a more neutral tone. This is one possibilty. Not what we propose (necessarily).
We would need more insights to suggest this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also applies to the rest of the "we" in the following.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3. Virtual nodes form a fully connected mesh to construct a hierarchical overlay.
Each virtual node is essentially a collection of super nodes; a link to any of the constituent super nodes represents a link to the virtual node.

4. We suggest using GossipSub for intra-cluster message dissemination and FloodSub for inter-cluster message dissemination.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, too, something like:

"A possible idea is.."

We do not suggest this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Co-authored-by: kaiserd <1684595+kaiserd@users.noreply.github.com>
Publishing through a D-regular overlay triggers approximately $N \times D$ transmissions.
Reducing $D$ reduces the redundant transmissions but compromises reachability and latency.

Sharing metadata through a K-regular overlay (where $K > D$) allows nodes to pull missing messages.
Copy link

@Menduist Menduist Oct 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, the K overlay is gossiping (since it happens at fixed intervals), the D overlay is flooding (or maybe another name, since we don't flood to everyone, not sure)

(so the description on line 32 isn't accurate, since it calls D the gossip)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GossipSub uses eager push (through overlay mesh) and lazy push (through IWANT messages).

The mesh degree $D_{Low} \leq D \leq D_{High}$ is crucial in deciding message dissemination latency.
A smaller value for $D$ results in higher latency due to increased rounds, whereas a higher $D$ reduces latency on the cost of increased redundancy.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whereas a higher $D$ reduces latency on the cost of increased redundancy.

should we say at the cost of increased bandwidth usage to be more explicit?
Also, this is only true up to a point where the congestion slows down propagation (but maybe we don't have to say that here)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


The mesh degree $D_{Low} \leq D \leq D_{High}$ is crucial in deciding message dissemination latency.
A smaller value for $D$ results in higher latency due to increased rounds, whereas a higher $D$ reduces latency on the cost of increased redundancy.
It is suggested that the average mesh degree should be $D_{avg} = \ln(N) + C$ for an optimal operation, where $N$ is the network size and C is a small constant.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is that coming from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from [3]

Copy link

@Menduist Menduist Oct 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only true if you want to keep the latency stable as the network grows, but that will cost more bandwidth

Most networks using GossipSub won't make that trade-off, and will use a fixed D (which means constant bandwidth, increasing latency as network grows)
Also, methods used to automatically estimate the network sizes will most likely be susceptible to attacks

Speaking about attacks, sybil resistance is a big part of choosing a D, since they are directly connected
Let's say you have D = 8, and then 75% of the network does an attack
Your effective D will become 8*0.25=2, until the network recovers (recovering speed depends on the scoring system)

All of that to say, choosing a D is a complex subject, and I don't think we can sum it up in one sentence like that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better solution involves a two-phased operation:


1. Every node computes its available bandwidth and selects a node degree $D_i$ proportional to its available bandwidth [3].
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would only be applicable in networks where participants have "bandwidth to spare"

Generally, home users don't want to spend all of their bandwidth to participate in some network (though maybe we could look at something like µTP)

And datacenter users tend to pay for their bandwidth, so even though they have a lot at hand, they might not want to "waste it"

However, I can see a benefit to lowering D locally if the node doesn't has enough bandwidth, as this would slow down the network either way

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ufarooqstatus did you add this? This should be reflected in the blog post.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose that we have three equal-length messages $x1, x2, x3$. Assuming an XOR coding function,
we know two trivial properties: $x1 \oplus x2 \oplus x2 = x1$ and $\vert x1 \vert = \vert x1 \oplus x2 \oplus x2 \vert$.

This implies that instead of sending messages individually, we can encode and transmit composite message(s) to the network.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also implies that the messages are coming from the same node, which is only one specific use case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a protocol can be devised (maybe by adding peer-IDs and message hash/ID-s) to indicate combined messages.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this improvment worth the complexity it adds? Maybe just add this question to mark this is just an option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can parallelize message transmission by partitioning large messages into smaller chunks, letting intermediate peers relay chunks as soon as they receive them.


## GOAL5: Scalability
Copy link

@Menduist Menduist Oct 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you want to scale here?
Number of participants? I'd argue that GossipSub already scales that with O(log(n)), so it's already quite scalable

Sure, we could aim for O(log log n), but with the current scalability properties, going from 10k nodes to 100 million node would ~ double the latency, which seems already quite reasonable
(100 million being apparently the number of BitTorrent users, seems like a high target already)

Scaling message sizes? That's interesting, and that's already described in goal 4
Scaling message count? That's interesting, and probably the next step of GS research, but that's already described in goal 3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the average message arrival rate is L bytes/sec, any node joining the network must have approximately L x D bandwidth available. As far as this holds, we can keep on adding new nodes. However, this increases network-wide message dissemination latency (at least log_D (N) hops needed for network-wide message dissemination). Now, per hop latency can vary between few milliseconds to a few hundred miliseconds. connecting nearby nodes means lowering average per hop latency. And there is usually an upper bound on network-wide message dissemination time in many applications.

Copy link
Contributor

@kaiserd kaiserd Oct 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add this to the blog post.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kaiserd kaiserd merged commit 8a95b0c into develop Nov 6, 2023
@kaiserd kaiserd deleted the rlogGossipSembr branch November 6, 2023 08:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants