Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TWN Connectivity #74

Open
SionoiS opened this issue Dec 8, 2023 · 1 comment
Open

TWN Connectivity #74

SionoiS opened this issue Dec 8, 2023 · 1 comment
Assignees

Comments

@SionoiS
Copy link

SionoiS commented Dec 8, 2023

Intro

In what context do we need to find nodes on specific shards and feature set?

Let's imagine future Waku, used by many apps with each many users. In this context, light nodes still don't contribute and can be seen as clients only. The servers would be Waku nodes, some supporting all protocols in the family, other dedicated to Filter or Store only. This modularity makes it difficult to predict the architecture of apps built on top of Waku. I will detail 2 possible scenarios, one with general service providers the other without. Keep in mind that a mix of both is the most probable outcome.

Apps sub-network

For this kind of app, each nodes would share the same exclusive peer list and also expect clients (and light-nodes) to bootstrapping from them. Theses apps would self segregate from other apps for efficiency. In other words, the nodes serving the same app would always connect to each others for Filter, Light Push, Store, Sync and not take part in discovery to reduce overhead. Some nodes must connect to outside nodes for Relay and discovery but those would be special.

We should also consider the more niche kind of apps that aim for full decentralization. Apps relying on edge computing, stewarded by decentralized and anonymous organisation. The form that these apps would take are even more amorphous. Any combination of node and protocols in the family could work thanks to Waku's modularity.

In this context, nodes would not need to "find" other nodes, most interconnections would be predetermined by which community a node is part of.

A market of general service providers

Imagine AWS, Google or Infura but for Waku. A service provider would have a website to manage client payments and authentication. In this case, all the information needed to use the service would already be available. A more interesting case would be anonymous service providers. Smart contracts would replace payment systems and ZK access tokens would gate the service or maybe RLN could be used. The only missing piece would be a good way to search for them. As the network grows it gets harder and harder to find a suitable service peer.

Paths to improvements

Some numbers;

  • 4 Protocols (Store, Filter, Sync, Light Push) = 24 permutations
  • (TWN) On any of the 8 shards = 24^8 permutations
  • Each shard can support ~10K nodes maximum
  • Can we assume that each service provider has a valid RLN membership?

In addition to the information below, see this Vac blog post about the limit of our current discovery mechanism and what could be done.

Misc. notes

  • Our Filter and Light Push protocols require using multiple service nodes for redundancy. Knowing this, connecting nodes that support theses protocols together can improve discovery speed.
  • Overlay network based on virtual coordinates inherently abstract away real world distances and can return service nodes too far to be useful.
  • Potential attacks both known and unknown increases in tandem with the complexity of the system.

Discv5 Service Discovery

DISCv5: robust service discovery details how the advertisement system works and provide various analysis. The specification is actionable but not implemented anywhere.

Advertisers place ads randomly along the way towards the topic via the use of a topic table. This results in ad density increasing closer to the topic. For searchers, the chances to find an ad by walking towards the topic increases as the number of peers placing ads increases.

Nodes don't accept all ads, a ticket system is used to prevent attacks and maintain fairness between different topics as some will be more popular than others.

What should the ad be? I see two possibilities here, the first would be to advertise both shard and protocol. This method increases the number of unique ads to advertise and reduces the "tickets" per ads. This could leads to weakness against various attacks because of the lower number of "tickets" but reduce the number of queries required.

Another way would be to advertise protocols and shards separately. This would increase the number of "tickets" per ads and reduce the number of unique ads (1 per protocol + 1 per shard) but it would require 2 queries instead of one and to cross reference the results to find matching peers.

This system would be useful in case we want to track more features in the future, e.g. content topics, new protocols, etc...

Sub-DHTs

IPFS Composable DHT would allow apps to share a base DHT, specify and discover other peers based on each features they support. Delivery date is unknown but work is ongoing. Not implemented anywhere yet. We could also implement this concept without waiting for IPFS, it consist mostly of biasing our peer selection in favor of peers with similar features in our routing table. This solution has not been studied for possible attacks but since it consist of random walks, we can expect it to be resilient.

Race 2 queries on sub-DHTs, one for the (Waku) protocol the other for the shard. Finding the correct sub-DHT might be fast depending on the peers already known but then finding the peer that fulfill the second parameter would be a random walk. Random walks are more resilient to attacks than storing values with close peers (in hash space) and in this case the concentration of suitable peers is much higher (25% and 12.5%) which speeds up discovery. As soon as one peer matches the query, that peer has a high probability of already be connected with other peers with similar feature set. We can expect that finding one or many peers sharing a feature set to be equally fast.

If service providers are required to register an RLN membership we may be able to limit sybil attacks in our hypothetical DHT.

Meridian

Self described as, a light weight framework for performing network positioning. Meridian is an overlay network structured around latency, in contrast to Kademlia which is based on XOR distance between peers. No DHT is built on top of this overlay network, it's sole purpose is to answer queries about a node position in "internet" space. Why is this useful? By itself, it cannot be used to find specific peers but can be a solid foundation. By combining; gossip based discovery, one routing table per feature and service clustering then finding specific service nodes can be done efficiently. By virtue of being so light weight, bandwidth cost and state can be increased without becoming prohibitively expensive for node operators. Although it does not solve the problem in a generally applicable way it might be good enough for us.

Meridian is designed to find the closest peer possible which could reduce the latency of all our protocols.

Provider DB

An alternative could be to maintain a DB of all providers (a prolly tree based index maybe?) so that every node can keep their own curated "provider list" but sync with others for updates. The process would be to just ask peers randomly until a suitable service provider is found. Since you cannot control what peers store in their provider DB it's hard to estimate the performance of a query. On the other hand, the system is harder to attack since there's no structure like in a DHT. There is a risk of centralization but with easy replication and sync, it is minimal. There is a question about incentives, why would nodes store providers information?

@SionoiS SionoiS self-assigned this Dec 8, 2023
@jm-clius
Copy link
Contributor

jm-clius commented Jan 3, 2024

Apps would self segregate from other apps for efficiency

True, at least in terms of the service nodes they use and we specifically want to cater for this use case by making it easy for third party service providers to be contracted by apps for their exclusive service-provisioning. However, I think the advantage of of decentralized services will be such that many apps will use it, provided:

  1. it's easy
  2. it works reliably
  3. it's cheap to use

(1) we can achieve with proper service discovery (discv5 topics?) and good defaults (filter being provided by default is a good start). (2) is a factor of proper SDKs and best practice documentation (e.g. subscribing for redundant services). (3) we are working on, but presumably the market will decide what is reasonable here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants