Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: SDK for redundant usage of filter/lightpush #1463

Closed
3 of 4 tasks
fryorcraken opened this issue Aug 8, 2023 · 14 comments
Closed
3 of 4 tasks

feat: SDK for redundant usage of filter/lightpush #1463

fryorcraken opened this issue Aug 8, 2023 · 14 comments

Comments

@fryorcraken
Copy link
Collaborator

fryorcraken commented Aug 8, 2023

Planned start date:
Due date:

Summary

Implement a scoring or other mechanism to enable js-waku nodes to:

  1. Rely on random internet peers with minimal degradation of the experience
  2. Subsequently, save peers in local storage and use them upon start-up

Implementing (2) without (1) would mean that upon start up, a node would not connect to bootstrap (Waku fleet) peers but previously found peers. Such peers may not be reliable and could lead to a full degradation of the experience.
A js-waku needs to determine whether it can avoid using bootstrap peers.

Also note:

  1. Usage of bootstrap peer should still be done for store service until we have distributed service
  2. peers passed as static list should be considered as bootstrap peers

Acceptance Criteria

  • A js-waku node can use services (filter, light push) from several remote nodes at the same time: feat: lightpush & filter send requests to multiple peers #1779
  • Some (scoring) mechanism to enable a local peer to determine whether a remote peer is reliable enough to be used for filter and light push services
  • Save peers in local storage and use them upon next restart for filter and light push, if they are deemed reliable, to increase decentralization and reduce load of bootstrap fleet.

Notes

To ensure the API consumers does not receive duplicate messages when several nodes are used for filter, caching of message (MUID) will be necessary.

Tasks

RAID (Risks, Assumptions, Issues and Dependencies)

  • Depends on @waku-org/research to help/deliver the scoring/other logic.
@fryorcraken fryorcraken added track:restricted-run Restricted run track (Secure Messaging/Waku Product), e.g. filter, WebRTC milestone Tracks a subteam milestone E:2023-peer-mgmt labels Aug 8, 2023
@fryorcraken
Copy link
Collaborator Author

Some idea for a logic: #914 (comment)

@fryorcraken fryorcraken changed the title [Milestone] Peer Management: Scoring and Persistence [Milestone] Peer Management: Scoring, Redundancy and Persistence Aug 8, 2023
@weboko
Copy link
Collaborator

weboko commented Aug 10, 2023

@danisharora099 to check a way to understand how reliable a peer (scoring) is by using existing nwaku API (possibly libp2p's protocol)

@fryorcraken
Copy link
Collaborator Author

fryorcraken commented Aug 15, 2023

@danisharora099 Shall we add a latency check as part of this milestone where we select the peers with lowest latency.
May be we even have a logic that pings every new peer via PX and if a faster peer is found we start to use it (in addition to other peers).

Maybe latency can be part of some scoring mechanism? not sure

@jm-clius
Copy link

Great initiative to look at some of these questions, especially as it relates to filter usage!
Filter relies in many ways on the same building blocks as relay for its reliability, but in a modular, "pick your own tradeoffs" way:

  • redundancy (for relay in full message connections, for filter in subscriptions)
  • randomness (selecting random peers for connection/subscription, preferably with some peer cycling)
  • periodically checking that you received all messages against a cache (this doesn't really exist yet for filter, but you could imagine using occasional store queries to achieve something similar)

As such it will be helpful to provide a configurable "reliability" SDK on top of filter for projects without the scope to build these features from the ground up with filter.

  • A js-waku node can use services (filter, light push) from several remote nodes at the same time.

Indeed. For now I'd suggest just selecting random nodes in the network as filter/lightpush peers, with some redundancy factor built in.

  • Some (scoring) mechanism to enable a local peer to determine whether a remote peer is reliable enough to be used for filter and light push services

I wouldn't necessarily bring scoring into this. Relay/gossipsub, for example, simply choose to eventually disconnect from peers that provides less value than others (peer scoring may be too long-lived and complex if there's simply a temporary connectivity issue). You could for example have n filter subscriptions and periodically review if some peers have "missed" more messages than others and cycle those.

  • Save peers in local storage and use them upon next restart for filter and light push, if they are deemed reliable, to increase decentralization and reduce load of bootstrap fleet.

I wouldn't imagine that the DNS lookups, followed by initial peer-exchange should take very long. It's probably a good idea to cache some peers, but I would try to flush out that cache as soon as possible after a startup and replace each of these subscriptions with a new one to a random node. This is to prevent a node from always using the same peers and so being vulnerable to bias.

Note that @siphiuel has been doing similar work on filter for status-go, so definitely worth getting his input here. :)

@fryorcraken fryorcraken changed the title [Milestone] Peer Management: Scoring, Redundancy and Persistence [Epic] Peer Management: Scoring, Redundancy and Persistence Aug 24, 2023
@fryorcraken fryorcraken added epic Tracks a yearly team epic (only for waku-org/pm repo) and removed milestone Tracks a subteam milestone labels Aug 24, 2023
@fryorcraken fryorcraken changed the title [Epic] Peer Management: Scoring, Redundancy and Persistence feat: SDK for using filter/lightpush Sep 8, 2023
@fryorcraken fryorcraken added E:2.1: Production testing of existing protocols See https://github.com/waku-org/pm/issues/49 for details and removed E:2023-peer-mgmt labels Sep 8, 2023
@fryorcraken fryorcraken changed the title feat: SDK for using filter/lightpush feat: SDK for redundant usage of filter/lightpush Sep 21, 2023
@danisharora099 danisharora099 self-assigned this Oct 10, 2023
@danisharora099 danisharora099 removed their assignment Oct 11, 2023
@danisharora099 danisharora099 self-assigned this Oct 13, 2023
@danisharora099
Copy link
Collaborator

danisharora099 commented Oct 17, 2023

@jm-clius agree with your overall idea, thanks for the comment!

re:

randomness (selecting random peers for connection/subscription, preferably with some peer cycling)

we decided to use the peer with the lowest ping for this, with the aim of having fastest responses to protocol requests so not sure how useful randomness is in the context of js-waku
perhaps, the strategy can be to increase the score of the node with the lowest peer for js-waku cc @fryorcraken

@fryorcraken
Copy link
Collaborator Author

I'd suggest to follow @jm-clius 's recommendation here and not introduce scoring.
I think prioritizing nodes with lowest latency first makes sense.
Then, if nodes are unreliable, we can disconnect and use another node.

@danisharora099
Copy link
Collaborator

danisharora099 commented Oct 20, 2023

attributes that could contribute to defining "reliability":

  • remote peer should have relay enabled
  • latency
  • number of times a remote peer has dropped a connection with us
  • peers discovered through peer-exchange
    • this also includes deprioritizing local storage peers in favour of peer-exchange peers

rough implementation (needs improvement):
whenever a protocol request is initiated:

  1. get all the peers connected
  2. check that they support relay (prioritize these peers, for the remaining "seats" use other peers)
  3. sort them by their latencies & reliability gauged by their # of disconnections
  4. use the top N peers to send the protocol request
  5. observe these N peers,
    • if any of them prove to be "unreliable", ie, unable to process (?) our request, or sends a faulty response
    • deprioritize them, and cycle with a new peer

cc @waku-org/research @fryorcraken

@fryorcraken
Copy link
Collaborator Author

attributes that could contribute to defining "reliability":

* remote peer should have relay enabled

* latency

* number of times a remote peer has dropped a connection with us

* peers discovered through peer-exchange
  
  * this also includes deprioritizing local storage peers in favour of peer-exchange peers

IMO the most important criteria is missing from the list:

  • Push the same or more messages than other peers on filter subscription
  • does not return error when doing a filter request such as ping
  • does not return error on light push requests

@fryorcraken fryorcraken removed track:restricted-run Restricted run track (Secure Messaging/Waku Product), e.g. filter, WebRTC epic Tracks a yearly team epic (only for waku-org/pm repo) labels Oct 25, 2023
@fryorcraken fryorcraken mentioned this issue Oct 27, 2023
3 tasks
@danisharora099
Copy link
Collaborator

danisharora099 commented Jan 10, 2024

action plan:

  1. if cache does not exist on startup:
  • DNS lookup, Peer Exchange & connect to fastest peers
  • cache peers in local storage
  • periodically "review" peers for reliability & if a node is "unreliable", swap it out for a random node
    • update cache if necessary
  1. if cache exists on startup:
  • connect to the cached peers
  • once connections are established, flush out the cache & use to the new "fastest peers"
  • periodically "review" peers for reliability & if a node is "unreliable", swap it out for a random node
    • update cache if necessary

PRs:

The scope of unreliability can be tackled as a followup PR

cc @jm-clius @waku-org/js-waku-developers please let me know if you have thoughts

@fryorcraken
Copy link
Collaborator Author

3. if cache exists on startup:

* connect to the cached peers

* once connections are established, flush out the cache & use to the new "fastest peers"

What peers? do you mean you do DNS discovery and peer exchange?

* periodically "review" peers for reliability & if a node is "unreliable", swap it out for a random node
  
  * update cache if necessary

@danisharora099
Copy link
Collaborator

What peers? do you mean you do DNS discovery and peer exchange?

With "cache existing on startup" means the nodes that we were previously able to connect to healthily, and are stored in our local storage. We connect to them, run PX on them, find new peers and eventually remove them and add these new found peers so we don't keep reusing the same peers to connect to.

@danisharora099
Copy link
Collaborator

danisharora099 commented Mar 6, 2024

remainder:

@weboko
Copy link
Collaborator

weboko commented May 14, 2024

As the last working item in this issue is linked to Reliability milestone - I am closing this with decoupling peer scoring into #2017

@danisharora099
Copy link
Collaborator

Another working item from this issue: #2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

6 participants