Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Milestone] Composing Waku Protocols to Improve Reliability #114

Closed
7 tasks
chair28980 opened this issue Jan 31, 2024 · 21 comments
Closed
7 tasks

[Milestone] Composing Waku Protocols to Improve Reliability #114

chair28980 opened this issue Jan 31, 2024 · 21 comments
Assignees

Comments

@chair28980
Copy link
Contributor

chair28980 commented Jan 31, 2024

Milestone: https://github.com/waku-org/pm/milestone/7

NOTE: the work originally defined in this issue has been split between the following deliverables:

  1. [Deliverable] Reliability Protocol for Relay #184
  2. [Deliverable] Reliability Protocol for Resource-Restricted Clients #186

Epics

NOTE: Epics updated to fit within the above 2 Deliverables.

Summary

Deliverables:

Provide recommendation in a form of a library to compose the Waku protocols to minimize message loss at a protocol level. This is both for relay and non-relay node. The aim is to provide opinionated libraries that use Waku in a generalized manner. The library may not fit all use-case and can always be bypassed or forked.

Finally, the expectation is not 100% reliability but simply increased reliability for common scenarios. Application level solution for more certitude are covered as part of Minimum Viable Data Synchronization.

This is part of the general SDK strategy and expected to be implemented in each language:

  • JavaScript for Browser using js-waku
  • Rust/Golang/JavaScript for Node using nwaku bindings.

Note that these SDKs needs to be reliable enough so that any developer can build a PoC application and reach a fair level of reliability.
However, it is assumed that when a developer want to push an application to MVP level or further, deeper understanding of Waku is needed to fine tune Waku usage to the application.

Functionalities/User Stories

First attempt to define a functional scope for this milestone:

A. relay node

  1. As a developer, I want a simple API to process incoming messages on one or some content topics.
  2. As a developer, I do not care about shards or pubsub topics, I am only using autosharding
  3. As a developer, I do not want to process the same messages twice (e.g relay amplification should be abstracted)
  4. As a developer, I want a simple API to send messages with a set content topic
  5. As a developer, when sending messages I want fair reliability on whether the messages were propagated (e.g. using store sync)
  6. As a developer, I want some fair protocol level safeguard to ensure that missed messages are detected (e.g. using store sync)

B. non-relay node

  1. As a developer, I want a simple API to register hooks to process messages on one or several content topics
  2. As a developer, I do not care about shards or pubsub topics, I am only using autosharding
  3. As developer, I want fair reliability when receiving messages (e..g redundant filter nodes + filter ping + store sync)
  4. As a developer, I do not want the same incoming message to be processed several times (e.g. filter redundancy should be abstracted)
  5. As a developer, I want fair reliability when sending messages.

Implementation Notes

Relay client

  • Simple API to use Waku relay.
    • Minimize API
    • to receive messages could provide several APIs (callback based, event based, await/next based) but not necessary at first
    • When sending messages, could use store (with message hash) on a regular basis to confirm messages was captured by store node
    • when detecting disconnection, or regularly, could use store node to retrieve missed messages. Missed messages as then bubbled to the API (no difference for dev from message received live on relay)
    • Seen cache, hashed based, to ensure no dupe messages bubbled up to use

Non-relay client

Receiving messages:

  • simple API that hide the connection/subscription steps to filter node
  • automatically connect and subscrive to 2-3 filter nodes
  • seen cache to ensure dupe messages are not bubbled up
  • start with event based or callback api, several api types can be added later. Demand for an await/next api was done by silence labs
  • when subscribicing, starting running filter ping
  • do simple disconnection managenet:
    • if filter ping fails on node 1 but not node 2, disconnect from node 1 and replace
    • if filter ping fails on all node (disconnect), monitor ipfs ping for network liveness
    • once reconnected, proceed with store queries
  • Some simple QoS
    • if a node 1 forwards messages but not node 2, replace node 2
    • Can also run store hash message query every 5min or so to confirm no missed messages

Sending messages:

  • Connect to 2/3 ligh tpush node when sending messages
  • error mgmt:
    • if 1 node reject messages but not 2 other, then change node
    • if all node reject messages, fallback on ipfs ping to measure liveness, once online, re send messages
  • QoS: perform store hash msg query on a regular basis to confirm messages sent are captured by store.

js-waku vs NodeJS bindings.

API for this output for js-waku and NodeJS bindings should aim to be aligned as much as possible, even if js-waku is non-relay and NodeJS is relay node.

Dependencies

To deliver the reliability part, this would depends on store v3 protocol to get access of message hashes.
On research side, couple of weeks left on this (as of 19 Feb) and iteration on nwaku needed.

@chair28980 chair28980 added the Deliverable Tracks a Deliverable label Jan 31, 2024
@chair28980 chair28980 modified the milestone: Composing Waku Protocols to Improve Reliability Jan 31, 2024
@vpavlin
Copy link
Member

vpavlin commented Feb 5, 2024

I think the user stories sound good and pretty comprehensive.

My main issues stemmed from a few things so far

  1. Unreliable Filter - seemed like the filter nodes just stopped propagating messages to my app (even with ping on), so I had to resubscribe - but js-waku kept trying to resubscribe to the same node, which for some reason would not work. So having multiple filter nodes and js-waku (or the library) keeping track of this would save a lot of trouble

  2. Unrelaiable Lightpush - after a while I started getting "Remove peer fault" errors and the only thing I could do was to retry. It would be great if the library could just pick different LIghtpush node in case sending fails multiple times

  3. As my filter subscription was failing, I always missed a few messages, so I kept track of the last received message and tried to query store for anything from that timestamp till now.

I implemented all of these in https://github.com/vpavlin/waku-dispatcher/blob/main/src/dispatcher.ts, so it would be great if we can have them in an official library

@weboko
Copy link

weboko commented Feb 5, 2024

At this point non-relay node is more relevant to js-waku.

Given formulation I think our team can do following things:

  • start with implementing some e2e` test to cover such user flow (essentially transitioning the requirement from text to test);
  • similarly to this feat: Store reliability js-waku#1685 we should do lightPush and filter (the description will be almost the same: have couple of nodes per protocol to send / receive data, check responses etc);

In case of need we can keep an eye on what @vpavlin provided (and we chatted about month ago) - https://github.com/vpavlin/waku-dispatcher/blob/main/src/dispatcher.ts

@fryorcraken
Copy link
Contributor

Current proposal is to limit the bindings to relay node: #121
So indeed, non-relay would apply to js-waku and relay to nwaku.

@fryorcraken
Copy link
Contributor

@vpavlin I believe the problems you highlighted would be covered by the proposed user stories.

@fryorcraken
Copy link
Contributor

We may also want to add something about connection feedback. I saw some reccent improvement in js-waku here: waku-org/js-waku#1666

@vpavlin
Copy link
Member

vpavlin commented Feb 6, 2024

Yes, I think the user stories are good

@fryorcraken
Copy link
Contributor

should mention RLN

@weboko
Copy link

weboko commented Feb 8, 2024

Considering previous comments and this comment I would roughly define js-waku work streams as follows:

  • orient protocols to be more RFC oriented + unit tests:
    • refine Filter;
    • refine LightPush;
    • refine Store;
  • write interop tests to cover non-relay client user stories;
  • protocol reliability + exposing additional configs to enable / disable behavior + unit tests:
  • protocol abstractions:
    • Filter subscribe works as simple as decoder + contentTopic;
    • ensure LightPush has .send API;
    • potential simplification of Store protocol

We can de-scope some of these streams in the interest of time.

@waku-org/js-waku-developers , @danisharora099 please, add anything I missed

@chaitanyaprem
Copy link

Current proposal is to limit the bindings to relay node: #121 So indeed, non-relay would apply to js-waku and relay to nwaku.

non-relay would be required for nwaku as well, because status desktop (light mode) and status mobile use lightpush and Filter.

@chaitanyaprem
Copy link

  1. Unreliable Filter - seemed like the filter nodes just stopped propagating messages to my app (even with ping on), so I had to resubscribe - but js-waku kept trying to resubscribe to the same node, which for some reason would not work. So having multiple filter nodes and js-waku (or the library) keeping track of this would save a lot of trouble

Interesting, we have not seen this so far with status testing.
If a node is subscribed to filter, then messages would always be pushed to it.
Rather the issues have been more due to something else causing message loss.
Wondering if this could be due to browser env where connections are unstable?

@fryorcraken
Copy link
Contributor

non-relay would be required for nwaku as well, because status desktop (light mode) and status mobile use lightpush and Filter.

I don't expect Status to use the output of this milestone. Status already composes the protocols to enable reliability.

Wondering if this could be due to browser env where connections are unstable?
Yes, seems to be related to websocket.

@fryorcraken
Copy link
Contributor

A potential follow up would be to add health indicator similar to waku-org/go-waku#1021

However, keeping this out of scope to expedite milestone delivery.

@fryorcraken
Copy link
Contributor

LGTM. Sign-off to happen at Waku PM call

@fryorcraken
Copy link
Contributor

Questions:

  • Should we only support autosharding?
  • Do we need to better define the reliability to be provided here?
    • Composing the protocols enable better heuristic of reliability and they are meant to be composed but it is technically advanced.
  • How to store messages across different applications and platforms?
  • Store sync protocol, this can help with reliability. Yes, we would want to use the sync protocol for any form of cache but as part of this milestone.

@fryorcraken
Copy link
Contributor

Signed-off EU-AS 19 Feb

@fryorcraken
Copy link
Contributor

I suggest to only support autosharding as part of this milestone.

@chair28980

This comment was marked as resolved.

@fryorcraken
Copy link
Contributor

Add to scope: waku-org/js-waku#1834 - feat: create helper for running docker locally

Did you mean to update #137 ?

@fryorcraken
Copy link
Contributor

We decided not to go with a health status to minimize milestone scope: waku-org/go-waku#1021

However this is is something we can do down the road on a follow-up milestone.

@weboko
Copy link

weboko commented Feb 26, 2024

We decided not to go with a health status to minimize milestone scope: waku-org/go-waku#1021

However this is is something we can do down the road on a follow-up milestone.

Booking this under following issue for js-waku: waku-org/js-waku#1865

@chair28980
Copy link
Contributor Author

Work split between two new deliverables based on updated 2024 Roadmap. Closing in favor of #184 and #186.

@chair28980 chair28980 closed this as not planned Won't fix, can't repro, duplicate, stale Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

6 participants