feat(lightpush): peer management for protocols #2003

danisharora099 · 2024-05-08T14:07:38Z

Problem

For Filter and LightPush, every time a new LP request/new subscription is created, getPeers() is called -- this is called every time which gives us a new set of peers to use for each request/subscription.

Further, we wish to discard a peer if it fails, and renew it with another peer. This is not trivial to do if we rely on getting a new set of peers each time.

Solution

This PR tackles it for LightPush:

A list of peers is constantly maintained for lightpush equal to numPeersToUse
If there is a node that fails for whatever reason, that peer is disconnected, and a new peer is introduced to the list

Notes

Related to feat: peer management for protocols (with disconnection management) #2002

Contribution checklist:

covered by unit tests;
covered by e2e test;
add ! in title if breaks public API;

github-actions · 2024-05-08T14:11:56Z

size-limit report 📦

Path	Size	Loading time (3g)	Running time (snapdragon)	Total time
Waku node	183.21 KB (+0.87% 🔺)	3.7 s (+0.87% 🔺)	13.1 s (-21.67% 🔽)	16.8 s
Waku Simple Light Node	183.19 KB (+0.9% 🔺)	3.7 s (+0.9% 🔺)	21 s (+17.38% 🔺)	24.7 s
ECIES encryption	23.12 KB (0%)	463 ms (0%)	3.4 s (-40.64% 🔽)	3.8 s
Symmetric encryption	22.58 KB (0%)	452 ms (0%)	4 s (-31.2% 🔽)	4.5 s
DNS discovery	72.49 KB (0%)	1.5 s (0%)	7.7 s (-24.07% 🔽)	9.1 s
Peer Exchange discovery	74.15 KB (0%)	1.5 s (0%)	10.7 s (+37.17% 🔺)	12.1 s
Local Peer Cache Discovery	67.68 KB (0%)	1.4 s (0%)	11.5 s (-5.72% 🔽)	12.9 s
Privacy preserving protocols	38.87 KB (0%)	778 ms (0%)	8.5 s (-0.85% 🔽)	9.2 s
Waku Filter	112.58 KB (+0.6% 🔺)	2.3 s (+0.6% 🔺)	15.3 s (+17.96% 🔺)	17.6 s
Waku LightPush	111.1 KB (+0.65% 🔺)	2.3 s (+0.65% 🔺)	14.9 s (-15.58% 🔽)	17.1 s
History retrieval protocols	111.67 KB (+0.69% 🔺)	2.3 s (+0.69% 🔺)	24.2 s (+113.55% 🔺)	26.4 s
Deterministic Message Hashing	7.29 KB (0%)	146 ms (0%)	1.1 s (-37.26% 🔽)	1.2 s

packages/core/src/lib/connection_manager.ts

packages/sdk/src/light-node/index.ts

packages/sdk/src/protocols/filter.ts

packages/core/src/lib/connection_manager.ts

packages/sdk/src/protocols/base_protocol.ts

packages/tests/tests/filter/utils.ts

packages/sdk/src/waku.ts

weboko · 2024-05-16T10:06:31Z

packages/sdk/src/protocols/store.ts

@@ -315,7 +321,8 @@ export class StoreSDK extends BaseProtocolSDK implements IStoreSDK {
 }

 export function wakuStore(
+  connectionManager: ConnectionManager,


nit: let's include it into init and maybe make it of a type ProtocolCreateOptions & ProtocolServices or something

I like the idea. This currently becomes non-trivial considering there is no type/interfaces for ConnectionManager in @waku/interfaces -- this becomes a nice followup for #1969

packages/sdk/src/protocols/filter.ts

weboko · 2024-05-16T10:16:22Z

packages/sdk/src/protocols/light_push.ts

@@ -43,15 +50,19 @@ class LightPushSDK extends BaseProtocolSDK implements ILightPushSDK {
      };
    }

-    const peers = await this.protocol.getPeers();
-    if (!peers.length) {
+    const peersFound = await this.hasPeers();


it seems to me in this case we should get peers as fast as possible and only in background support validity and quality of the pool

because if we make it part of send or subscribe (or for Store) - then it increases delay for the application built on top

instead I think we should be lazy in this approach and prevent / make consumer wait only if service doesn't work (error being thrown)

edge case is if the only peer is in the pool and this peer is bad - I believe it is a rare case for which we should optimize only after seeing it happens often for end users.
So we should measure it now: add a log line for it for later use in telemetry etc, also a point for dogfood app we are preparing.

I agree with your approach in theory and that is how I implemented it.

However, this hasPeers() check makes itself useful considering we have a recurring interval that maintains peers and thus we need to ensure that maintainPeers() was called at least once.

Looking at the function:

protected hasPeers = async (): Promise<boolean> => { let success = await this.maintainPeers(); let attempts = 0; while (!success || this.peers.length === 0) { attempts++; success = await this.maintainPeers(); if (attempts > 3) { if (this.peers.length === 0) { this.log.error("Failed to find peers to send message to"); return false; } else { this.log.warn( `Found only ${this.peers.length} peers, expected ${this.numPeersToUse}` ); return true; } } await new Promise((resolve) => setTimeout(resolve, 1000)); } return true; };

It will return if peers.length > 0.

and thus we need to ensure that maintainPeers() was called at least once
it seems it is initiated from the constructor so there is no need to make sure of it anymore, considering ctor was initiated successfully

let's not have blocking routines for ad-hoc operations, if we cannot send a message now - we should throw (maybe later we can implement improvement of re-sending messages but this is out of scope)

I understand your POV and agree to it.

This is non-trivial because if we don't provide this reliability abstraction part of the SDK send() functionality, it will have to be implemented as a wrapper by library consumers.

For example: if the interval set is 30s (default), users will have to wait 30 seconds after node initialisation to be able to register new connected peers.

I have an idea:

Instead of relying solely on an interval, let's trigger maintainPeers() on a new peer:connect event.
This will significantly improve efficiency of the routine.

Hm, that definitely will improve the time it takes for maintenance to update.

Tested it out with tests.

However, because of the nature of it being non-blocking, send() will fail way more often when users try to send a lightpush request right after starting their node.

This is most reproducible in our tests, where we do a lightpush.send() right after we start a node and connect to peers (as will also happen in most user applications)

IMO: it is a good default/reliability abstraction to automatically try to wait for peers IF there are no peers available.
If peers are available, well and good and we don't wait.

Since this is especially a SDK level feature, I think this is a nice to have.
We can wrap this as an argument autoRetry from user with default true @weboko

Implemented some improvements to your comment here: 8e89b24

Wdyt?

I like the idea of autoRetry and we can enrich it within #2032 (feel free to take if you want and when free and let's discuss exact design, we can change it from what I described there)

and I think it is a good way to use peer:identify or some other event (maybe connected is good for that purpose too)

but I would still not make consumer wait for some other operation during .send unless it is explicitly configured

packages/sdk/src/protocols/base_protocol.ts

weboko · 2024-05-16T10:28:40Z

packages/sdk/src/protocols/base_protocol.ts

+  private async findAdditionalPeers(numPeers: number): Promise<Peer[]> {
+    this.log.info(`Finding ${numPeers} additional peers`);
+    try {
+      let newPeers = await this.core.getPeers({


in general - getPeers from BaseProtocolCore will get peer only those we are connected to
is it sufficient? it seems that in some cases we need to initiate new connections, if existing are not enough or not good

or we rely on connection manager to automatically add one after it being dropped?

ConnectionManager keeps on trying to connect to as many peers, as we keep discovering this.

The upper limit on max connections is 100: https://github.com/libp2p/js-libp2p/blob/169c9d85e7c9cd65be964b5d08bd618d950f70ee/doc/LIMITS.md?plain=1#L39

This is more than enough. We can assume that when a connection is dropped, we can find new peers (if they were discovered), or we will connect to them as soon as we discover them.

There does not seem to be an apparent action js-waku can take to connect to new peers, other than what's already happening through discoveries.

Actually it seems to be 300

I think we can document it somewhere, at least, in comment section - mentioning that assumption is to have connection manager to populate connected peers.

weboko · 2024-06-04T23:12:09Z

packages/sdk/src/protocols/base_protocol.ts

+     * This is useful to make the process of finding and adding new peers more dynamic and efficient,
+     * instead of relying on the interval only.
+     */
+    this.core.addLibp2pEventListener("peer:identify", (evt) => {


I am honestly a bit lost here, help please :)

when "peer:identify" fired - maintainPeers will be called;

but then we anyway do this in startMaintainPeersInterval and even do this later in interval;

Why do we need to use event still?

Though we discussed this in detail in the sync session earlier today, I'll recap here to document it:

Intervals are ran as a maintenance cycle of peers, to fetch state of active connections from the ConnectionManager. The latter is responsible of actively connecting to nodes that are discovered. The former is responsible of finding the relevant peers, based on the multicodec and the configured pubsub topic for the protocol .
Since the PeerManager, that runs these intervals, does not have (and shouldn't) this information, it in a way polls the ConnectionManager for this information.

peer:identify is used to shorten the period between connection of a new node, and the interval that is run. It decreases the potential time for the protocol to identify that a new peer is now available for this protocol.

weboko

LGTM! We derived follow-ups and before merging, please, make sure it works e2e within some of our exampels.

- passes `ConnectionManager` to ProtocolSDK to allow disconnecting from within protocol - maintains `numPeersToUse` for each protocol within BaseProtocolSDK

…ests fixes parallelisation of lightpush.send() requests

danisharora099 mentioned this pull request May 8, 2024

feat: peer management for protocols (with disconnection management) #2002

Open

5 tasks

danisharora099 commented May 8, 2024

View reviewed changes

packages/core/src/lib/connection_manager.ts Show resolved Hide resolved

packages/sdk/src/light-node/index.ts Show resolved Hide resolved

packages/sdk/src/protocols/filter.ts Show resolved Hide resolved

danisharora099 force-pushed the feat/peer-reconnection branch 2 times, most recently from 3b96c37 to b947aa9 Compare May 14, 2024 11:50

danisharora099 marked this pull request as ready for review May 14, 2024 12:21

danisharora099 requested a review from a team as a code owner May 14, 2024 12:21