feat: update various protocols to autoshard #1857

SionoiS · 2023-07-21T12:31:55Z

Description

Updating LIGHTPUSH & FILTER protocols to handle autosharding.

Changes

LIGHTPUSH take optional pubsub topics and compute shards.
FITLER take optional pubsub topics and compute shards.

Tracking #1846

SionoiS · 2023-07-21T14:26:13Z

~~I'm not sure how to change FILTER to handle multiple shards. The code ATM assumes one pubsub and multiple content topics.~~

SionoiS · 2023-08-01T16:01:05Z

~~I added some tests and fixed a bug.~~

waku/v2/node/waku_node.nim

SionoiS · 2023-08-02T13:14:05Z

Added a proc for optional autosharding, changed the cluster index and count for gen0 and added back topics in config with a deprecation note.

SionoiS · 2023-08-02T20:13:35Z

@jakubgs When we merge this to master the config will be back to normal. --topic will be back as deprecated with --pubsub-topic and --content-topic present.

jm-clius

Made a comment below re filter, but it applies to all the protocols: I think we don't want the underlying protocols to change, just provide applications with some "middleware" logic when they interact with the node that populates shards in API calls to the underlying protocols. IMO protocols should remain completely unaware of autosharding, which occurs at a higher layer.

waku/v2/waku_filter/rpc.nim

SionoiS · 2023-08-03T19:44:34Z

Made a comment below re filter, but it applies to all the protocols: I think we don't want the underlying protocols to change, just provide applications with some "middleware" logic when they interact with the node that populates shards in API calls to the underlying protocols. IMO protocols should remain completely unaware of autosharding, which occurs at a higher layer.

~~The protocols are not the code in nwaku. They are in the RFCs and for FILTER for example the pubsub topic is optional. They were not following the protocol, now they do.~~

I totally miss this part If the request contains filter criteria, it MUST contain a pubsub_topic and the content_topics set MUST NOT be empty.

As for where we decide to compute the shard, my idea was that light node should have the least amount of computing to do.

I'll make the changes.

waku/v2/waku_filter_v2/client.nim

waku/v2/waku_lightpush/client.nim

waku/v2/waku_store/client.nim

SionoiS · 2023-08-04T20:05:04Z

It's still a bit weird how you have to send multiple request in the case of the content topic not using the same shards.

alrevuelta

left some comments, all of them related to pubsub vs content topic. let me know what you think or if im missing something :)

apps/wakunode2/external_config.nim

examples/v2/filter_subscriber.nim

waku/v2/waku_filter_v2/client.nim

alrevuelta · 2023-08-08T08:29:47Z

@SionoiS

It's still a bit weird how you have to send multiple request in the case of the content topic not using the same shards.

If we remove the pubsub topic from the request then you dont need to send multiple requests? The full node will "infer" it.

jm-clius · 2023-08-08T09:04:12Z

It's still a bit weird how you have to send multiple request in the case of the content topic not using the same shards.

True. If we don't change the underlying light protocols themselves, the "middleware" on the light client that populates the pubsub topic in requests might have to create multiple requests instead of one. To me this still seems safer as an initial step, rather than removing the pubsub topic from the protocol itself and having implicit assumption that the service node the client is contacting is on the same version/generation of autosharding as the client (in other words, on protocol level we remain explicit about shards/pubsub topics). Happy to have people disagree with me here, as I do think we eventually want to simplify how the protocols are used too. This just seems to me the simplest/safest increment to get there.

SionoiS · 2023-08-08T11:56:17Z

implicit assumption that the service node the client is contacting is on the same version/generation of autosharding as the client

The generation to use is in the content topic there's not assumptions. unsupported generation = bad request.

As for the version, any autosharding algo. change is breaking change. unsupported version = bad request.
This one is weak since the version is not in the protocol.

jm-clius

Thanks for moving sharding logic to the clients to keep the wire protocol the same. However, I'm still confused why the autosharding logic needs to be in the protocol clients. This effectively changes the protocol behaviour too (e.g. a filter subscribe now spinning off multiple subscribes). I would say the only thing that needs to be changed is where the application interacts with the public API, in other words: either in the JSON-RPC/REST APIs (e.g. here) or, if we want to be a bit more universal, in the Nim API for the node, e.g. here. If we follow the latter route, we'll still have a problem with store query, but the way to fix it IMO, would be to provide an API query* call where HistoryQuery creation is moved into the API call, while query criteria (content topics, optional pubsub, etc.) are provided separately as arguments.

SionoiS · 2023-08-10T14:58:51Z

Thanks for moving sharding logic to the clients to keep the wire protocol the same. However, I'm still confused why the autosharding logic needs to be in the protocol clients. This effectively changes the protocol behaviour too (e.g. a filter subscribe now spinning off multiple subscribes). I would say the only thing that needs to be changed is where the application interacts with the public API, in other words: either in the JSON-RPC/REST APIs (e.g. here) or, if we want to be a bit more universal, in the Nim API for the node, e.g. here. If we follow the latter route, we'll still have a problem with store query, but the way to fix it IMO, would be to provide an API query* call where HistoryQuery creation is moved into the API call, while query criteria (content topics, optional pubsub, etc.) are provided separately as arguments.

I got confused by node again. JSONRPC handlers call node.lightpushPublish() then it calls client.publish(), I though it called it directly.

I am not a fan of adding the "sharding middleware" to the APIs handlers and I also don't like node.

I guess I'll put it in node for now...

SionoiS · 2023-08-10T16:07:49Z

Is there a difference between omitting pubsub topic for a STORE request or one request per shards?

I see three different intent when making a request.

I don't care about pubsub topics.
I want autosharding
I want messages for one pubsub topic.

1 and 2 are functionally the same but maybe we want to differentiate?

Also, like a said there's no easy way to merge results when sending multiple requests. How would you page?

Maybe we could error when requesting multiple shards at the same time?

WDYT? @jm-clius @alrevuelta

jm-clius · 2023-08-10T16:18:59Z

I am not a fan of adding the "sharding middleware" to the APIs handlers and I also don't like node.

I understand, though I think both options are better than adding it into the protocols themselves. Another option may be just to create a new API for applications using autosharding? Up to you, especially since we can increment here.

Maybe we could error when requesting multiple shards at the same time?

Mmm. Yes, I think Store behaviour here is different than Filter, in that it's possible to query with an open (i.e. unpopulated) pubsub topic which would mean "don't care". But in the case of autosharding you assume your content topics are unique (i.e. doesn't span shards), so there's no reason to change anything for autosharding (the pubsub topic can just remain empty in the HistoryQuery). Where things will get interesting for store and other protocols is in how we select the store peer that is able to service our request - we'd need to know that it's a store node for the shard we're querying on.

github-actions · 2023-08-10T19:41:09Z