bug: Automatic sharding with store not working #2616

AlejandroCabeza · 2024-04-22T21:15:11Z

Problem

The short and full version for content topics with automatic sharding don't retrieve the same messages.
Having two sets of messages, one for /toychat/2/huilong/proto and another for /0/toychat/2/huilong/proto, when making a HistoryQuery for each of the content topics, each query retrieves their respective set of messages, and not both.

Impact

...

To reproduce

Defined two equivalent content topics: one full version (e.g.: /0/toychat/2/huilong/proto) and another short (/toychat/2/huilong/proto).
Create two sets of messages.
Create an ArchiveDriver and insert the two sets of messages; each of them assigned to one of the content topics.
Create a server and a client, mount store on the server and store client on the client. Also, mount the archive on
the server.
Create two HistoryQuery. One both will make a request to the server, one of them to the full content topic, another to the short content topic.
Verify the results. The short content topic query will only retrieve the messages assigned to the short content topic, and the full content topic query will only retrieve the messages assigned to the full content topic.

You can also check tests/sharding branch. Open tests/node/test_wakunode_sharding.nim and check out the store (automatic sharding filtering) case. Keep in mind they are declared as xasyncTest, so you'll need to update them to asyncTest for them not to be ignored by the test runner.

Expected behavior

Given the two content topics are equivalent, both sets of messages should be retrieved by both queries.

nwaku version/commit hash

wakunode2: v0.27.0-rc.0-11-ge61e4f
branch: tests/sharding
commit: 82978c5 (might change in the future)
pr: #2603

The text was updated successfully, but these errors were encountered:

gabrielmer · 2024-04-23T13:03:06Z

@SionoiS WDYT? Are those content topics actually supposed to be equivalent?

SionoiS · 2024-04-23T14:44:01Z

@SionoiS WDYT? Are those content topics actually supposed to be equivalent?

AFAIK we store content topics as string and so they are different at the DB level...

🤔 I'm not sure what to do exactly. I agree that they should be equivalent. I guess we could format content topics before storing them in DB?

gabrielmer · 2024-04-23T14:56:17Z

Yes, at the DB level they're different strings, nothing to do about it.
So either it is or it should be formatted at a higher level. But maybe it's not something to test at the ArchiveDriver, which is too low level

CC @Ivansete-status

SionoiS · 2024-04-23T19:13:09Z

I feel we should not use strings internally at all. We already have

nwaku/waku/waku_core/topics/pubsub_topic.nim

Line 27 in 6d135b0

type NsPubsubTopic* = object

and

nwaku/waku/waku_core/topics/content_topic.nim

Line 23 in 6d135b0

type NsContentTopic* = object

we could use.

kaichaosun · 2024-04-25T08:15:36Z

Could someone elaborate what this 0 means, and what's the value it brings?

The generation number monotonously increases and indirectly refers to the total number of shards of the Waku Network.

From spec it still confuses me.

And I have been sending messages via /relay/v1/auto/messages, the content topic is not prefixed with any 0 (generation) in database messages table, it stores the one app used.

SionoiS · 2024-04-25T11:59:22Z

Could someone elaborate what this 0 means, and what's the value it brings?

The generation number monotonously increases and indirectly refers to the total number of shards of the Waku Network.

From spec it still confuses me.

Since autosharding maps infinite content topics to finite shards. We need to know the number of shards and the algorithm used for autosharding, that is what this number represent. In TWN gen 0, hash mod 8 (shards) is used for autosharding but we could define a TWN gen 1 in the future with a different algorithm.

And I have been sending messages via /relay/v1/auto/messages, the content topic is not prefixed with any 0 (generation) in database messages table, it stores the one app used.

0 is the first and default prefix and can be omitted, it is implicit in all content topics ATM.

gabrielmer · 2024-05-21T12:14:33Z

Revisiting this, I personally think that we should store in the DB the full length content topic defined in the spec

And if we receive the short version of the content topic, transform it at the application level to the long version which will be stored in the DB.

If that's the case, then when interacting directly with the DB in tests, we would only need to use the full version.

Does it make sense? cc @jm-clius

jm-clius · 2024-05-21T13:44:57Z

At this stage we don't know yet if there will be a gen 1. I wouldn't pad our existing content topics just in case we have this generational use in future.
In fact, autosharding should preferably not have a major effect on what we do in the lower layer of the protocols in general. For now, I'm happy with the content topics being differentiated in the DB (i.e. requiring separate filters to query), as long as both short and long forms map to the same pubsub topic. We may clarify the specification to indicate:

without a prefix a content topic will be assumed gen 0
there is no filter equivalence between content topics that use the explicit vs implicit default gen 0

Actually, the entire generational concept is more to indicate how we could expand number of shards in future and may be marked as not implemented until we do define subsequent generations.

gabrielmer · 2024-05-21T14:32:03Z

Sounds good, agree with that approach :))

If that's the case, then can we close this one? or is there anything missing? @AlejandroCabeza

We do have to check that content topics in short and long forms map to the same pubsub topic, but I think that would be a separate issue from this one.

gabrielmer · 2024-06-04T12:51:56Z

Closing this issue as per the above discussion. Feel free to reopen in case something is missing :))

AlejandroCabeza · 2024-06-10T11:22:01Z

If I understood correctly then these two, toychat/2/huilong/proto and /0/toychat/2/huilong/proto, are not to be treated as equivalent?

gabrielmer · 2024-06-10T11:31:49Z

If I understood correctly then these two, toychat/2/huilong/proto and /0/toychat/2/huilong/proto, are not to be treated as equivalent?

My understanding is that not at the DB level, but when using autosharding's API they should behave as equals

AlejandroCabeza · 2024-06-10T14:34:52Z

If I understood correctly then these two, toychat/2/huilong/proto and /0/toychat/2/huilong/proto, are not to be treated as equivalent?

My understanding is that not at the DB level, but when using autosharding's API they should behave as equals

Right, then has somebody implemented that and enabled the test?

gabrielmer · 2024-06-10T15:35:21Z

Right, then has somebody implemented that and enabled the test?

I see that there's already tests activated for it and working such as

nwaku/tests/node/test_wakunode_sharding.nim

Lines 310 to 348 in f5d87c5

    
           asyncTest "relay (automatic sharding filtering)": 
        
             # Given a connected server and client subscribed to the same content topic (with two different formats) 
        
             let 
        
               contentTopicShort = "/toychat/2/huilong/proto" 
        
               contentTopicFull = "/0/toychat/2/huilong/proto" 
        
               pubsubTopic = "/waku/2/rs/0/58355" 
        
               serverHandler = server.subscribeToContentTopicWithHandler(contentTopicShort) 
        
               clientHandler = client.subscribeToContentTopicWithHandler(contentTopicFull) 
        
             await sleepAsync(FUTURE_TIMEOUT) 
        
             await client.connectToNodes(@[server.switch.peerInfo.toRemotePeerInfo()]) 
        
             # When the client publishes a message 
        
             discard await client.publish( 
        
               some(pubsubTopic), 
        
               WakuMessage(payload: "message1".toBytes(), contentTopic: contentTopicShort), 
        
             ) 
        
             let 
        
               serverResult1 = await serverHandler.waitForResult(FUTURE_TIMEOUT) 
        
               clientResult1 = await clientHandler.waitForResult(FUTURE_TIMEOUT) 
        
             # Then the server and client receive the message 
        
             assertResultOk(serverResult1) 
        
             assertResultOk(clientResult1) 
        
             # When the server publishes a message 
        
             serverHandler.reset() 
        
             clientHandler.reset() 
        
             discard await server.publish( 
        
               some(pubsubTopic), 
        
               WakuMessage(payload: "message2".toBytes(), contentTopic: contentTopicFull), 
        
             ) 
        
             let 
        
               serverResult2 = await serverHandler.waitForResult(FUTURE_TIMEOUT) 
        
               clientResult2 = await clientHandler.waitForResult(FUTURE_TIMEOUT) 
        
             # Then the server and client receive the message 
        
             assertResultOk(serverResult2) 
        
             assertResultOk(clientResult2)

Which if I understand correctly, implies that both content topics map to the same shard. Maybe I'm missing something?

AlejandroCabeza · 2024-06-10T15:41:17Z

Right, then has somebody implemented that and enabled the test?

I see that there's already tests activated for it and working such as

nwaku/tests/node/test_wakunode_sharding.nim

Lines 310 to 348 in f5d87c5

asyncTest "relay (automatic sharding filtering)":

# Given a connected server and client subscribed to the same content topic (with two different formats)

let

contentTopicShort = "/toychat/2/huilong/proto"

contentTopicFull = "/0/toychat/2/huilong/proto"

pubsubTopic = "/waku/2/rs/0/58355"

serverHandler = server.subscribeToContentTopicWithHandler(contentTopicShort)

clientHandler = client.subscribeToContentTopicWithHandler(contentTopicFull)

await sleepAsync(FUTURE_TIMEOUT)

await client.connectToNodes(@[server.switch.peerInfo.toRemotePeerInfo()])

# When the client publishes a message

discard await client.publish(

some(pubsubTopic),

WakuMessage(payload: "message1".toBytes(), contentTopic: contentTopicShort),

)

let

serverResult1 = await serverHandler.waitForResult(FUTURE_TIMEOUT)

clientResult1 = await clientHandler.waitForResult(FUTURE_TIMEOUT)

# Then the server and client receive the message

assertResultOk(serverResult1)

assertResultOk(clientResult1)

# When the server publishes a message

serverHandler.reset()

clientHandler.reset()

discard await server.publish(

some(pubsubTopic),

WakuMessage(payload: "message2".toBytes(), contentTopic: contentTopicFull),

)

let

serverResult2 = await serverHandler.waitForResult(FUTURE_TIMEOUT)

clientResult2 = await clientHandler.waitForResult(FUTURE_TIMEOUT)

# Then the server and client receive the message

assertResultOk(serverResult2)

assertResultOk(clientResult2)

Which if I understand correctly, implies that both content topics map to the same shard. Maybe I'm missing something?

That's a relay-related test. This issue is store-related.

gabrielmer · 2024-06-10T16:17:12Z

That's a relay-related test. This issue is store-related.

Yes, but isn't the mapping from content topic to shard the same for all protocols? Same autosharding algorithm.

I understood this was the last question remaining, whether autosharding maps both to the same shard and if so we're fine for now (as we decided to not see them as equivalent at the DB level).
The test case I attached, even if it's for relay, tests/proves this is the case.

So currently

We know that the mapping works as intended
If it stops working as intended, a test will fail and we will know about it

AlejandroCabeza · 2024-06-10T17:00:43Z

That's a relay-related test. This issue is store-related.

Yes, but isn't the mapping from content topic to shard the same for all protocols? Same autosharding algorithm.

I understood this was the last question remaining, whether autosharding maps both to the same shard and if so we're fine for now (as we decided to not see them as equivalent at the DB level). The test case I attached, even if it's for relay, tests/proves this is the case.

So currently
1. We know that the mapping works as intended

2. If it stops working as intended, a test will fail and we will know about it

I can't remember, but there must be a reason I posted an issue specifically referencing an issue with lightpush-store, with the test case skipped, while there's a couple other tests referencing other protocols that weren't skipped. There could be something within store that is messing with the topics format.
I just run the mentioned test and it still fails. I'll point out, though, that the test uses HistoryQuery which now is located in waku_store_legacy.

gabrielmer · 2024-06-10T22:10:25Z

I can't remember, but there must be a reason I posted an issue specifically referencing an issue with lightpush-store, with the test case skipped, while there's a couple other tests referencing other protocols that weren't skipped. There could be something within store that is messing with the topics format. I just run the mentioned test and it still fails. I'll point out, though, that the test uses HistoryQuery which now is located in waku_store_legacy.

I think that the issue here is that the test is adding and querying it at the DB level which is too low of a level. At that level as per the above discussion it's ok to see both content topics as different.

As long as autosharding maps them to the same shard (which is not specific to any protocol and we saw there's already tests that assure it), then everything seems to be alright :)

AlejandroCabeza added the bug Something isn't working label Apr 22, 2024

gabrielmer closed this as completed Jun 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: Automatic sharding with store not working #2616

bug: Automatic sharding with store not working #2616

AlejandroCabeza commented Apr 22, 2024 •

edited

gabrielmer commented Apr 23, 2024

SionoiS commented Apr 23, 2024

gabrielmer commented Apr 23, 2024

SionoiS commented Apr 23, 2024

kaichaosun commented Apr 25, 2024

SionoiS commented Apr 25, 2024 •

edited

gabrielmer commented May 21, 2024

jm-clius commented May 21, 2024

gabrielmer commented May 21, 2024

gabrielmer commented Jun 4, 2024

AlejandroCabeza commented Jun 10, 2024 •

edited

gabrielmer commented Jun 10, 2024

AlejandroCabeza commented Jun 10, 2024

gabrielmer commented Jun 10, 2024

AlejandroCabeza commented Jun 10, 2024

gabrielmer commented Jun 10, 2024

AlejandroCabeza commented Jun 10, 2024

gabrielmer commented Jun 10, 2024

bug: Automatic sharding with store not working #2616

bug: Automatic sharding with store not working #2616

Comments

AlejandroCabeza commented Apr 22, 2024 • edited

Problem

Impact

To reproduce

Expected behavior

nwaku version/commit hash

gabrielmer commented Apr 23, 2024

SionoiS commented Apr 23, 2024

gabrielmer commented Apr 23, 2024

SionoiS commented Apr 23, 2024

kaichaosun commented Apr 25, 2024

SionoiS commented Apr 25, 2024 • edited

gabrielmer commented May 21, 2024

jm-clius commented May 21, 2024

gabrielmer commented May 21, 2024

gabrielmer commented Jun 4, 2024

AlejandroCabeza commented Jun 10, 2024 • edited

gabrielmer commented Jun 10, 2024

AlejandroCabeza commented Jun 10, 2024

gabrielmer commented Jun 10, 2024

AlejandroCabeza commented Jun 10, 2024

gabrielmer commented Jun 10, 2024

AlejandroCabeza commented Jun 10, 2024

gabrielmer commented Jun 10, 2024

AlejandroCabeza commented Apr 22, 2024 •

edited

SionoiS commented Apr 25, 2024 •

edited

AlejandroCabeza commented Jun 10, 2024 •

edited