Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: can't discover peers when using DNS Discovery URL + shards #2162

Closed
richard-ramos opened this issue Oct 26, 2023 · 6 comments · Fixed by #2267
Closed

bug: can't discover peers when using DNS Discovery URL + shards #2162

richard-ramos opened this issue Oct 26, 2023 · 6 comments · Fixed by #2267
Labels
bug Something isn't working

Comments

@richard-ramos
Copy link
Member

The nodes returned by DNS DIscovery in shards.test don't have the information about the shards:

enrtree://AMOJVZX4V6EXP7NTJPMAYJYST2QP6AJXYW76IU6VGJS7UVSNDYZG4@boot.test.shards.nodes.status.im

Nodes returned by this URL
1 - enr:-Ne4QJKpiQqwYpo0p1yDW6opKFYzh801nhSzX65S_x892UXABVYzFBrdFwCPiWwXlKqVz5sXkTzYtUuX1wg2sW5DZnwBgmlkgnY0gmlwhCIfDu-KbXVsdGlhZGRyc7g4ADY2MWJvb3QtMDIuZ2MtdXMtY2VudHJhbDEtYS5zaGFyZHMudGVzdC5zdGF0dXNpbS5uZXQGdl-Jc2VjcDI1NmsxoQJm8YcPIYhI5rvlLJJRlpebApk6w4uOLdFgAeHN2wO9N4N0Y3CCdl-DdWRwgiMohXdha3UyDQ
2 - enr:-Ne4QIvHiMe1Gf7h22jygL1kPFVAcQ0RkDYNk1PNA52KUKElBSPuPy-HSD1pRX-rCx2A2Qqh0GtkzFUyL8NQEiL15P0BgmlkgnY0gmlwhAjaF0yKbXVsdGlhZGRyc7g4ADY2MWJvb3QtMDEuYWMtY24taG9uZ2tvbmctYy5zaGFyZHMudGVzdC5zdGF0dXNpbS5uZXQGdl-Jc2VjcDI1NmsxoQM_sJtGT5gonA4UUzhn2d7LQY9ztY8loLAaSk1HKVruYIN0Y3CCdl-DdWRwgiMohXdha3UyDQ
3 - enr:-Ne4QHOpWLyVVZMzJwXcc00CNp16vB5x2WFy6WQAEKyaOf_UMWKvz2a0HN9QCoSyBYmudBKspqYa_U6tJ64B0TqLzy0BgmlkgnY0gmlwhAjarmyKbXVsdGlhZGRyc7g4ADY2MWJvb3QtMDIuYWMtY24taG9uZ2tvbmctYy5zaGFyZHMudGVzdC5zdGF0dXNpbS5uZXQGdl-Jc2VjcDI1NmsxoQNeQXcyqdYwEjflVdLKYAusuZJ93fpGiFwqK1jU9ISQC4N0Y3CCdl-DdWRwgiMohXdha3UyDQ
4 - enr:-M24QJDZfhB_wN_PHOAQuzgnta20xKUsZl5kdhBeQJM16gdldCJNAKQp6dgbwo-MTRJxYVNCr85cHRAJxtNLR4vTbP0BgmlkgnY0gmlwhKdjEy-KbXVsdGlhZGRyc68ALTYoYm9vdC0wMS5kby1hbXMzLnNoYXJkcy50ZXN0LnN0YXR1c2ltLm5ldAZ2X4lzZWNwMjU2azGhAt60bRUEoHNuLlnsM12sU2PIQwBwfLIJ8a_ZPEY2-Rnkg3RjcIJ2X4N1ZHCCIyiFd2FrdTIN
5 - enr:-M24QAsRRxoLDnnXFGnbHGUKjtqgXOVxb2Cian1vegc1rtY0Yk5wXDF7NeBzPl7frvyxo3Vt-xSL0vUa2jazchNIS_oBgmlkgnY0gmlwhLKAj_GKbXVsdGlhZGRyc68ALTYoYm9vdC0wMi5kby1hbXMzLnNoYXJkcy50ZXN0LnN0YXR1c2ltLm5ldAZ2X4lzZWNwMjU2azGhAtsXOrELG9R5LlIbF6bqeLC0tg7bmNzQ0JkSmEO3zxqzg3RjcIJ2X4N1ZHCCIyiFd2FrdTIN
6 - enr:-Ne4QINS7SZiUk9oN3mcLpOrdQrFWS-AUDjyq5F9__8iTUT_H8ExnAj5qDWmG4qbLaz4NKvDtmIU3Ycu9sP_Ixk6hn4BgmlkgnY0gmlwhCKHDVeKbXVsdGlhZGRyc7g4ADY2MWJvb3QtMDEuZ2MtdXMtY2VudHJhbDEtYS5zaGFyZHMudGVzdC5zdGF0dXNpbS5uZXQGdl-Jc2VjcDI1NmsxoQLGOqANDRbJFI6KVhTfYMDmT9c2UOKzebVV1eQr3EzqQ4N0Y3CCdl-DdWRwgiMohXdha3UyDQ

These nodes are currently subscribed to shards:

  • /waku/2/rs/16/32
  • /waku/2/rs/16/64
  • /waku/2/rs/16/128
  • /waku/2/rs/16/256

However if you check the ENRs in https://enr-viewer.com/ these nodes lack the rs or rsv attributes. Also, if you compare the current ENR of the nodes (using the nodes' RPC server), against those from the DNS Discovery URL, you'd see that the latter are outdated compared to the former, which makes sense since the DNS Discovery URL was manually created when the fleet was setup, while the subscription to shards is something that happens 'dynamically'.

This is problematic because If i have a node subscribed to shards 32,64,128 and/or 256, if I use the dns discovery URL, I'll not be able to find new peers, because they'll get filtered out. DiscV5 will populate the routing tables with those shardless ENRs retrieved from the URL, and then they get filtered out, since Discv5 (at least in go-waku implementation, and looking at nwaku implementation seems to be similar) will not ask those nodes for their current ENR, and the filtering logic defined in this predicate:

proc shardingPredicate*(record: Record): Option[WakuDiscv5Predicate] =
## Filter peers based on relay sharding information
let typeRecordRes = record.toTyped()
let typedRecord =
if typeRecordRes.isErr():
debug "peer filtering failed", reason= $typeRecordRes.error
return none(WakuDiscv5Predicate)
else: typeRecordRes.get()
let nodeShardOp = typedRecord.relaySharding()
let nodeShard =
if nodeShardOp.isNone():
debug "no relay sharding information, peer filtering disabled"
return none(WakuDiscv5Predicate)
else: nodeShardOp.get()
debug "peer filtering updated"
let predicate = proc(record: waku_enr.Record): bool =
nodeShard.shardIds.anyIt(record.containsShard(nodeShard.clusterId, it))
return some(predicate)
, meaning that the nodes will not be able to discover new peers using the dns discovery URL.

This is particularly problematic in the case of Status, since the DNS discovery URL is hardcoded in the node configuration.

@richard-ramos richard-ramos added the bug Something isn't working label Oct 26, 2023
@richard-ramos richard-ramos changed the title bug: bug: can't discover peers when using DNS Discovery URL + shards Oct 26, 2023
@richard-ramos
Copy link
Member Author

richard-ramos commented Oct 26, 2023

I tried the following in nwaku:

Node1:
./build/wakunode2 --discv5-discovery --dns-discovery --dns-discovery-url=enrtree://AMOJVZX4V6EXP7NTJPMAYJYST2QP6AJXYW76IU6VGJS7UVSNDYZG4@boot.test.shards.nodes.status.im --pubsub-topic=/waku/2/rs/16/128 --tcp-port=55511

Node2:
./build/wakunode2 --discv5-discovery --dns-discovery --dns-discovery-url=enrtree://AMOJVZX4V6EXP7NTJPMAYJYST2QP6AJXYW76IU6VGJS7UVSNDYZG4@boot.test.shards.nodes.status.im --pubsub-topic=/waku/2/rs/16/128 --tcp-port=55522

Do notice that the following log line gets printed when using this configuration

WRN 2023-10-26 11:01:43.551-04:00 No discv5 bootstrap nodes share this node configured shards topics="wakunode app" tid=2758528 file=waku_discv5.nim:96

Can confirm that nodes are not getting discovered, however if I remove the --pubsub-topic flag, I'm able to see the peers being discovered

@richard-ramos
Copy link
Member Author

I created the following tool with go-waku to discover peers via discv5: https://github.com/waku-org/test-discv5/tree/master

You can use any of these commands to execute it:

go run main.go --dns-disc-url=enrtree://AMOJVZX4V6EXP7NTJPMAYJYST2QP6AJXYW76IU6VGJS7UVSNDYZG4@boot.test.shards.nodes.status.im

or

go run main.go --bootnodes=comma_separated_list_of_enrs

This will continuously try to discover peers, and print the enr, ip, port, multiaddresses, rs and rsv if available. If something is not available it wont be printed. In the results you can see that neither of the nodes have an rs or rsv field being displayed:

Bootnodes:
1 - enr:-Ne4QJKpiQqwYpo0p1yDW6opKFYzh801nhSzX65S_x892UXABVYzFBrdFwCPiWwXlKqVz5sXkTzYtUuX1wg2sW5DZnwBgmlkgnY0gmlwhCIfDu-KbXVsdGlhZGRyc7g4ADY2MWJvb3QtMDIuZ2MtdXMtY2VudHJhbDEtYS5zaGFyZHMudGVzdC5zdGF0dXNpbS5uZXQGdl-Jc2VjcDI1NmsxoQJm8YcPIYhI5rvlLJJRlpebApk6w4uOLdFgAeHN2wO9N4N0Y3CCdl-DdWRwgiMohXdha3UyDQ
2 - enr:-Ne4QIvHiMe1Gf7h22jygL1kPFVAcQ0RkDYNk1PNA52KUKElBSPuPy-HSD1pRX-rCx2A2Qqh0GtkzFUyL8NQEiL15P0BgmlkgnY0gmlwhAjaF0yKbXVsdGlhZGRyc7g4ADY2MWJvb3QtMDEuYWMtY24taG9uZ2tvbmctYy5zaGFyZHMudGVzdC5zdGF0dXNpbS5uZXQGdl-Jc2VjcDI1NmsxoQM_sJtGT5gonA4UUzhn2d7LQY9ztY8loLAaSk1HKVruYIN0Y3CCdl-DdWRwgiMohXdha3UyDQ
3 - enr:-Ne4QHOpWLyVVZMzJwXcc00CNp16vB5x2WFy6WQAEKyaOf_UMWKvz2a0HN9QCoSyBYmudBKspqYa_U6tJ64B0TqLzy0BgmlkgnY0gmlwhAjarmyKbXVsdGlhZGRyc7g4ADY2MWJvb3QtMDIuYWMtY24taG9uZ2tvbmctYy5zaGFyZHMudGVzdC5zdGF0dXNpbS5uZXQGdl-Jc2VjcDI1NmsxoQNeQXcyqdYwEjflVdLKYAusuZJ93fpGiFwqK1jU9ISQC4N0Y3CCdl-DdWRwgiMohXdha3UyDQ
4 - enr:-M24QJDZfhB_wN_PHOAQuzgnta20xKUsZl5kdhBeQJM16gdldCJNAKQp6dgbwo-MTRJxYVNCr85cHRAJxtNLR4vTbP0BgmlkgnY0gmlwhKdjEy-KbXVsdGlhZGRyc68ALTYoYm9vdC0wMS5kby1hbXMzLnNoYXJkcy50ZXN0LnN0YXR1c2ltLm5ldAZ2X4lzZWNwMjU2azGhAt60bRUEoHNuLlnsM12sU2PIQwBwfLIJ8a_ZPEY2-Rnkg3RjcIJ2X4N1ZHCCIyiFd2FrdTIN
5 - enr:-M24QAsRRxoLDnnXFGnbHGUKjtqgXOVxb2Cian1vegc1rtY0Yk5wXDF7NeBzPl7frvyxo3Vt-xSL0vUa2jazchNIS_oBgmlkgnY0gmlwhLKAj_GKbXVsdGlhZGRyc68ALTYoYm9vdC0wMi5kby1hbXMzLnNoYXJkcy50ZXN0LnN0YXR1c2ltLm5ldAZ2X4lzZWNwMjU2azGhAtsXOrELG9R5LlIbF6bqeLC0tg7bmNzQ0JkSmEO3zxqzg3RjcIJ2X4N1ZHCCIyiFd2FrdTIN
6 - enr:-Ne4QINS7SZiUk9oN3mcLpOrdQrFWS-AUDjyq5F9__8iTUT_H8ExnAj5qDWmG4qbLaz4NKvDtmIU3Ycu9sP_Ixk6hn4BgmlkgnY0gmlwhCKHDVeKbXVsdGlhZGRyc7g4ADY2MWJvb3QtMDEuZ2MtdXMtY2VudHJhbDEtYS5zaGFyZHMudGVzdC5zdGF0dXNpbS5uZXQGdl-Jc2VjcDI1NmsxoQLGOqANDRbJFI6KVhTfYMDmT9c2UOKzebVV1eQr3EzqQ4N0Y3CCdl-DdWRwgiMohXdha3UyDQ

Your node:
enr:-.......................

Discovered peers:
===============================================================================
1 - NEW - enr:-Ne4QINS7SZiUk9oN3mcLpOrdQrFWS-AUDjyq5F9__8iTUT_H8ExnAj5qDWmG4qbLaz4NKvDtmIU3Ycu9sP_Ixk6hn4BgmlkgnY0gmlwhCKHDVeKbXVsdGlhZGRyc7g4ADY2MWJvb3QtMDEuZ2MtdXMtY2VudHJhbDEtYS5zaGFyZHMudGVzdC5zdGF0dXNpbS5uZXQGdl-Jc2VjcDI1NmsxoQLGOqANDRbJFI6KVhTfYMDmT9c2UOKzebVV1eQr3EzqQ4N0Y3CCdl-DdWRwgiMohXdha3UyDQ
peerID 16Uiu2HAm8mUZ18tBWPXDQsaF7PbCKYA35z7WB2xNZH2EVq1qS8LJ
multiaddr [/ip4/34.135.13.87/tcp/30303/p2p/16Uiu2HAm8mUZ18tBWPXDQsaF7PbCKYA35z7WB2xNZH2EVq1qS8LJ /dns4/boot-01.gc-us-central1-a.shards.test.statusim.net/tcp/30303/p2p/16Uiu2HAm8mUZ18tBWPXDQsaF7PbCKYA35z7WB2xNZH2EVq1qS8LJ]
ip 34.135.13.87:30303

2 - NEW - enr:-M24QJDZfhB_wN_PHOAQuzgnta20xKUsZl5kdhBeQJM16gdldCJNAKQp6dgbwo-MTRJxYVNCr85cHRAJxtNLR4vTbP0BgmlkgnY0gmlwhKdjEy-KbXVsdGlhZGRyc68ALTYoYm9vdC0wMS5kby1hbXMzLnNoYXJkcy50ZXN0LnN0YXR1c2ltLm5ldAZ2X4lzZWNwMjU2azGhAt60bRUEoHNuLlnsM12sU2PIQwBwfLIJ8a_ZPEY2-Rnkg3RjcIJ2X4N1ZHCCIyiFd2FrdTIN
peerID 16Uiu2HAmAR24Mbb6VuzoyUiGx42UenDkshENVDj4qnmmbabLvo31
multiaddr [/ip4/167.99.19.47/tcp/30303/p2p/16Uiu2HAmAR24Mbb6VuzoyUiGx42UenDkshENVDj4qnmmbabLvo31 /dns4/boot-01.do-ams3.shards.test.statusim.net/tcp/30303/p2p/16Uiu2HAmAR24Mbb6VuzoyUiGx42UenDkshENVDj4qnmmbabLvo31]
ip 167.99.19.47:30303

3 - NEW - enr:-Ne4QHOpWLyVVZMzJwXcc00CNp16vB5x2WFy6WQAEKyaOf_UMWKvz2a0HN9QCoSyBYmudBKspqYa_U6tJ64B0TqLzy0BgmlkgnY0gmlwhAjarmyKbXVsdGlhZGRyc7g4ADY2MWJvb3QtMDIuYWMtY24taG9uZ2tvbmctYy5zaGFyZHMudGVzdC5zdGF0dXNpbS5uZXQGdl-Jc2VjcDI1NmsxoQNeQXcyqdYwEjflVdLKYAusuZJ93fpGiFwqK1jU9ISQC4N0Y3CCdl-DdWRwgiMohXdha3UyDQ
peerID 16Uiu2HAmJzva9cFZdiLEeaXC4rLTZGH8DmrTetPfpmngrcaaNhUN
multiaddr [/ip4/8.218.174.108/tcp/30303/p2p/16Uiu2HAmJzva9cFZdiLEeaXC4rLTZGH8DmrTetPfpmngrcaaNhUN /dns4/boot-02.ac-cn-hongkong-c.shards.test.statusim.net/tcp/30303/p2p/16Uiu2HAmJzva9cFZdiLEeaXC4rLTZGH8DmrTetPfpmngrcaaNhUN]
ip 8.218.174.108:30303

4 - NEW - enr:-Ne4QJKpiQqwYpo0p1yDW6opKFYzh801nhSzX65S_x892UXABVYzFBrdFwCPiWwXlKqVz5sXkTzYtUuX1wg2sW5DZnwBgmlkgnY0gmlwhCIfDu-KbXVsdGlhZGRyc7g4ADY2MWJvb3QtMDIuZ2MtdXMtY2VudHJhbDEtYS5zaGFyZHMudGVzdC5zdGF0dXNpbS5uZXQGdl-Jc2VjcDI1NmsxoQJm8YcPIYhI5rvlLJJRlpebApk6w4uOLdFgAeHN2wO9N4N0Y3CCdl-DdWRwgiMohXdha3UyDQ
peerID 16Uiu2HAm2MXB1WzsGKnYrcX8GRSvunQ1riJmPzVZuvUphM1YE4pn
multiaddr [/ip4/34.31.14.239/tcp/30303/p2p/16Uiu2HAm2MXB1WzsGKnYrcX8GRSvunQ1riJmPzVZuvUphM1YE4pn /dns4/boot-02.gc-us-central1-a.shards.test.statusim.net/tcp/30303/p2p/16Uiu2HAm2MXB1WzsGKnYrcX8GRSvunQ1riJmPzVZuvUphM1YE4pn]
ip 34.31.14.239:30303

5 - NEW - enr:-M24QAsRRxoLDnnXFGnbHGUKjtqgXOVxb2Cian1vegc1rtY0Yk5wXDF7NeBzPl7frvyxo3Vt-xSL0vUa2jazchNIS_oBgmlkgnY0gmlwhLKAj_GKbXVsdGlhZGRyc68ALTYoYm9vdC0wMi5kby1hbXMzLnNoYXJkcy50ZXN0LnN0YXR1c2ltLm5ldAZ2X4lzZWNwMjU2azGhAtsXOrELG9R5LlIbF6bqeLC0tg7bmNzQ0JkSmEO3zxqzg3RjcIJ2X4N1ZHCCIyiFd2FrdTIN
peerID 16Uiu2HAmAAuoviraBqSBcR5eC346RK46SruiPKdFQBvWrFjXEkLr
multiaddr [/ip4/178.128.143.241/tcp/30303/p2p/16Uiu2HAmAAuoviraBqSBcR5eC346RK46SruiPKdFQBvWrFjXEkLr /dns4/boot-02.do-ams3.shards.test.statusim.net/tcp/30303/p2p/16Uiu2HAmAAuoviraBqSBcR5eC346RK46SruiPKdFQBvWrFjXEkLr]
ip 178.128.143.241:30303

6 - NEW - enr:-Ne4QIvHiMe1Gf7h22jygL1kPFVAcQ0RkDYNk1PNA52KUKElBSPuPy-HSD1pRX-rCx2A2Qqh0GtkzFUyL8NQEiL15P0BgmlkgnY0gmlwhAjaF0yKbXVsdGlhZGRyc7g4ADY2MWJvb3QtMDEuYWMtY24taG9uZ2tvbmctYy5zaGFyZHMudGVzdC5zdGF0dXNpbS5uZXQGdl-Jc2VjcDI1NmsxoQM_sJtGT5gonA4UUzhn2d7LQY9ztY8loLAaSk1HKVruYIN0Y3CCdl-DdWRwgiMohXdha3UyDQ
peerID 16Uiu2HAmGwcE8v7gmJNEWFtZtojYpPMTHy2jBLL6xRk33qgDxFWX
multiaddr [/ip4/8.218.23.76/tcp/30303/p2p/16Uiu2HAmGwcE8v7gmJNEWFtZtojYpPMTHy2jBLL6xRk33qgDxFWX /dns4/boot-01.ac-cn-hongkong-c.shards.test.statusim.net/tcp/30303/p2p/16Uiu2HAmGwcE8v7gmJNEWFtZtojYpPMTHy2jBLL6xRk33qgDxFWX]
ip 8.218.23.76:30303

@richard-ramos
Copy link
Member Author

Doing more experiments with nwaku, if I remove the following lines of code

if shardPredOp.isSome():
bootstrapRecords.keepIf(shardPredOp.get())
then even tho the nodes get added as bootnodes, this part of the code still filters them out, due to the ENR info being outdated wrt shards https://github.com/waku-org/nwaku/blob/master/waku/waku_discv5.nim#L248-L251

@SionoiS
Copy link
Contributor

SionoiS commented Oct 27, 2023

Looks like a Status problem not really Waku. Bootnodes without common shards should be filtered out. Same with nodes found through discv5.

  • Could Status use double indirection in this case? Name -> DNS Disco. url -> ENRs
  • Build special purpose nodes that are only for bootstrapping and have them support all shards.
  • The Web3 way would be to use the Ethereum Name Service for bootnodes.

@jm-clius
Copy link
Contributor

I don't see this as priority for nwaku, as this is mostly relevant to Community clients relying on bootstrap nodes with ever-changing shard subscriptions - the go-waku fix will suffice for Status Communites. @chair28980 should we remove the epic label?

@chair28980 chair28980 removed the E:Targeted Status Communities dogfooding See https://github.com/waku-org/pm/issues/97 for details label Nov 11, 2023
@richard-ramos
Copy link
Member Author

While @chaitanyaprem and I were dogfooding the shards.test fleet, we noticed that only see a single store node being discovered, which was a strange behavior because this fleet has 6 store nodes according to https://fleets.status.im/ .

After doing some investigation on why this is happening, I found out that the store nodes are actually not discovered at all. In status-go we have something called the mailserver cycle, that automatically chooses a store node based on ping reply time, to connect to it and retrieve message history.

The reason why the store nodes are not being discovered via discV5 is because the dns discovery URL: https://github.com/status-im/infra-shards/blob/710444384b18f78e94eef62d8ac91b1322f6d333/ansible/group_vars/store.yml#L45C107-L46C25

After viewing the information of the nodes returned by this DNS discovery URL using https://github.com/waku-org/test-discv5, I saw that they enrs do not contain the shards we're interested into (for reasons explained in this issue), and while we fixed this in go-waku: https://github.com/waku-org/go-waku/blob/d7249fc123d3a27e3eb60b85d58d4d7a51df64a1/waku/v2/discv5/discover.go#L403-L405 and we can discover other go-waku peers, in nwaku the fix is not present (

let predicate = proc(record: waku_enr.Record): bool =
record.getCapabilities().len > 0 and #RFC 31 requirement
nodeShard.shardIds.anyIt(record.containsShard(nodeShard.clusterId, it)) #RFC 64 guideline
).

I think this means that the store nodes and boot nodes are not connected to each other. I'm thinking we should add this to nwaku: https://github.com/waku-org/go-waku/blob/d7249fc123d3a27e3eb60b85d58d4d7a51df64a1/waku/v2/discv5/discover.go#L403-L405 to avoid this situation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants