-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: discv5 not returning enough peers #2810
Comments
Thanks for submitting this @chaitanyaprem ! Answering some points:
Taking the following setup as a reference: Status-Desktop - 2.28.1 ==> status-go - 0.177.0 ==> go-waku commit dd81e1d469716328f05c05f0526de2adbcd9a4ea On the other hand, ...which can lead into a wrong result in case of mismatch: https://github.com/status-im/nim-eth/blob/d66a29db7ca4372dba116928f979e92cb7f7661f/eth/p2p/discoveryv5/encoding.nim#L350 So yes, it seems that might cause an issue
We don't handle any cache in discv5, AFAIK. ( cc @gabrielmer @richard-ramos ) |
nwaku in different nim.cfg files overrides that value to |
So, this means atleast the discv5 protocol-id is matched in both go and nim implementations of waku. |
Looking at https://github.com/status-im/nim-eth/blob/d66a29db7ca4372dba116928f979e92cb7f7661f/eth/p2p/discoveryv5/encoding.nim#L34-L35 and the whole file in general, it does seem that I'm not finding another place where it allows to set a value by passing a parameter. Reading the file and how all the logic is built by using |
From using https://github.com/waku-org/test-discv5 I can see that I found a lot of TWN nodes, or nodes with no
|
@gabrielmer I was wondering if we can add a admin REST API to query and print the discv5 and px cache. Also, wondering if there can be some discv5 metrics added that would be helpful such as new peers discovered (which can be plotted over time) and peers that are reachable etc. |
Yes! I think we can add an admin endpoint with any info that might help. For peer exchange, I understand that the cache you're referring to is the following:
For discv5 however I'm not sure what you're referring to. We don't keep any internal cache, we just get random peers from discv5 and add them to the peer manager. We can either return the peers we have saved which came from discv5 as a source, or get into discv5's inner workings such as its routing table (or some cache you might be referring to) and return them. Could you please point more specifically which data structure you're referring to as the discv5 cache?
Yes! If you can please specify the metrics you find useful and their exact definitions and we can open an issue and get into it asap :)) |
discv5 uses a DHT internally, so there must be a local cache of nodes discovered so far. Maybe you can check discv5 code to see if there is such a cache. |
DST team indicating few items that may be useful, posting link to discord. |
Mmm I might be mistaken but I think what you might be referring to is the routing table |
In a nwaku pm session, we commented that @SionoiS could help us to understand better how Discv5 should work and whether we are using it properly or not |
Couple of points I'd like to make;
Status fleet use cluster 16 and what is this ratio to total number of peers in the DHT? Total network size is around ~1k nodes AFAIK. This problem is exactly why Waku use a separate discovery network than Ethereum. Discv5 is not built to find specific peers. Further more, the random walk is not even totally random I suspect that some "paths" are more likely than other which result in the same peer being returned even more often. I'm afraid none of those problems are bugs. |
Would it make sense for the status fleet to use a different protocol ID than |
No need, just don't connect to outside nodes. It may happen that nodes connect to outside and "pollute" the status fleet discovery network with other nodes, in that case yes another protocol Id would prevent that. |
Hm I think this is already the case. We did see some nwaku nodes from other fleets (like waku.test) |
I investigated a bit here are my findings;
Strategies we could try by taking full control but without modifying discv5 in the case of nwaku.
|
@chaitanyaprem - in a nwaku pm session we are commenting to close it because we consider that the discv5 is working correctly. @SionoiS analyzed discv5 and didn't see issues there. |
Sure,Thanks. |
Problem
0
which indicates they are not part of status cluster or TWN.Impact
This doesn't give enough chance in status desktop to form a healthy mesh as most of the instances are only connected to fleet nodes. Peer-count never goes beyond 15-20 which i am thinking are all fleet peers.
More discussion regarding this https://discord.com/channels/1110799176264056863/1239858809359306762/1250836260109422652
@richard-ramos wrote a discv5 test tool that constantly queries fleet nodes and prints number of unique peers reported and their ENR along with
rs
andrsv
value.To reproduce
Run the tool https://github.com/waku-org/test-discv5
docker build -t discv5:latest .
docker run --rm discv5:latest --bootnodes="enr:-QEKuECA0zhRJej2eaOoOPddNcYr7-5NdRwuoLCe2EE4wfEYkAZhFotg6Kkr8K15pMAGyUyt0smHkZCjLeld0BUzogNtAYJpZIJ2NIJpcISnYxMvim11bHRpYWRkcnO4WgAqNiVib290LTAxLmRvLWFtczMuc2hhcmRzLnRlc3Quc3RhdHVzLmltBnZfACw2JWJvb3QtMDEuZG8tYW1zMy5zaGFyZHMudGVzdC5zdGF0dXMuaW0GAbveA4Jyc40AEAUAAQAgAEAAgAEAiXNlY3AyNTZrMaEC3rRtFQSgc24uWewzXaxTY8hDAHB8sgnxr9k8Rjb5GeSDdGNwgnZfg3VkcIIjKIV3YWt1Mg0,enr:-QEcuEAX6Qk-vVAoJLxR4A_4UVogGhvQrqKW4DFKlf8MA1PmCjgowL-LBtSC9BLjXbb8gf42FdDHGtSjEvvWKD10erxqAYJpZIJ2NIJpcIQI2hdMim11bHRpYWRkcnO4bAAzNi5ib290LTAxLmFjLWNuLWhvbmdrb25nLWMuc2hhcmRzLnRlc3Quc3RhdHVzLmltBnZfADU2LmJvb3QtMDEuYWMtY24taG9uZ2tvbmctYy5zaGFyZHMudGVzdC5zdGF0dXMuaW0GAbveA4Jyc40AEAUAAQAgAEAAgAEAiXNlY3AyNTZrMaEDP7CbRk-YKJwOFFM4Z9ney0GPc7WPJaCwGkpNRyla7mCDdGNwgnZfg3VkcIIjKIV3YWt1Mg0,enr:-QEcuEAgXDqrYd_TrpUWtn3zmxZ9XPm7O3GS6lV7aMJJOTsbOAAeQwSd_eoHcCXqVzTUtwTyB4855qtbd8DARnExyqHPAYJpZIJ2NIJpcIQihw1Xim11bHRpYWRkcnO4bAAzNi5ib290LTAxLmdjLXVzLWNlbnRyYWwxLWEuc2hhcmRzLnRlc3Quc3RhdHVzLmltBnZfADU2LmJvb3QtMDEuZ2MtdXMtY2VudHJhbDEtYS5zaGFyZHMudGVzdC5zdGF0dXMuaW0GAbveA4Jyc40AEAUAAQAgAEAAgAEAiXNlY3AyNTZrMaECxjqgDQ0WyRSOilYU32DA5k_XNlDis3m1VdXkK9xM6kODdGNwgnZfg3VkcIIjKIV3YWt1Mg0" > output.txt
Expected behavior
Peers stored should not be of different cluster. Also it is taking a lot of time for peers to be returned.
I ran the above tool for 15 minutes i just had 20 peers.
Whereas when i ran peerExchangeClient against same fleet nodes, i see close to 80 unique peers returned in 2-3 minutes.
nwaku version/commit hash
v0.28.0-2-ga96a6b94
Additional things to verify/confirm in nwaku
The text was updated successfully, but these errors were encountered: