-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Message propagation times with waku-rln #42
Comments
This analysis is awesome, thanks @alrevuelta!! |
Shadows simulations behave as if the nodes had "infinitely fast CPU", in other words, time only passes in the simulations when the nodes are doing network operations (or sleeping) It's the benefit and drawback of Shadow: it isn't CPU bottlenecked, so you can simulate as many nodes as your RAM allows, but you it can't take into account the computation induced delays
You can also use swap, though that will of course slow the "real time" duration of the simulation |
More info here: shadow/shadow#1675 Seems there is actually a flag to enable time passing when doing CPU computations, but would need more investigation |
Ow I see thanks. This has indeed a great impact on the results. Will check that flag.
Can't find it in the docs. Anything to be configured, or it will just use my available swap? Thanks! |
Weekly Update
|
Rerun the simulations, taking rln CPU time into account (simulated). Report was updated.
|
Weekly Update
|
tldr: We present the results of 1000
nwaku
nodes runningrln
using different message sizes, in a real network with banwidth limitations and network delays. The goal is to study the message propagation delay distribution, and how it's affected by i) rln and ii) message size in a real environment. We observe that for messages of10kB
the average end-to-end propagation delay is508 ms
. We can also observe that the message propagation delays are severely affected when increasing the message size, which indicates that it is not a good idea to use waku for messages of eg.500kB
. See simulation parameters.Introduction
Waku uses relay as a routing protocol, which is an adaptation of gossipsub. It routes messages following a publisher/subscriber architecture, where nodes can publish messages or subscribe to topics. If message
m
is published to topict
, alli
nodesn_1...n_i
subscribed tot
will getm
. Therelay
protocol ensures that every node gets the messages of the topics it is subscribed to.However, since
relay
works in a decentralized manner, all nodes contribute to the gossiping of a message, until it has successfully reached all the interested nodes (subscribed to it). This means that a message can travel multiple hops until it reaches all nodes. The amount of hops determines the message propagation time, which is measured as the time difference of when the node published the message and when another node received.This issue aims to go from theory to practice, by i) understanding message propagation times in theory and ii) presenting nwaku simulation results in an end-to-end setup with rln, with real message propagation times.
Theory
Let's start with message propagation times in theory. On a high level, it depends on:
D
one of the most important parameters. This sets the hops that a message will travel to reach all nodes. HigherD
, less hops, less delay. Note that a higherD
implies more bandwidth consumption.In a D-regular graph, like the one formed by waku nodes around a topic, the maximum amount of hops that a message can travel to reach all nodes can be calculated as
ceil(log(total_nodes)/log(D))
. For example, with log(1000)/log(6) = 3.85 = 4. So in a network with 1000 nodes andD=6
, no matter which node publishes the message, in 4 hops it will reach all the nodes.Notice the "worst case" since some nodes might be directly connected to the publisher, so they will get the message in just 1 hop.
But how long does it take to jump each hop? It depends on:
Assuming a message
m
that travels 4 hops from noden1
(publisher) ton5
(subscriber) we can calculate the message propagation timempt=ipt_1+ipt_2+ipt_3+ipt_4
whereipt
is the individual propagation time between each node in the chain.However, specific message propagation times are useless, we need average times under specific conditions. And for this, we need simulations.
Simulations
Using shadow simulator, we have developed a tool that allows to simulate message propagation delays of
nwaku
(using a slightly modified branch, mainly to instrument it with tools to measure the times + starting from an already connected mesh. Thanks @Menduist for the help. Note that running this simulation requires a significant amount of resources, done with 256 GB of RAM.The configuration of the simulation is (see config):
latency=100ms
. Average latency in our current waku network. Thanks @vpavlin for the measurements. See this for live data.down_bandwidth=83Mbps
,up_bandwidth=38Mbps
. As shown in Table 2 that's the worldwide median speed.D=6
, which is the currentnwaku
configuration.nodes=1000
. Amount of nodes used in the simulationnwaku
was used with a minor modification10
messages were published, that led to9990
received messages.shadow
doesn't take into account CPU times (by now), we simulate it withsleepAsync
as per RLN Key Benchmarks #23 findings.0.012 seconds
for proof verification and0.15 seconds
for proof generation.Results
The following figure shows the message propagation time with real simulations, showing the distribution in a network with the above configuration with three different message sizes:
10kB
,100kB
,500kB
. Note that the whiskers indicate the best/worst values and the box contains P25 to P75 values. Averagemu
and P95 are also shown. Raw data here.Important note. The first messages sent in the simulations are omitted, since they show an abnormal propagation delay that doesn't reflect reality. This is due to how flow control works in TCP, where right after connection, the sender node has no idea of the "bandwidth" of the receiver node, so it will start sending packages at a lower rate. This translates into high transmission times, and it's more pronounced when dealing with big message sizes.
In other words, in a 100Mpbs link, 100Mbits won't be sent in 1 second, or at least not a the beginning, when the node is slowly increasing the rate until based on ACK/NACK ratio. For more information about this, this is explained in here.
Conclusions:
10kB
the average propagation delay is508 ms
, quite reasonable for applications using waku. The variance is acceptable, with 95% of the messages arriving in<627 ms
.10kB
we can see that the best case propagation delay is263 ms
. This corresponds to the nodes that are just 1 hop from the publisher. The proof generation time0.15 seconds
affects the most, where the rest is the inter-node latency and the transmission of the message itself.100kB
and500kB
. So its probably not a good idea to use waku for such big messages. Note that these simulations had 1000 nodes, so if we scale it to 10000 or beyond, propagation times would be worse.Future work:
D
values (average of 6 ranging from 4 to 12) have a huge impact on the bandwidth that a node consumes. Are we willing to lower D in order to reduce bandwidth but increase message propagaton times?shadow
doesn't take CPU time into account, it's currently simulated for rln, which should be the biggest bottleneck. Onceshadow
has this feature times would be more accurate.The text was updated successfully, but these errors were encountered: