bug/regression: relayed messages reach recently started peer with a big delay (~60 seconds) #2388

fbarbu15 · 2024-02-01T10:38:43Z

Problem

This is probably only noticeable for automated tests where we start the node before each test and stop it at the end.
But the impact is big, the suite execution duration nearly tripled:

Also this might uncover a bigger issue

To reproduce

Start 2 relay nodes connected to the same topic
Immediately after they started, publish message from node1 (with POST /relay/v1/messages)
Check that the relayed messages reach both nodes (with GET /relay/v1/messages)

Actual behavior

Node1 sees the messages immediately, however for node2 it takes about 60 seconds until it can see the published messages by the node1

nwaku version/commit hash

This can be see with harbor.status.im/wakuorg/nwaku:latest but doesn't reproduce with harbor.status.im/wakuorg/nwaku:v0.24.0 (where it takes around 15 seconds) nor with harbor.status.im/wakuorg/go-waku:latest (where it takes just 1 second)
Added docker logs with this versions for comparison purposes

go-waku_latest_1_second.zip
nwaku_latest_60_seconds.zip
nwaku_v0.24.0_12_seconds.zip

Docker start flags

node1: ['--listen-address=0.0.0.0', '--rest=true', '--rest-admin=true', '--websocket-support=true', '--log-level=TRACE', '--rest-relay-cache-capacity=100', '--websocket-port=14657', '--rest-port=14655', '--tcp-port=14656', '--discv5-udp-port=14658', '--rest-address=0.0.0.0', '--nat=extip:172.18.94.230', '--peer-exchange=true', '--discv5-discovery=true', '--cluster-id=0', '--metrics-server=true', '--metrics-server-address=0.0.0.0', '--metrics-server-port=14659', '--metrics-logging=true', '--relay=true', '--nodekey=30348dd51465150e04a5d9d932c72864c8967f806cce60b5d26afeca1e77eb68']
node2: ['--listen-address=0.0.0.0', '--rest=true', '--rest-admin=true', '--websocket-support=true', '--log-level=TRACE', '--rest-relay-cache-capacity=100', '--websocket-port=16322', '--rest-port=16320', '--tcp-port=16321', '--discv5-udp-port=16323', '--rest-address=0.0.0.0', '--nat=extip:172.18.194.133', '--peer-exchange=true', '--discv5-discovery=true', '--cluster-id=0', '--metrics-server=true', '--metrics-server-address=0.0.0.0', '--metrics-server-port=16324', '--metrics-logging=true', '--relay=true', '--discv5-bootstrap-node=enr:-Kq4QNtbZhpyehoDXnigU0Si_Hr1g-dVvNx-AnQ-UvdoygcbBJxRhNIwLldG_8g2cOEQrpXdc_fwJh_HEyXbOgcVlKkBgmlkgnY0gmlwhKwSXuaKbXVsdGlhZGRyc4wACgSsEl7mBjlB3QOJc2VjcDI1NmsxoQM3Tqpf5eFn4Jztm4gB0Y0JVSJyxyZsW8QR-QU5DZb-PYN0Y3CCOUCDdWRwgjlChXdha3UyAQ']

The text was updated successfully, but these errors were encountered:

gabrielmer · 2024-02-13T16:04:13Z

I really suspect it's related to this #2332 (comment)

fbarbu15 · 2024-02-13T16:22:57Z

I really suspect it's related to this #2332 (comment)

Yes, the timing is right, it started to reproduce the day that PR was merged. Thanks @gabrielmer

gabrielmer · 2024-02-14T07:53:59Z

@SionoiS what do you think we should do?

SionoiS · 2024-02-14T12:51:05Z

The issue is that the connectivity check interval was changed from 15s to 60s.

By adding a peer to the manager, you have to wait 60s for the next check to connect to this new peer.

To speed the tests, add the peers to the peer store before starting the node or force the connection manually instead of waiting on the peer manager. That would makes tests even faster than the previous 15s wait.

fbarbu15 · 2024-02-14T12:53:22Z

@SionoiS thanks for the explanation. How do we force the connection manually?

SionoiS · 2024-02-14T12:57:24Z

Call either

nwaku/waku/node/peer_manager/peer_manager.nim

Line 660 in d00065e

proc connectToRelayPeers*(pm: PeerManager) {.async.} =

Or

nwaku/waku/node/peer_manager/peer_manager.nim

Line 679 in d00065e

proc manageRelayPeers*(pm: PeerManager) {.async.} =

for the new shard aware peer management.

fbarbu15 · 2024-02-14T13:03:11Z

the tests are using nwaku as a docker container, but I could use this one https://waku-org.github.io/waku-rest-api/#post-/admin/v1/peers . It's the same result ?

SionoiS · 2024-02-14T13:08:19Z

the tests are using nwaku as a docker container, but I could use this one https://waku-org.github.io/waku-rest-api/#post-/admin/v1/peers . It's the same result ?

Ah sorry I misunderstood the context. In that case, yes that end point would connect to the peer directly, bypassing the peer management.

fbarbu15 · 2024-02-14T13:08:49Z

Great, thanks for the workaround. Closing the issue

fbarbu15 added the bug Something isn't working label Feb 1, 2024

romanzac mentioned this issue Feb 14, 2024

bug: message not delivered during interop test #2369

Open

fbarbu15 closed this as completed Feb 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug/regression: relayed messages reach recently started peer with a big delay (~60 seconds) #2388

bug/regression: relayed messages reach recently started peer with a big delay (~60 seconds) #2388

fbarbu15 commented Feb 1, 2024

gabrielmer commented Feb 13, 2024

fbarbu15 commented Feb 13, 2024

gabrielmer commented Feb 14, 2024

SionoiS commented Feb 14, 2024 •

edited

fbarbu15 commented Feb 14, 2024

SionoiS commented Feb 14, 2024

fbarbu15 commented Feb 14, 2024

SionoiS commented Feb 14, 2024

fbarbu15 commented Feb 14, 2024

bug/regression: relayed messages reach recently started peer with a big delay (~60 seconds) #2388

bug/regression: relayed messages reach recently started peer with a big delay (~60 seconds) #2388

Comments

fbarbu15 commented Feb 1, 2024

Problem

To reproduce

Actual behavior

nwaku version/commit hash

Docker start flags

gabrielmer commented Feb 13, 2024

fbarbu15 commented Feb 13, 2024

gabrielmer commented Feb 14, 2024

SionoiS commented Feb 14, 2024 • edited

fbarbu15 commented Feb 14, 2024

SionoiS commented Feb 14, 2024

fbarbu15 commented Feb 14, 2024

SionoiS commented Feb 14, 2024

fbarbu15 commented Feb 14, 2024

SionoiS commented Feb 14, 2024 •

edited