Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: wakunode2 systemd unit restarts about 10-15x per day #2173

Open
calinah opened this issue Oct 31, 2023 · 6 comments
Open

bug: wakunode2 systemd unit restarts about 10-15x per day #2173

calinah opened this issue Oct 31, 2023 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@calinah
Copy link

calinah commented Oct 31, 2023

Problem

We are seeing a lot of restarts for the systemd unit running wakunode2 with version 0.20.0

To reproduce

Our systemd unit definition:

[Unit]
Description=Waku2 Node
After=network.target
StartLimitIntervalSec=60
StartLimitBurst=3

[Service]
Type=simple
Restart=on-failure
RestartSec=5
TimeoutSec=900
User=root
Nice=0
LimitNOFILE=200000
WorkingDirectory=/root/nwaku
ExecStart=/root/nwaku/build/wakunode2 \
	--relay=true \
    --topic=/waku/2/graphcast-v0-testnet/proto \
    --store=false \
    --tcp-port=31900 \
    --nat=extip:$IP \
    --nodekey=$NODE_KEY \
    --rest=true \
    --rest-address=0.0.0.0 \
    --dns-discovery=true \
    --dns-discovery-url=enrtree://$KEY@testnet.bootnodes.graphcast.xyz  \
    --rpc=true \
    --rpc-address=0.0.0.0 
    --rpc-admin=true \
    --metrics-server-address=0.0.0.0 \
    --metrics-server=true \
    --pubsub-topic='/waku/2/graphcast-v0-testnet/proto'     --relay-peer-exchange=true \
    --discv5-discovery=true \
    --discv5-bootstrap-node=enr:$ENR \
    --filter=true \
    --lightpush=true \
    --log-level=DEBUG \
    --websocket-secure-support=true \
    --websocket-secure-key-path=/etc/letsencrypt/live/waku-0.testnet.bootnodes.graphcast.xyz/privkey.pem \
    --websocket-secure-cert-path=/etc/letsencrypt/live/waku-0.testnet.bootnodes.graphcast.xyz/fullchain.pem

KillSignal=SIGHUP

[Install]
WantedBy=multi-user.target

Expected behavior

A clear and concise description of what you expected to happen.

Screenshots/logs

wakunode2 runs for a while, is connected to peers and eventually starts failing with the following:

Oct 31 09:11:01 nwaku-fsn2 wakunode2[606926]: DBG 2023-10-31 09:11:01.320+00:00 push request                               topics="waku lightpush" tid=606926 file=protocol.nim:57 peerId=16U*rtebTB requestId=27157f48d7b4cb7406a>
Oct 31 09:11:01 nwaku-fsn2 wakunode2[606926]: DBG 2023-10-31 09:11:01.393+00:00 push request                               topics="waku lightpush" tid=606926 file=protocol.nim:57 peerId=16U*rtebTB requestId=5b0c5b95afa6ce73fc8>
Oct 31 09:11:01 nwaku-fsn2 wakunode2[606926]: DBG 2023-10-31 09:11:01.435+00:00 Unexpected bytes while waiting for EOF     topics="libp2p lpstream" tid=606926 file=lpstream.nim:351 s=6540c4a42cfa009143ec38c3
Oct 31 09:11:01 nwaku-fsn2 wakunode2[606926]: DBG 2023-10-31 09:11:01.733+00:00 Unexpected bytes while waiting for EOF     topics="libp2p lpstream" tid=606926 file=lpstream.nim:351 s=6540c4a52cfa009143ec38c9
Oct 31 09:11:01 nwaku-fsn2 wakunode2[606926]: DBG 2023-10-31 09:11:01.781+00:00 push request                               topics="waku lightpush" tid=606926 file=protocol.nim:57 peerId=16U*rtebTB requestId=fd9442c76b2aad49dc8>
Oct 31 09:11:01 nwaku-fsn2 wakunode2[606926]: DBG 2023-10-31 09:11:01.783+00:00 Unexpected bytes while waiting for EOF     topics="libp2p lpstream" tid=606926 file=lpstream.nim:351 s=6540c4a52cfa009143ec38ca
Oct 31 09:11:02 nwaku-fsn2 wakunode2[606926]: DBG 2023-10-31 09:11:02.242+00:00 Unexpected bytes while waiting for EOF     topics="libp2p lpstream" tid=606926 file=lpstream.nim:351 s=6540c4a62cfa009143ec38cc
Oct 31 09:11:02 nwaku-fsn2 wakunode2[606926]: DBG 2023-10-31 09:11:02.326+00:00 push request                               topics="waku lightpush" tid=606926 file=protocol.nim:57 peerId=16U*rtebTB requestId=82a3bd58904935c53cf>
Oct 31 09:11:02 nwaku-fsn2 wakunode2[606926]: DBG 2023-10-31 09:11:02.505+00:00 push request                               topics="waku lightpush" tid=606926 file=protocol.nim:57 peerId=16U*rtebTB requestId=a489fc067e586a8f4ac>
Oct 31 09:11:02 nwaku-fsn2 wakunode2[606926]: DBG 2023-10-31 09:11:02.509+00:00 AsyncStream Error                          topics="libp2p wstransport" tid=606926 file=wstransport.nim:276 exc="Incomplete data sent or received"
Oct 31 09:11:02 nwaku-fsn2 wakunode2[606926]: DBG 2023-10-31 09:11:02.509+00:00 Unable to get a connection                 topics="libp2p switch" tid=606926 file=switch.nim:262
Oct 31 09:11:02 nwaku-fsn2 wakunode2[606926]: DBG 2023-10-31 09:11:02.509+00:00 About to accept incoming connection        topics="libp2p switch" tid=606926 file=switch.nim:245
Oct 31 09:11:02 nwaku-fsn2 wakunode2[606926]: DBG 2023-10-31 09:11:02.509+00:00 unsubscribing pubsub peer                  topics="libp2p pubsub" tid=606926 file=pubsub.nim:135 peerId=16U*GbKJEL
Oct 31 09:11:02 nwaku-fsn2 wakunode2[606926]: DBG 2023-10-31 09:11:02.510+00:00 Failed to handle connection                topics="libp2p tcptransport" tid=606926 file=tcptransport.nim:241 exc="(107) Transport endpoint is not >
Oct 31 09:11:02 nwaku-fsn2 wakunode2[606926]: DBG 2023-10-31 09:11:02.510+00:00 Unable to get a connection                 topics="libp2p switch" tid=606926 file=switch.nim:262
Oct 31 09:11:02 nwaku-fsn2 wakunode2[606926]: DBG 2023-10-31 09:11:02.510+00:00 About to accept incoming connection        topics="libp2p switch" tid=606926 file=switch.nim:245
Oct 31 09:11:02 nwaku-fsn2 wakunode2[606926]: DBG 2023-10-31 09:11:02.513+00:00 exiting pubsub read loop                   topics="libp2p pubsubpeer" tid=606926 file=pubsubpeer.nim:166 conn=16U*GbKJEL:6540c3402cfa009143ec2f7a >
Oct 31 09:11:02 nwaku-fsn2 wakunode2[606926]: DBG 2023-10-31 09:11:02.513+00:00 exiting pubsub read loop                   topics="libp2p pubsubpeer" tid=606926 file=pubsubpeer.nim:166 conn=16U*GbKJEL:6540c3402cfa009143ec2f78 >
Oct 31 09:11:02 nwaku-fsn2 wakunode2[606926]: DBG 2023-10-31 09:11:02.514+00:00 Could not establish send connection        topics="libp2p pubsubpeer" tid=606926 file=pubsubpeer.nim:221 msg="Couldn't get muxed stream"
Oct 31 09:11:02 nwaku-fsn2 wakunode2[606926]: DBG 2023-10-31 09:11:02.514+00:00 AsyncStream Error                          topics="libp2p wstransport" tid=606926 file=wstransport.nim:276 exc="Incomplete data sent or received"
Oct 31 09:11:02 nwaku-fsn2 wakunode2[606926]: DBG 2023-10-31 09:11:02.514+00:00 Unable to get a connection
.
.
.
Oct 31 09:12:12 nwaku-fsn2 wakunode2[606926]: INF 2023-10-31 09:12:12.109+00:00 added filter subscritpiton                 topics="waku filter" tid=606926 file=protocol.nim:80 peerId=16U*m9VE7w pubsubTopic=/waku/2/graphcast-v0>
Oct 31 09:12:12 nwaku-fsn2 wakunode2[606926]: DBG 2023-10-31 09:12:12.192+00:00 push request                               topics="waku lightpush" tid=606926 file=protocol.nim:57 peerId=16U*shdmWU requestId=cc4446ea8c686f27b6e>
Oct 31 09:12:12 nwaku-fsn2 wakunode2[606926]: DBG 2023-10-31 09:12:12.192+00:00 No peers for topic, skipping publish       topics="libp2p gossipsub" tid=606926 file=gossipsub.nim:538 peersOnTopic=0 connectedPeers=0 topic=/waku>
Oct 31 09:12:12 nwaku-fsn2 wakunode2[606926]: DBG 2023-10-31 09:12:12.216+00:00 Unexpected bytes while waiting for EOF     topics="libp2p lpstream" tid=606926 file=lpstream.nim:351 s=6540c4ec2cfa009143ec3b69
Oct 31 09:12:12 nwaku-fsn2 wakunode2[606926]: DBG 2023-10-31 09:12:12.290+00:00 push request                               topics="waku lightpush" tid=606926 file=protocol.nim:57 peerId=16U*shdmWU requestId=6e5e60cd19edc93924a>
Oct 31 09:12:12 nwaku-fsn2 wakunode2[606926]: DBG 2023-10-31 09:12:12.290+00:00 No peers for topic, skipping publish       topics="libp2p gossipsub" tid=606926 file=gossipsub.nim:538 peersOnTopic=0 connectedPeers=0 topic=/waku>
Oct 31 09:12:12 nwaku-fsn2 wakunode2[606926]: DBG 2023-10-31 09:12:12.438+00:00 push request                               topics="waku lightpush" tid=606926 file=protocol.nim:57 peerId=16U*shdmWU requestId=c7071688d5ba88f4783>
Oct 31 09:12:12 nwaku-fsn2 wakunode2[606926]: DBG 2023-10-31 09:12:12.438+00:00 No peers for topic, skipping publish       topics="libp2p gossipsub" tid=606926 file=gossipsub.nim:538 peersOnTopic=0 connectedPeers=0 topic=/waku>
Oct 31 09:12:12 nwaku-fsn2 wakunode2[606926]: DBG 2023-10-31 09:12:12.449+00:00 push request                               topics="waku lightpush" tid=606926 file=protocol.nim:57 peerId=16U*shdmWU requestId=c77a32a9ca931c2dfac>
Oct 31 09:12:12 nwaku-fsn2 wakunode2[606926]: DBG 2023-10-31 09:12:12.449+00:00 No peers for topic, skipping publish       topics="libp2p gossipsub" tid=606926 file=gossipsub.nim:538 peersOnTopic=0 connectedPeers=0 topic=/waku>
Oct 31 09:12:12 nwaku-fsn2 wakunode2[606926]: DBG 2023-10-31 09:12:12.546+00:00 push request                               topics="waku lightpush" tid=606926 file=protocol.nim:57 peerId=16U*shdmWU requestId=1f775d5aac4b8fa2133>
Oct 31 09:12:12 nwaku-fsn2 wakunode2[606926]: DBG 2023-10-31 09:12:12.546+00:00 No peers for topic, skipping publish       topics="libp2p gossipsub" tid=606926 file=gossipsub.nim:538 peersOnTopic=0 connectedPeers=0 topic=/waku>
Oct 31 09:12:12 nwaku-fsn2 wakunode2[606926]: /root/nwaku/apps/wakunode2/wakunode2.nim(144) wakunode2
Oct 31 09:12:12 nwaku-fsn2 wakunode2[606926]: /root/nwaku/vendor/nim-chronos/chronos/asyncloop.nim(1245) runForever
Oct 31 09:12:12 nwaku-fsn2 wakunode2[606926]: /root/nwaku/vendor/nim-chronos/chronos/asyncloop.nim(269) poll
Oct 31 09:12:12 nwaku-fsn2 wakunode2[606926]: /root/nwaku/vendor/nim-chronos/chronos/asyncfutures2.nim(373) futureContinue
Oct 31 09:12:12 nwaku-fsn2 wakunode2[606926]: /root/nwaku/vendor/nimbus-build-system/vendor/Nim/lib/system/iterators.nim(240) handleMessage
Oct 31 09:12:12 nwaku-fsn2 wakunode2[606926]: /root/nwaku/vendor/nimbus-build-system/vendor/Nim/lib/system/assertions.nim(38) failedAssertImpl
Oct 31 09:12:12 nwaku-fsn2 wakunode2[606926]: /root/nwaku/vendor/nimbus-build-system/vendor/Nim/lib/system/assertions.nim(28) raiseAssert
Oct 31 09:12:12 nwaku-fsn2 wakunode2[606926]: /root/nwaku/vendor/nimbus-build-system/vendor/Nim/lib/system/fatal.nim(54) sysFatal
Oct 31 09:12:12 nwaku-fsn2 wakunode2[606926]: [[reraised from:
Oct 31 09:12:12 nwaku-fsn2 wakunode2[606926]: /root/nwaku/apps/wakunode2/wakunode2.nim(144) wakunode2
Oct 31 09:12:12 nwaku-fsn2 wakunode2[606926]: /root/nwaku/vendor/nim-chronos/chronos/asyncloop.nim(1245) runForever
Oct 31 09:12:12 nwaku-fsn2 wakunode2[606926]: /root/nwaku/vendor/nim-chronos/chronos/asyncloop.nim(269) poll
Oct 31 09:12:12 nwaku-fsn2 wakunode2[606926]: /root/nwaku/vendor/nim-chronos/chronos/asyncfutures2.nim(395) futureContinue
Oct 31 09:12:12 nwaku-fsn2 wakunode2[606926]: ]]
Oct 31 09:12:12 nwaku-fsn2 wakunode2[606926]: Error: unhandled exception: /root/nwaku/vendor/nimbus-build-system/vendor/Nim/lib/system/iterators.nim(240, 11) `len(a) == L` the length of the seq changed while iterating over it >
Oct 31 09:12:12 nwaku-fsn2 systemd[1]: nwaku.service: Main process exited, code=exited, status=1/FAILURE
Oct 31 09:12:12 nwaku-fsn2 systemd[1]: nwaku.service: Failed with result 'exit-code'.
Oct 31 09:12:12 nwaku-fsn2 systemd[1]: nwaku.service: Consumed 1min 47.452s CPU time.
Oct 31 09:12:17 nwaku-fsn2 systemd[1]: nwaku.service: Scheduled restart job, restart counter is at 5.
Oct 31 09:12:17 nwaku-fsn2 systemd[1]: Stopped Waku2 Node.
Oct 31 09:12:17 nwaku-fsn2 systemd[1]: nwaku.service: Consumed 1min 47.452s CPU time.
Oct 31 09:12:17 nwaku-fsn2 systemd[1]: Started Waku2 Node.

Additionally the systemd unit tries restarting and quite often is fails restarting with the following:

 Oct 31 09:12:23 nwaku-fsn2 wakunode2[608759]: ERR 2023-10-31 09:12:23.172+00:00 5/7 Starting node and protocols failed     topics="wakunode main" tid=608759 file=wakunode2.nim:92 error="failed to start waku node: (98) Address >
Oct 31 09:12:23 nwaku-fsn2 systemd[1]: nwaku.service: Main process exited, code=exited, status=1/FAILURE
Oct 31 09:12:23 nwaku-fsn2 systemd[1]: nwaku.service: Failed with result 'exit-code'.
Oct 31 09:12:28 nwaku-fsn2 systemd[1]: nwaku.service: Scheduled restart job, restart counter is at 7.
Oct 31 09:12:28 nwaku-fsn2 systemd[1]: Stopped Waku2 Node.
@calinah calinah added the bug Something isn't working label Oct 31, 2023
@chair28980 chair28980 assigned chair28980 and vpavlin and unassigned chair28980 Oct 31, 2023
@calinah
Copy link
Author

calinah commented Nov 1, 2023

@vpavlin as discussed this am I've upgraded the nodes to v0.21.1 and after a few hours of running it restarted again with the following error:

Oct 31 16:25:19 nwaku-hil2 wakunode2[650599]: ryXvCZE4a12hH4VTcterTFPL1WMnn/proto\", \"/subgraph-radio-pillar-mainnet/0/QmPmmr3BzNdet6DMsqgVcxV9WFGXdtcQdmrR5qs9GoQ3iF/proto\", \"/subgraph-radio-pillar-m>
Oct 31 16:25:19 nwaku-hil2 wakunode2[650599]: /root/nwaku/apps/wakunode2/wakunode2.nim(144) wakunode2
Oct 31 16:25:19 nwaku-hil2 wakunode2[650599]: /root/nwaku/vendor/nim-chronos/chronos/asyncloop.nim(1245) runForever
Oct 31 16:25:19 nwaku-hil2 wakunode2[650599]: /root/nwaku/vendor/nim-chronos/chronos/asyncloop.nim(269) poll
Oct 31 16:25:19 nwaku-hil2 wakunode2[650599]: /root/nwaku/vendor/nim-chronos/chronos/asyncfutures2.nim(373) futureContinue
Oct 31 16:25:19 nwaku-hil2 wakunode2[650599]: /root/nwaku/vendor/nimbus-build-system/vendor/Nim/lib/system/iterators.nim(240) handleMessage
Oct 31 16:25:19 nwaku-hil2 wakunode2[650599]: /root/nwaku/vendor/nimbus-build-system/vendor/Nim/lib/system/assertions.nim(38) failedAssertImpl
Oct 31 16:25:19 nwaku-hil2 wakunode2[650599]: /root/nwaku/vendor/nimbus-build-system/vendor/Nim/lib/system/assertions.nim(28) raiseAssert
Oct 31 16:25:19 nwaku-hil2 wakunode2[650599]: /root/nwaku/vendor/nimbus-build-system/vendor/Nim/lib/system/fatal.nim(54) sysFatal
Oct 31 16:25:19 nwaku-hil2 wakunode2[650599]: [[reraised from:
Oct 31 16:25:19 nwaku-hil2 wakunode2[650599]: /root/nwaku/apps/wakunode2/wakunode2.nim(144) wakunode2
Oct 31 16:25:19 nwaku-hil2 wakunode2[650599]: /root/nwaku/vendor/nim-chronos/chronos/asyncloop.nim(1245) runForever
Oct 31 16:25:19 nwaku-hil2 wakunode2[650599]: /root/nwaku/vendor/nim-chronos/chronos/asyncloop.nim(269) poll
Oct 31 16:25:19 nwaku-hil2 wakunode2[650599]: /root/nwaku/vendor/nim-chronos/chronos/asyncfutures2.nim(395) futureContinue
Oct 31 16:25:19 nwaku-hil2 wakunode2[650599]: ]]
Oct 31 16:25:19 nwaku-hil2 wakunode2[650599]: Error: unhandled exception: /root/nwaku/vendor/nimbus-build-system/vendor/Nim/lib/system/iterators.nim(240, 11) `len(a) == L` the length of the seq changed >
Oct 31 16:25:19 nwaku-hil2 systemd[1]: nwaku.service: Main process exited, code=exited, status=1/FAILURE
Oct 31 16:25:19 nwaku-hil2 systemd[1]: nwaku.service: Failed with result 'exit-code'.
Oct 31 16:25:19 nwaku-hil2 systemd[1]: nwaku.service: Consumed 25.543s CPU time.
Oct 31 16:25:24 nwaku-hil2 systemd[1]: nwaku.service: Scheduled restart job, restart counter is at 2.
Oct 31 16:25:24 nwaku-hil2 systemd[1]: Stopped Waku2 Node.
Oct 31 16:25:24 nwaku-hil2 systemd[1]: nwaku.service: Consumed 25.543s CPU time.
Oct 31 16:25:24 nwaku-hil2 systemd[1]: Started Waku2 Node.
Oct 31 16:25:24 nwaku-hil2 wakunode2[651244]: INF 2023-10-31 16:25:24.732+00:00 Finding peers using Waku DNS discovery     topics="waku dnsdisc" tid=651244 file=waku_dnsdisc.nim:51
Oct 31 16:25:24 nwaku-hil2 wakunode2[651244]: INF 2023-10-31 16:25:24.746+00:00 Successfully discovered ENR                topics="waku dnsdisc" tid=651244 file=waku_dnsdisc.nim:65 count=2
Oct 31 16:25:24 nwaku-hil2 wakunode2[651244]: INF 2023-10-31 16:25:24.746+00:00 Successfully discovered nodes              topics="waku dnsdisc" tid=651244 file=waku_dnsdisc.nim:82 count=2
Oct 31 16:25:24 nwaku-hil2 wakunode2[651244]: INF 2023-10-31 16:25:24.748+00:00 Initializing networking                    tid=651244 file=waku_node.nim:140 addrs="@[/ip4/5.78.81.99/tcp/31900, /ip4/5.78>
Oct 31 16:25:24 nwaku-hil2 wakunode2[651244]: INF 2023-10-31 16:25:24.748+00:00 mounting relay protocol                    topics="waku node" tid=651244 file=waku_node.nim:313
Oct 31 16:25:24 nwaku-hil2 wakunode2[651244]: INF 2023-10-31 16:25:24.748+00:00 relay mounted successfully                 topics="waku node" tid=651244 file=waku_node.nim:332
Oct 31 16:25:24 nwaku-hil2 wakunode2[651244]: INF 2023-10-31 16:25:24.748+00:00 mounting rendezvous discovery protocol     topics="waku node" tid=651244 file=waku_node.nim:844
Oct 31 16:25:24 nwaku-hil2 wakunode2[651244]: INF 2023-10-31 16:25:24.748+00:00 mounting libp2p ping protocol              topics="waku node" tid=651244 file=waku_node.nim:802
Oct 31 16:25:24 nwaku-hil2 wakunode2[651244]: INF 2023-10-31 16:25:24.748+00:00 mounting store client                      topics="waku node" tid=651244 file=waku_node.nim:603
Oct 31 16:25:24 nwaku-hil2 wakunode2[651244]: INF 2023-10-31 16:25:24.748+00:00 mounting light push                        topics="waku node" tid=651244 file=waku_node.nim:659
Oct 31 16:25:24 nwaku-hil2 wakunode2[651244]: INF 2023-10-31 16:25:24.748+00:00 mounting filter protocol                   topics="waku node" tid=651244 file=waku_node.nim:342
Oct 31 16:25:24 nwaku-hil2 wakunode2[651244]: INF 2023-10-31 16:25:24.748+00:00 Starting discovery v5 service              topics="waku discv5" tid=651244 file=waku_discv5.nim:268
Oct 31 16:25:24 nwaku-hil2 wakunode2[651244]: INF 2023-10-31 16:25:24.748+00:00 Discv5: discoverable ENR                   topics="waku discv5" tid=651244 file=waku_discv5.nim:282 enr=enr:-Kq4QDaEzgGgyM>
Oct 31 16:25:24 nwaku-hil2 wakunode2[651244]: INF 2023-10-31 16:25:24.748+00:00 Starting discovery v5 search               topics="waku discv5" tid=651244 file=waku_discv5.nim:247
Oct 31 16:25:24 nwaku-hil2 wakunode2[651244]: INF 2023-10-31 16:25:24.749+00:00 Starting Waku node                         topics="waku node" tid=651244 file=waku_node.nim:859 version=v0.20.0
Oct 31 16:25:24 nwaku-hil2 wakunode2[651244]: INF 2023-10-31 16:25:24.749+00:00 PeerInfo                                   topics="waku node" tid=651244 file=waku_node.nim:862 peerId=16U*nyqqcH addrs=@[]
Oct 31 16:25:24 nwaku-hil2 wakunode2[651244]: INF 2023-10-31 16:25:24.749+00:00 Listening on                               topics="waku node" tid=651244 file=waku_node.nim:869 full=[/ip4/5.78.81.99/tcp/>
Oct 31 16:25:24 nwaku-hil2 wakunode2[651244]: INF 2023-10-31 16:25:24.749+00:00 DNS: discoverable ENR                      topics="waku node" tid=651244 file=waku_node.nim:870 enr=enr:-Kq4QDaEzgGgyMKZdR>
Oct 31 16:25:24 nwaku-hil2 wakunode2[651244]: INF 2023-10-31 16:25:24.749+00:00 starting relay protocol                    topics="waku node" tid=651244 file=waku_node.nim:282
Oct 31 16:25:24 nwaku-hil2 wakunode2[651244]: INF 2023-10-31 16:25:24.749+00:00 relay started successfully                 topics="waku node" tid=651244 file=waku_node.nim:303
Oct 31 16:25:24 nwaku-hil2 wakunode2[651244]: INF 2023-10-31 16:25:24.749+00:00 Stopping AutonatService                    topics="libp2p autonatservice" tid=651244 file=service.nim:203
Oct 31 16:25:24 nwaku-hil2 wakunode2[651244]: WRN 2023-10-31 16:25:24.749+00:00 service is already stopped                 topics="libp2p switch" tid=651244 file=switch.nim:86
Oct 31 16:25:24 nwaku-hil2 wakunode2[651244]: INF 2023-10-31 16:25:24.748+00:00 mounting libp2p ping protocol              topics="waku node" tid=651244 file=waku_node.nim:802
Oct 31 16:25:24 nwaku-hil2 wakunode2[651244]: INF 2023-10-31 16:25:24.748+00:00 mounting store client                      topics="waku node" tid=651244 file=waku_node.nim:603
Oct 31 16:25:24 nwaku-hil2 wakunode2[651244]: INF 2023-10-31 16:25:24.748+00:00 mounting light push                        topics="waku node" tid=651244 file=waku_node.nim:659
Oct 31 16:25:24 nwaku-hil2 wakunode2[651244]: INF 2023-10-31 16:25:24.748+00:00 mounting filter protocol                   topics="waku node" tid=651244 file=waku_node.nim:342
Oct 31 16:25:24 nwaku-hil2 wakunode2[651244]: INF 2023-10-31 16:25:24.748+00:00 Starting discovery v5 service              topics="waku discv5" tid=651244 file=waku_discv5.nim:268
Oct 31 16:25:24 nwaku-hil2 wakunode2[651244]: INF 2023-10-31 16:25:24.748+00:00 Discv5: discoverable ENR                   topics="waku discv5" tid=651244 file=waku_discv5.nim:282 enr=enr:-Kq4QDaEzgGgyM>
Oct 31 16:25:24 nwaku-hil2 wakunode2[651244]: INF 2023-10-31 16:25:24.748+00:00 Starting discovery v5 search               topics="waku discv5" tid=651244 file=waku_discv5.nim:247
Oct 31 16:25:24 nwaku-hil2 wakunode2[651244]: INF 2023-10-31 16:25:24.749+00:00 Starting Waku node                         topics="waku node" tid=651244 file=waku_node.nim:859 version=v0.20.0
Oct 31 16:25:24 nwaku-hil2 wakunode2[651244]: INF 2023-10-31 16:25:24.749+00:00 PeerInfo                                   topics="waku node" tid=651244 file=waku_node.nim:862 peerId=16U*nyqqcH addrs=@[]
Oct 31 16:25:24 nwaku-hil2 wakunode2[651244]: INF 2023-10-31 16:25:24.749+00:00 Listening on                               topics="waku node" tid=651244 file=waku_node.nim:869 full=[/ip4/5.78.81.99/tcp/>
Oct 31 16:25:24 nwaku-hil2 wakunode2[651244]: INF 2023-10-31 16:25:24.749+00:00 DNS: discoverable ENR                      topics="waku node" tid=651244 file=waku_node.nim:870 enr=enr:-Kq4QDaEzgGgyMKZdR>
Oct 31 16:25:24 nwaku-hil2 wakunode2[651244]: INF 2023-10-31 16:25:24.749+00:00 starting relay protocol                    topics="waku node" tid=651244 file=waku_node.nim:282
Oct 31 16:25:24 nwaku-hil2 wakunode2[651244]: INF 2023-10-31 16:25:24.749+00:00 relay started successfully                 topics="waku node" tid=651244 file=waku_node.nim:303
Oct 31 16:25:24 nwaku-hil2 wakunode2[651244]: INF 2023-10-31 16:25:24.749+00:00 Stopping AutonatService                    topics="libp2p autonatservice" tid=651244 file=service.nim:203
Oct 31 16:25:24 nwaku-hil2 wakunode2[651244]: WRN 2023-10-31 16:25:24.749+00:00 service is already stopped                 topics="libp2p switch" tid=651244 file=switch.nim:86
Oct 31 16:25:24 nwaku-hil2 wakunode2[651244]: INF 2023-10-31 16:25:24.749+00:00 Stopping AutonatService                    topics="libp2p autonatservice" tid=651244 file=service.nim:203
Oct 31 16:25:24 nwaku-hil2 wakunode2[651244]: WRN 2023-10-31 16:25:24.749+00:00 service is already stopped                 topics="libp2p switch" tid=651244 file=switch.nim:86
Oct 31 16:25:24 nwaku-hil2 wakunode2[651244]: WRN 2023-10-31 16:25:24.749+00:00 Stopping relay without starting it         topics="libp2p relay" tid=651244 file=relay.nim:373
Oct 31 16:25:24 nwaku-hil2 wakunode2[651244]: WRN 2023-10-31 16:25:24.749+00:00 Stopping rendezvous without starting it    topics="libp2p discovery rendezvous" tid=651244 file=rendezvous.nim:688
Oct 31 16:25:24 nwaku-hil2 wakunode2[651244]: ERR 2023-10-31 16:25:24.749+00:00 5/7 Starting node and protocols failed     topics="wakunode main" tid=651244 file=wakunode2.nim:92 error="failed to start >
Oct 31 16:25:24 nwaku-hil2 systemd[1]: nwaku.service: Main process exited, code=exited, status=1/FAILURE
Oct 31 16:25:24 nwaku-hil2 systemd[1]: nwaku.service: Failed with result 'exit-code'.
Oct 31 16:25:29 nwaku-hil2 systemd[1]: nwaku.service: Scheduled restart job, restart counter is at 3.
Oct 31 16:25:29 nwaku-hil2 systemd[1]: Stopped Waku2 Node.

@vpavlin
Copy link
Member

vpavlin commented Nov 1, 2023

Thanks for verifying! We'll start digging then:) Can you share more logs before the error kicks in?

@calinah
Copy link
Author

calinah commented Nov 1, 2023

@vpavlin here's more logs - everything seems completely healthy before the failure:

Oct 31 16:24:12 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:24:12.749+00:00 Total spam messages                        topics="waku rln_relay" tid=650599 file=protocol_metrics.nim:79 count=0.0
Oct 31 16:24:12 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:24:12.749+00:00 Total invalid messages                     topics="waku rln_relay" tid=650599 file=protocol_metrics.nim:80 count=0.0
Oct 31 16:24:12 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:24:12.749+00:00 Total valid messages                       topics="waku rln_relay" tid=650599 file=protocol_metrics.nim:81 count=0.0
Oct 31 16:24:12 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:24:12.749+00:00 Total errors                               topics="waku rln_relay" tid=650599 file=protocol_metrics.nim:82 count=0.0
Oct 31 16:24:12 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:24:12.749+00:00 Total proofs verified                      topics="waku rln_relay" tid=650599 file=protocol_metrics.nim:83 count=0.0
Oct 31 16:24:13 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:24:13.619+00:00 received filter subscribe request          topics="waku filter" tid=650599 file=protocol.nim:111 peerId=16U*AocsUB request="(requestId: \"a349cd07>
Oct 31 16:24:16 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:24:16.177+00:00 received filter subscribe request          topics="waku filter" tid=650599 file=protocol.nim:111 peerId=16U*niS4Cf request="(requestId: \"cacedf7d>
Oct 31 16:24:18 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:24:18.628+00:00 received filter subscribe request          topics="waku filter" tid=650599 file=protocol.nim:111 peerId=16U*AocsUB request="(requestId: \"d84ffee9>
Oct 31 16:24:21 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:24:21.176+00:00 received filter subscribe request          topics="waku filter" tid=650599 file=protocol.nim:111 peerId=16U*niS4Cf request="(requestId: \"a79816d5>
Oct 31 16:24:23 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:24:23.619+00:00 received filter subscribe request          topics="waku filter" tid=650599 file=protocol.nim:111 peerId=16U*AocsUB request="(requestId: \"87779158>
Oct 31 16:24:26 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:24:26.180+00:00 received filter subscribe request          topics="waku filter" tid=650599 file=protocol.nim:111 peerId=16U*niS4Cf request="(requestId: \"f6e84570>
Oct 31 16:24:28 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:24:28.623+00:00 received filter subscribe request          topics="waku filter" tid=650599 file=protocol.nim:111 peerId=16U*AocsUB request="(requestId: \"f3b4e4cb>
Oct 31 16:24:31 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:24:31.177+00:00 received filter subscribe request          topics="waku filter" tid=650599 file=protocol.nim:111 peerId=16U*niS4Cf request="(requestId: \"a904c938>
Oct 31 16:24:33 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:24:33.620+00:00 received filter subscribe request          topics="waku filter" tid=650599 file=protocol.nim:111 peerId=16U*AocsUB request="(requestId: \"30d74254>
Oct 31 16:24:36 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:24:36.182+00:00 received filter subscribe request          topics="waku filter" tid=650599 file=protocol.nim:111 peerId=16U*niS4Cf request="(requestId: \"b7e4efb5>
Oct 31 16:24:38 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:24:38.628+00:00 received filter subscribe request          topics="waku filter" tid=650599 file=protocol.nim:111 peerId=16U*AocsUB request="(requestId: \"d09144f9>
Oct 31 16:24:41 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:24:41.180+00:00 received filter subscribe request          topics="waku filter" tid=650599 file=protocol.nim:111 peerId=16U*niS4Cf request="(requestId: \"b73fe382>
Oct 31 16:24:42 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:24:42.750+00:00 Total connections initiated                topics="waku node metrics" tid=650599 file=waku_metrics.nim:53 count=0.0
Oct 31 16:24:42 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:24:42.750+00:00 Total messages                             topics="waku node metrics" tid=650599 file=waku_metrics.nim:54 count=83.0
Oct 31 16:24:42 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:24:42.750+00:00 Total store peers                          topics="waku node metrics" tid=650599 file=waku_metrics.nim:55 count=0.0
Oct 31 16:24:42 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:24:42.750+00:00 Total peer exchange peers                  topics="waku node metrics" tid=650599 file=waku_metrics.nim:56 count=0.0
Oct 31 16:24:42 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:24:42.750+00:00 Total lightpush peers                      topics="waku node metrics" tid=650599 file=waku_metrics.nim:57 count=0.0
Oct 31 16:24:42 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:24:42.750+00:00 Total filter peers                         topics="waku node metrics" tid=650599 file=waku_metrics.nim:58 count=0.0
Oct 31 16:24:42 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:24:42.750+00:00 Total active filter subscriptions          topics="waku node metrics" tid=650599 file=waku_metrics.nim:59 count=30.0
Oct 31 16:24:42 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:24:42.750+00:00 Total errors                               topics="waku node metrics" tid=650599 file=waku_metrics.nim:60 count=0.0
Oct 31 16:24:42 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:24:42.750+00:00 Total messages                             topics="waku rln_relay" tid=650599 file=protocol_metrics.nim:78 count=0.0
Oct 31 16:24:42 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:24:42.750+00:00 Total spam messages                        topics="waku rln_relay" tid=650599 file=protocol_metrics.nim:79 count=0.0
Oct 31 16:24:42 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:24:42.750+00:00 Total invalid messages                     topics="waku rln_relay" tid=650599 file=protocol_metrics.nim:80 count=0.0
Oct 31 16:24:42 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:24:42.750+00:00 Total valid messages                       topics="waku rln_relay" tid=650599 file=protocol_metrics.nim:81 count=0.0
Oct 31 16:24:42 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:24:42.750+00:00 Total errors                               topics="waku rln_relay" tid=650599 file=protocol_metrics.nim:82 count=0.0
Oct 31 16:24:42 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:24:42.750+00:00 Total proofs verified                      topics="waku rln_relay" tid=650599 file=protocol_metrics.nim:83 count=0.0
Oct 31 16:24:43 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:24:43.621+00:00 received filter subscribe request          topics="waku filter" tid=650599 file=protocol.nim:111 peerId=16U*AocsUB request="(requestId: \"1f9602ec>
Oct 31 16:24:46 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:24:46.185+00:00 received filter subscribe request          topics="waku filter" tid=650599 file=protocol.nim:111 peerId=16U*niS4Cf request="(requestId: \"01ab9ecc>
Oct 31 16:24:47 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:24:47.492+00:00 received filter subscribe request          topics="waku filter" tid=650599 file=protocol.nim:111 peerId=16U*niS4Cf request="(requestId: \"be2d619a>
Oct 31 16:24:48 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:24:48.619+00:00 received filter subscribe request          topics="waku filter" tid=650599 file=protocol.nim:111 peerId=16U*AocsUB request="(requestId: \"fa1ec40c>
Oct 31 16:24:51 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:24:51.180+00:00 received filter subscribe request          topics="waku filter" tid=650599 file=protocol.nim:111 peerId=16U*niS4Cf request="(requestId: \"92590aae>
Oct 31 16:24:53 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:24:53.619+00:00 received filter subscribe request          topics="waku filter" tid=650599 file=protocol.nim:111 peerId=16U*AocsUB request="(requestId: \"22c4f3e0>
Oct 31 16:24:56 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:24:56.181+00:00 received filter subscribe request          topics="waku filter" tid=650599 file=protocol.nim:111 peerId=16U*niS4Cf request="(requestId: \"40b5db2f>
Oct 31 16:24:56 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:24:56.458+00:00 Dialing multiple peers                     topics="waku node peer_manager" tid=650599 file=peer_manager.nim:520 numOfPeers=2
Oct 31 16:24:56 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:24:56.458+00:00 Finished dialing multiple peers            topics="waku node peer_manager" tid=650599 file=peer_manager.nim:533 successfulConns=0 attempted=2
Oct 31 16:24:58 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:24:58.621+00:00 received filter subscribe request          topics="waku filter" tid=650599 file=protocol.nim:111 peerId=16U*AocsUB request="(requestId: \"dde62ecb>
Oct 31 16:25:01 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:25:01.181+00:00 received filter subscribe request          topics="waku filter" tid=650599 file=protocol.nim:111 peerId=16U*niS4Cf request="(requestId: \"1e163c61>
Oct 31 16:25:02 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:25:02.743+00:00 received filter subscribe request          topics="waku filter" tid=650599 file=protocol.nim:111 peerId=16U*niS4Cf request="(requestId: \"414e724f>
Oct 31 16:25:03 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:25:03.637+00:00 received filter subscribe request          topics="waku filter" tid=650599 file=protocol.nim:111 peerId=16U*AocsUB request="(requestId: \"dc6c7d40>
Oct 31 16:25:06 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:25:06.177+00:00 received filter subscribe request          topics="waku filter" tid=650599 file=protocol.nim:111 peerId=16U*niS4Cf request="(requestId: \"e2ffa5eb>
Oct 31 16:25:06 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:25:06.574+00:00 added filter subscritpiton                 topics="waku filter" tid=650599 file=protocol.nim:80 peerId=16U*E1mkaH pubsubTopic=/waku/2/graphcast-v0>
Oct 31 16:25:08 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:25:08.625+00:00 received filter subscribe request          topics="waku filter" tid=650599 file=protocol.nim:111 peerId=16U*AocsUB request="(requestId: \"f231a196>
Oct 31 16:25:11 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:25:11.181+00:00 received filter subscribe request          topics="waku filter" tid=650599 file=protocol.nim:111 peerId=16U*niS4Cf request="(requestId: \"ee8db722>
Oct 31 16:25:12 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:25:12.668+00:00 Relay peer connections                     topics="waku node peer_manager" tid=650599 file=peer_manager.nim:685 inRelayConns=30/27 outRelayConns=5>
Oct 31 16:25:12 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:25:12.751+00:00 Total connections initiated                topics="waku node metrics" tid=650599 file=waku_metrics.nim:53 count=0.0
Oct 31 16:25:12 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:25:12.751+00:00 Total messages                             topics="waku node metrics" tid=650599 file=waku_metrics.nim:54 count=83.0
Oct 31 16:25:12 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:25:12.751+00:00 Total store peers                          topics="waku node metrics" tid=650599 file=waku_metrics.nim:55 count=0.0
Oct 31 16:25:12 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:25:12.751+00:00 Total peer exchange peers                  topics="waku node metrics" tid=650599 file=waku_metrics.nim:56 count=0.0
Oct 31 16:25:12 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:25:12.752+00:00 Total lightpush peers                      topics="waku node metrics" tid=650599 file=waku_metrics.nim:57 count=0.0
Oct 31 16:25:12 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:25:12.752+00:00 Total filter peers                         topics="waku node metrics" tid=650599 file=waku_metrics.nim:58 count=0.0
Oct 31 16:25:12 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:25:12.752+00:00 Total active filter subscriptions          topics="waku node metrics" tid=650599 file=waku_metrics.nim:59 count=31.0
Oct 31 16:25:12 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:25:12.752+00:00 Total errors                               topics="waku node metrics" tid=650599 file=waku_metrics.nim:60 count=0.0
Oct 31 16:25:12 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:25:12.752+00:00 Total messages                             topics="waku rln_relay" tid=650599 file=protocol_metrics.nim:78 count=0.0
Oct 31 16:25:12 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:25:12.752+00:00 Total spam messages                        topics="waku rln_relay" tid=650599 file=protocol_metrics.nim:79 count=0.0
Oct 31 16:25:12 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:25:12.752+00:00 Total invalid messages                     topics="waku rln_relay" tid=650599 file=protocol_metrics.nim:80 count=0.0
Oct 31 16:25:12 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:25:12.752+00:00 Total valid messages                       topics="waku rln_relay" tid=650599 file=protocol_metrics.nim:81 count=0.0
Oct 31 16:25:12 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:25:12.752+00:00 Total errors                               topics="waku rln_relay" tid=650599 file=protocol_metrics.nim:82 count=0.0
Oct 31 16:25:12 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:25:12.752+00:00 Total proofs verified                      topics="waku rln_relay" tid=650599 file=protocol_metrics.nim:83 count=0.0
Oct 31 16:25:13 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:25:13.625+00:00 received filter subscribe request          topics="waku filter" tid=650599 file=protocol.nim:111 peerId=16U*AocsUB request="(requestId: \"f1361f96>
Oct 31 16:25:16 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:25:16.186+00:00 received filter subscribe request          topics="waku filter" tid=650599 file=protocol.nim:111 peerId=16U*niS4Cf request="(requestId: \"d19c5861>
Oct 31 16:25:16 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:25:16.460+00:00 Dialing multiple peers                     topics="waku node peer_manager" tid=650599 file=peer_manager.nim:520 numOfPeers=1
Oct 31 16:25:16 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:25:16.460+00:00 Finished dialing multiple peers            topics="waku node peer_manager" tid=650599 file=peer_manager.nim:533 successfulConns=0 attempted=1
Oct 31 16:25:18 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:25:18.624+00:00 received filter subscribe request          topics="waku filter" tid=650599 file=protocol.nim:111 peerId=16U*AocsUB request="(requestId: \"5b393f7a>
Oct 31 16:25:19 nwaku-hil2 wakunode2[650599]: INF 2023-10-31 16:25:19.623+00:00 added filter subscritpiton                 topics="waku filter" tid=650599 file=protocol.nim:80 peerId=16U*8Ht2Ui pubsubTopic=/waku/2/graphcast-v0>
Oct 31 16:25:19 nwaku-hil2 wakunode2[650599]: ryXvCZE4a12hH4VTcterTFPL1WMnn/proto\", \"/subgraph-radio-pillar-mainnet/0/QmPmmr3BzNdet6DMsqgVcxV9WFGXdtcQdmrR5qs9GoQ3iF/proto\", \"/subgraph-radio-pillar-mainnet/0/QmPfUDgFuUmkWGq>
Oct 31 16:25:19 nwaku-hil2 wakunode2[650599]: /root/nwaku/apps/wakunode2/wakunode2.nim(144) wakunode2

@vpavlin
Copy link
Member

vpavlin commented Nov 10, 2023

I am not 100% sure, but my hunch would be a potential race condition in how we use Nim table implementation?

Oct 31 09:12:12 nwaku-fsn2 wakunode2[606926]: Error: unhandled exception: /root/nwaku/vendor/nimbus-build-system/vendor/Nim/lib/system/iterators.nim(240, 11) len(a) == L the length of the seq changed while iterating over it >

Becuase we could be potentially reading from, adding to and deleting items from the filter.subscriptons table, so it could happen that we remove or add an item while calling something that iterates over it.

@jm-clius Any thoughts? Is it possible that multiple filter requests get handled in parallel? Should we use SharedTable instead?

@vpavlin
Copy link
Member

vpavlin commented Nov 10, 2023

Regarding the systemd restarts, I consulted it with a friend from systemd team and he suggested that maybe we do not set SO_REUSEADDR on some sockets and it seems like we are not. Based on strace output, we set this option for the main TCP port we listen on, REST and JSON RPC APIs, but not for metrics and dscv5.

He say it is possible that the socket then stays in some locked state in kernel for some time, which results in that error when systemd tries to restart it.

Given where the error happens, I'd say it is due to DiscV5 port.

@jm-clius any thoughts who would be the right person to reach out to?

@Ivansete-status
Copy link
Collaborator

Morning all!
This is a super tricky issue. I hope the following comments help and don't add too much noise :)

The next error...

Oct 31 09:12:12 nwaku-fsn2 wakunode2[606926]: Error: unhandled exception: /root/nwaku/vendor/nimbus-build-system/vendor/Nim/lib/system/iterators.nim(240, 11) len(a) == L the length of the seq changed while iterating over it >

...happens due to an assert failure in:
https://github.com/status-im/Nim/blob/3c9b68dc157804885b14a1984efc25e8b7cc861d/lib/system/iterators.nim#L240

IMHO, the most plausible candidate to cause this issue is the next point:

for multiaddr in remotePeerInfo.addrs.items:
pb.write(2, multiaddr)
for proto in remotePeerInfo.protocols.items:
pb.write(3, proto)

Given this problem is very difficult to replicate from our side, I suggest creating a tailored image with additional logs in order to pinpoint the precise place where this happens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Icebox
Development

No branches or pull requests

4 participants